# Internet Problem Solving Contest

## Solution to Problem E – Encoded Messages

The first and most simple tool of cryptographers are letter frequencies. For the easy input we get:

```_: 104        h: 20           b: 9
e: 61         l: 17           v: 8
t: 52         w: 15           y: 7
o: 41         d: 15           x: 2
n: 39         c: 15           p: 2
i: 38         u: 12           k: 2
a: 37         m: 11           j: 1
s: 34         f: 10           z: 0
r: 29         g: 9            q: 0
```

In English, the most frequent ten letters, in order, are ETAINOSRLD, and of course a space is more frequent than any of the letters. This almost exactly matches the results we got from the ciphertext.

This may lead us to the conclusion that we see the actual letters of the plaintext. The reason why they don't make sense is that they are rearranged in some way.

The problem statement gave a hint about permutations being used. And indeed, this is a simple permutation cipher. The key in this cipher is a fixed permutation on N elements. The encryption of a given plaintext works as follows: The text is divided into groups of N letters, and the letters in each group are rearranged according to the key permutation. (Decryption works in the same way, we just use the inverse permutation.)

In our case, a permutation with N=8 was used. Probably the best way how to find it was to use brute force – try out all reasonably small permutations, filter the really bad results, and examine the remaining ones.

A simple way of filtering is based on pairs of consequent letters: if there are too many uncommon pairs in the output, it is most probably wrong. A more sophisticated way is to use a dictionary, and only output those possibilities where we get a lot of known words in the decrypted text. We will show an even more sophisticated method later.

The plaintext:

```Place a needle on a table. Then, with your eye on the level of
the table, look at it side-ways, and you see the whole length of it;
but look at it end-ways, and you see nothing but a point,
it has become practically invisible. Just so is it with one
of our Women.  When her side is turned towards us, we see her
as a straight line; when the end containing her eye or mouth --
for with us these two organs are identical -- is the part that meets
our eye, then we see nothing but a highly lustrous point;
but when the back is presented to our view, then -- being only
sub-lustrous, and, indeed, almost as dim as an inanimate object --
her hinder extremity serves her as a kind of Invisible Cap.

The dangers to which we are exposed from our Women must now be
manifest to the meanest capacity in Spaceland. If even the angle
of a respectable Triangle in the middle class is not without
its dangers; if to run against a Working Man involves a gash;
if collision with an officer of the military class necessitates
a serious wound; if a mere touch from the vertex of a Private Soldier
brings with it danger of death; -- what can it be to run against
a Woman, except absolute and immediate destruction? And when a Woman
is invisible, or visible only as a dim sub-lustrous point,
how difficult must it be, even for the most cautious,
always to avoid collision!

The answer is the first word from the title of the book that contains
this excerpt. By the way, we truly recommend that you read this book
after the contest is over.
```

The book is "Flatland: A romance of many dimensions" by Edwin A. Abbott. It tells about the adventures of a square in his 2-dimensional, and in our 3-dimensional world. And we can only repeat the recommendation. Do read it.

The answer, of course, was FLATLAND.

In this case, the results of the frequency analysis seem to be less pleasant:

```h: 45           z: 15            u: 5
f: 36           x: 12            d: 4
i: 30           s: 11            l: 3
b: 27           y: 10            c: 3
w: 22           g: 10            q: 2
p: 22           a: 10            m: 1
r: 20           n: 9             o: 0
k: 19           e: 9             j: 0
v: 17           t: 6             _: 0
```

But still we may note that the letter frequencies for different letters differ. If we translate the frequencies into percents, we will see that the numbers quite closely match known percents for the English language... only the actual letters differ.

This suggests that a substitution cipher may have been used – a cipher where all occurences of a given letter are replaced by some other letter. For example, each H in the ciphertext was probably an E in the plaintext.

Substitution ciphers can often be done by hand, once you get the most frequent letters right the rest is simple. Sadly, our attempts are doomed to fail, the cipher is a bit more difficult than this. In addition to the substitution cipher, a permutation cipher was used. (In this case we used a permutation with N=6.)

Brute force is not an option any more. Even if we knew the permutation length, there would still be 6!x26! possible keys. We have to use a more clever way of deciphering this text.

We will start by finding the tetragram frequencies for the English language. (A tetragram are four consecutive letters of the text.) We can do this by analysing some large English text, such as several works of fiction. Once we have the tetragram frequencies, we have a pretty powerful weapon. We can use them to say how similar to English a given string "sounds".

Let `goodness(S) = sum_{T is a tetragram in S} log(frequency(S))`. The more unexpected a tetragram is, the lower is the logarithm of its frequency, and the lower is the resulting sum. In other words, `goodness(S)` tells us how common are the tetragrams of S in English text – and this is exactly what makes a text sound English.

A side note: why did we use logarithms? One can come to this idea by experimentation only. However, there is deeper mathematics involved. We will only sketch the general idea. Let E be an event with probability p. If E does occur, the surprise that this happened can be defined as log_2(1/p). (E.g., if we flip a coin, the surprise when we get a tail is 1.) Note that this "surprise" is exactly the amount of bits of information we get when the event happens. Now suppose that we generate an English-sounding text using tetragram frequencies only. (This is called a Markov source.) Then `goodness(S)` is inversely proportional to the surprise that we got the string S using our source. In other words, the higher `goodness(S)`, the lower the surprise, the better the string.

Using this measure we may write a simple hill-climbing algorithm to solve a substitution cipher: Start with a random permutation of letters. In each step, try to modify the current permutation by swapping the images of two letters. If some such changes lead to better-sounding plaintexts, pick the best one and continue. Otherwise, restart the whole process.

On a sufficiently long ciphertext and with a reasonably fast machine, this algorithm is able to find the solution in a matter of a few seconds. This is already enough to brute force the rest of this cipher. However, there is still plenty of space for major improvements. (We are aware of some, but we won't spoil you the fun of looking for them.)

Here is the deciphered plaintext:

```Regardless of what you may think, it is not easy to guess the first word
of the first sentence.

The proof you should submit are three uppercase letters used to denote
the young sidekick of the Bastard Operator From Hell.

The rest of this input only serves to make the plaintext longer
and easier to decrypt.

A memorandum of the wager was at once drawn up and signed by the six
parties, during which Phileas Fogg preserved a stoical composure. He
certainly did not bet to win, and had only staked the twenty thousand
pounds, half of his fortune, because he foresaw that he might have to
expend the other half to carry out this difficult, not to say unattainable,
project. As for his antagonists, they seemed much agitated; not so much by
the value of their stake, as because they had some scruples about betting
under conditions so difficult to their friend.

The clock struck seven, and the party offered to suspend the game so
that Mr. Fogg might make his preparations for departure.

"I am quite ready now," was his tranquil response. "Diamonds are
trumps: be so good as to play, gentlemen."
```

The BOFH is a fictional character from short stories (and later columns in The Register) written by Simon Travaglia. Another classic worth reading! Start right here and then move on to The Register part.

As you can find in the Wikipedia article (and of course in the texts!), his young assistant is known as the PFY – the Pimply-Faced Youth. This was the correct answer.