IPSC 2006
Solution to Problem M – Matrioska
This problem was rather easy - at least concerning its idea. All you had to
do was to find out what 'envelope' format was used, decode it and repeat
this several times (with different envelopes, of course).
We used several common file formats - ZIP, WAV, PS, some general encodings -
UUEncode, Base64 and several programming (or other) languages - Pascal, C/C++,
TeX, Befunge and Whitespace. There are tools/compilers/viewers available for
opening or otherwise processing all of these formats, see the Reference
section below.
Generally, Google is your very good friend for this task. If there is any
human-readable header in the data (at any step in decoding), searching for
relevant part of it usually reveals what format it is. Another useful tool
is file on most UNIX systems, which can most of the time tell you
exactly what the format is. However, this is not always the case - it would
be too simple then. Finally, Wikipedia always contains basic description and
links to other resources and tools.
EASY input
So, let's start. In the following, each step starts with format used and a
short description of how to guess it and what to do next.
- UUEncode - get this file
Format used for encoding binary attachments to e-mails (now obsoleted
by MIME). Googling for begin 644 does the trick; many tools
for decoding are available.
- C/C++ source code - get this file
This should be straight-forward for all of you.
- WAV file - get this file
Probably the most common file format for storing short audio sequences,
promoted by Microsoft. It can contain any actual sound encoding, however,
in this case raw PCM (8bit) is used and any audio player should play this.
- Morse code - get this file
Standardized (1865) encoding for transmitting characters into a sequence
of short and long beeps. If it is too fast for you, try increasing the
r=5 constant.
- Plain text - get this file
"THAT WAS TOO EASY TRY HARD ONE"; this is clearly the original message,
we are done.
HARD input
- Base64 - get this file
This is probably the most common encoding these days for binary e-mail
attachments, used in MIME (e-mail format capable of packaging several
parts into a single e-mail message); you can note that only symbols
A-Z, a-z, 0-9 and +/ are used (64 in total)
with some = signs at the end, hence the name. Many tools
(both online or offline) available, you can even create an e-mail message
by copy&pasting this into e-mail source and looking at the result in any
e-mail client.
- ZIP archive - get this file
One of the most popular data compression format, with PK signature
at the beginning. Some operating systems can even open this format directly
(after adding the .zip extension, probably).
- PostScript - get this file
Special page-description language for (hi-end) printers. It is a stack-based
general-purpose Turing-complete programming language with special commands
for describing graphic elements within a page. Looking at the header is enough
to guess the format. If you have a PostScript-capable printer around, just
print it; otherwise, use GhostScript or any GUI for it.
- Befunge source code - get this file
This could be a bit tricky to guess. First you need to write the code to
a text file since you have only a screenshot of it (or use some OCR software,
but that would probably take you much longer). Befunge is one of many
programming languages which look a bit funny and most coders don't
use it for regular code. Basically, the symbols <, >,
^ and v control the execution flow over a program written in
2-dimensional grid. It is stack-based and has some primitive operations
like printing a symbol or branching according to the value on top of its
stack.
- Pascal source code - get this file
Another straight-forward one. Created by Niklaus Wirth (1970) for
educational purposes.
- Pascal source code - get this file
Another straight-forward one. Created by Niklaus Wirth (1970) for
educational purposes.
- Pascal source code - get this file
Another straight-forward one. Created by Niklaus Wirth (1970) for
educational purposes.
- Pascal source code - get this file
OK. Due to limited disk storage on our server, we've decided not to
continue to describe each intermediate step. What we are dealing with
here is known as a quine, program which outputs its source code
when compiled and run. It should be clear that we won't get anything
else that this source code. That's why the problem description states
that in the case of being stuck, you should try again from the beginning.
Clearly, this is the right time to do it.
- TeX source code - get this file
TeX is a typesetting system created by Donald E. Knuth. If you look
carefully into the file, you will notice 2 parts of a file, which
look a bit different than the (PostScript) rest. This is the first one
(after adding some spaces and newlines):
()(0) eq {
\catcode`(=0
(catcode`)=9
(font (sm)=cmr5 at0.1pt)
(global(sm))
(global(output)={(setbox254)(box255)}))
} if
The first and last line is valid PostScript source - but since empty
string ("") is not equal to string containing zero ("0"), the middle part
is never executed by PostScript interpreter/printer.
However, if this
file is fed to TeX (without any modification), the string ()(0) eq
is treated as a text to typeset (as is all the previous junk in the file),
curly braces denote a block in TeX and the second line is regular TeX command
(all TeX commands start by a back-slash character by default). This command -
\catcode`(=0 tells TeX that also symbol ( should act the
same way as \ does - to mark a TeX command. So from this point (within
the current block) (command is exactly the same as
\command, but looks more like PostScript (which doesn't use \
characters at all).
In order for PostScript interpreter to silently skip this code, we have to make
sure our TeX code is well-parenthesized, ie, has a matching ) for every ( and
these are in proper order. Since our TeX source would typeset many )s, we
just define ) to be silently skipped (like any other whitespace).
The rest of TeX code is just for making sure that we don't actually put any
of previous/following PostScript 'junk' to the output.
The second TeX part (can be found by searching for another \) is similar
to this one, it just outputs something useful to TeX's output. You can try to
find out how it does that; TeXbook is highly suggested. But for our purposes,
it is enough to let TeX process our input file and look at the output.
- C/C++ source code - get this file
Straightforward, once again.
- Whitespace source code - get this file
Another tricky one. The output of the C code from previous step is just 50 empty
lines. This doesn't look like a message.. But wait, are those line really empty?
- Plain text - get this file
"IPSC RULEZ". Got it.
References:
http://en.wikipedia.org/wiki/Base64
http://en.wikipedia.org/wiki/Befunge
http://en.wikipedia.org/wiki/Morse_code
http://en.wikipedia.org/wiki/PostScript
http://en.wikipedia.org/wiki/Tex
http://en.wikipedia.org/wiki/Uuencode
http://en.wikipedia.org/wiki/Wav
http://en.wikipedia.org/wiki/Whitespace_programming_language
http://en.wikipedia.org/wiki/ZIP_(file_format)