IPSC 2006 Solution to Problem M – Matrioska

IPSC 2006

Solution to Problem M – Matrioska

This problem was rather easy - at least concerning its idea. All you had to do was to find out what 'envelope' format was used, decode it and repeat this several times (with different envelopes, of course).

We used several common file formats - ZIP, WAV, PS, some general encodings - UUEncode, Base64 and several programming (or other) languages - Pascal, C/C++, TeX, Befunge and Whitespace. There are tools/compilers/viewers available for opening or otherwise processing all of these formats, see the Reference section below.

Generally, Google is your very good friend for this task. If there is any human-readable header in the data (at any step in decoding), searching for relevant part of it usually reveals what format it is. Another useful tool is file on most UNIX systems, which can most of the time tell you exactly what the format is. However, this is not always the case - it would be too simple then. Finally, Wikipedia always contains basic description and links to other resources and tools.

EASY input

So, let's start. In the following, each step starts with format used and a short description of how to guess it and what to do next.

UUEncode - get this file
Format used for encoding binary attachments to e-mails (now obsoleted by MIME). Googling for begin 644 does the trick; many tools for decoding are available.
C/C++ source code - get this file
This should be straight-forward for all of you.
WAV file - get this file
Probably the most common file format for storing short audio sequences, promoted by Microsoft. It can contain any actual sound encoding, however, in this case raw PCM (8bit) is used and any audio player should play this.
Morse code - get this file
Standardized (1865) encoding for transmitting characters into a sequence of short and long beeps. If it is too fast for you, try increasing the r=5 constant.
Plain text - get this file
"THAT WAS TOO EASY TRY HARD ONE"; this is clearly the original message, we are done.

HARD input

Base64 - get this file
This is probably the most common encoding these days for binary e-mail attachments, used in MIME (e-mail format capable of packaging several parts into a single e-mail message); you can note that only symbols A-Z, a-z, 0-9 and +/ are used (64 in total) with some = signs at the end, hence the name. Many tools (both online or offline) available, you can even create an e-mail message by copy&pasting this into e-mail source and looking at the result in any e-mail client.
ZIP archive - get this file
One of the most popular data compression format, with PK signature at the beginning. Some operating systems can even open this format directly (after adding the .zip extension, probably).
PostScript - get this file
Special page-description language for (hi-end) printers. It is a stack-based general-purpose Turing-complete programming language with special commands for describing graphic elements within a page. Looking at the header is enough to guess the format. If you have a PostScript-capable printer around, just print it; otherwise, use GhostScript or any GUI for it.
Befunge source code - get this file
This could be a bit tricky to guess. First you need to write the code to a text file since you have only a screenshot of it (or use some OCR software, but that would probably take you much longer). Befunge is one of many programming languages which look a bit funny and most coders don't use it for regular code. Basically, the symbols <, >, ^ and v control the execution flow over a program written in 2-dimensional grid. It is stack-based and has some primitive operations like printing a symbol or branching according to the value on top of its stack.
Pascal source code - get this file
Another straight-forward one. Created by Niklaus Wirth (1970) for educational purposes.
Pascal source code - get this file
Another straight-forward one. Created by Niklaus Wirth (1970) for educational purposes.
Pascal source code - get this file
Another straight-forward one. Created by Niklaus Wirth (1970) for educational purposes.
Pascal source code - get this file
OK. Due to limited disk storage on our server, we've decided not to continue to describe each intermediate step. What we are dealing with here is known as a quine, program which outputs its source code when compiled and run. It should be clear that we won't get anything else that this source code. That's why the problem description states that in the case of being stuck, you should try again from the beginning. Clearly, this is the right time to do it.
TeX source code - get this file
TeX is a typesetting system created by Donald E. Knuth. If you look carefully into the file, you will notice 2 parts of a file, which look a bit different than the (PostScript) rest. This is the first one (after adding some spaces and newlines):
()(0) eq { \catcode`(=0 (catcode`)=9 (font (sm)=cmr5 at0.1pt) (global(sm)) (global(output)={(setbox254)(box255)})) } if
The first and last line is valid PostScript source - but since empty string ("") is not equal to string containing zero ("0"), the middle part is never executed by PostScript interpreter/printer.
However, if this file is fed to TeX (without any modification), the string ()(0) eq is treated as a text to typeset (as is all the previous junk in the file), curly braces denote a block in TeX and the second line is regular TeX command (all TeX commands start by a back-slash character by default). This command - \catcode`(=0 tells TeX that also symbol ( should act the same way as \ does - to mark a TeX command. So from this point (within the current block) (command is exactly the same as \command, but looks more like PostScript (which doesn't use \ characters at all).

In order for PostScript interpreter to silently skip this code, we have to make sure our TeX code is well-parenthesized, ie, has a matching ) for every ( and these are in proper order. Since our TeX source would typeset many )s, we just define ) to be silently skipped (like any other whitespace).

The rest of TeX code is just for making sure that we don't actually put any of previous/following PostScript 'junk' to the output.

The second TeX part (can be found by searching for another \) is similar to this one, it just outputs something useful to TeX's output. You can try to find out how it does that; TeXbook is highly suggested. But for our purposes, it is enough to let TeX process our input file and look at the output.
C/C++ source code - get this file
Straightforward, once again.
Whitespace source code - get this file
Another tricky one. The output of the C code from previous step is just 50 empty lines. This doesn't look like a message.. But wait, are those line really empty?
Plain text - get this file
"IPSC RULEZ". Got it.

References:

http://en.wikipedia.org/wiki/Base64
http://en.wikipedia.org/wiki/Befunge
http://en.wikipedia.org/wiki/Morse_code
http://en.wikipedia.org/wiki/PostScript
http://en.wikipedia.org/wiki/Tex
http://en.wikipedia.org/wiki/Uuencode
http://en.wikipedia.org/wiki/Wav
http://en.wikipedia.org/wiki/Whitespace_programming_language
http://en.wikipedia.org/wiki/ZIP_(file_format)

Internet Problem Solving Contest

IPSC 2006

Solution to Problem M – Matrioska

EASY input

HARD input

References: