IPSC Logo

Internet Problem Solving Contest

IPSC 2015

Solution to Problem J – Juicy Dot Coms

When dealing with an unknown file format, the extension of the filename can be a good hint. Wikipedia is quite helpful when it comes to .COM files.

Once you found out that it’s an old MS-DOS binary, you probably wanted to run it – but didn’t have a copy of MS-DOS lying around any more. Luckily for you, DOSBox runs almost everywhere, and can run these binaries as well. That was actually sufficient to solve the easy subproblem, as described below.

Hollywood hacking

The easy subproblem was actually solvable without any need for understanding how computers work – almost like hackers break passwords in movies. The program gives immediate feedback when an incorrect character is pressed. This means we can figure out the password on character-by-character basis, simply by entering the known prefix of the password and exhaustively trying to extend it by one more character. Once you discovered that the program doesn’t terminate if you press an uppercase A, you probably knew you were on the right track.

Still, it was beneficial to look at the “code” as well, as the easy subproblem could prepare you for the hard one.

16-bit machine code basics

Apparently, the files consist of raw machine code for a 16-bit processor. There are many ways of turning the machine code into something more readable; a few can be found in this Stack Overflow answer.

In order to understand the dissassembled output, one would need a quick course in x86 assembler. It’s Wikipedia to the rescue again: X86 assembly language!

For our purposes, it would be sufficient to understand that we are dealing with a few 16-bit values (so-called “registers”) which are named ax, bx, cx, dx, si, and di. Each of them can be split into two 8-bit halves: e.g., the higher 8-bit half of ax is called ah, and the lower 8-bit half is called al.

Most of the instructions are named reasonably enough to infer their meaning from the name. The syntax [X] denotes access to memory location [X]; similar to dereferencing a C-style pointer.

The only tricky instruction is “int 0x21”, which was, back in the MS-DOS days, used for calling various functions provided by the operating system – such as reading input from the user or writing output. The definitive guide to these functions (and a lot more) is Ralf Brown’s Interrupt List (RBIL).

Disassembling the easy subproblem

Now, we are ready to do some disassembling! From one of the above links, we have learned that the .COM file is loaded into memory so that its first instruction is at the address 100h. Hence, we may want to run a command like this one:

$ ndisasm -b16 -o100h j1.com 

The output starts with:

00000100  BA3001            mov dx,0x130
00000103  B409              mov ah,0x9
00000105  CD21              int 0x21

RBIL tells us this block just writes out a piece of text at the address that is stored in the dx register. A quick look into the position 0x30 of our file (remember, the first byte of the file is at address 0x100 in memory) makes it clear it is just the introduction message. Moving on.

00000107  BE2002            mov si,0x220 
0000010A  BF5E02            mov di,0x25e
0000010D  B90A00            mov cx,0xa
00000110  BA0102            mov dx,0x201

Similar reasoning allows us to find that si now points to the string “7ialnjpbjd”, di to “Pythagoras” and dx points to the message displayed after entering an incorrect password. Finally, cx has the value 10, which is suspiciously similar both to the length of the strange string and to the philosopher’s name.

If we now skip ahead a bit, we’ll see the following sequence of instructions:

0000011F  46                inc si
00000120  47                inc di
00000121  49                dec cx
00000122  75EF              jnz 0x113

The last two instructions are essentially a do-while loop: the register cx is decremented by one and if it didn’t hit zero yet, execution continues back at the address 0x113. Thus, cx serves as a loop counter, starting at 10 and decreasing with every iteration. At the same time, the registers si and di move forward over the respective strings they pointed at initially.

The instruction immediately following the loop is

00000124  BA3A02            mov dx,0x23a

and a quick check of the corresponding location in the file tells us we’re on the right track – the position 0x13a corresponds to the success message we want to see! The only way of reaching this point in the program is to run the loop through all the 10 iterations.

The body of the loop looks as follows:

00000113  B401              mov ah,0x1
00000115  CD21              int 0x21
00000117  8805              mov [di],al
00000119  28C8              sub al,cl
0000011B  3A04              cmp al,[si]
0000011D  7508              jnz 0x127

RBIL tells us that first two instructions just cause one character to be read from input. This character is then used to overwrite one character of “Pythagoras” and it is compared to one character of that strange string – but only after the current value of the loop-interator is subtracted from it.

If they differ (their difference being “non-zero”, thus jnz = “jump if not zero”), the program continues at location 0x127. Quick glance at that location reveals that it just contains one display-a-message request followed by program termination:

00000127  B409              mov ah,0x9
00000129  CD21              int 0x21
0000012B  B8004C            mov ax,0x4c00
0000012E  CD21              int 0x21

Since we need to go through all the iterations of the loop, we need to make sure the comparison always ends up true. This means the first character we input must be equal to the first character of the “7ialnjpbjd” string plus 10, the second one must be equal to the second character of this string plus 9, and so on. Adjusting the characters in the string reveals the password to be “Aristotele”.

(And indeed, if you enter Aristotele from your keyboard into the program, it will verify it and output a confirmation that this is indeed the password.)

Decompiled easy

Here’s some pseudocode that is roughly equivalent to j1.com:

    write("... welcome...");

    char *pkey = "7ialnjpbjd"; // register "si"
    char *presult = "Pythagoras"; // register "di"
    int count = 10; // register "cx"
    char *message = "... failed..."; // register "dx"

    do
    {
        char al = read_character_from_input();
        *presult = al;
        if (al - count != pkey)
            goto finish;
        pkey++;
        presult++; 
        count--;
    } while(count > 0);
    message = "... success..."

finish:
    write(message);
    exit(0);

Disassembling the hard subproblem

The hard sub-problem is slightly more complicated, but if you didn’t follow the movie-hacker path, you should be ready for it! Unlike the easy case, this program doesn’t take any input; it just displays three strings and terminates. The second of those messages is displayed in a subroutine at address 0x288 and mentions the program as being “cracked”. A subroutine is entered using “call” instruction and finishes when the execution reaches the “ret” one. Looking carefully, we can see that there is a large block within the program which looks reasonable enough, yet is never entered. This block starts at address 0x116 and seems to try to write some strings – which turn out to be mentioning “license”. Last but not least, it contains a “ret” (at 0x15c). All of these observations suggest this block is the license-checking function which the wannabe-cracker replaced by a call to his own code.

As the first step, let’s “un-crack” the program by modifying the “call” instruction at address 0x107. Call instruction consists of three bytes; the second and third of which determine the address to be called – they represent a signed 16-bit value which corresponds to a relative offset between the next instruction and the desired address to be called. In the cracked program, these two bytes represent value 0x17e (0x10A + 0x17E = 0x288). Replacing them by 0x10c (i.e. 0x0c, 0x01) changes the call destination to 0x10A + 0x0c = 0x116, just as we desired.

Doing this removes the crack… but of course, if you now run the code, it produces a complaint about license information not being valid.

Let’s see how the license is validated. A quick look tells us that the whole validation process takes place in the block starting at address 0x11d and if we reach address 0x139 in a legitimate, non-cracky way, we have won – since that’s where a message about license being valid is displayed. Conditional jumps leading to address 0x10d correspond to failed checks – if a program jumps to that address, it will only display a message and terminate.

Let’s see what those conditions are!

0000011D  A13302            mov ax,[0x231]
00000120  8B1E3502          mov bx,[0x233]
00000124  BA2402            mov dx,0x220
// Okay, we will be dealing with two 16-bit values: ax and bx
// "dx" is only used for the message about license being bad.

00000127  B117              mov cl,0x17
00000129  39C3              cmp bx,ax
0000012B  72E0              jc 0x10d
0000012D  83F800            cmp ax,byte +0x0
00000130  74DB              jz 0x10d
00000132  29C3              sub bx,ax
00000134  93                xchg ax,bx
00000135  FEC9              dec cl
00000137  75F0              jnz 0x129

Translating this block into a pseudocode yields (writing A and B instead of ax and bx):

    for (count = 23; count != 0; count--)
    {
        if (B < A || A == 0)
            fail;
        B -= A;
        swap(A, B);
    }

Since we are dealing with 16-bit values, it is easy to determine the correct values of A and B by exhaustive search and find that the only pair which survives through the 23 iterations is A = 28657 and B = 46368. Just out of curiosity… do you happen to know what the 23-rd Fibonacci number is? No, it’s most definitely not a coincidence.

Finally, we need to place these numbers at offset 0x131 in the file (that’s where ax and bx were initialized from), overwriting the conveniently located string IPSC. We have 0x6ff1 = 28657, while 0xb520 = 46368, so the final byte-level representation would be 0xf1, 0x6f, 0x20, 0xb5 (remember, we are working with a little-endian platform). Doing so yields the final executable which, upon execution, reveals the answer: “LeonardoPisano”.