IPSC 2004 Solution to Problem A – Andrew's revenge

IPSC 2004

Solution to Problem A – Andrew's revenge

Easy input

The most important part of the program is the function mex():

long long mex(long long a, long long b, long long c) {
  long long d;
  if (b==0) return 1;
  if (b==1) return a%c;
  d=mex(a,b/2,c); d*=d; d%=c; d*=mex(a,b%2,c); d%=c;
  return d;
}

We can easily find out that for a,b,c>=0, the return value of mex() is exactly a^b mod c. (With one exception: mex(0,0,x) is 1, whereas 0⁰ is undefined.)

The output file contains triplets a,c,mex(a,b,c). To produce the input file, we have to find the value of b. This is the discrete logarithm problem. This problem is assumed to be computationally hard. In fact, some cryptosystems (e.g. the ElGamal cryptosystem) are based on similar problems and would be broken, if we could compute discrete logarithms fast. However, most of the outputs were small enough to be solved by an exhaustive search. Note, that to identify b, we only need to try values between 0 and c-1. In fact, we may stop the search as soon as a^b is equal to some previously computed power of a.

A few "problematic" triplets:

0 47 1
For this triplet, the only valid solution is b=0; this is the special case mentioned above.
2 987654321 536870912
The modulus is large, but 536870912=2²⁹, therefore b=29. (This can also be found by the exhaustive search.)
21341241 941241241 757104002
Answer for this triplet was given to you for free. It is listed in the comments in the source code along with the correct value of b=123412321.
2 1000000007 3
This triplet can also be solved by looking at the examples in the code. We seek the value dlog₂(3) (modulo 1000000007). We do not know this value, but we are given the values dlog₂(72) and dlog₇₂(3). The answer is the product P of these two values. As we have to find a b satisfying b<c, the actual answer is b=P mod (1000000007-1).

Hard input

The program encrypts the input text (actually, only the uppercase letters) using the Hill cipher. The encryption key is an N×N matrix A, the decryption key is the inverse matrix of A. (Inverse matrix of the matrix A is a matrix A' such that AA'=I, where I is the identity matrix – with ones on the diagonal and zeroes elsewhere. The matrix A has to be chosen in such a way that the inverse matrix exists.)

The encryption process works as follows. We map letters to numbers from 0 to 25. We divide the input sequence into N-letter words, each of the words will be encoded separately. To encode a word, we take the corresponding N-dimensional vector of numbers, multiply it by matrix A (computing modulo 26) and convert the result back into letters. In this way, we obtain N new letters, each of them being a fixed linear combination of the original letters.

Suppose we know matrix A. If we knew the inverse matrix A', decryption would be easy – we would simply encode the ciphertext using matrix A' instead of A, obtaining the original plaintext. The inverse matrix can be computed using Gaussian elimination.

It is possible to obtain the same algorithm without reasoning about inverse matrices. To decrypt a single word, we consider the unknown letters to be variables. The matrix A and the ciphertext represent a set of N linear equations by which these variables are determined. Again, we can solve this system of linear equations by Gaussian elimination.

However, we still have to find the matrix A. The program computing the matrix is pretty slow and we will have to optimize it.

How does the program generate the matrix A? It starts with the identity matrix. In each iteration, a regular invertible matrix B is generated and A is multiplied by it. (As a matter of fact, these matrices correspond to "multiplying a row of A by a constant" and "adding one row of A to another row", but this doesn't play a role in our solution.)

Note that the value returned by the pseudo-random function RANDOM() (we will use the names from the C code) depends only on the previous value returned. Thus, the generated matrix B depends only on the number returned by the first call of RANDOM() in the loop. From the moment this number repeats, the sequence of generated matrices B is periodic.

A simple modification of our program shows that the 28360th matrix B is identical with the first one. Therefore the sequence of generated matrices is periodic with period 28359.

The loop is executed exactly 134620173=28359*4747 times. To compute A, we only have to compute the 4747th power of the product D of the first 28359 generated matrices. This is already feasible.

Perhaps the simplest way to compute the inverse of the matrix A is to modify the program. We may compute the required matrix A' by multiplying the inverses of the generated matrices B in reverse order. Finding the inverses of the generated matrices is trivial and to compute the actual product we use exactly the method described above.

Internet Problem Solving Contest

IPSC 2004

Solution to Problem A – Andrew's revenge

Easy input

Hard input