# Internet Problem Solving Contest

## Solution to Problem K – Kirk or Picard?

Linear congruential generators (LCGs) are one of the most popular sources of pseudorandom numbers. The main reasons why this is the case: they are very fast and simple. However, the quality of the random numbers they provide isn’t very high.

Additionally, not all LCGs are equal. By its nature, each LCG is periodic, but different LCGs can have periods of different lengths and with different patterns. Some of them will pass at least some basic statistical tests, some won’t.

In this problem we used two LCGs that aren’t really among the best ones.

The first sequence was generated by a distorted version of the LCG used in Java’s java.util.Random, POSIX [ln]rand48, and glibc [ln]rand48[_r]. This generator uses m = 248 − 1, $a={\rm 5DEECE66D}_{16}$, and c = 11. Its official implementations output the 32 highest-order bits of the internal state (bit 47 down to bit 16 of the current remainder modulo m).

In our implementation we distorted this LCG: instead of outputting the highest-order bits (which have a large period and behave nicely) we output the 24 lowest-order bits.

The second LCG is the generator used in random0 to generate random floating-point values between 0 and 1. This generator has m = 134 456, a = 8 121, and c = 28 411. The original generator outputs the value s/m after each step, where s is the current state. Again, our generator is actually a distorted form of this LCG: instead of doing this division, we simply output bits 16 down to 0 of the current state.

In the second LCG it should be quite obvious that there are multiple significant issues. First of all, the modulus is not a power of 2, which is why bit 16 of the current state is more likely to be 0 than 1. On the other end of the state we can observe an even simpler pattern: if the current state is odd, the next one will be even and vice versa. That should be pretty easy to observe and then to predict :)

There are many ways to solve this task. The correct submissions included both ones that barely made it over the 2500 correct per sequence, but there was also a lot of submissions that got something like 3998 + 3708 bits correct.

A painful but reasonable way of solving this task is based on statistical tests. Code up a bunch of tests, find irregularities, and use those to get good predictions for the missing bits.

One particular test that is worth looking at: for each pair of small integers 0 ≤ q < p take all bits on positions pi + q and for all small k count the number of occurrences of each k-bit substring.

Our favorite way of solving this task is to realize that there are some simple dependencies between adjacent bits. We don’t know what they are, but we don’t have to. We can just take some standard classifier (e.g., something from scikit-learn), train it using the known parts of the sequence (“in this context expect this bit”) and then use it to predict the missing bits from their context.

(You can easily verify whether this approach works for a particular classifier by dividing the training data you have into training and validation data. In other words, train on 90% of data you have, verify that it predicts the remaining 10% well, and then use the trained classifier to fill in the actual blanks.)

We are looking forward to hearing from you: what method did you use? How painful was it?