Both the easy and the difficult data set contain a program that is too inefficient to run in a reasonable time. Therefore a solution of the problem consists of two steps:
The program in the easy data set generates all possible combinations of 50 distinct numbers from 1,2,...,300. After each combination is generated in array a, character * is printed to the output.
One possible way how you could discover the purpose of the program was to substitute some small numbers for 300 and 50 and to display values of array a during the computation.
It is well known that the number of combinations of 50 elements from 300-element set is (300 choose 50) = 300! / (50! * 250!) = (300*299*298*...*250) / (50*49*48*...*1). This number is too large to be stored even in 32-bit integer. You can either write your own arithmetic of large numbers or to use a programming language that provides this feature. If you need to write your own arithmetic, you may preprocess the expression so that you need to program only multiplication of a large number by a small number, which is quite simple.
In our solution we have used Scheme programming language that has built-in large numbers. The program computes (n choose k) recursively using formula (n choose k) = (n choose k-1) * (n-k+1) / k. The result is
It is not very important to study what functions boo and foo do. The key observation is that function boo initializes the array which it gets as an argument, and function foo changes the value of this array so that the new value depends only on the previous value of the array. Thus we can make a sequence of values of an array, where the first value in the sequence is the value assigned by boo and each successive value is obtained from the previous one by using foo.
The program tries to find the smallest i such that there exist j<i such that the i-th and j-th value in the sequence are the same. The program writes exactly i-1 stars. The program uses a straightforward and inefficient way how to find such value of i - for each i=2,3,4,... it tries all values of j=1,2,3,...,i-1.
One way how to solve the problem was to use hash tables to store all values of a sequence until we generate a value that is already found in our hash table. Then the current element is the first element which is repeated. This program uses a lot of memory and the bad thing is that we do not know in advance how much memory will be needed.
Another very efficient method is known as Floyd's method. It can be used in general for finding periods (or cycles) in sequences that have a property that each element is uniquely determined by the previous element. First of all realize that once some element is repeated in the sequence, the sequence must be periodic from that point on. Therefore we can view the sequence as having some pre-period of unknown length p followed by an infinitely many times repeating period of length d. For example sequence 5, 4, 3, 1, 2, 1, 2, 1, 2, 1, 2, ... has p=3 and d=2. We want to compute number p+d because the (p+d+1)-st element is the first one that repeats in the sequence (it is the first element in the second occurrence of the period).
The Floyd's method works as follows. You maintain two pointers in your sequence so that the first pointer points to the i-th element when the second pointer points to the 2i-th element. In each step we move the first pointer to the next element and we move the second pointer twice (to the element after the next element). It is guaranteed that if a sequence is periodic then i-th and 2i-th elements will be the same for some i<p+d. In this way we find some element which is in the cycle and we find it in reasonable time.
Once you have a pointer inside a period, it is not difficult to find the length of the period d - just keep one pointer on its place and move the other pointer until the pointers point to elements with the same value. The length of pre-period is found as follows. Move two pointers, the first starts at the first element of the sequence and the second one starts at the (d+1)-st element. You move them simultaneously so that the second is always d elements apart from the first one. Once the corresponding elements have the same value, you have found the first element of the period.
Overall time complexity of this algorithm is O(d+p) and it needs a constant amount of memory. The solution for our sequence is