# Internet Problem Solving Contest

## Solution to Problem L – Luxor Catch-ya!

### Easy subproblem

Looking at the pictures of chaptchas from the easy subproblem we can quickly see that they look rather simple – they are just slightly noisy versions of the reference alphabet. Moreover, the captcha letters are clearly separated, each in its own 100x70 block.

This hints of a simple idea — just count have many pixels have the same color in both images. Unfortunately, looks might be deceiving — if you wrote the simple comparison code, you would quickly discover that the accuracy is not very good. The problem is that the images are also slightly shifted by a few pixels. Fortunately, the inputs are small enough that it is possible to check all the reasonable shifts just by a brute force.

If you think you have a solution that’s almost correct, there is a neat trick you can do: use your solution’s outputs to generate a large bitmap of the pictures it thinks are correct. Generate a second such bitmap by concatenating the actual inputs. Then, open a graphics editor and place one on top of the other. You will quickly see the differences, and if there are only few of them, you can easily fix the output by hand.

### Hard subproblem

Captchas in the hard subproblem pose a more of a challenge. By observing a few of them, we can conclude that the letters are still separated. Unfortunately, unlike the easy subproblem, captchas here are heavily distorted in both x and y directions. Also, the problem statement hints that we do not have to get all of them correct (though, the tolerance is not really big). Finally, we have provided an example set of 200 correctly labelled captchas, i.e., 1200 symbols in total. As you may have guessed, this was quite intentional and the easiest way to solve this problem was to apply machine learning techniques.

There are several different possible machine learning techniques. However, for the domain of image/letter recognition, probably the best one is convolutional neural networks (although SVMs with handcrafted features can also score very high). In our case, we used a fairly standard setup for a convolutional network slightly finetuned to quickly downsample the data. Our layers are

• an 100x70 input layer
• a 7x7 convolution layer with 10 filters
• a maxpool layer pooling 5x5 blocks into a single value
• a 5x5x(10 channels) convolution layer with 10 filters
• a maxpool layer pooling 2x2 blocks into a single value
• the result from the previous layer is 10x7x(10 channels)
• a fully connected 700x70 layer. Note that we use a dropout technique on this layer during the learning step
• a fully connected 70x26 layer with softmax activation function (one-hot encoding of which letter of alphabet it is)

With this setup, the network can learn pretty quickly the given labelled dataset and we were able to obtain >99.5% per-captcha accuracy.

Even the example convolution network from Keras.io works quite well.

### But what if I hate machine learning?

Use Amazon Mechanical Turk! Actually, sorry but you can’t. This would be a violation of the contest rule “It is strictly forbidden to communicate with people other than your team members about issues that concern solving the problems.” But not everything is lost. Indeed, if you were bored during the second half of the contest, guess what, you can be the mechanical turk – with three people in a team, it is measly 1200 letters per person to recognize (out of 800 captchas, only 600 are unlabelled so there are total of 3600 letters and 3 people). And this can go even further – the winning non-machine-learning strategy would go something like this:

1. Realize that humans are much better at “comparison” than at “classification”
2. Sort the images by similarity. This could be something as simple as the “diff” from the easy subtask or something more complex
3. Write a simple program or a webpage which just displays a captcha letter and a few options (sorted by relevance) on which you can click
4. Click through the sequence

### The golden middle way

Maybe you come up with something (perhaps a standard neural network, SVM, or other machine learning technique) but you realized that the accuracy is way below what was needed. But don’t worry – most of the machine learning tools actually output also the “probability” of the classification being right. So, the solution is to let the machine classify automatically, then take the worst predictions (according to their probability) and classify them manually. This would save you a lot of human work while not requiring expensive fine-tuning of the machine learning algorithms.