Looking at the pictures of chaptchas from the easy subproblem we can quickly see that they look rather simple – they are just slightly noisy versions of the reference alphabet. Moreover, the captcha letters are clearly separated, each in its own 100x70 block.
This hints of a simple idea — just count have many pixels have the same color in both images. Unfortunately, looks might be deceiving — if you wrote the simple comparison code, you would quickly discover that the accuracy is not very good. The problem is that the images are also slightly shifted by a few pixels. Fortunately, the inputs are small enough that it is possible to check all the reasonable shifts just by a brute force.
If you think you have a solution that’s almost correct, there is a neat trick you can do: use your solution’s outputs to generate a large bitmap of the pictures it thinks are correct. Generate a second such bitmap by concatenating the actual inputs. Then, open a graphics editor and place one on top of the other. You will quickly see the differences, and if there are only few of them, you can easily fix the output by hand.
Captchas in the hard subproblem pose a more of a challenge. By observing a few of them, we can conclude that the letters are still separated. Unfortunately, unlike the easy subproblem, captchas here are heavily distorted in both x and y directions. Also, the problem statement hints that we do not have to get all of them correct (though, the tolerance is not really big). Finally, we have provided an example set of 200 correctly labelled captchas, i.e., 1200 symbols in total. As you may have guessed, this was quite intentional and the easiest way to solve this problem was to apply machine learning techniques.
There are several different possible machine learning techniques. However, for the domain of image/letter recognition, probably the best one is convolutional neural networks (although SVMs with handcrafted features can also score very high). In our case, we used a fairly standard setup for a convolutional network slightly finetuned to quickly downsample the data. Our layers are
With this setup, the network can learn pretty quickly the given labelled dataset and we were able to obtain >99.5% per-captcha accuracy.
Even the example convolution network from Keras.io works quite well.
Use Amazon Mechanical Turk! Actually, sorry but you can’t. This would be a violation of the contest rule “It is strictly forbidden to communicate with people other than your team members about issues that concern solving the problems.” But not everything is lost. Indeed, if you were bored during the second half of the contest, guess what, you can be the mechanical turk – with three people in a team, it is measly 1200 letters per person to recognize (out of 800 captchas, only 600 are unlabelled so there are total of 3600 letters and 3 people). And this can go even further – the winning non-machine-learning strategy would go something like this:
Maybe you come up with something (perhaps a standard neural network, SVM, or other machine learning technique) but you realized that the accuracy is way below what was needed. But don’t worry – most of the machine learning tools actually output also the “probability” of the classification being right. So, the solution is to let the machine classify automatically, then take the worst predictions (according to their probability) and classify them manually. This would save you a lot of human work while not requiring expensive fine-tuning of the machine learning algorithms.