IPSC 2014 Problem H – Hashsets

IPSC 2014

Problem H – Hashsets

There are not many data structures that are used in practice more frequently than hashsets and hashmaps (also known as associative arrays). They get a lot of praise, and deserve most of it. However, people often overestimate their capabilities. The Internet is full of bad advice such as “just use a hashset, all operations are O(1)” and “don’t worry, it always works in practice”. We hope that you know better. And if you don’t, now you’ll learn.

Let’s start by looking at a sample program in two of the most common modern programming languages: C++11 and Java 7.

// C++11
#include <iostream>
#include <unordered_set>
int main() {
    long long tmp;
    std::unordered_set<long long> hashset;
    while (std::cin >> tmp) hashset.insert(tmp);
}

// Java
import java.io.*;
import java.util.*;
public class HashSetDemo {
    public static void main(String[] args) throws IOException {
        BufferedReader in = new BufferedReader( new InputStreamReader( System.in ) );
        HashSet<Long> hashset = new HashSet<Long>();
        for (String x = in.readLine(); x != null ; x = in.readLine())
            hashset.add( Long.parseLong(x) );
    }
}

Both programs do the same thing: they read a sequence of signed 64-bit values from the standard input and they insert them into a hashset.

We compiled and ran both programs on our server. If we used the sequence 1, 2, …, 50 000 as the input, the C++ program needed about 0.05 seconds and the Java program needed about 0.25 seconds. After we increased the input to 1, 2, …, 1 000 000, the C++ program ran in 0.6 seconds and the Java program took 0.8 seconds. That’s fast, right?

Based on this evidence and their limited understanding, many people would claim that the above programs will process any sequence of n integers in O(n) time. Are you one of those people?

Problem specification

Submit a sequence of 64-bit integers. The length of your sequence can be at most 50 000.

To solve the easy subproblem H1, your sequence must force at least one of our sample programs to run for at least 2 seconds on our server.

To solve the hard subproblem H2, your sequence must force both of our sample programs to run for at least 10 seconds on our server.

(The hard subproblem would certainly be solvable even if the limits were three times as large. We don’t require you to find the worst possible input – any input that’s bad enough will do.)

Technical details

For the purpose of this problem, there is no “the C++” or “the Java”. As the internals of hashsets are implementation-defined, we have to go with specific compiler versions. We did our best to choose the most common ones.

The officially supported versions are gcc 4.8.1, gcc 4.7.2, gcc 4.6.4, OpenJDK 1.7.0_40, and OpenJDK 1.6.0_24. Solutions that work with any combination of these compilers should be accepted.

Up to our best knowledge, any version of gcc in the 4.6, 4.7, and 4.8 branches should be OK. Any OpenJDK version of Java 6 or 7 should also be OK. However, the only officially supported versions are the ones explicitly given above. For reference, below are sources of two of the officially supported versions.

gcc 4.8.1
OpenJDK 7u40
Should one of the above links fail for some reason, try our local mirror instead.
You can also find sources of other versions and even browsable repositories on-line if you look hard enough.

Output specification

The file you submit should contain a whitespace-separated sequence of at most 50,000 integers, each between − 2⁶³ and 2⁶³ − 1, inclusive. (Don’t worry about the type of whitespace, we will format it properly before feeding it into our Java program.)

Internet Problem Solving Contest

IPSC 2014

Problem H – Hashsets

Problem specification

Technical details

Output specification