IPSC Logo

Internet Problem Solving Contest

IPSC 2014

Problem D – Disk space eater

“We could always ask them to find a message in a file full of zeros.”

“Isn’t that too easy?”

“Then let’s make the file huge, like 1 EB. That would be hard, right?”

“Sure it would, because nobody can download an exabyte of data.”

“We can compress it! And if it is still too large, we can compress it again and again. And again!”

Compression algorithm

The input files are compressed with bzip2. Bzip2 is a block-based compression algorithm that takes a single file and produces a compressed version of the same file. For more information on bzip2, see its Wikipedia entry or the project web page.

If you are on Linux or a Mac, you probably already have bzip2 installed (just type bzip2 or bunzip2 into your terminal). If not, install the bzip2 package. There is also bzip2 for Windows.

Problem specification

In each subproblem you are given a file that has been compressed using bzip2 one or more times. Your task is to extract a message from the original file. The original file contains exactly one message: a contiguous block of printable non-whitespace ASCII characters. The message has at most 100 characters. Every other byte in the original file is a zero byte.

In the easy subproblem, the size of the original file is roughly 1 TB and it was compressed twice in a row. The hard subproblem has four nested layers. Have fun!

Output specification

Submit a text file with the extracted message.