Internet Problem Solving Contest

IPSC 2012

Problem D – Data mania

This year, we, the IPSC organizers, finally decided to rewrite our old webpage and grading system. The old code was full of hacks and it looked more like spagetti code than anything else. During this conversion, we found out that we have quite a lot of data about previous years (teams, submissions, etc.). One of the things that is still missing (as you can see on our page) is the hall of fame. However, we would like to make it less boring than just “winners of the year XY”. Thus, we need to analyze our data and calculate the most crazy statistics like

“The names of teams with the longest accepted submissions”. But as you can guess, we are now quite busy with the preparation of the contest – the new grader is being finished as we speak, and we still need to finish some tasks, including this one. Therefore, you should help us!

Problem specification

You will be given a set of five data tables about previous three IPSC contests (2009-2011). We already did a good job in cleaning these data so you can assume that it is consistent (we will explain later what this means). Note that these data file are common for both subproblems (easy and hard).

Your task is to compute results of several data analytics queries that are given in plain English.

Data files specification

The supplied data will consist of the following tables: problems, teams, team members, submissions and submission results. The files are named “d-problems.txt”, “d-teams.txt”, “d-members.txt”, “d-submissions.txt” and “d-messages.txt”.

In general, each table consists of several space-separated columns. Each column holds a string consisting of one or more permitted characters. The permitted characters are letters (a-z, A-Z), digits (0-9), underscores, dashes and colons (_-:). All other characters (including spaces) were converted to underscores.

The file d-problems.txt contains three space-separated columns. The first column contains the year of the contest, the second column contains the task (one letter, A-Z), and the third column contains the (co-)author of the problem. If the task had multiple co-authors, they will be listed on separate lines.

The file d-teams.txt contains five space-separated columns. The first column contains a TIS (a unique identifier of a team; also, teams in different years of the contest never share the same TIS). The second column contains the name of a team, the third column contains its country, the fourth column contains the name of its institution, and the fifth column contains the division (“open” or “secondary”).

The file d-members.txt contains three space-separated columns. The first column contains the TIS of a team, the second column contains the name of one team member, and the third column contains the age of that member (or zero if age was not specified). If the team had more than one team member, each team member will be listed on a separate line.

The file d-submissions.txt contains six space-separated columns. Each row corresponds to one submission received by our server. The first column contains the date when the submission was received in the format “YY-MM-DD”. The second column contains the corresponding time in the format “hh:mm:ss”. The third column is the TIS of the team who sent the submission. The fourth column is the subproblem identifier. It consists of a letter and a number (1 for easy, 2 for hard), such as “A2” or “G1”. The fifth column contains the size of the submitted file (in bytes). Finally, the last column contains the hash of the submitted file.

The file d-messages.txt contains four space-separated columns. Each row describes a message shown to one of the teams. The first column contains the TIS of the team. The second column contains a unix timestamp of the moment when the message was created. The third column contains subproblem identifier (in the same format as above). Finally, the fourth column contains the status message which is either “OK” or “WA”. (Note that during actual contests there were several other possible messages including “contest is not running” and “too many submissions”. Those were removed when we cleaned up the data.)

You may assume the following:

Input specification

The first line of the input file contains an integer t specifying the number of test cases. Each test case is preceded by a blank line.

Each test case consists of one or several consecutive lines describing the analytics query in plain English. Unless otherwise stated, you should assume that

Output specification

For each testcase, output several lines. On the first line, output an integer c – the number of results for that testcase. The next c lines should contain the results as tuples, with elements of a tuple separated by single spaces. (Note that the format of the output is very similar to the format of the data files.)

If the results should be sorted according to some sorting criteria, proceed as follows: Integers and doubles should be sorted according to their numerical value. Strings should be sorted lexicographically according to their ASCII value. Years should be sorted chronologically (in increasing value).

Moreover, if the query involves string comparison, the comparison should be done using ASCII values. You may assume that no query will require comparison of floating-point values for equality.



For each year (in chronological order) find the first accepted submission. Print the year, the name of the team that made it and the time from the start of the contest. Time should be formatted as "mm:ss".

For each year (in chronological order) print the year, the number of OK submissions and the number of WA submissions in that year.
2009 Bananas_in_Pyjamas 02:54
2010 Gennady_Korotkevich 02:28
2011 Never_Retired 06:16
2009 3352 2942
2010 4441 5579
2011 2437 3340