IPSC 2012 Problem D – Data mania

IPSC 2012

Problem D – Data mania

This year, we, the IPSC organizers, ﬁnally decided to rewrite our old webpage and grading system. The old code was full of hacks and it looked more like spagetti code than anything else. During this conversion, we found out that we have quite a lot of data about previous years (teams, submissions, etc.). One of the things that is still missing (as you can see on our page) is the hall of fame. However, we would like to make it less boring than just “winners of the year XY”. Thus, we need to analyze our data and calculate the most crazy statistics like

“The names of teams with the longest accepted submissions”. But as you can guess, we are now quite busy with the preparation of the contest – the new grader is being ﬁnished as we speak, and we still need to ﬁnish some tasks, including this one. Therefore, you should help us!

Problem speciﬁcation

You will be given a set of ﬁve data tables about previous three IPSC contests (2009-2011). We already did a good job in cleaning these data so you can assume that it is consistent (we will explain later what this means). Note that these data ﬁle are common for both subproblems (easy and hard).

Your task is to compute results of several data analytics queries that are given in plain English.

Data ﬁles speciﬁcation

The supplied data will consist of the following tables: problems, teams, team members, submissions and submission results. The ﬁles are named “d-problems.txt”, “d-teams.txt”, “d-members.txt”, “d-submissions.txt” and “d-messages.txt”.

In general, each table consists of several space-separated columns. Each column holds a string consisting of one or more permitted characters. The permitted characters are letters (a-z, A-Z), digits (0-9), underscores, dashes and colons (_-:). All other characters (including spaces) were converted to underscores.

The ﬁle d-problems.txt contains three space-separated columns. The ﬁrst column contains the year of the contest, the second column contains the task (one letter, A-Z), and the third column contains the (co-)author of the problem. If the task had multiple co-authors, they will be listed on separate lines.

The ﬁle d-teams.txt contains ﬁve space-separated columns. The ﬁrst column contains a TIS (a unique identiﬁer of a team; also, teams in diﬀerent years of the contest never share the same TIS). The second column contains the name of a team, the third column contains its country, the fourth column contains the name of its institution, and the ﬁfth column contains the division (“open” or “secondary”).

The ﬁle d-members.txt contains three space-separated columns. The ﬁrst column contains the TIS of a team, the second column contains the name of one team member, and the third column contains the age of that member (or zero if age was not speciﬁed). If the team had more than one team member, each team member will be listed on a separate line.

The ﬁle d-submissions.txt contains six space-separated columns. Each row corresponds to one submission received by our server. The ﬁrst column contains the date when the submission was received in the format “YY-MM-DD”. The second column contains the corresponding time in the format “hh:mm:ss”. The third column is the TIS of the team who sent the submission. The fourth column is the subproblem identiﬁer. It consists of a letter and a number (1 for easy, 2 for hard), such as “A2” or “G1”. The ﬁfth column contains the size of the submitted ﬁle (in bytes). Finally, the last column contains the hash of the submitted ﬁle.

The ﬁle d-messages.txt contains four space-separated columns. Each row describes a message shown to one of the teams. The ﬁrst column contains the TIS of the team. The second column contains a unix timestamp of the moment when the message was created. The third column contains subproblem identiﬁer (in the same format as above). Finally, the fourth column contains the status message which is either “OK” or “WA”. (Note that during actual contests there were several other possible messages including “contest is not running” and “too many submissions”. Those were removed when we cleaned up the data.)

You may assume the following:

For the purpose of conversion between unix timestamps and dates our server runs in the timezone +02:00.
Each contest starts exactly at noon and lasts until 16:59:59. (Note: we shifted the 2009 contest by -2 hours to guarantee this.)
The triples (TIS, timestamp, subproblem) are unique, i.e., there are no two concurrent submissions of the same problem from the same team.
Submissions and messages can be paired using the triple (timestamp, TIS, subproblem). That is:
- For each submission, there is exactly one corresponding message with its evaluation.
- For each message there is exactly one corresponding submission.
Submissions and messages are consistent with problems.
- For each submission/message, the subproblem is a valid subproblem for that year, i.e. there is a corresponding problem for that year.
- For each year and the problem, there are exactly two subproblems (easy and hard) and both contain at least one submission.
For each submission there is a team with the corresponding TIS.
For each year and subproblem, each team has at most one submission that received the message “OK”. That submission, if it exists, is the last one in chronological order.
For each team there is at least one member.
For each team there is at most three members.
For each team member there is a team with the corresponding TIS.

Input speciﬁcation

The ﬁrst line of the input ﬁle contains an integer t specifying the number of test cases. Each test case is preceded by a blank line.

Each test case consists of one or several consecutive lines describing the analytics query in plain English. Unless otherwise stated, you should assume that

“subproblem” means year + subproblem,
a team is identiﬁed by its TIS, not by its name,
an accepted submission is a submission that received the message “OK”.

Output speciﬁcation

For each testcase, output several lines. On the ﬁrst line, output an integer c – the number of results for that testcase. The next c lines should contain the results as tuples, with elements of a tuple separated by single spaces. (Note that the format of the output is very similar to the format of the data ﬁles.)

If the results should be sorted according to some sorting criteria, proceed as follows: Integers and doubles should be sorted according to their numerical value. Strings should be sorted lexicographically according to their ASCII value. Years should be sorted chronologically (in increasing value).

Moreover, if the query involves string comparison, the comparison should be done using ASCII values. You may assume that no query will require comparison of ﬂoating-point values for equality.

Example

input

2

For each year (in chronological order) find the first accepted submission. Print the year, the name of the team that made it and the time from the start of the contest. Time should be formatted as "mm:ss".

For each year (in chronological order) print the year, the number of OK submissions and the number of WA submissions in that year.

output

3
2009 Bananas_in_Pyjamas 02:54
2010 Gennady_Korotkevich 02:28
2011 Never_Retired 06:16
3
2009 3352 2942
2010 4441 5579
2011 2437 3340

Internet Problem Solving Contest

IPSC 2012

Problem D – Data mania