Internet Problem Solving Contest

IPSC 2009

Solution to Problem M – Muzidabutur

This solution requires understanding of regular expressions. If you are not familiar with them, we recommended to visit page http://www.regular-expressions.info.

The description of Muzidabutur is a regular expression (regexp). Probably your first idea was to write program that matches regexp written in Muzidabutur syntax. This method works, but it is not the easiest one, because there are so many tools for working with regular expressions. A better idea is to rewrite the description of Muzidabutur into some normal regexp syntax and then to use existing tools like grep, sed, or programming languages with built-in regular expressions.

The transformation from the description to a regexp can also be done by using regexp tools (we used sed). In this solution we use extended regexps (ERE) as specified by the POSIX standard. It is easy to derive following rules.

  1. Pattern ‘THE STRING XXXXX’ changes to ‘(XXXXX)
  2. Pattern ‘AT LEAST XX TIMES’ changes to ‘\{12,\}’. Similar rules for other types of iteration.
  3. Pattern ‘OR’ changes to ‘|
  4. Pattern ‘FOLLOW BY’ changes to an empty string.
  5. Pattern ‘ONE OF THE CHARACTERS XXXXX’ changes to ‘[XXXXX]’.
  6. Pattern ‘ANY CHARACTER EXCEPT XXXXX’ changes to ‘[\^XXXXX]’.
  7. Erase all spaces and all backslashes before brackets, add ‘\^\\(’ at the beginning of the regexp and ‘\\)\$’ at the end of the regexp (we want to enclose whole expression in brackets – ‘^a|b\$’ is not correct).

The rules described above are enough to solve the easy input and most test cases from the hard input. In the hard input there were 10 cases with special characters in strings and character sets, and additionally they also contain special characters and sequences from other types of regular expressions. For example, there were character sets ‘[:digit]’, ‘a][bc’, ‘\^47’, and strings containing ‘*’, ‘\\+’, ‘\\t’, ‘\\s’ and others. These special cases could be resolved manually, or by improving our transformation rules (i.e., by fixing rules 1, 5 and 6). New rules are listed below.

  1. Pattern ‘THE STRING XXXXX’ changes to ‘([[.X.]][[.X.]][[.X.]][[.X.]][[.X.]])’ – or just add ‘\\’ before the special characters.
  2. Pattern ‘ONE OF THE CHARACTERS XXXXX’ changes to ‘[[.X.][.X.][.X.][.X.][.X.]]’. Another solution is to rearrange characters into proper order (‘\^’ is not on the beginning, ‘-’ is on the end, ‘]’ is on the beginning).
  3. Pattern ‘ANY CHARACTER EXCEPT XXXXX’ changes to ‘[\^[.X.][.X.][.X.][.X.][.X.]]’.