Coordinate compression is a useful tool in many problems in computational geometry. This is a simple trick that can be used whenever we have a task in which only relative positions of objects matter but we don’t care about their exact coordinates.
Coordinate compression is performed as follows:
For example, if the old x-coordinates of your objects were 10, 3, 10, and 47, their new x-coordinates are 1, 0, 1, and 2.
Why did we mention coordinate compression? For two reasons.
First, similarity of two sets of rectangles can obviously be defined using coordinate compression: two sets are similar if and only if they are exactly the same after coordinate compression.
Second, we can use this approach to enumerate all distinct arrangements.
Each of our n rectangles corresponds to two x-coordinates and two y-coordinates. Hence, in each direction there are at most 2n distinct coordinates. Which means that after coordinate compression all coordinates have to be between 0 and 2n − 1, inclusive.
And for n = 4 this gives us a solution that’s fast enough: Generate all possible n-tuples of rectangles on the 2n × 2n grid, and count those that do not change under coordinate compression.
In order to both speed up the generation and to make sure we only generate each set once, imagine that each rectangle is a 4-tuple of coordinates, and generate the rectangles so that these 4-tuples are in lexicographic order.
For n = 4 there are 282 = 784 ways to draw a rectangle onto an 8 × 8 grid, and therefore (784 ⋅ 783 ⋅ 782 ⋅ 781)/4!≈15.6 × 109 sets of four such rectangles. This is already a reasonably small number, and with a few simple optimisations it is possible to generate all distinct sets for n = 4 in under a minute.
With n up to 5000, we almost certainly need a polynomial-time solution.
Let’s start by considering a slightly different problem. As in the original, we are drawing n rectangles, but there are two changes. First, each rectangle has a different color (i.e., their order matters). Second, rectangles are now allowed to be identical.
The solution to our original problem will consist of two steps. First we solve the (much easier) new problem we just stated, then we show how to use its result to compute the answer we seek.
Let’s first solve the new problem in one dimension: what is the number of fundamentally different sets of n colored intervals on a line?
This can be done using dynamic programming. Let dp[n][k] be the number of arrangements of n intervals such that their endpoints have exactly k different coordinates. We can now start from dp[1][2]=1 and fill the table row by row. For each endpoint of a new segment we either select an endpoint used by any of the already placed segments, or we create a new one.
This intermediate result has an OEIS entry.
To extend this result to two dimensions, we just observe that the two dimensions are completely independent: any interval on the x axis can be paired with any interval on the y axis to get a rectangle. Thus, the number of arrangements of n colored rectangles is simply the square of the number of arrangements of n colored intervals.
Now we need to deal with the fact that some of the rectangles may coincide. Let’s assume that we already computed the number of arrangments of i distinct labeled rectangles for all i < n. Then the number of arrangements of n distinct labeled rectangles is:
$$\operatorname{distinct-labeled}(n) = \operatorname{labeled}(n) - \sum_{i=1}^{n-1} \operatorname{distinct-labeled}(i) \cdot S_2[n,i]$$
where S2[n, i] is the Stirling number of the second kind, or also the number of partitions of set of size n to i non-empty subsets. Removing the labels is now trivial
$$\operatorname{distinct-unlabeled}(n) = \frac{\operatorname{distinct-labeled}(n)}{n!}$$
In total, we need O(n2) to calculate the answer for all i between 1 and n.
One can use the inversion formula to express the above quantity using Stirling numbers of the first kind:
$$\operatorname{distinct-labeled}(n) = \sum_{i=1}^{n} (-1)^{n-i} \operatorname{labeled}(i) \cdot S_1[n,i]$$