What we want to do is to create a given permutation of the boxes. A permutation can be decomposed into disjoint cycles. In our case there are three types of cycles:
This all means that the number of moves necessary and sufficient to reach the final configuration is N-(number of fixed points)+(number of non-trivial cycles)-(2 if there is a non-trivial cycle with the free location).
The only problem remaining is to find the final positions of the blocks. It can be done by simulating the described procedure. We need some data structure to find the numbers x and y efficiently. Let g=gcd(N,d). Observe that all numbers of the form (C+d*x+y) mod N for fixed y are in the same remainder class modulo g. So for each number j=0, 1, ..., g-1 we will count number of blocks alread placed on positions with reaminder j (modulo g). When i-th block is processed, we compute the value v_{i}=(c_{i} mod N) mod g and look if there is a fee place in this remainder class. If there is we take y=0, else we look for next free remainder class and determine y. When we have y we can find the next free x.
So we need a data structure where we can store k items arranged into the cycle where each can be marked (as occupied) or unmarked. We need to support the following operations:
This can be implemented by the Union-FindSet algorithm. By this algorithm we will maintain the (maximal) connected segments of marked elements. To determine the next free unmarked item we perform a FIND operation to find the segment containing gived item and return the item following that segment. To mark an item we look if it does not connect some segments. If it does, we perform a UNION operation to join them.
This algorithm has linear space complexity and O(N*log^{*}(N)) time complexity if the Union-FindSet algorithm is implemented with path compression and balancing. (Path compression was sufficient to have a fast program.)