Published on *Computer Science Department at Princeton University* (http://www.cs.princeton.edu)

The paper is concerned with the optimal schedule for the permutation of n2 data items on an n x n square grid formed by the orthogonal superposition of 2n time multiplexed buses. The upper bound is proved by showing the existence of the n + 1 cycle, uniform schedule (that is, all two-step transfers consistently row first or alternately column first) for an arbitrary permutation. The lower bound is proved by showing the non-existence of n-cycle, non-uniform schedules for some non-degenerate permutations. We also show a simple way to perform an artitrary permutation dynamically with n buffers at every bus intersection. We further show that, with specific example of permutations, the potential increase

in parallelism by dynamically partitioning the 2n buses does not lead to a further reduction in time.