Performance of VLSI Engines for Lattice Computations
We address the problem of designing and building efficient custom VLSI-based processors to do computations on large multi-dimensional lattices. The design tradeoffs for two architectures which provide practical engines for lattice updates are derived and analyzed. We find that I/O constitutes the principal
bottle-neck of processors designed for lattice computations, and we derive upper bounds on throughput for lattice updates based on Hong and Kung's graph-pebbling argument that models I/O. In particular we show that R = 0(BS 1/d) where R is the site update rate, B is the main memory bandwidth, S is the
processor storage, and d is the dimension of the lattice.