Published on *Computer Science Department at Princeton University* (http://www.cs.princeton.edu)

Report ID:

TR-430-93

Authors:

Chao, Liang-Fang [1]

Date:

September 1993

Pages:

182

Download Formats:

[Postscript [2]]

In a parallel system, either a VLSI architecture in hardware

or a parallel program in software, the quality of the final design

depends on the ability of a synthesis system to exploit the

parallelism hidden in the input description of applications. Since

iterative or recursive algorithms are usually the most time-critical

parts of an application, the parallelism embedded in the repetitive

pattern of an iterative algorithm needs to be explored. This thesis

studies techniques and algorithms to expose the parallelism in an

iterative algorithm so that the designer can find an implementation

achieving a desired execution rate. In particular, the objective is

to find an efficient schedule to be executed iteratively.

A form of {it data-flow graphs} is used to model the iterative part

of an application, e.g. a digital signal filter or the while/for loop

of a program. Nodes in the graph represent operations to be performed

and edges represent both intra-iteration and inter-iteration

precedence relations. A schedule of the operations is executed

repeatedly with certain execution rate. Two transformation

techniques, {it retiming} (pipelining) and {it unfolding}

(unwinding), are combined to transform a data-flow graph into another

data-flow graph while preserving the intended behavior of the

application. After the interaction of retiming and unfolding is

analyzed, it is shown that the order of retiming and unfolding is

immaterial. Algorithms are designed to find a transformation such

that tThis new technique turns out to be very useful, and can be

generalized to other problems.

For example, the problem of {it software pipelining} in parallel

compilers is modeled as a special case of our technique. Efficient

polynomial-time algorithms are derived for the scheduling on different

parallel models and implementation styles. For uniform nested loops,

multi-dimensional retiming and unfolding are defined and studied for

nested loop scheduling.

Based on the theoretical results, a novel technique, {it rotation},

is designed for loop pipelining under resource constraints. Rotation

technique repeatedly transforms a schedule To a more compact schedule.

The rotation scheduling gives very good performance from the

experiments on many benchmarks.

**Links**

[1] http://www.cs.princeton.edu/research/techreps/author/281

[2] ftp://ftp.cs.princeton.edu/techreports/1993/430.ps.gz