Published on *Computer Science Department at Princeton University* (http://www.cs.princeton.edu)

Substantial progress in Parallel Scientific Computation will emerge

from improvements in the effectiveness of current parallel machines,

the development of new scientific algorithms, and the employment of

more powerful multiprocessors. Successfully addressing these issues in

a cost-effective way requires extensive experimentation.

Nevertheless, the large computational complexity of scientific

applications and the high cost of multiprocessor systems, make the

quantitative and qualitative analyses of parallel workloads a

difficult and costly endeavor.In this thesis, I address some of the issues involved in the modeling

and evaluation of parallel scientific computations. More

specifically, I introduce Functional Algorithm Simulation, that is,

simulation without performing the bulk of numerical calculations

involved in the applications studied. Functional Algorithm Simulation

is applicable in the evaluation of algorithms simulating complex

systems, for which the core-set of time-consuming calculations and

data-exchanges can be determined from input information, before the

actual computations take place. To assess the principles of

Functional Algorithm Simulation I built the {em Functional Algorithm

Simulation Testbed (FAST)/}, a software prototype system for

approximately simulating the parallel execution of such algorithms on

uniprocessor workstations. FAST has been used to evaluate parallel

executions of three interesting and important scientific algorithms:

SIMPLE, a Computational Fluid Dynamics code; the Fast Multipole

Method, and a modified version of the Barnes-Hut algorithm, which

solve the N-Body problem and have applications in Computational

Molecular Dynamics and Astrophysics. Experimentation with FAST shows

that approximate simulation can give valid and useful results. FAST

enables us to study parallel executions of much larger problem sizes

and more processors than those reported so far, with modest computing

resources. Also, it allows us to collect detailed information

characterizing parallel executions on various message-passing

architectures, analyze the effects of communication overhead to

parallel performance, and study the scalability of parallel

algorithms.