Related Documents
Optimizing Communication Scheduling using Dataflow Semantics. Adrian Soviani, Jaswinder Pal Singh. International Conference on Parallel Processing (ICPP'09). PDF, slides
Presents how coarse grain dataflow semantics (CGD) can describe data and task parallelism at high level. Writing efficient codes is easier compared to message passing: communication and synchronization are added automatically and optimized for specific architectures; design space exploration and many high level optimizations require only redefining data distributions. CGD implementations currently support MPI, SHMEM, and pthreads. Results on SGI Altix 4700 show a 27% improvement for NPB FT, and 41% for the stencil micro-kernel.
A Hierarchical Bandwidth Cost Model for Collective Communication. Adrian Soviani, Jaswinder Pal Singh. Draft. request
Presents an improvement of DBSP in the context of estimating collective communication costs for modern fat-tree interconnects. The proposed model accurately estimates cost for globally unbalanced patterns where the number of messages going to each level of the hierarchy is uneven across processors e.g. broadcast, nearest neighbor
A Portable Communication Library for Distributed Datastructures. Adrian Soviani, Jaswinder PalSingh. Draft. PDF
Summary of CGD library interface. Includes experiments for the PDE and FFT application kernels running on SGI Altix 4700 and 2x4-core Opteron systems.
Coarse Grain Dataflow Programming Model. Presentation of preliminary results at Geofluid Dynamics Lab, Princeton. PDF
Discovering Performance Bottlenecks in Large Scale Parallel Applications. Presentation related to MOM4 project at Geofluid Dynamics Lab, Princeton. PDF
