CGD Kernel Applications
This page contains links to the implementation of a few simple kernels and benchmarks. The kernels show how simpler algorithms are implemented in CGD, while the benchmarks showcase more advanced CGD capabilities and serve to evaluate its performance.
For some benchmarks both an original and an optimized version are presented. The CGD model makes easier some high level optimizations such as data layout changes. At the same time, many lower level optimizations are done automatically, including communication scheduling and data reuse.
Examples
- Stencil : A standard 2D PDE solver for heat
dissipation. Optimizations include communication aggregation and communication - computation overlapping.
- FFT 2D : A standard 2D Fast Fourier Transform based on block column and block row decompositions.
- Matrix : A standard matrix multiplication algorithm based on 2D, block column, and block row decompositions.
Benchmarks
- NPB FT : The FT NASA parallel benchmark
solves a 3D PDE using spectral methods. The most computation intensive
sections compute X,Y,Z-wise FFTs on local domains, while the most
communication intensive steps are the X-Y and Y-Z transpositions for 2D
domain decompositions. Optimizations include communication - computation overlapping.
- Barnes-Hut : The Splash2 benchmark computes the interaction between N bodies as time evolves. Each time iteration requires rebalancing a tree containing all particles, computing interactions hierarchically, moving particles, and repartitioning the particle tree.
