CGD Library Primitives
Types
The library can work with arbitrary global datastructure types as long as a range type is defined to represent datastructure subdomains:
| Range | Type representing a domain of a global datastructure | U |
| Data | Datastructure type decomposable by Range domains | U |
| Partition <Range> | Decomposition type assigning a domain to each PE | L |
| MPartition <Range> | Decomposition type assigning a list of domains to each PE | L |
| Swap <Range> | Describes a redistribution operation | L |
- U tagged types are user defined
- L tagged types are template types defined by the library
- A decomposition/partition may consist of overlapping domains
- A redistribution/swap type contains for each PE which domains/ranges have to be sent and received to redistribute a datastructure
As previously mentioned a SPMD program consists of sequential computations and distributed datastructure allocation and redistribution primitives. When parallel computations are written in the CGD language only type definitions, sequential computations, and helper functions are written in C++.
Datastructure Allocation
| Data (T, A) | Allocates datastructure A of type T |
| DataRa (T, A, R) | Allocates global datastructure A of type T for domain R |
- Datastructures can be allocated without being decomposed
- A datastructure should be allocated for the largest domain it will hold
Datastructure Redistribution
A global datastructure can be transformed between two decompositions using two primitives:
| swapBegin (Swap S, Data A, Data B, int id, int pe) | Start redistribution of A to B according to S |
| swapEnd (Swap S, Data A, Data B, int id, int pe) | Finalizes redistribution |
| computeSwap (Partition Pa, Partition Pb, Swap S) | Computes swap that transforms decomposition A to B |
Swaps:
- Datastructures A, B can be the same
- pe represents the current PE number
- overlapping swaps are possible when different id are used. Max id number is user defined.
- If S describes redistribution from decomposition Pa to Pb
- A should hold data for domain Pa [pe]
- B should be able to hold data for domain Pb [pe]
Implementations:
- message passing : MPI, SHMEM
- begin marshalls and sends messages
- end receives messages and unmarshalls
- asynchronous
- CC-SAS
- begin waits for availability, copy between datastructures
- ends waits everything to be done
- no buffering
Computations
Most work is done by functions taking subdomains of global datastructures as arguments:
| fname (Arg1, Arg2, ...) | Sequential computation taking in, out, and inout arguments |
Global datastructure argument:
- its local domain should be supplied as another argument
- it should be allocated for the given domain outside the function call
- exception: domain is computed within the function
- can be used in read-only, write-only or read-write mode
Helper functions
The following helper functions are implemented for a global datastructure type:
| copy (Data A, Range Ra, Data B, Range Rb) | Copy A [Ra] to B [Rb] | R |
| sizeBytes (Data A, Range R) | Number of bytes needed to marshall A[R] | R |
| toBytes (char* buf, Data A, Range R) | Write a representation of A[R] to buf | R |
| fromBytes (char* buf, Data A, Range R) | Read a representation of A[R] from buf | R |
| ckalloc (A, R) | Allocate datastructure A for domain R | A |
| print (f, A, R) | Print A[R] to file f | O |
- R tagged functions are needed by redistribution operations / swaps
- A tagged functions are needed by data allocation macros
- O tagged functions are optional
If a range type is used to automatically compute a redistribution with computeSwap the following functions are needed:
| intersect (Range Ra, Range Rb, Range Rc) | Compute intersection of two domains |
| noElements (Range R) | Get domain element count |
For N-dimensional arrays and other basic types the library defines range and data types, and all required helper functions.
