CGD Programming Language
Types
Generic types are declared as domain/range types or
datastructure types:
| Syntax | |
| type range range-type, ... ; | Delaration of user defined domain type |
| type partition range-type part-type [predef-part-type], ... ; | Declaration of distribution type with <= 1 domains / proc |
| type mpartition range-type part-type [predef-part-type], ... ; | Declaration of distribution type with any number domains / proc |
| type swap <range-type> swap-type [predef-part-type], ... ; | Declaration of redistribution matrix type for distributions with <=1 domains / proc |
| type mswap <range-type> swap-type [predef-part-type], ... ; | Declaration of redistribution matrix type for distributions with any number domains / proc |
| type data data-type [range-type-list], ... ; | Declaration of user datastructure type |
| type data <base-type> data-type [range-type-list], ... ; | Declaration of user datastructure array type |
| Examples | |
| type range Ra, Rb; | User defined domain types |
| type partition <Ra> PartRa [PartNP], MPartRa [PartNP]; | Distribution types based on Ra domains, assigning <= 1 or any # of domains to each proc |
| type swap <Ra> SwapRa [PartNP]; | Redistribution type describing how to convert PartRa distributions |
| type data Da, Db; | Non-decomposable datastructure types |
| type data Dc [PartRa, PartRb]; | Datastructure type decomposed by distributions PartRa and PartRb |
- Domain and datastructure types are defined by the user in C++ header files
- A global datastructure can be decomposed based on several domain representations
- Decomposition/partition and redistribution/swap types are based on arbitrary user domain types
- CGD uses C++ templates to define these types
- Redistribution types are defined as domain matrices
- if present, s(i,j) specifies a domain to be copied from proc i to proc j
- For domain and datastructure types it may be required to define helper functions (see Library Helpers).
- predefined for N-dimmensional arrays and other basic types
- Predefined distributions
- n elements
- ALLn - a single PE holds all n elements
- PPEn - each PE pi holds element pi
- ONEn - a single PE holds all n elements
- 1 element
- ALL1 - each PE holds a copy
- ONE1 - a single PE holds a copy
Declarations
Parallel and sequential computations are declared to take input and output arguments:
| Syntax | |
| function fname
(IN-arg-list
-> OUT-arg-list) type arg, ... , type arg [partition-arg], ... , ... ; |
Computation declaration |
| Examples | |
| function foo (A -> B) Data A, B; |
A, B are non-decomposable. Shortcut for A[ALL1], B[ALL1] |
| function foo (A, p
-> B) Data A[p], B[p], PartRange p[ALLn]; | A has distribution p; p has predefined distribution ALLn |
| function foo
(A, p -> B) Data A[p], B[p], Range p; |
A has distribution taken from domain argument p. Shortcut for PartRange p[PPEn] |
| function foo (A,
pin, pout
-> A*) Data A[pin], A*[pout], Range pin, pout; |
A is an in-out argument |
- For each argument
- type : data type, range type, partition type, swap type
- decomposition/partition optional : another argument or predefined
- Argument A has distribution argument pa
- A has data type allowing partition type of pa
- Distribution arguments are decomposed by predefined distributions
- ALLn - each PE knows the domains of all PEs
- PPEn - each PE knowns only its own domain
- In-out arguments A, A*
- both have same type
- may have identical or different distributions
- C++ function declaration has single argument
- CGD uses different labels for data before and after execution
Decomposition Rules
Relations between partitions and swaps describe how datastructures can be converted between partitions:
| Syntax | |
| part partition < partition ... ; | Order : decomposition inclusion |
| part partition = partition + partition ... ; | Merge : decomposition union |
| part swap : partition -> partition ; | Swap : redistribution |
| Examples | |
| part pa < pb < pc; | Decomposition pa is included in pb, included in pc |
| part pa = pb + pc; | Decomposition pa is obtained by merging domains of pb and pc |
| part sw : pa -> pb; | Decomposition pa is converted to pb according to redistribution sw |
- Order
- larger partition domains include smaller partition domains
- large -> small conversion requires nothing to be done
- Merge
- the union of domains of multiple partitions produces new partition
- logical copy, avoided if same C++ datastructure used for all partitions
- Swap
- a redistribution operation is needed to convert the datastructure
- may do messaging, copy, and synchronization depending on implementation
- Each global datastructure argument is produced by one or more computations
- producers provide datastructures in given decomposition
- subsequent computations may need datastructures in other decompositions
- decomposition rules describe how to do the conversion
- Datastructures with no decompositions are assigned PAL decomposition
Parallel Computations
When the body of a computation is defined in CGD it is considered a parallel computation. For sequential computations only the declaration is provided, and the function is written in C++. More comprehensive examples are presented in Applications.
| Syntax | |
| function fname
(IN-arg-list -> OUT-arg-list) argument-declarations { decomposition-rules | loop | computation ; ... } |
Parallel computation definition for each argument : type, distribution function body |
| loop (idx-arg, cond-arg ; IN-arg-list -> OUT-arg-list) { loop | if | computation ; ... } |
Loop computation idx-arg is iteration count cond-arg is end condition loop body |
| if (cond-arg, IN-arg-list -> OUT-arg-list) { loop | if | computation ; ... } else { ... } | If computation cond-arg is condition arg TRUE body FALSE body Input args are optional |
| fname <task-part> (IN-arg-list -> OUT-arg-list); | Computation invocation assigning a number of sequential computations to each proc according to task-part |
| Examples | |
|
loop
(idx, cond ; A[p] -> B[p]) { foo (A[p], C -> B[p]); bar (idx -> cond); } |
Loop computes B[p] each iteration value of B is given to A the next iteration end condition cond is set using iter count idx |
|
if
(cond, A[all] -> B[pb]) { foo (A[p1], C[pc] -> B[pb]); } else { bar (A[p2], C[pc] -> B[pb]); } | If computes B[pb] calling either foo or bar computations A[all] is provided to both bodies A[p1] or A[p2] produced within each body |
| function foo (A, pa, pb -> B) Data A[pa], B[pb]; Range pa, pb; ... |
Computation declaration |
| foo <ALL1> (A, pa, pb -> B); | Standard computation invocation |
| foo <ALL1> (A[pa], pa, pb -> B[pb]); | Explicit computation invocation |
| foo (A[pa] -> B[pb]); | Short computation invocation |
- A datastructure label represents a single value
- no logical overwriting
- C++ datastructure overwriting can happen
- updated value gets a new label
- Each datastructure produced by a computation, used by many
- decomposition conversion based on decomposition rules
- Body of parallel computation is a graph
- nodes :
- sequential, parallel computations
- (datastructure, distribution) pairs
- links between providers and users of a given datastructure
- Loops
- body of loop is subgraph
- seen as single computation from above
- dependencies: in args, other data nodes used by body
- produces: out args
- each iteration logically assigns output arguments to input arguments
- copy avoided using renaming
- If
- true and false bodies are subgraphs
- seen as single computation from above
- dependencies: in args, other data nodes used by true or false bodies
- produces: out args
- cond arg evaluated at runtime, single body executed
- Computation invocation
- long version : provide same arguments as declaration
- short version : last input arguments can be ignored if provided explicitly as partitions for other arguments
- An arbitrary number of sequential computations s is assigned to each proc p
- s = # domains assigned by <task-part> partition argument to p
- default ALL1 gives s = 1 for all p
- if s > 0, each argument should have a distribution with
- 1 domain when a single domain / proc is allowed
- >= s domains when multiple domains / proc are allowed
CGD Compiler
- Traverses computation graph to satisfy data dependencies
- when multiple options possible preserves original order
- generates computation order
- Adds
- redistribution, merge, and copy primitives to convert datastructures as needed
- allocation primitives for datastructures
- loops to execute computations called with MPart distributions (multiple domain/PE)
- other auxiliary code
- Optimizations
- Data label to C++ datastructure assignment
- avoids redistribution and loop related copy
- memory reuse : multiple compatible non-overlapping data labels
assigned to single datastructure
- Computations
- removes unnecessary computations
- moves computations out of the loops when possible
- Inlines parallel computations to increase scope of above optimizations

