This is the first of a blog series to dive into the loop transformations that are necessary for an automatic parallelizing compiler.
Loop transformations https://homes.luddy.indiana.edu/achauhan/Teaching/B629/2006-Fall/LectureNotes/27-loop-transformations.html
https://www.cse.iitk.ac.in/users/swarnendu/courses/autumn2020-cs610/loop-transformations.pdf
Transformation matrix A has integral components and $ | det(A) | =1$ |
Exchange the inner loop and outer loop.
A generalization of loop interchange for beyond two loops.
Reverse the iteration space of one loop. $[0-N]$ becomes $[N-0]$
“Skew” the loop nest to iterate in a different (e.g, diagonal) way so one of the loop level can be parallelized.
Transform a while loop into a if-do-while loop. Good for pipelining and loop-invariant code motion
Get the first couple of loop iterations outside of the loop.
Split out several iteration of the loop as sequential code, and the rest into multiple loop bodies.
Adjust one or more statement so the loop-carried dependence become intra-iteration dependence.
Partition the loop iteration space into smaller chunks. To improve the cache performance or for parallelization.
Inverse of loop distribution. Sometimes used to increase the granularity of the loop after distribution.
Break the loop body and separate a loop into several ones.
Combine several iterations of a loop into one loop body. Reduce the number of iterations. Straight-line code optimizations.
Break a scalar operation $T=T+xxx$ to an array operation $T[i]=T[i-1]+xxx$. Almost like privatization.
What is a dependence that can be broken by loop rotation?
for i in [0..N] {
res = a[i-1];
a[i] = cos[i];
}
while node {
load(node->value);
node = node->next;
}