Lesson 8: Loop Optimization

Gist

Loop-Invariant Code Motion

Also from Lesson 5, recall the loop-invariant code motion optimization, which transforms code like this:

let a = ...;
let b = ...;
for (let i = 0; i < 100; ++i) {
    f(a * b);
}

Into this, by moving code that does the same thing on every iteration to the loop’s pre-header block:

let a = ...;
let b = ...;
let c = a * b;
for (let i = 0; i < 100; ++i) {
    f(c);
}

Induction Variable Elimination

Induction variable elimination is an optimization that generally targets array indexing (and similar computations). Imagine this loop that accesses an array:

for (let i = 0; i < 100; ++i) {
    f(a[i]);
}

which, in C terms, is equivalent to:

for (let i = 0; i < 100; ++i) {
    f(a + i * stride);
}

where a is a base pointer and stride is the size of the array’s elements. Induction variable elimination rewrites the code to just “bump the pointer” instead of multiplying every time:

let a_100 = a + 100 * stride;
for (let a_i = a; a_i < a_100; a_i += stride) {
    f(a_i);
}

This code only has to do an addition in the loop—it has eliminated a multiplication.

The basic idea is to find affine expressions like a + i * stride that depend on a loop induction variable (i here) and some loop-invariant values (a and stride here). Here’s the recipe:

After all this, you’ll want to do some copy propagation and dead code elimination to, for example, eliminate basic induction variables that are no longer necessary.

This optimization is also sometimes called strength reduction for induction variables (but I think that’s confusing because strength reduction is a much more general concept that doesn’t by itself have anything to do with loops). For more, check out some slides from CMU on induction variable elimination and two blog posts that cover this topic from CS 6120 in 2019: one by Yi Jing, Zhijing Li, and Neil Adit and another by Shaojie Xiang, Yi-Hsiang Lai, and Yuan Zhou.

Loop Unswitching

Loop unswitching “lifts” conditions out of loops, creating two loops. So this code:

for (let i = 0; i < 100; ++i) {
    if (c) {  // Loop-invariant value.
        f();
    } else {
        g();
    }
}

Becomes:

if (c) {
    for (let i = 0; i < 100; ++i) {
        f();
    }
} else {
    for (let i = 0; i < 100; ++i) {
        g();
    }
}

The analysis, of course, requires identifying loop-invariant code. Then the heavily lifting in the transformation is “just” rearranging the CFG and duplicating the loop’s subgraph. For more, see a blog post by Sameer Lal from CS 6120 in 2019.

Loop Fusion & Fission

Loop fusion or loop jamming takes two sequential loops and combines them into one. Its main purpose again has to do with arrays, particularly when feeding the output from one loop into the input of the next loop, like this:

for (let i = 0; i < 100; ++i) {
    b[i] = f(a[i]);
}
for (let i = 0; i < 100; ++i) {
    c[i] = g(b[i]);
}

Fusing these loops yields one that calls both f and g:

for (let i = 0; i < 100; ++i) {
    b[i] = f(a[i]);
    c[i] = g(b[i]);
}

Then, the real win comes if the intermediate array b is “dead” outside of this context, in which case we can eliminate it:

for (let i = 0; i < 100; ++i) {
    c[i] = g(f(a[i]));
}

In this case, fusion eliminated a lot of memory accesses (and perhaps even reduced the program’s heap footprint, if we no longer even have to allocate b). This optimization is extremely important in machine learning compilers and other data-intensive domains—for example, it is far more efficient to fuse a fully-connected layer with a ReLU nonlinearity than it is to compute them serially, with a temporary tensor in between.

The opposite transformation, loop fission or distribution, can be useful to optimize for locality—i.e., to make multiple, smaller loop bodies that each have a smaller memory footprint.

The key analysis you need in both case is independence, i.e., being certain that f and g do not interfere with each other so it’s legal to change the order they execute in.

Global Optimizations Galore!

This is not an exhaustive list of loop optimizations—there are so many more! For example:

See overviews of these and still more in Steve Chong’s slides on loop optimizations.

More broadly, this is where we will stop going down the standard function-level optimization rabbit hole. Before the class moves on to farther-flung topics in language implementation, for those who are curious about this sort of thing, I wanted to leave you with a few links to read about even more fancy global optimizations:

Tasks

Your task is to implement and evaluate a loop optimization.