A variable \(v\) is called an induction variable for a loop if it is updated during the loop only by adding some loop-invariant expression to it.
If an induction variable \(v\) takes on at loop iteration \(n\) a value of the form \(cn+d\), where \(c\) and \(d\) are loop-invariant, then \(v\) is a linear induction variable.
Induction variables are easier to reason about than other variables, enabling various optimizations. Linear functions of induction variables are also induction variables, which means that often loops have several induction variables that are related to each other.
A basic induction variable \(i\) is one whose only definitions in the loop are equivalent to \(i = i + c\) for some loop-invariant expression \(c\) (typically a constant). The value \(c\) need not be the same at every definition. A basic induction variable is linear if it has a single definition and that definition either dominates or postdominates every other node in the loop.
A derived induction variable \(j\) is a variable that can be be expressed as \(ci + d\) where \(i\) is a basic induction variable, and \(c, d\) are loop-invariant. All the derived induction variables that are based on the same basic induction variable \(i\) are said to be in the same family or class.
We write \(⟨i,a,b⟩\) to denote a derived induction variable in the family of basic induction variable \(i\), with the formula \(ai + b\). A basic induction variable \(i\) can therefore be written in this notation as \(⟨i, 1, 0⟩\).
In the following code, \(i\) is a basic induction variable, \(j\) is a linear basic induction variable, \(k\) and \(l\) are linear derived induction variables in the family of \(j\), and \(m\) is a derived induction variable in the \(i\) family.
while (i < 10) { j = j + 2; if (j > 4) i = i + 1; i = i - 1; k = j + 10; l = k * 4; m = i * 8; }
All linear basic induction variables are linearly related to each other, so they can be placed into the same family.
We can find induction variables with a dataflow analysis over the loop. The domain of the analysis is mappings from variable names to the lattice of values depicted in the figure above. In this lattice, two induction variables are related only if they are the same induction variables; a variable can also be mapped to \(⊥\), meaning that it is not an induction variable, or \(⊤\), meaning that no assignments to the variable have been seen so far, and hence it is not known whether it is an induction variable.
A value computed for a program point is a mapping from variables to a value in the lattice above; that is, the dataflow values is a function. The ordering on these functions is pointwise. To compute the meet of two such functions, we compute the meet of the two functions everywhere. In other words, if functions \(F_1\) and \(F_2\) map variable \(v\) to values \(l_1\) and \(l_2\) respectively, then \(F_1 ⊓ F_2\) maps \(v\) to \(l_1⊓l_2\). This sounds complicated but is just the obvious thing to do.
For example, we might take the meet of two functions as follows: \begin{align*} & \{ i ↦ ⟨i, 1, 0⟩, j ↦ ⟨1, 2, 4⟩, k ↦ ⊤ \} ⊓ \{ i ↦ ⟨i, 1, 0⟩, j ↦ ⟨1, 4, 4⟩, k ↦ ⟨i, 1, 1⟩ \} \\ =~& \{ i ↦ ⟨i, 1, 0⟩, j ↦ ⊥, k ↦ ⟨i, 1, 1⟩ \} \end{align*}
The dataflow analysis is performed just over the loop nodes. To start the dataflow analysis, we first find all basic induction variables, which is straightforward. Then the initial dataflow value for each node is the function that maps all basic induction variables \(i\) to \(⟨i, 1, 0⟩\), and maps all other variables to \(⊤\). We introduce an artificial start node that transitions to the loop header.
\(n\) | \(f'(k)\) | ||||||||
---|---|---|---|---|---|---|---|---|---|
\(\texttt{if}~e\) \(\texttt{return}~\vec{e}\) \([e_1] ← e_2\) | \(f(k)\) | ||||||||
\(\texttt{start}\) |
| ||||||||
\(x ← e\) |
|
One entry in the above table is not defined. When the variable being updated by an assignment is the same variable for which we are trying to compute an abstract value, and it is not a basic induction variable, then its value is computed as an abstract interpretation of the expression being assigned. We interpret the various arithmetic operations on expression as faithfully as possible given the abstract representation of variables:
\begin{align*} ⟨i, a, b⟩ ± c & = ⟨i, a, b±c⟩ \\ ⟨i, a, b⟩ ± ⟨i, c, d⟩ & = ⟨i, a ± c, b±d⟩ \\ ⟨i, a, b⟩ * c & = ⟨i, a*c, b*c⟩ \end{align*}All other results of abstract interpretation yield \(⊥\), meaning that analysis cannot determine that the variable is an induction variable.
This analysis is simplified in SSA form. Since each variable is defined once,
different abstract values do not need to be computed on each edge. A basic
induction variable is one that is updated with a φ definition of the form
i3 = φ(i1, i2)
, where the next
value of the variable is incremented in the loop body by a loop-invariant
expression c: i2 ← i3 + c
.
Given that we have identified the induction variables in the code, some useful optimizations arise. Consider the following loop, which updates a sequence of memory locations:
while (i < a.length) { j = a + 3*i; [j] = [j] + 1; i = i + 2; }
The variable \(j\) is computed using multiplication, but it is a derived induction variable \(⟨i, 3, a⟩\) in the notation of the previous lecture, in the same family as the basic induction variable \(i\).
The idea of strength reduction using induction variables is to compute \(j\) using addition instead of multiplication. Perhaps even more importantly, we will compute \(j\) without using \(i\), possibly making \(i\) dead.
The optimization works as follows for a derived induction variable \(⟨i,a,b⟩\):
On our example above, this has the following effect:
s = a + 3*i; while (i < 10) { j = s; [j] = [j] + 1; i = i + 2; s = s + 6; }
Once we have derived induction variables, we can often eliminate the basic induction variables they are derived from. After strength reduction, the only use of basic induction variables is often in the loop guard. Even this use can often be removed through linear-function test replacement, also known as removal of almost-useless variables.
If we have an induction variable whose only uses are being incremented (\(i=i+c\)) and for testing a loop guard (\(i\lt n\) where \(n\) is loop-invariant), and there is a derived induction variable \(k = ⟨i,a,b⟩\), we can write the test \(i\lt n\) as \(k \lt a*n + b\). With luck, the expression \(a*n+b\) will be loop-invariant and can be hoisted out of the loop. Then, assuming \(i\) is not live at exit from the loop, it is not used for anything and its definition can be removed. The result of applying this optimization to our example is:
s = a + 3*i; t = a + 3*a.length; while (s < t) { j = s; [j] = [j] + 1; s = s + 6; }
A round of copy propagation and dead code removal gives us tighter code:
s = a + 3*i; t = a + 3*a.length; while (s < t) { [s] = [s] + 1; s = s + 6; }
In type-safe languages, accesses to array elements generally incur a
bounds check. Before accessing a[i]
, the language implementation
must ensure that i
is a legal index. For example, in Java array
indices start at zero, so the language must test that \(0 ≤ \texttt{i} \lt \texttt{a.length}\).
Returning to our example from earlier, after strength reduction we can expect the code to look more like the following (or the equivalent CFG):
s = a + 3*i; while (i < a.length) { j = s; if (i < 0 || i ≥ a.length) goto Lerror [j] = [j] + 1; i = i + 2; s = s + 6; }
This extra branch inside the loop is likely to add overhead. Furthermore, it prevents the induction variable elimination operation just discussed.
One simple improvement we can make is to implement the check
\(0 ≤ \texttt{i} \lt \texttt{n}\) in a single test. Assuming that n
is a signed
nonnegative integer, and i
is a signed integer, this test can be implemented by
doing an unsigned comparison \(\texttt{i}\lt \texttt{n}\). If i
is negative, it will
look like a large unsigned integer that will fail the unsigned
comparison, as desired. Processor architectures have a unsigned
comparison mode that supports this. For example, the jae
instruction
(``jump above or equal'') on the Intel architecture implements
unsigned comparison.
Even better would be to eliminate the test entirely. The key insight is that the loop guard (in this case, \(\texttt{i} \lt \texttt{a.length}\)) often ensures that the bounds check succeeds. If this can be determined statically, the bounds check can be removed. If it can be tested dynamically, the loop can be split into two versions, a fast version that does not do the bounds check and a slow one that does.
The bounds-check elimination works under the following conditions:
Under these conditions, the bound checks on \(k\) is superfluous and can be eliminated. But when does \(j\lt u\) imply \(k\lt n\)? Suppose that \(j = ⟨i, a_j, b_j⟩\) and \(k = ⟨i, a_k, b_k⟩\). If the \(j\) test succeeds, then \(a_j i + b_j \lt u\). Without loss of generality, assume \(a_j\gt0\). Then this implies that \(i \lt (u-b_j)/a_j\). Therefore, \(k = a_k i + b_k \lt a_k(u-b_j)/a_j + b_k\). If we can show statically or dynamically that this right-hand side is less than or equal to \(n\), then we know \(k \lt n\). So the goal is to show that \(a_k(u-b_j)/a_j + b_k ≤ n\). This can be done either at compile time or by hoisting a test before the loop. In our example, the test for \(\texttt{i} \lt \texttt{a.length}\) is that \(1*(\texttt{a.length} - 0)/1 + 0 ≤ \texttt{a.length}\), which can be determined statically. The compiler does still need to insert a test that \(i ≥ 0\) before the loop:
s = a + 3*i; if (i < 0) goto Lerr while (i < a.length) { j = s; [j] = [j] + 1; i = i + 2; s = s + 6; }
After linear-function test replacement, this code example becomes subject to induction variable elimination.
Loop guards and induction variable updates add significant overhead to short loops. The cost of loop guards involving induction variables can often be reduced by unrolling loops to form multiple consecutive copies. Multiple loop guard tests can be then be combined into a single conservative test. If the loop is unrolled to make \(n\) copies, and the loop guard has the form \(i \lt u\) where \(u\) is a loop-invariant expression and \(i\) is an induction variable with stride \(c\) that is updated at most once per loop iteration, the test \(i + c(n-1) \lt u\) conservatively ensures that all \(n\) loop copies can be run without having any guard fail.
Since this guard is conservative, there still may be \(\lt n\) iterations left to be performed. So the unrolled loop has to be followed by another copy of the original loop, the loop epilogue, which performs any remaining iterations one by one. Since loop unrolling therefore results in \(n+1\) copies of the original loop, it trades off code size for speed. If the original loop was only going to be executed for a small number of iterations, loop unrolling could hurt performance rather than improve it.
Updates to basic linear induction variables inside the unrolled loop can also be combined. If a variable \(i\) is updated with the statement \(i = i + c\), then \(n\) copies of the update can be replaced with a single update \(i = i + nc\). However, any uses of \(i\) within the copies of the loop must be changed as well -- in the second loop copy, to \(i+c\), in the third copy (if any), to \(i+2c\), and so on. These additions can often be folded into existing arithmetic computations.