Lesson 5: Global Analysis & SSA
- discussion
- global analysis & optimization
- static single assignment
-
SSA slides from Todd Mowry at CMU
another presentation of the pseudocode for various algorithms herein -
Revisiting Out-of-SSA Translation for Correctness, Code Quality, and Efficiency
by Boissinot on more sophisticated was to translate out of SSA form - tasks due October 7
Gist
Dominators
Lots of definitions!
- Reminders: Successors & predecessors. Paths in CFGs.
- A dominates B iff all paths from the entry to B include A.
- The dominator tree is a convenient data structure for storing the dominance relationships in an entire function. The recursive children of a given node in a tree are the nodes that that node dominates.
- A strictly dominates B iff A dominates B and A ≠ B. (Dominance is reflexive, so "strict" dominance just takes that part away.)
- A immediately dominates B iff A dominates B but A does not strictly dominate any other node that strictly dominates B. (In which case A is B's direct parent in the dominator tree.)
- A dominance frontier is the set of nodes that are just "one edge away" from being dominated by a given node. Put differently, A's dominance frontier contains B iff A does not strictly dominate B, but A does dominate some predecessor of B.
- Post-dominance is the reverse of dominance. A post-dominates B iff all paths from B to the exit include A. (You can extend the strict version, the immediate version, trees, etc. to post-dominance.)
An algorithm for finding dominators:
dom = {every block -> all blocks}
while dom is still changing:
for vertex in CFG:
dom[vertex] = {vertex} ∪ ⋂(dom[p] for p in vertex.preds}
The dom
relation will, in the end, map each block to its set of dominators.
We initialize it as the "complete" relation, i.e., mapping every block to the set of all blocks.
The loop pares down the sets by iterating to convergence.
The running time is O(n²) in the worst case. But there's a trick: if you iterate over the CFG in reverse post-order, and the CFG is well behaved (reducible), it runs in linear time—the outer loop runs a constant number of times.
Natural Loops
Some things about loops:
- Natural loops are strongly connected components in the CFG with a single entry.
- Natural loops are formed around backedges, which are edges from A to B where B dominates A.
- A natural loop is the smallest set of vertices L including A and B such that, for every v in L, either all the predecessors of v are in L or v=B.
- A CFG is reducible iff every backedge has a natural loop.
- A language that only has
for
,while
,if
,break
,continue
, etc. can only generate reducible CFGs. You needgoto
or something to generate irreducible CFGs.
- A language that only has
Loop-Invariant Code Motion (LICM)
And finally, loop-invariant code motion (LICM) is an optimization that works on natural loops. It moves code from inside a loop to before the loop, if the computation always does the same thing on every iteration of the loop.
A loop's preheader is its header's unique predecessor. LICM moves code to the preheader. But while natural loops need to have a unique header, the header does not necessarily have a unique predecessor. So it's often convenient to invent an empty preheader block that jumps directly to the header, and then move all the in-edges to the header to point there instead.
LICM needs two ingredients: identifying loop-invariant instructions in the loop body, and deciding when it's safe to move one from the body to the preheader.
To identify loop-invariant instructions:
iterate to convergence:
for every instruction in the loop:
mark it as LI iff, for all arguments x, either:
all reaching defintions of x are outside of the loop, or
there is exactly one definition, and it is already marked as
loop invariant
(This determination requires that you already calculated reaching definitions! Presumably using data flow.)
It's safe to move a loop-invariant instruction to the preheader iff:
- The definition dominates all of its uses, and
- No other definitions of the same variable exist in the loop, and
- The instruction dominates all loop exits.
The last criterion is somewhat tricky: it ensures that the computation would have been computed eventually anyway, so it's safe to just do it earlier. But it's not true of loops that may execute zero times, which, when you think about it, rules out most for
loops! It's possible to relax this condition if:
- The assigned-to variable is dead after the loop, and
- The instruction can't have side effects, including exceptions—generally ruling out division because it might divide by zero. (A thing that you generally need to be careful of in such speculative optimizations that do computations that might not actually be necessary.)
Static Single Assignment (SSA)
You have undoubtedly noticed by now that many of the annoying problems in implementing analyses & optimizations stem from variable name conflicts. Wouldn't it be nice if every assignment in a program used a unique variable name? Of course, people don't write programs that way, so we're out of luck. Right?
Wrong! Many compilers convert programs into static single assignment (SSA) form, which does exactly what it says: it ensures that, globally, every variable has exactly one static assignment location. (Of course, that statement might be executed multiple times, which is why it's not dynamic single assignment.) In Bril terms, we convert a program like this:
@main {
a: int = const 4;
b: int = const 2;
a: int = add a b;
b: int = add a b;
print b;
}
Into a program like this, by renaming all the variables:
@main {
a.1: int = const 4;
b.1: int = const 2;
a.2: int = add a.1 b.1;
b.2: int = add a.2 b.1;
print b.2;
}
Of course, things will get a little more complicated when there is control flow. And because real machines are not SSA, using separate variables (i.e., memory locations and registers) for everything is bound to be inefficient. The idea in SSA is to convert general programs into SSA form, do all our optimization there, and then convert back to a standard mutating form before we generate backend code.
ϕ-Nodes
Just renaming assignments willy-nilly will quickly run into problems. Consider this program:
@main(cond: bool) {
.entry:
a: int = const 47;
br cond .left .right;
.left:
a: int = add a a;
jmp .exit;
.right:
a: int = mul a a;
jmp .exit;
.exit:
print a;
}
If we start renaming all the occurrences of a
, everything goes fine until we try to write that last print a
.
Which "version" of a
should it use?
To match the expressiveness of unrestricted programs, SSA adds a new kind of instruction: a ϕ-node. ϕ-nodes are flow-sensitive copy instructions: they get a value from one of several variables, depending on which incoming CFG edge was most recently taken to get to them.
In Bril, a ϕ-node appears as a phi
instruction:
a.4: int = phi .left a.2 .right a.3;
The phi
instruction chooses between any number of variables, and it picks between them based on labels.
If the program most recently executed a basic block with the given label, then the phi
instruction takes its value from the corresponding variable.
You can write the above program in SSA like this:
@main(cond: bool) {
.entry:
a.1: int = const 47;
br cond .left .right;
.left:
a.2: int = add a.1 a.1;
jmp .exit;
.right:
a.3: int = mul a.1 a.1;
jmp .exit;
.exit:
a.4: int = phi .left a.2 .right a.3;
print a.4;
}
It can also be useful to see how ϕ-nodes crop up in loops.
(An aside: some recent SSA-form IRs, such as MLIR and Swift's IR, use an alternative to ϕ-nodes called basic block arguments. Instead of making ϕ-nodes look like weird instructions, these IRs bake the need for ϕ-like conditional copies into the structure of the CFG. Basic blocks have named parameters, and whenever you jump to a block, you must provide arguments for those parameters. With ϕ-nodes, a basic block enumerates all the possible sources for a given variable, one for each in-edge in the CFG; with basic block arguments, the sources are distributed to the "other end" of the CFG edge. Basic block arguments are a nice alternative for "SSA-native" IRs because they avoid messy problems that arise when needing to treat ϕ-nodes differently from every other kind of instruction.)
Bril in SSA
Bril has an SSA extension.
It adds support for a phi
instruction.
Beyond that, SSA form is just a restriction on the normal expressiveness of Bril—if you solemnly promise never to assign statically to the same variable twice, you are writing "SSA Bril."
The reference interpreter has built-in support for phi
, so you can execute your SSA-form Bril programs without fuss.
The SSA Philosophy
In addition to a language form, SSA is also a philosophy! It can fundamentally change the way you think about programs. In the SSA philosophy:
- definitions == variables
- instructions == values
- arguments == data flow graph edges
In LLVM, for example, instructions do not refer to argument variables by name—an argument is a pointer to defining instruction.
Converting to SSA
To convert to SSA, we want to insert ϕ-nodes whenever there are distinct paths containing distinct definitions of a variable. We don't need ϕ-nodes in places that are dominated by a definition of the variable. So what's a way to know when control reachable from a definition is not dominated by that definition? The dominance frontier!
We do it in two steps. First, insert ϕ-nodes:
for v in vars:
for d in Defs[v]: # Blocks where v is assigned.
for block in DF[d]: # Dominance frontier.
Add a ϕ-node to block,
unless we have done so already.
Add block to Defs[v] (because it now writes to v!),
unless it's already in there.
Then, rename variables:
stack[v] is a stack of variable names (for every variable v)
def rename(block):
for instr in block:
replace each argument to instr with stack[old name]
replace instr's destination with a new name
push that new name onto stack[old name]
for s in block's successors:
for p in s's ϕ-nodes:
Assuming p is for a variable v, make it read from stack[v].
for b in blocks immediately dominated by block:
# That is, children in the dominance tree.
rename(b)
pop all the names we just pushed onto the stacks
rename(entry)
Converting from SSA
Eventually, we need to convert out of SSA form to generate efficient code for real machines that don't have phi
-nodes and do have finite space for variable storage.
The basic algorithm is pretty straightforward. If you have a ϕ-node:
v = phi .l1 x .l2 y;
Then there must be assignments to x
and y
(recursively) preceding this statement in the CFG.
The paths from x
to the phi
-containing block and from y
to the same block must "converge" at that block.
So insert code into the phi
-containing block's immediate predecessors along each of those two paths:
one that does v = id x
and one that does v = id y
.
Then you can delete the phi
instruction.
This basic approach can introduce some redundant copying. (Take a look at the code it generates after you implement it!) Non-SSA copy propagation optimization can work well as a post-processing step. For a more extensive take on how to translate out of SSA efficiently, see “Revisiting Out-of-SSA Translation for Correctness, Code Quality, and Efficiency” by Boissinot et al.
Tasks
- Implement some dominance utilities:
- Find dominators for a function.
- Construct the dominance tree.
- Compute the dominance frontier.
- Implement the “into SSA” and “out of SSA” transformations on Bril functions.
- One thing to watch out for: a tricky part of the translation from the pseudocode to the real world is dealing with variables that are undefined along some paths.
- As usual, convince yourself that your implementation actually works!
- You will want to make sure the output of your "to SSA" pass is actually in SSA form. There's a really simple
is_ssa.py
script that can check that for you. - You'll also want to make sure that programs do the same thing when converted to SSA form and back again. Fortunately, brili supports the
phi
instruction, so you can interpret your SSA-form programs if you want to check the midpoint of that round trip.
- You will want to make sure the output of your "to SSA" pass is actually in SSA form. There's a really simple
- For bonus "points," implement global value numbering for SSA-form Bril code.