The CS 6120 Course Blog

Sparse Conditional Constant Propagation

by Mark Anastos

The goal of this project was to implement the sparse conditional constant propagation optimization as a pass on programs in the intermediate language Bril. Sparse conditional constant propagation is a compiler optimization that detects variables and expressions in a program that will always evaluate to a fixed value, and computes their values at compile time rather than at runtime. It is set apart from traditional constant propagation by its reliance on static single assignment (SSA) form to improve the efficiency of the analysis (sparse), and by its ability to detect control-flow edges which will never be executed due to constant branch conditions (conditional).

Motivation

As an example of constant propagation, consider the following block of Bril code:

    a: int = const 1;
    b: int = add a a;
    cond: bool = const false;
    br cond then else;
then:
    b: int = add b a;
else:
    print b;

Any constant propagation analysis would be able to detect that the initial defintion of b will always evaluate to the same value. As such, the addition operation could be completed at compile time and replaced in the program with const 2. Simple constant propagation does not make any conclusions about control flow, and thus would be unable to determine whether or not the instruction b: int = add b a; will be executed. However, by inspecting the program, we can see that the branch condition is false and consequently the instruction is dead code. As such, the final value of b that is printed will always be 2. Conditional constant propagation has this ability to reason about branch conditions, and thus can optimize the entire block above to the following three instructions:

    a: int = const 1;
    b: int = const 2;
    print b;

In fact, if the variable a is not used later on in the function, its definition could be removed as well.

The example above is clearly contrived to show the capabilities of conditional constant propagation. And so, it may not be obvious if and when this optimization would truly be beneficial for real programs. Why would a program contain code that will never run? A common answer to this question is that people often put code that is meant for debugging under conditionals. As such, when the code is compiled for production, with debugging disabled, all of that code is unneeded and can be removed through this optimization.

However, I think that a more important motivation for this optimization revolves around the fact that it is operating on an intermediate language. People would most likely not write programs directly in Bril, but instead they would write them in some other, higher-level language and then compile them down to Bril (and then to assembly). The compilation process could easily produce constant values that are not fully evaluated or control-flow edges that are never traversed. Other optimizations, such as function inlining, could produce oportunities for conditional constant propagation if, for example, the function arguments are constants. As such, this optimization could have significant benefits even for programs without any obviously constant expressions.

Implementation

Sparse conditional constant propagation was introduced by Wegman and Zadeck in “Constant Propagation with Conditional Branches” (1991). My implementation of the data-flow analysis is based on the description given in that paper. The analysis works on programs in SSA form, and as such I also needed to implement transformations on Bril programs to and from SSA form. After running the analysis, I then needed to actually use the information that it provides to replace computations with constant values and to eliminate dead code where possible. I wrote the optimization in TypeScript in order to take advantage of the pre-existing type definitions for Bril, which are written in that language. The optimization operates by taking in a Bril program (in the canonical JSON representation) though standard input, and outputting the optimized version of the program to standard output.

SSA

After separating out the basic blocks of a program and generating a representation of the control-flow graph (CFG), the first step in performing sparse conditional constant propagation is to convert the program into SSA form, in which each assignment is to a unique variable. In order to handle cases where the value of a variable could be from multiple of its definitions, SSA introduces a φ instruction (e.g., x_2: int = phi x_0 x_1;), which takes as many arguments as there are in-edges to the block in the CFG, and assigns to the destination one of the arguments in correspondance with the in-edge that the block was entered through. In order to convert back out of SSA, these φ instructions must be removed. In general, they can be removed by placing assignments at the end of the predecessor nodes or along the edges. As an example, the code block given in the motivation section would require one φ instruction, as there are two definitions of b that reach the print statement:

    a: int = const 1;
    b_0: int = add a a;
    cond: bool = const false;
    br cond then else;
then:
    b_1: int = add b_0 a;
else:
    b_2: int = phi b_0 b_1;
    print b_2;

The conversion to SSA form is divided into two parts: inserting φ instructions where necessary, and then renaming the variables to give each definition its own variable name. The only places where φ instructions might be necessary for a variable are in blocks on the dominance frontier of definitions of the variable. As such, in order to not add far too many instructions to the program, I needed to compute the dominator tree of the control-flow graph.1 To do this I used the Lengauer-Tarjan dominator tree algorithm. The number of φ instructions could further be reduced by running a live variable analysis and only inserting them if the variable is live in that block. However I did not do this as it does not matter for the effectiveness of the optimization and, I believe, it would more likely than not make the optimization slower.

Typically, when converting back out of SSA form, you would need to consider each SSA variable independently and add assignment statements along the control-flow edges in order to replace the φ instructions. Converting to and directly back from SSA would then make a program, in general, less efficient then the original.2 However, during implementation I realized that, specifically for this optimization, the fully general conversion back from SSA is not necessary. This is because constant propagation only ever decreases the live ranges of SSA variables, by replacing their uses with constants or removing dead code. As such, the live ranges of SSA variables that come from the same original variable will never interfere with each other and can simply be renamed back to the original variable name. If this SSA conversion were to be used for other optimizations that do not share this property with constant propagation, the conversion back would need to be modified.

Constant Propagation

The primary component of the optimization is the constant propagation analysis itself. I implemented it according to the worklist algorithm described in the paper. Doing so was mostly straightforward. However, the paper seems to not mention one point in the algorithm where it is necessary to add to the worklist, which took a while for me to figure out.3 The output of the analysis is a mapping from variables to elements of a lattice, which can be ⊤ (the variable is undefined), ⊥ (the value is unknown), or a value (the variable is a constant). Because the program is in SSA form, the analysis only needs one lattice element per variable, instead of a lattice element for each variable at each program point. The paper alludes to the fact that a constant propagation analysis can gain information from control-flow branches. For example, for any Bril branch, we can conclude that the condition variable is true on the first out-edge and false on the second. I did not implement this, as it would require keeping track multiple lattice positions per variable and would significantly complicate the analysis.4

After completing the analysis, the next step is to use its results to actually modify the code by replacing computations with constants and removing dead code. Any block that is unreachable can simply be removed from the CFG. In fact, doing this can create more dead code to remove if any variable defined elsewhere is only used in removed blocks. The next step is to replace the expressions in definitions of variables that are known to be constant with the constant value itself. Similarly, this could also create more dead code. As such, the last step is to remove any definitions of variables that have no uses. After this, the program is converted back out of SSA form, the CFG is flattened back into a single list of instructions, and the program is output in its standard JSON form.

Evaluation

It has proven difficult to evaluate the effectiveness of this optimization in a manner that would accurately reflect its utility. Upon running the optimization on several test cases, I have observed that, for programs that are written directly by humans in Bril, the optimization tends to exhibit one of two behaviors: either the program is effectively unchanged, or it is completely evaluated leaving only assignments of constants to variables, and print statements with those variables. By running these test cases, however, I have been able to ensure that the optimization does not change the behavior of well-typed Bril programs.

For example, of the test cases provided in the bril-benchmark repository, two (factorial and fibonacci) are completely unchanged by the optimization modulo the order of basic blocks, and the other two (matrix_mul and poly_mul) are completely evaluated to just constant assignments and print statements. The factor that seems to separate these two classes of programs is the existance or lack of loops. This is because, if there are no loops, then every assignment statement occurs no more than once. As such, because vanilla Bril has no channels through which data can enter a function from the outside, every variable's value can be determined through the conditional constant propagation analysis. When loops are involved, variables' values change between iterations, and as such the analysis is unable to determine a constant value for the variables.

I wrote a few programs in TypeScript, and used the ts2bril compiler to convert them to Bril. These converted programs are qualitatively different from the handwritten programs, as the compiler inserts a lot of short-lived variables in order to translate contructs from the TypeScript language. Due to the high number of redundant variables, I believe a more effective optimization for working on programs outputted by this compiler would be copy propagation. For a TypeScript program that prints the first 20 fibonacci numbers, the constant propagation optimization had the following effect:

Instruction CountUnoptimizedOptimizedChange
Static2623–11.5%
Dynamic410388–5.4%

For this program, the optimization removed only three instructions, one of which was in the loop body (an extraneous assignment to a variable that was never used). In general, it is difficult to judge the effectiveness of this optimization without the infrastructure of other optimizations and compilers to pair it with. In the future, if someone were to implement, say, a function inlining optimization, and were to extend this optimization to support function calls, I would be interested to see the effect it would have.

1

For the renaming step of the SSA conversion it was also necessary to compute the immediate dominator of a node. As such the dominator tree was computed instead of just the dominance relation.

2

The efficiency lost by adding assignments in the transformation back from SSA can be regained though move coalescing during register allocation.

3

In the function that the paper calls Visit-φ, if the lattice position of the variable changes, you must add all uses of the variable to the worklist.

4

You could imagine gaining a lot of information from looking at branches in this way. For instance, in the following case, you could conclude that i is 5 on the true edge of the branch:

    ...
    b: bool = eq i 5;
    c: bool = and a b;
    br c foo bar;