Lesson 4: Data Flow
- discussion thread
- data flow
- implementation task
-
the CS 4120 notes
for a longer introduction to data flow -
Section 5.3 of SPA
has more on fixed point algorithms for solving data flow - tasks due February 17
Gist
The Data Flow Framework
- Reaching definitions are an example of a global property that require you to look at an entire CFG.
- Terminology:
- Use: An instruction uses all of its arguments. (So any instruction with arguments is a “use.”)
- Definition: An instruction defines the variable it writes to. (You can call every instruction that writes to a variable a “definition.”)
- Available: Definitions that reach a given point in the program are available at that point.
- Kills: Any definition kills all of the currently available definitions.
- The data flow framework. Here’s how you solve a global analysis problem by imbuing a local analysis with a dose of data-flow magic:
- Figure out the thing you want to know at the entry and exit of each block.
- Write an equation for every block relating that thing at the entry to that thing at the exit. (In general, this is a local analysis for the block.)
- Generate equalities according to the edges in the CFG.
- Solve the system of equations! (Using a general solver algorithm that you don’t need to adapt to every problem.)
- Instantiating the data flow framework for reaching definitions.
- Initial value: the empty set.
- Transfer function:
out(in) = def(b) ∪ (in - kills(b))
- Merge function: union.
- The worklist algorithm for solving data flow.
Here’s the pseudocode for solving a forward data flow problem with a worklist algorithm:
in[entry] = init
out[*] = init
worklist = all blocks
while worklist is not empty:
b = pick any block from worklist
in[b] = merge(out[p] for every predecessor p of b)
out[b] = transfer(b, in[b])
if out[b] changed:
worklist += successors of b
For the backward version, flip predecessors & successors, and flip in
and out
.
Instantiating Data Flow
- Theory interlude: the requirements for a general data flow analysis. How do you know the worklist algorithm terminates and gives you the right answer?
- The domain of values you’re trying compute needs to form a partial order with a unique lower bound. The rough idea is that the worklist algorithm should only “move” values monotonically in the order, so it’s guaranteed to eventually terminate.
- In terms of a partial order ⊑, the merge function is the meet (greatest lower bound) operator ⊓; the initial value is the top value ⊤; and the transfer function must be a monotonic function, so
x ⊑ y
impliesf(x) ⊑ f(y)
. - The usual definition of a “correct” solution to a data-flow problem is the meet-over-all-paths solution: the meet of chained applications of the transfer functions for every path in the CFG from the entry block to any given block.
- For more on the theory, I recommend Chapter 5 of Static Program Analysis by Møller and Schwartzbach.
- More examples of things you can do with the data flow framework.
- Reaching definitions.
- Live variables: which variables are both defined and going to be used at some point in the future?
- Constant propagation: which variables have statically knowable constant values?
- Available expressions: which expressions have already been computed computed in the past? (Useful in CSE.)
- Initialized variables: like in Java, which variables have had something written to them?
- Interval analysis: what is the numerical range of values that a given variable might hold?
- Demonstrating a simple generic implementation.
- If you want, you can use the
{args}
feature of Turnt and its-a
command-line flag to quickly switch between different analyses.
- If you want, you can use the
Tasks
- Implement at least one data flow analysis. You choose which.
- For bonus “points,” implement a generic solver that supports multiple analyses.