Lesson 10: Alias Analysis

video
Pointer Analysis
tutorial by Yannis Smaragdakis and George Balatsouras

Gist

Motivation

Lots of languages have pointers! Whether you call them that or not.

C, obviously.
In Java, most values are pointers—everything but the primitive types.
In ML, pointers are everywhere implicitly, and you can create them yourself explicitly with the 'a ref type.
Bril has pointers too, with the memory extension.

Tricky quiz to emphasize the point that pointers make semantics hard: what does this C program return?

bool foo(char *a, char *b) {
    *a = 'a';
    *b = 'b';
    return *a == 'a';
}

The answer is that it depends on aliasing, i.e., whether a and b point to the same location in memory. Really, doing anything with pointers depends on aliasing! Basically anything that you can normally do with non-aliasable local (stack) variables is extra hard to do on heap variables (i.e., pointers) without aliasing information.

An example especially close to my heart is parallelization. Aliasing is a major impediment (the major impediment?) to automatic parallelization of code—you need to be sure that two things running in parallel can’t be interfering with each others’ data, which requires knowing that they don’t touch pointers that alias.

Stating the Alias Analysis Problem

The problem: “For every program point, and for every pair of pointer-typed variables p and q, do p and q point to the same memory location at that point in time?”

Of course, this problem in undecidable. And even when it is computable, sometimes it can be very expensive. So we will have to make do with partial information: sometimes (often), the answer will be “maybe.”
Useful answers are “must” alias vs. “must not” alias.
A common from of question to ask is a “may alias” query, which says “yes” (not very useful, and optimizations must be conservative) or “no” (we know these things must not alias, which is often useful for optimization).

Alias Analysis with Data Flow

Let’s try to concoct a simple alias analysis using the data flow framework!

Direction: Forward.
Domain: A map from variable names to sets of locations that the variable may refer to. (You can use this data structure to answer may-alias queries by checking whether two variables map to sets with a nonempty intersection.) (What’s a “location”? See “Heap Models,” below.)
Initial value: Every variable has an empty set.
Merge function: Union for every variable.
Transfer function: do these things to the mapping for pointer-relevant Bril instructions.
- x = const K: map[x] = {}
- x = id y: map[x] = map[y]
- x = alloc y: map[x] = {fresh location}

Heap Models

Any alias needs a definition of what a “memory location” is. A common answer is that there is one location per static allocation site. In Bril, for example, every alloc instruction becomes a memory location.

For realistic languages, it often helps to disambiguate memory locations further: for example, to give every offset in an array a different location, or to give every field in an object a different location.

Context-Sensitive Alias Analysis

See last time’s discussion about context sensitivity in general. Context sensitivity is a big topic in alias analysis: it’s common to use some limited calling-context context to disambiguate memory locations and alias information.

Of course, there is a sharp trade-off between cost and precision. Scalable & precise alias analysis remains an open problem. Seriously, it’s its own world of ongoing research. For much more, see Smaragdakis and Balatsouras.

Tasks

There are no implementation tasks for this lesson. If alias analysis is your bag, you can start with using your data flow implementation to implement a straightforward may-alias analysis for Bril pointers, then proceed on to the literature to find and implement more and more interesting pointer analyses.