Struct Extension for Compiled Bril Programs

by Mark Moeller December 18, 2020

Motivation: Garbage Collection for Compilers

Building garbage collectors for compilers (as opposed to interpreters) is particularly tricky, since almost everything about garbage collection algorithms is informed by dynamic information. Assuming we are not in a virtual machine environment (such as Java, which can relegate details of garbage collection to the JVM), we must "bake in" statically a host of things to be checked or performed dynamically. This involves, at least, the following decisions:

What type of collection? While the two main choices here are tracing and reference counting, we saw in the Unified Theory of Garbage Collection, complete answers lie on a spectrum. This choice is somewhat orthogonal to the compiler/interpreter issue.
When to collect? Certainly this decision will ideally be informed by dynamic information about the heap state to minimize unnecessary pauses or overhead. This is still more complicated when compiling to native binaries as opposed to virtual machine bytecode where more safepoints may be available. For example, to be a drop-in replacement for manual memory managment in C, one (naively at least) has only malloc calls to work with as possible collection times.
How much to collect? Both tracing and reference counting have tunable early returns that offer sacrificing memory efficiency for performance. Again, whatever decisions we need to make must be evaluated by code that was emitted statically.

Contributions

To explore these issues, I built a new compiler from Bril to LLVM that supports the memory extension (due to Zagieboylo, Doenges).

Before getting into any garbage collection, I needed a little more expressive language for heap manipulation than just pointers to primitive arrays. For this, I built a compiler that implements a new struct extention for Bril, which allows the use of structs similar to those in C.

A Struct Extension for Bril

Programs using this extension have a second top-level key (i.e., in addition to "functions"), called "structs", which define C-like struct datatypes for use in the program. A struct definition has only two keys:

name: a name for the struct type.
mbrs: a list of dictionaries that represent the members of the struct type. Each dictionary has two keys, type, which gives a valid Bril type, and name. (Observe that these are the same two fields that an argument to a function has).

Warm-up Example

To use the standard first struct example, we might define a struct to represent a point in $\textbf{Z}^2$ . (Note that we narrowly miss one of the three disallowed names for a struct type, int, bool, and ptr.)

struct point = {
    x: int;
    y: int;
}

which would have the JSON representation:

{"name": "point", "mbrs":[{"type": "int", "name": "x"}, {"type": "int", "name": "y"}]}

For the rest of this post, we'll only use the text representation for definitions.

Memory Allocation for Structs

Instances of structs are only allowed to be allocated on the heap, using the memory extension. Thus, we can use alloc without modification to get a space for an instance of a struct on the heap, or to get really exciting, a whole array. Importantly, the struct is not automatically initialized. It is illegal to load from any struct member that has not been previously written to.

So we can allocate a point with:

one: int = const 1
p: ptr<point> = alloc one

To access the members of the struct, we only need a way to get a name for the members, since we can read and write using load and store from the normal memory extension. For this, we a new value instructions:

getmbr, which takes two arguments: a pointer to a struct, and the name of a member of the struct. It returns a pointer to the requested element, with appropriate pointer type. Since we can only ever have pointers to structs, this is always possible.

In text form, this would look like this:

<dest>: <ty> = getmbr <ptr> <mbr>

If this evokes LLVM's getelementptr in your mind, you've got it! (That is how this instruction is implemented.) In the future, it might be convenient to provide a "dot-notation" as syntactic sugar for member access.

Or in the case of our point struct, we would initialize (x, y) to (1, 1) with:

px: ptr<int> = getmbr p x;
store px one;

py: ptr<int> = getmbr p y;
store py one;

Recursive Example

Of course, to do anything actually interesting with structs, we need the ability to point to other structs from a struct. We can do this, but note that we can only have pointers to other structs, not nested definitions of new structs (Since we don't have any concept of closures, this would not be useful anyway.)

For example, we could have a linked list as:

struct int_list = {
    elt: int;
    next: ptr<int_list>;
}

Naturally, we need a way to close off such a recursive structure. Strictly speaking, we could do this with another member (say, a bool to indicate the last element). This would be rather inelegant, however, since then we'd have uninitialized data in an active struct. Instead, somewhat reluctantly, I added a nullptr constant that can be assigned to any pointer type. Suppose last is a ptr<int_list> and the last element of a linked list. Then we can end the list with:

last_n: ptr<ptr<int>> = getmbr last next;

Finally to allow recursive algorithms to be encoded in the usual way, a test for whether a pointer is nullptr is provided with an isnull predicate:

<dest>: bool = isnull <ptr>

With those preliminaries in hand, we can start expressing typical algorithms for our new data structure:

@cons (head: int, tail:ptr<int_list>): ptr<int_list> {
    one: int = const 1;
    p: ptr<int_list> = alloc one;

    phead: ptr<int> = getmbr p elt;
    ptail: ptr<ptr<int_list>> = getmbr p next;

    store phead head;
    store ptail tail;

    ret p;
}

@print_list (list: ptr<int_list>) {
    empty: bool = isnull list;
    br empty .end .print;
.print:
    xp: ptr<int> = getmbr list elt;
    x: int = load xp;
    print x;

    tp: ptr<ptr<int_list>> = getmbr list next;
    t: ptr<int_list> = load tp;
    call @print_list t;
.end:
    ret;
}

@free_list (list: ptr<int_list>) {
    empty: bool = isnull list;
    br empty .end .freetail;
.freetail:
    tp: ptr<ptr<int_list>> = getmbr list next;
    t: ptr<int_list> = load tp;
    call @free_list t;
    free list;
.end:
    ret;
}

Implementation

The compiler is implemented in Python, with help from clang to generate LLVM for builtin functions written in C. I modified the Bril parser to allow structs as described; that extension is here. I add the struct extension features to the reference interpreter.

The compiler first performs the SSA conversion on the input program. I used my implementation from Lesson 5.) This required a couple of modifications:

The member name argument to getmbr is "special" syntactically, and should not be renamed during SSA conversion.
My SSA conversion allowed blocks to "fallthrough" the next label. Since LLVM disallows this, I updated my SSA conversion to end each block with a terminator.
Bril actually disallows constant propagation to a degree, in the sense that literals cannot be dropped into later expressions where constants are used. LLVM requires this since you cannot assign constants to registers.

Evaluation

Correctness

I validated my solution first and foremost by compiling and running the Bril benchmark suite. There is one outstanding bug from the benchmark suite: mat-mul outputs some negative numbers which does not happen when under the reference interpreter. In experimenting with this bug, I wondered if I had selected the wrong LLVM instruction for Bril's div (I had picked sdiv, the signed integer division instruction.) Curiously enough, replacing sdiv with any of udiv, udiv exact, or sdiv exact "fixed" the bug (but broke one other test in each case).

Memory Management

The major evaluation in my proposal had been to measure the memory footprint of programs using garbage collection. Since my implementation is well short of that goal, I did not evaluate the memory management (as it was being done manually). I did at least confirm that my programs did not have dynamic memory violations by running them under Valgrind.

Part of my proposal had been fairly hand wavy about where I would get enough benchmarks to measure the code running in the first place. A big lesson learned for me from the experience of doing this project is that generating a lot of valid input programs for compiler testing is really hard!

Conclusions

While this project has not matured quickly enough yet to explore the garbage collection issues I was originally after, we are at least now in a position to do so in future work.

The decision to write something from scratch always has its costs and benefits. I do think my compiler is built in a way suitable for an automatic garbage collection improvement.

To return to three garbation collection design questions I posed at the top of this post, I think the best approach is to start with a simple, flexible solution and tune towards particular use cases. To that end, I think the best answers for a first version of a collector for this compiler would be:

What type of collection? Tracing, since the naive implementation collects everything, while a naive implementation of reference counting misses cycles. To make things more efficient, it can be an improvement later on to add generations that are only reference counted.
When to collect? For a simple starting point, we could record heap size periodically (whether it is an object count or byte count) and collect in conjunction with an alloc whenever we exceed some threshold increase. That would provide plenty of opporunity for tuning later on as well.
How much to collect? Since we chose tracing, the best starting point would just be to do a full collection every time, then we can experiment from there with generations to decide what types of collection are the most worthwhile.

The CS 6120 Course Blog