Struct Extension for Compiled Bril Programs
Motivation: Garbage Collection for Compilers
Building garbage collectors for compilers (as opposed to interpreters) is particularly tricky, since almost everything about garbage collection algorithms is informed by dynamic information. Assuming we are not in a virtual machine environment (such as Java, which can relegate details of garbage collection to the JVM), we must "bake in" statically a host of things to be checked or performed dynamically. This involves, at least, the following decisions:
- What type of collection? While the two main choices here are tracing and reference counting, we saw in the Unified Theory of Garbage Collection, complete answers lie on a spectrum. This choice is somewhat orthogonal to the compiler/interpreter issue.
- When to collect? Certainly this decision will ideally be informed by dynamic information about the heap state to minimize unnecessary pauses or overhead. This is still more complicated when compiling to native binaries as opposed to virtual machine bytecode where more safepoints may be available. For example, to be a drop-in replacement for manual memory managment in C, one (naively at least) has only malloc calls to work with as possible collection times.
- How much to collect? Both tracing and reference counting have tunable early returns that offer sacrificing memory efficiency for performance. Again, whatever decisions we need to make must be evaluated by code that was emitted statically.
Contributions
To explore these issues, I built a new compiler from Bril to LLVM that supports the memory extension (due to Zagieboylo, Doenges).
Before getting into any garbage collection, I needed a little more expressive language for heap manipulation than just pointers to primitive arrays. For this, I built a compiler that implements a new struct extention for Bril, which allows the use of structs similar to those in C.
A Struct Extension for Bril
Programs using this extension have a second top-level key (i.e., in addition to
"functions"
), called "structs"
, which define C-like struct datatypes for use
in the program. A struct definition has only two keys:
name
: a name for the struct type.mbrs
: a list of dictionaries that represent the members of the struct type. Each dictionary has two keys,type
, which gives a valid Bril type, andname
. (Observe that these are the same two fields that an argument to a function has).
Warm-up Example
To use the standard first struct example, we might define a struct to represent
a point in $\textbf{Z}^2$
. (Note that we narrowly miss one of the three
disallowed names for a struct type, int
, bool
, and ptr
.)
struct point = {
x: int;
y: int;
}
which would have the JSON representation:
{"name": "point", "mbrs":[{"type": "int", "name": "x"}, {"type": "int", "name": "y"}]}
For the rest of this post, we'll only use the text representation for definitions.
Memory Allocation for Structs
Instances of structs are only allowed to be allocated on the heap, using the
memory extension. Thus, we can use
alloc
without modification to get a space for an instance of a struct on
the heap, or to get really exciting, a whole array. Importantly, the struct is
not automatically initialized. It is illegal to load
from any struct member
that has not been previously written to.
So we can allocate a point
with:
one: int = const 1
p: ptr<point> = alloc one
To access the members of the struct, we only need a way to get a name for the
members, since we can read and write using load
and store
from the normal
memory extension. For this, we a new value instructions:
getmbr
, which takes two arguments: a pointer to a struct, and the name of a member of the struct. It returns a pointer to the requested element, with appropriate pointer type. Since we can only ever have pointers to structs, this is always possible.
In text form, this would look like this:
<dest>: <ty> = getmbr <ptr> <mbr>
If this evokes LLVM's getelementptr
in your mind, you've got it! (That is how
this instruction is implemented.) In the future, it might be convenient to
provide a "dot-notation" as syntactic sugar for member access.
Or in the case of our point
struct, we would initialize (x
, y
) to (1, 1) with:
px: ptr<int> = getmbr p x;
store px one;
py: ptr<int> = getmbr p y;
store py one;
Recursive Example
Of course, to do anything actually interesting with structs, we need the ability to point to other structs from a struct. We can do this, but note that we can only have pointers to other structs, not nested definitions of new structs (Since we don't have any concept of closures, this would not be useful anyway.)
For example, we could have a linked list as:
struct int_list = {
elt: int;
next: ptr<int_list>;
}
Naturally, we need a way to close off such a recursive structure. Strictly
speaking, we could do this with another member (say, a bool to indicate the last
element). This would be rather inelegant, however, since then we'd have
uninitialized data in an active struct. Instead, somewhat reluctantly, I
added a nullptr
constant that can be assigned to any pointer type. Suppose
last
is a ptr<int_list>
and the last element of a linked list. Then we can
end the list with:
last_n: ptr<ptr<int>> = getmbr last next;
Finally to allow recursive algorithms to be encoded in the usual way, a test for
whether a pointer is nullptr
is provided with an isnull
predicate:
<dest>: bool = isnull <ptr>
With those preliminaries in hand, we can start expressing typical algorithms for our new data structure:
@cons (head: int, tail:ptr<int_list>): ptr<int_list> {
one: int = const 1;
p: ptr<int_list> = alloc one;
phead: ptr<int> = getmbr p elt;
ptail: ptr<ptr<int_list>> = getmbr p next;
store phead head;
store ptail tail;
ret p;
}
@print_list (list: ptr<int_list>) {
empty: bool = isnull list;
br empty .end .print;
.print:
xp: ptr<int> = getmbr list elt;
x: int = load xp;
print x;
tp: ptr<ptr<int_list>> = getmbr list next;
t: ptr<int_list> = load tp;
call @print_list t;
.end:
ret;
}
@free_list (list: ptr<int_list>) {
empty: bool = isnull list;
br empty .end .freetail;
.freetail:
tp: ptr<ptr<int_list>> = getmbr list next;
t: ptr<int_list> = load tp;
call @free_list t;
free list;
.end:
ret;
}
Implementation
The compiler is implemented in Python, with help from clang to generate LLVM for builtin functions written in C. I modified the Bril parser to allow structs as described; that extension is here. I add the struct extension features to the reference interpreter.
The compiler first performs the SSA conversion on the input program. I used my implementation from Lesson 5.) This required a couple of modifications:
-
The member name argument to
getmbr
is "special" syntactically, and should not be renamed during SSA conversion. -
My SSA conversion allowed blocks to "fallthrough" the next label. Since LLVM disallows this, I updated my SSA conversion to end each block with a terminator.
-
Bril actually disallows constant propagation to a degree, in the sense that literals cannot be dropped into later expressions where constants are used. LLVM requires this since you cannot assign constants to registers.
Evaluation
Correctness
I validated my solution first and foremost by compiling and running the Bril
benchmark suite. There is one outstanding bug from the benchmark suite:
mat-mul
outputs some negative numbers which does not happen when under the
reference interpreter. In experimenting with this bug, I wondered if
I had selected the wrong LLVM instruction for Bril's div
(I had picked sdiv
,
the signed integer division instruction.) Curiously enough, replacing sdiv
with any of udiv
, udiv exact
, or sdiv exact
"fixed" the bug (but broke
one other test in each case).
Memory Management
The major evaluation in my proposal had been to measure the memory footprint of programs using garbage collection. Since my implementation is well short of that goal, I did not evaluate the memory management (as it was being done manually). I did at least confirm that my programs did not have dynamic memory violations by running them under Valgrind.
Part of my proposal had been fairly hand wavy about where I would get enough benchmarks to measure the code running in the first place. A big lesson learned for me from the experience of doing this project is that generating a lot of valid input programs for compiler testing is really hard!
Conclusions
While this project has not matured quickly enough yet to explore the garbage collection issues I was originally after, we are at least now in a position to do so in future work.
The decision to write something from scratch always has its costs and benefits. I do think my compiler is built in a way suitable for an automatic garbage collection improvement.
To return to three garbation collection design questions I posed at the top of this post, I think the best approach is to start with a simple, flexible solution and tune towards particular use cases. To that end, I think the best answers for a first version of a collector for this compiler would be:
- What type of collection? Tracing, since the naive implementation collects everything, while a naive implementation of reference counting misses cycles. To make things more efficient, it can be an improvement later on to add generations that are only reference counted.
- When to collect? For a simple starting point, we could record heap size periodically (whether it is an object count or byte count) and collect in conjunction with an alloc whenever we exceed some threshold increase. That would provide plenty of opporunity for tuning later on as well.
- How much to collect? Since we chose tracing, the best starting point would just be to do a full collection every time, then we can experiment from there with generations to decide what types of collection are the most worthwhile.