Bril Debugger
The goal of this project was to create an interactive debugger in the style of GDB and LLDB that would run programs in the intermediate language Bril. In order to be a helpful tool for debugging a Bril program, I decided that the debugger must have, at a minimum, the capability to perform the following tasks:
- Interpret a program and present its output
- Step through individual instructions of a program
- Present the current state of a program to the user
- Halt interpretation of a program upon reaching a user-declared program point
In my implementation of this project, BrilDB, I included the above four core capabilities as well as a few additional features, including the ability to modify the values of variables in the state, and the ability to condition breakpoints on expressions such that the interpreter only halts at the program point if the condition holds.
Design
BrilDB is designed as a command-line interface. The Bril program to be debugged is declared by its file name as an argument to the program, and commands are issued by text in the interface. Commands can only be issued while the program is halted, not while it is actively being interpreted. The commands that BrilDB accepts are as follows:
run
: Interprets the program from its current state until it reaches a breakpoint whose condition is satisfied or the program terminates.step [N]
: The same asrun
, but stops executing afterN
instructions if neither a satisfied breakpoint nor the end of the program is reached before that.N
defaults to1
if not given in the command.restart
: Resets the program state to the beginning of themain
function, undeclaring all program variables. Breakpoints are not affected.scope
: Lists all variables that have already been declared in the current state and their values.print VAR
: Prints the value of the variableVAR
in the current state.assign VAR VAL
: Modifies the current state to set variableVAR
to have valueVAL
.VAL
must be an integer or boolean literal.breakpoint LOC [COND]
: Places a breakpoint at locationLOC
that halts when conditionCOND
is satisfied.LOC
must be either a label name or an instruction/line number.COND
defaults totrue
if not given. The syntax for conditions is explained below. Breakpoints can be "removed" by placing new breakpoints withCOND
asfalse
.list
: Prints the current function with all of its instructions shown in their textual format. Includes inline information about breakpoints and the line numbers.
The interpreter does not print any logging information until it reaches a
satisfied breakpoint or the program terminates. This is to ensurethat the Bril
program output is not confused with any BrilDB output. In contrast to this, the
step
function, when used to execute only a single instruction, will print the
instruction that it executes to the screen. This is to help orient the user as
to where they currently are within the program.
In my experience, a common use case for debuggers is to set a breakpoint at a section in the code where a bug is believed to reside, run the program up to that breakpoint, and then step through the section instruction-by-instruction until the bug presents itself. As such it is important to know exactly where you are within the code while stepping so that you can know where to fix the bug when you find it.
The scope
and print
commands are useful to see whether variables hold the
values that you expect them to at a given program point. assign
is useful,
in the case where a variable is not as expected, to see whether modifying it to
the expected value fixes the issue.
The list
command gives a perspective on the whole function currently being
executed. Because it shows the line number for each instruction, it can be
useful for setting breakpoints in exactly the correct position. It also shows
the current position within the function and breakpoints that have been places
in the function in order to further orient the user. For example, below is an
invocation of list
on a program for calculating Fibonacci numbers that has a
conditional breakpoint set at the loop
label:
(brildb) list main { 1 n: int = const 10; 2 a: int = const 0; 3 b: int = const 1; 4 loop: (B: eq n 3) 5 zero: int = const 0; -> 6 done: bool = eq n zero; 7 br done end iter; 8 iter: 9 t: int = id a; 10 a: int = id b; 11 b: int = add t a; 12 one: int = const 1; 13 n: int = sub n one; 14 jmp loop; 15 end: 16 print a; }
Breakpoint Conditions
In order to support conditioning breakpoints on arbitrary expressions, I needed a syntax in which users could express the conditions. I decided to use a syntax based on the operations in Bril, with a grammar as follows:
BEXP ::= "true" | "false" | IDENTIFIER | "(" AB_BINOP AEXP AEXP ")" | "(" "not" BEXP ")" | "(" BB_BINOP BEXP BEXP ")" AEXP ::= INTEGER | IDENTIFIER | "(" AA_BINOP AEXP AEXP ")" BB_BINOP ::= "and" | "or" AB_BINOP ::= "eq" | "lt" | "gt" | "le" | "ge" AA_BINOP ::= "add" | "sub" | "mul" | "div"
Any BEXP
(boolean expression) can be used as the condition of a breakpoint.
For example, if you wanted to halt at a label foo
in the program only if
x < y < z, you would issue the command:
breakpoint foo (and (lt x y) (lt y z))
.
The condition grammar is typed such that only boolean expressions can be placed
where a boolean would be expected and only arithmetic expressions (AEXP
) can
be placed where an integer would be expected. However, notably, any identifier
can be used as either a boolean or an arithmetic expression.
Implementation
BrilDB is implemented as a standalone Haskell program. The primary component of the program is the interpreter, which handles executing commands and manipulating the state. The other components are the command parser and the type definitions, which include the ability to convert from the canonical JSON representation of a Bril program to the internal algebraic data type (ADT) representation.
The command parser component is built using the parser combinator library Parsec. Use of this library for this task is arguably excessive, as the command syntax is very simple. However, as the grammar for breakpoint conditions is somewhat more complex, I found it useful to use Parsec for that, and thus it made sense to use it for all of the command parsing.
The implementation of the interpreter component and the type definitions are
designed with the anticipation of adding support for function calls in mind. The
program state is built with a call stack, which currently has at most one stack
frame in it. Interpreting the ret
instruction corresponds to popping an
element off the call stack; determining if the program has terminated
corresponds to checking whether the call stack is empty. However, to add support
for calls, significant changes would need to be made to the commands to support
actions such as viewing the call stack, setting breakpoints in other functions,
stepping into or over calls, etc.
Breakpoints are implemented as an extra field in the ADT for instructions. Every
instruction has a condition (which is an ADT corresponding to BEXP
above)
which is by default false
. Adding a breakpoint replaces that condition with
the one supplied by the command.
How to Break BrilDB
While BrilDB handles most kinds of user errors gracefully, such as malformed
commands or nonsensical conditions on breakpoints, it will crash under certain
circumstances. Specifically, BrilDB does not act as a typechecker. If a program
is not well-typed and, say, tries to add an integer to a boolean, BrilDB will
crash as it expects the arguments to an add
operation to both be integers. As
such, it is suggested that you run a program through a typechecker first before
attempting to debug it with BrilDB.
Difficulties
The most difficult part of this project for me was determining the syntax and
writing the parser for the breakpoint
command with a condition. I needed the
language to be usable in a one-line command, while also fitting in with the Bril
language and being expressive enough for complex conditions. Creating the parser
was somewhat difficult due to differences between the command language I was
creating and Parsec's expectations of how a language should work. For example,
in the command language, I wanted a variable to be allowed to have the same name
as a command or operator.
Another difficulty I faced was making sure that the debugger would never attempt to interpret an instruction after the Bril program had terminated. I had to consider both how the user might attempt to execute instructions post-termination, and how the iterative interpreter could keep going despite reaching the end. To help with this I wrote a function which wraps a state operation and checks that the program has not terminated before completing that operation:
checkTerminated :: StateT DebugState IO () -> StateT DebugState IO () checkTerminated st = do term <- gets terminated if term then liftIO $ putStrLn "program terminated." else st
Evaluation
As BrilDB is a program built around a user-interface, it did not make sense to me to evaluate it through an automated testing suite. Instead I opted to perform tests of the program's effectiveness by using the program myself. I also decided that the grounds on which I would evaluate the program would be its correctness, as it is vitally important when trying to debug a program that the tools you are using to debug it are correct in what they are showing you. The speed of the debugger is not a huge concern (within reason) as in most cases the interpreter will not be running for very long before it hits a breakpoint.
As such, to evaluate the debugger I first ran a collection of programs through
the run
command to see if pure interpretation of the programs gave the
expected results. Some of these programs came from the tests for brili
and
others were slightly longer programs that I created to test specific constructs.
I then ran through a few of these program setting breakpoints and conditional
breakpoints to ensure they triggered when and only when they were supposed to.
I used the scope
command to test whether the state is as expected at certain
program points. I tested state modification by seeing if running the program
after issuing an assign
command gave the same result as if there were an
assignment instruction at that location.
During this evaluation, I found the following two issues, both of which have since been fixed:
- The
assign
command did not allow the value to be a negative number (parsing issue). - If a program ended without a
ret
, the interpreter would crash when the program terminated.