Easy, Slow Dynamic Analysis With the LLVM Interpreter

January 24, 2013

In systems/languages/architecture/et cetera research, one often needs a quick and dirty dynamic analysis: you need to observe or manipulate what a program does at runtime. The usual approach to this is instrumentation. You use dynamic instrumentation tools like Pin or DynamoRIO or static instrumentation with ASM for Java or an LLVM pass to add the necessary code to the program under analysis. This works great, and can perform well if the instrumentation isn’t too heavy, but it can also be a headache: inserting just the right instructions to call into your runtime library is error-prone and less convenient than writing application-level code.

Another approach is interpretation. With an interpreter, you don’t have to bother with inserting assembly or bytecode—you can just hack the interpreter’s behavior directly. While a simple, extensible interpreter is likely to be slow, this approach can make sense in situations when performance isn’t very important, such as during exploration phases or while prototyping an analysis that will eventually be implemented with instrumentation.

For a current project, I need to observe executions of C/C++ programs compiled to LLVM bitcode. And, fortunately, LLVM ships with a bitcode interpreter. It’s slow but it works. Less fortunately, however, this interpreter is not meant to be extended to do dynamic analysis. So I’ve thrown together a quick pile of hacks, called ExtensibleInterpreter, that wraps the interpreter but allows clients to interpose on instruction execution.

Here’s what a simple tracing interpreter looks like:

class Tracer : public ExtensibleInterpreter {
public:
    Tracer(Module *M) : ExtensibleInterpreter(M) {};
    virtual void execute(Instruction &I) {
        I.dump();
        ExtensibleInterpreter::execute(I);
    }
};

You can interpose on the execution of every instruction. Here, the interpreter just dumps a representation of the instruction and calls the superclass’ execute() method to interpret it as normal. But you can write arbitrary C++ code to implement your analysis.

More information about using the ExtensibleInterpreter is available on the README. If you don’t look too closely at the horrible hacks behind the curtain, this technique might be useful for your next quick, one-off dynamic analysis.