Research Statement

Research Statement

Frederick Smith

Send Me Mail

My goal is to simplify and speed the development of complex software systems. To that end, I have focused on developing tools that detect run time errors at compile time while allowing programmers to express efficient programs concisely. Specifically, my thesis work combines run-time code generation (an optimization for improving speed) with certifying compilation (a technique for checking at compile time that the resulting code will not crash). In a separate project, Kathleen Fisher, Anne Rogers, and I developed Hancock, a domain-specific language for processing large volumes of data that makes call-processing programs easier to maintain and more efficient. Hancock is used daily within AT&T. Finally, I devised the mostly-copying garbage collection algorithm to improve the efficiency of automatic memory management for garbage-collected programs inter-operating with those using manual memory management (such as most C programs). That algorithm and my implementation (MCC) serve as the basis for the Moby compiler's garbage collector.

Certifying run-time code generation.

Portable executable formats, such as Java bytecode, rely on just-in-time compilers to regain efficiency. Transmeta relies on binary translation to convert x86 instructions into Transmeta's native format. This choice allows Transmeta engineers to use the latest hardware technology with only the marginal cost of updating their binary translator. Both of these paradigms rely on aggressive run-time code generation (RTCG) to recover performance. Both require that the dynamically generated code be as reliable as hardware. Reliability can be increased by checking that generated code is type-safe (cannot crash). Typed Assembly Language (TAL) can be used to check that the output of a certifying compiler is type-safe. My research extends this guarantee to programs using RTCG.

Our RTCG system copies code fragments, called templates, sequentially into a code region until a complete well-typed function is produced. At each intermediate step the type of the code region has a post-condition stating its requirements of the next template. A fundamental problem we faced was handling the changing type of the code region.

Traditional type systems assign one type to each location and therefore cannot type-check code regions. Linear type systems permit types to change but only for locations that are not aliased. Modern optimizing compilers rely on register allocation. One effect of register allocation is to create aliases between stack locations and registers, thus violating the key property of linear type systems. David Walker and I developed alias types to solve this problem. Alias types track aliasing explicitly by introducing a level of indirection in the type system. Using this level of indirection, alias types easily accomodate type-changing operations such as code generation and memory deallocation. The formal methods we used to prove alias types sound increased our confidence in the correctness of the TAL implementation.

To evaluate our technology, Dan Grossman and I developed an optimizing compiler for Cyclone, a type-safe C-like language augmented with RTCG constructs. Our compiler translates Cyclone into TAL/T, TAL augmented with RTCG constructs. Because code generation is part of the program's execution time, it is essential to generate dynamic code quickly and effectively. To minimize code generation time, our compiler performs most optimizations (for example, register allocation) at compile time and only assembles pre-computed code templates at run time. Our experiments show that our type-safe system is able to achieve speedups comparable to those achieved by similar un-safe systems.

Most recently, I have worked on evaluating the effectiveness of RTCG and our compiler. The results have been surprising. For a ray-tracer application (based on the winning entry of the ICFP 2000 programming contest) we found little or no benefit. Although seemingly a good candidate for RTCG because the application contains both an interpreter and many invariant matrix-vector multiplies, the benefits of RTCG are negligible because these elements do not account for a significant fraction of the overall running time. On the other hand, I ported the functional simulator from the SimpleScalar Toolset, a widely used simulator in the hardware community, and ran the SPEC CPU95 benchmark suite through it. RTCG was able to eliminate the fetch and decode stages of the instruction pipeline for the simulator. Because these stages are relatively expensive, we obtained an average 2.7 times speedup.

Hancock.

AT&T's statisticians are interested in studying telephone call records to detect fraud. Initial trial applications showed that this approach was feasible but the programs were hard to write at the scale of the incoming data (250 million calls per day). The programs were also hard to maintain as native C constructs do not map cleanly onto the abstractions used by the statisticians. Anne Rogers, Kathleen Fisher, and I designed and implemented the domain-specific language Hancock to address these two concerns through the use of special-purpose language constructs and a high-performance database-like run-time system. To ease the migration from C, we designed Hancock to be a strict superset of C. Kathleen Fisher and I implemented the Hancock compiler as a Hancock to C translator. I worked on Hancock through the first public release. Today, you may download Hancock for non-commercial use from the internet. AT&T's production machines run Hancock compiled programs. And, statisticians within AT&T use Hancock to develop new applications.

Mostly-copying garbage collection.

Garbage collectors automatically reclaim unusable memory. A garbage collector works by tracing pointers in the heap and reclaiming any data not reachable from static data, the stack or registers. Most type safe language implementations rely on a garbage collector for all memory deallocation. Most garbage collectors are either accurate or conservative. Accurate collectors rely on explicit tags (usually compiler inserted) to detect pointers. Conservative collectors do not use such tags but are usually slower and less precise (they may reclaim less memory) than accurate collectors.

In the process of producing a C backend for the ML compiler TIL, Greg Morrisett and I had to choose a garbage collector. Although our compiler could tag objects in the heap, the stack layout was under the C compiler's control and could not be tagged precisely. Therefore we could not use an accurate collector. At first we used the Boehm-Demers-Weiser conservative collector for C (BDW) but found it to be much slower than the accurate collector we used previously..

Instead we developed and implemented a collection algorithm that we later discovered was an enhanced version of the mostly-copying collection algorithm. Our collector, MCC, uses a hybrid approach. It behaves like an accurate copying collector when possible, and like a conservative mark-sweep collector when necessary. The result is a garbage collector that allows easy inter-operability with C and has good performance. One contribution of our work was a careful analysis that showed MCC outperformed BDW when memory was plentiful but not when space was at a premium. MCC serves as the basis for the Moby compiler's garbage collector. Moby is developed by Kathleen Fisher and John Reppy.

Conclusion.

My research has consistently reinforced the importance of four skills. Formal methods help validate claims of correctness and better understand language design principles. Implementation skills are useful for demonstrating practical utility. Analytical and experimental skills are necessary to assess performance. And, collaboration helps provide new perspectives and techniques. For me, research combining these four skills is the most satisfying and the most effective for achieving my goal -- simplifying the production of complex software.

In the near term, I intend to investigate techniques for helping programmers express performance intentions. Strong interfaces are a desirable software engineering property but most compilers impose implicit performance penalties on their use. For example, constants are usually not propagated across module boundaries. By ignoring the programmer's performance intention, compilers effectively penalize good software engineering practice. One solution to this problem is to rely on compiler optimizations. From the programmer's perspective this technique is inadequate because it provides no guarantee that the optimization will be performed. A better solution is to make it possible to express performance intentions directly in the language. More generally, we should provide compile-time support for computation. C++ templates inadvertently provide a clumsy way of performing compile-time computation and have been used to great effect for this purpose by matrix libraries.

This document was translated from L^AT_EX by H^EV^EA.