Adrian Sampson: Truffle, an Architecture for Approximate Computing

In an extension of some previous work on the EnerJ approximation-aware programming language, I recently worked (with Hadi Esmaeilzadeh, Luis Ceze, and Doug Burger) on designing a hardware architecture to support this kind of approximate computing. The architecture, which we call Truffle (a soft core with a hard shell—this name due to Hadi’s genius with nomenclature), allows parts of programs to run precisely and parts in a low-power, approximate mode. Our paper about Truffle was recently accepted to ASPLOS 2012.

As we originally imagined in the EnerJ paper, we wanted a processor that could alternate between precise and approximate operation in several different structures:

Functional units (ALUs and FPUs).
On-chip caches.
Registers.
Main memory (not explored much in the Truffle, which is a CPU design).

This means that the CPU should have an ISA that allows approximate arithmetic operations along with approximate loads and stores with carefully-defined effects on the registers and caches. Truffle’s ISA lets the compiler use approximate instructions—for example, an ADD.A instruction that executes an approximate addition—that have approximate semantics: they do not define any particular output behavior, instead allowing arbitrary errors to occur in the data they compute. The register file and caches also both have two modes; they switch between approximate and precise mode based on the instructions that write to them. This makes it simple for a compiler to keep track of the storage structures’ modes and to carefully avoid storing precise data in approximate mode (which could lead to corruption of critical data).

In fact, this co-design with the compiler is an important facet of Truffle’s design: to avoid spending energy to save energy, Truffle relies on all safety checks to be performed off-line by the compiler. No costly dynamic checks need to be performed at runtime. This also makes Truffle substantially simpler than other information-flow-inspired architectures, which generally need to tag and track data movement across the processor.

In the end, simulations of EnerJ benchmarks on Truffle demonstrate energy savings up to the 40% range—which is clearly encouraging. We found that Truffle’s energy savings are bounded by the processor’s “front end”: the fetch, decode, and register renaming logic that must be performed precisely. In the future, we hope to look into approximate computing structures that bypass the need for a fully-precise front end to achieve even better energy efficiency.