In an extension of some previous work on the EnerJ approximation-aware programming language, I recently worked (with Hadi Esmaeilzadeh, Luis Ceze, and Doug Burger) on designing a hardware architecture to support this kind of approximate computing. The architecture, which we call Truffle (a soft core with a hard shell—this name due to Hadi’s genius with nomenclature), allows parts of programs to run precisely and parts in a low-power, approximate mode. Our paper about Truffle was recently accepted to ASPLOS 2012.
As we originally imagined in the EnerJ paper, we wanted a processor that could alternate between precise and approximate operation in several different structures:
- Functional units (ALUs and FPUs).
- On-chip caches.
- Registers.
- Main memory (not explored much in the Truffle, which is a CPU design).
This means that the CPU should have an ISA that allows approximate arithmetic
operations along with approximate loads and stores with carefully-defined
effects on the registers and caches. Truffle’s ISA lets the compiler use
approximate instructions—for example, an ADD.A
instruction that executes an
approximate addition—that have approximate semantics: they do not define any
particular output behavior, instead allowing arbitrary errors to occur in the
data they compute. The register file and caches also both have two modes; they
switch between approximate and precise mode based on the instructions that write
to them. This makes it simple for a compiler to keep track of the storage
structures’ modes and to carefully avoid storing precise data in approximate
mode (which could lead to corruption of critical data).
In fact, this co-design with the compiler is an important facet of Truffle’s design: to avoid spending energy to save energy, Truffle relies on all safety checks to be performed off-line by the compiler. No costly dynamic checks need to be performed at runtime. This also makes Truffle substantially simpler than other information-flow-inspired architectures, which generally need to tag and track data movement across the processor.
In the end, simulations of EnerJ benchmarks on Truffle demonstrate energy savings up to the 40% range—which is clearly encouraging. We found that Truffle’s energy savings are bounded by the processor’s “front end”: the fetch, decode, and register renaming logic that must be performed precisely. In the future, we hope to look into approximate computing structures that bypass the need for a fully-precise front end to achieve even better energy efficiency.