A transient hardware fault occurs when an energetic particle strikes a transistor, causing it to change state. Although transient faults do not permanently damage the hardware, they may corrupt computations by altering stored values and signal transfers. Existing solutions can detect transient faults by duplicating computations and comparing the results, however these solutions lack any formal reasoning about their behavior.
In this talk, I will show how to use low-level type systems to cleanly express invariants about redundant computations and to formally reason about code behavior, even when execution may be affected by transient faults. In particular, I will present a typed assembly language named TAL_FT and use it to prove that well-typed programs will always detect any single fault before the fault causes a change in the program output.