Quick links

Fault-tolerant Typed Assembly Language

Report ID:
TR-776-07
Date:
March 2007
Pages:
35
Download Formats:
[PDF]

Abstract:

A transient hardware fault occurs when an energetic particle strikes a
transistor, causing it to change state. Although transient faults do not
permanently damage the hardware, they may corrupt computations by altering
stored values and signal transfers. In this paper, we propose a new scheme
for provably safe and reliable computing in the presence of transient
hardware faults. In our scheme, software computations are replicated to
provide redundancy while special instructions compare the independently
computed results to detect errors before writing critical data. In stark
contrast to any previous efforts in this area, we have analyzed our fault
tolerance scheme from a formal, theoretical perspective. To be specific,
first, we provide an operational semantics for our assembly language, which
includes a precise formal definition of our fault model. Second, we develop
an assembly-level type system designed to detect reliability problems in
compiled code. Third, we provide a formal specification for program fault
tolerance under the given fault model and prove that all well-typed programs
are indeed fault tolerant. In addition to the formal analysis, we evaluate
our detection scheme and show that it only takes 34% longer to execute than
the unreliable version.

Follow us: Facebook Twitter Linkedin