Frances Perry - Research Statement
My research focuses on designing and implementing techniques for improving code reliability.
When executing untrusted code,
users want to guarantee that it will execute safely
— it won't crash, corrupt their data, or exhibit other
unexpected behaviors.
Proof-carrying code systems provide this security by supplying a safety proof along with the binary code.
The user can then verify that the code satisfies the proof before executing it.
Traditionally, these safety proofs have included guarantees of memory safety and type safety.
My research extends the properties that can be guaranteed by providing techniques for verifying the correctness of
fault tolerance protocols and data
allocation on the stack.
The cleanest way to express low-level safety proofs is using typed assembly languages.
While source-level type systems track the types of program variables, typed assembly
languages track the types of values in the low-level machine state including the heap, stack, and register file.
As code is compiled into simpler instructions, the compiler translates the corresponding types into the
more complex types needed to capture the invariants of low-level code.
Once code generation is complete, the resulting code is type checked, which verifies that the code obeys
the invariants encapsulated by the type system.
By type checking code after optimizations, register allocation and instruction selection, we can verify the actual
code that will be executed, as well as detect bugs in the compiler.
Transient Fault Tolerance
My thesis research focuses on using low-level type systems to guarantee that software is reliable in the presence
of transient faults.
A transient fault occurs when an energetic particle strikes a chip and causes a change in state in the processor.
These faults do not cause permanent damage, but they may affect the computation that is currently executing.
Transient faults have been shown to be the cause of crashes at major companies, and
current technology trends, such as decreasing transistor size, will make future processors much more susceptible.
Researchers in architecture and compilers have already developed a number of different solutions using software,
hardware or a combination of the two.
The general approach of all these solutions is to duplicate computations and check for consistency between the copies.
Unfortunately, generating correct fault-tolerant code is extremely hard,
and there has been very little research on verifying these techniques.
Testing is rarely sufficient because full coverage would require testing all combinations of
features in conjunction with all combinations of faults.
Along with others at Princeton, I designed a new type system
[5]
that guarantees that faulty executions of a well-typed program
are identical to non-faulty executions up to the point where a fault is detected.
We use a hybrid hardware-software solution in which an added hardware queue
buffers the stores from the original computation and only writes them to memory if the duplicate computation agrees.
Faults can occur at any point and randomly modify a register or queue value.
The assembly-level type system tracks information about the two computations which we use to rigorously prove that
well-typed programs satisfy reliability properties. We use these reliability properties
to show that all executions containing a single fault will always detect
the fault before the difference from non-faulty versions becomes observable.
This work received the Best Paper Award at Programming Language Design and Implementation (PLDI) 2007.
More recently, I designed a type system
for reasoning about software solutions for control-flow faults
[6].
Control-flow faults arise when a fault causes a jump or branch instruction to transfer control to an unexpected code address.
Current software solutions can detect most, but not all, control-flow faults. I formally analyzed a subset of these solutions
that can detect any fault that causes incorrect control transfers between basic blocks.
Similarly to the previous work, this is accomplished by using a type system to track reliability properties.
The main results are new techniques to allow sound reasoning even after code transfers to an unexpected location and continues
executing.
Here, I proved that executions with a single fault will visit at most one incorrect block before
the fault is detected.
Data Allocation on the Stack
As well as using typed assembly languages for proving properties about fault tolerance, I have
designed typed assembly languages that can guarantee the memory safety of real-world stack use.
Unlike memory allocated in the heap, the type of the value stored in a given stack slot will change during the execution
of the program. As the stack grows and shrinks, a location may hold a code pointer in one stack frame and an
integer in some later stack frame. Assembly-level type systems need to track these changing types carefully to avoid
unsafe behavior. Complications arise when allocating data on the stack because stack locations
may be aliased.
Once locations are potentially aliased, it becomes extremely difficult to safely
modify the type of a location. Previous work was unable to design a stack-based type system that was flexible enough
to express common stack use, such as passing and returning arguments by reference.
Together with colleagues at Princeton, I designed a typed assembly language based on linear logic
[2]
that could express real-world
invariants about stacks.
Linear logic, which is similar to separation logic, is well-suited for reasoning about memory because
it can easily represent disjoint portions of memory. Once we know that two locations are disjoint, we know they cannot be aliased, so we can safely update
the type of one without affecting the other.
Our type system used formulas in linear logic to describe the typing information for the heap, stack, and register
file at each program point.
To show that our solution could express typical stack use, we provided a type-preserving translation from a
simple C-like source language into our typed assembly language. Though expressive, the linear logic formulas
required an undecidable subset of linear logic, making automatic proof generation impossible.
During a summer at Microsoft Research, I worked with colleagues to develop a related type system
[3]
that required only a subset of linear logic.
By simplifying the logic used to describe stacks we were able to provide a decision procedure
that made type checking decidable, while still maintaining the expressiveness needed for general
data allocation. This type system is currently implemented in Microsoft's Bartok Compiler
[1].
Dynamic Verification of Aliasing Invariants
In the work at Princeton on stack types, we saw how linear logic can elegantly reason about aliasing in low-level memory.
We used this insight to develop a system of contracts (dynamically checked programmer assertions) for a C-like language
[4].
A programmer uses logic to define the aliasing invariants of data structures and then adds assertions using these definitions into the program.
The assertions are dynamically verified using a modified linear logic programming language. By using linear logic to express
the invariants, the programmer can clearly specify where aliasing is and is not allowed.
Static Deadlock Detection
During an internship at Microsoft in 2005, I worked with members of the Program Analysis Group to design and implement a static
analysis for detecting potential deadlocks in Windows code.
The tool infers the global lock ordering by using a sequential analysis to track the sequence of locks
held at each program point. After gathering all the lock sequence data, it looks for ordering conflicts in this data.
Compared to previous work, we obtain fewer fault positives by using a more precise alias analysis and ruling out infeasible paths
and provide results that are simpler to understand by incorporating a graphical viewer for code traces.
The tool continues to be run regularly and has found over 100 confirmed concurrency bugs in Windows Vista.
Future Research
As code complexity continues to grow, it is increasingly important to understand code behavior.
My recent work has focused on using typed assembly languages to prove
the correctness of memory management and reliability schemes.
I plan to continue developing program analysis techniques for verifying critical program properties.
In the short term, there are many opportunities for future work with fault-tolerant typed assembly languages.
I am currently defining a translation from a simple C-like
language into the language of our PLDI paper to show that the type system is expressive enough to be
used in a compiler.
I would like to extend the work on control flow faults to reason more exactly about
what the existing software solutions can do, even though they can only catch certain faults.
In general, I look forward to making these ideas more concrete by implementing them in a compiler
and extending them
beyond fault detection to also reason about fault recovery.
Each new type system for reasoning about fault tolerance
has generalized the schemes used by former work.
What started
as a simple division of values into either
the original or backup computation has evolved into a general
scheme for classifying groups of values with related reliability properties.
I would like
to develop a generic framework for reasoning in the presence of
transient faults, perhaps looking beyond type systems to more general logics.
Moving ahead, I would like to look into applying similar techniques to other situtations where systems may be unexpectedly
modified or fail without warning. In particular, I hope to investigate parallels with concurrent and distributed systems.
I will continue to improve code reliability by developing new techniques for reasoning about program behavior.
In particular, I hope to collaborate with researchers in other fields to identify domain-specific
issues that will benefit from formal reasoning.
References
[1]
Type-Preserving Compilation for Realistic Object-Oriented Compilers
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), To Appear June 2008
[2]
IEEE Symposium on Logic in Computer Sciences (LICS), June 2005
[3]
International Workshop on Aliasing, Confinement and Ownership (IWACO), July 2007
[4]
ACM SIGPLAN Conference on Generative Programming and Component Engineering (GPCE), October 2006
[5]
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2007
[6]
Reasoning about Control Flow in the Presence of Transient Faults
International Static Analysis Symposium (SAS), To Appear July 2008