Foundational Proof-Carrying Code

The FPCC project 1999-2005 was led by Andrew W. Appel at Princeton University. The goal was to build Proof-Carrying Code systems with machine-checked proofs of security from the foundations of logic.

Background. In the 1990s, researchers were working on the question, "Can we automatically prove that machine-language programs are safe?" For example, could you safely embed Java applets in the same memory space as the rest of your browser? In principle you could, if the Java type system were sound, if the Java bytecode verifier didn't have any bugs, and if the just-in-time compiler had no bugs. But how could you prove all that? The verification technology of the 1990s was not up to the task of proving correctness of a compiler.

Research 1978-1995 by Robin Milner, Mads Tofte, Gordon Plotkin, Matthias Felleisen, Robert Harper, David MacQueen, and many others had established the mathematical methods of proving the soundness of type systems: if your program type-checks, then it won't go wrong at run time. In 1994-1996 Zhong Shao, Greg Morrisett, and others had demonstrated type-preserving compilers, that could transform a well-typed source program (in ML) into a well-typed low-level intermediate language. In 1997-98 George Necula, Peter Lee, and others demonstrated Proof-Carrying Code, a compiler and checker that could take a type-checked BPF or Java program and produce an assembly-language program with a PCC-checker, so that in principle if the PCC checker accepts your program then it will stay in the Java sandbox. In 1999 Greg Morrisett, David Walker, and others reformulated the PCC checker as "Typed Assembly Language" (TAL).

But how do you prove that your TAL-checker is sound? That if your TAL-checker accepts your program then your program will stay in its sandbox? In 1999 the state of the art was proofs in English+math about an abstraction of the TAL with respect to an idealized machine. We wanted a machine-checked proof about a real source language, a real TAL, and a real machine.

The Princeton FPCC project focused on proving the correctness of a PCC-checker (a TAL). The source language was Standard ML; the target language was SPARC machine language; and the proof language was Twelf. That is, given a source program in ML, the system would produce a SPARC machine-language program with a machine-checkable Twelf proof that the machine-language program was memory-safe (stayed in its sandbox).

Research results

We started in 1999 with an ambitious goal. It took a lot of exploration to find our way, but by 2005 we had all the pieces put together.

Explorations of what proof system to use

began with lambda-Prolog (Appel & Felty ICSP'99) before settling on Twelf (Appel & Felty TPLP'04).

Semantic methods for proving soundness of type systems

began with Appel & Felty (POPL'00) for a language without function-pointers or mutable references, extended to function-pointers and quantified types by Appel & McAllester (TOPLAS'01) and to mutable references by Ahmed, Appel, and Virga (LICS'02, Tech report '03, Ahmed Phd Thesis '04). This introduced the method of step-indexed models, which eventually became step-indexed logical relations (Ahmed & Dreyer in various publications), and became also modal models of type systems and program logics (Appel, Mellies, Richards, Vouillon POPL'07, Richard PhD Thesis 2010). Later lines of research by Appel, by Ahmed, by Dreyer, by Birkedal, and others led to applications in program logics such as Iris, VST, and others.

Specifications of machine language

were needed because you can't prove the safety of a machine-language with knowing what the machine language does. See Michael & Appel (CADE'00).

Our type-preserving compiler

had a front-end built at Yale by Zhong Shao's group, and a back end built by Juan Chen in her Princeton PhD thesis (Chen, Wu, Appel, Fang PLDI'03, Chen PhD Thesis 2004).

Connecting the TAL to its semantic type-soundness proof

was done by Dinghao Wu based on a theory of control flow by Gang Tan and type parameterization by Kedar Swadi (Tan, Appel, Swadi, Wu VMCAI'04; Swadi PhD Thesis 2003, Tan PhD Thesis 2005, Wu PhD Thesis 2005).

The minimal trusted base.

If we were to claim, "we have a machine-checked proof that this TAL-checker is sound", you might ask, "do you have a proof that your proof-checker is sound?" To address this question, we built the smallest and simplest possible proof checker, based on the most trustworthy proof-checking algorithms in the refereed scientific literature (Appel, Michael, Stump, Virga JAR'03; Wu, Appel, Stump PPDP'03; Wu PhD Thesis '05)

Tying all the pieces together

was described by Ahmed, Appel, Richards, Swadi, Tan, and Wang (TOPLAS 2010).

Proof-Carrying Authentication

was a related side project, in which we applied these foundational principles to logic-based authentication and authorization systems: Appel and Felten CCS'99, Appel and Felten tech. report 2001, Bauer PhD Thesis 2003.

Funding

was provided by the Defense Advanced Research Projects Agency and the National Science Foundation, with smaller grants from IBM and Sun Microsystems.

PhD Theses

Kedar N. Swadi, Ph.D. (2003) Typed Machine Language.
Lujo Bauer, Ph.D. (2003) Access Control for the Web via Proof-Carrying Authorization.
Eunyoung Lee, Ph.D. (2004) Secure Linking: A Logical Framework for Policy-Enforced Component Composition.
Juan Chen, Ph.D. (2004) A Low-Level Typed Assembly Language with a Machine-checkable Soundness Proof.
Amal J. Ahmed, Ph.D. (2004) Semantics of Types for Mutable State.
Gang Tan, Ph.D. (2005) A Compositional Logic for Control Flow and its Application to Foundational Proof-Carrying Code.
Dinghao Wu, Ph.D. (2005) Interfacing Compilers, Proof Checkers, and Proofs for Foundational Proof-Carrying Code.
Christopher D. Richards, Ph.D. (2010) The Approximation Modality in Models of Higher-Order Types.

Research papers

Semantic Foundations for Typed Assembly Languages, by A. Ahmed, A. W. Appel, C. D. Richards, K. Swadi, G. Tan, and D. C. Wang. ACM Transactions on Programming Languages and Systems, 32(3):7.1-7.67, March 2010. [local copy]

A Very Modal Model of a Modern, Major, General Type System, by Andrew W. Appel, Paul-Andre Mellies, Christopher D. Richards, and Jerome Vouillon. POPL 2007: The 34th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January 2007. [local copy]

A Compositional Logic for Control Flow by Gang Tan and Andrew W. Appel. In 7th International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI), January 2006.

Social processes and proofs of theorems and programs, revisited. Andrew W. Appel, PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, 2004

Construction of a Semantic Model for a Typed Assembly Language, by Gang Tan, Andrew W. Appel, Kedar N. Swadi, and Dinghao Wu. In 5th International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI '04),, January 2004.

Policy-Enforced Linking of Untrusted Components (Extended Abstract), by Eunyoung Lee and Andrew W. Appel. European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 371-374, September 2003.

Foundational Proof Checkers with Small Witnesses, by Dinghao Wu, Andrew W. Appel, and Aaron Stump. 5th ACM-SIGPLAN International Conference on Principles and Practice of Declarative Programming, pp. 264-274, August 2003. [local copy]

A Provably Sound TAL for Back-end Optimization by Juan Chen, Dinghao Wu, Andrew W. Appel, and Hai Fang. PLDI 2003: ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 208-219, June 2003. [local copy]

Using Memory Errors to Attack a Virtual Machine by Sudhakar Govindavajhala and Andrew W. Appel, 2003 IEEE Symposium on Security and Privacy, pp. 154-165, May 2003.

A Kind System for Typed Machine Language, by Andrew W. Appel, Christopher D. Richards, and Kedar N. Swadi. Princeton University, October 2002.

An Indexed Model of Impredicative Polymorphism and Mutable References, by Amal Ahmed, Andrew W. Appel, and Roberto Virga. Princeton University, January 2003.

Secure Linking: a Framework for Trusted Software Components, by Eunyoung Lee and Andrew W. Appel. Princeton University CS TR-662-02, September 2002. Extended Version, CS TR-663-02, September 2002.

Polymorphic Lemmas and Definitions in Lambda Prolog and Twelf, by Andrew W. Appel and Amy P. Felty. Theory and Practice of Logic Programming 4 (1) 1-39, January 2004.

Dependent Types Ensure Partial Correctness of Theorem Provers, by Andrew W. Appel and Amy P. Felty. Journal of Functional Programming 14(1):3-19, January 2004.

A Stratified Semantics of General References Embeddable in Higher-Order Logic, by Amal J. Ahmed, Andrew W. Appel, and Roberto Virga. 17th Annual IEEE Symposium on Logic in Computer Science (LICS 2002), pp. 75-86, June 2002. (Preliminary version of July 2001.)

A Trustworthy Proof Checker, by Andrew W. Appel, Neophytos G. Michael, Aaron Stump, and Roberto Virga. Journal of Automated Reasoning 31:231-260, 2003. Previous version appeared in Verification Workshop - VERIFY 2002 and (jointly) in Foundations of Computer Security - FCS 2002 Copenhagen, Denmark, July 25-26, 2002.

JVM TCB: Measurements of the Trusted Computing Base of Java Virtual Machines, by Andrew W. Appel and Daniel C. Wang. Princeton University CS TR-647-02, April 2002.

Typed Machine Language and its Semantics, by Kedar N. Swadi and Andrew W. Appel, preliminary version of July 2001.

Foundational Proof-Carrying Code. Andrew W. Appel, in 16th Annual IEEE Symposium on Logic in Computer Science (LICS '01), pp. 247-258, June 2001.

Models for Security Policies in Proof-Carrying Code. Andrew W. Appel and Edward W. Felten, Princeton University Computer Science TR-636-01, March 2001.

Hints on Proving Theorems in Twelf. Andrew W. Appel, Princeton University, February 2000.

An Indexed Model of Recursive Types for Foundational Proof-Carrying Code. Andrew W. Appel and David McAllester. ACM Transactions on Programming Languages and Systems 23 (5) 657-683, September 2001.

Efficient Substitution in Hoare Logic Expressions. Andrew W. Appel, Kedar N. Swadi, and Roberto Virga. 4th International Workshop on Higher-Order Operational Techniques in Semantics (HOOTS 2000), pp. 35-50, September 2000.

Machine Instruction Syntax and Semantics in Higher Order Logic. Neophytos G. Michael and Andrew W. Appel. 17th International Conference on Automated Deduction (CADE-17), pp. 7-24, Springer-Verlag (Lecture Notes in Artificial Intelligence), June 2000.

Protection against untrusted code. Andrew W. Appel. IBM Developer Works, September 1999.

A Semantic Model of Types and Machine Instructions for Proof-Carrying Code, Andrew W. Appel and Amy P. Felty. 27th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '00), pp. 243-253, January 2000. [local copy]

Lightweight Lemmas in Lambda Prolog, Andrew W. Appel and Amy P. Felty, in 16th International Conference on Logic Programming, November 1999.

Proof-Carrying Authentication. Andrew W. Appel and Edward W. Felten, 6th ACM Conference on Computer and Communications Security, November 1999. [local copy]