VFAVerified Functional Algorithms

by Andrew W. Appel

This textbook is in beta release. If you use this material, please send your comments and feedback to Andrew Appel.

Introduction

This electronic book is Volume 3 of the Software Foundations series, which presents the mathematical underpinnings of reliable software.

Some software should be deductively verified for functional correctness, that is, you should prove that it behaves correctly on all inputs. That can't be achieved by testing alone, unless there are only a finite number of possible inputs. I say "some software" because many programs, perhaps most programs, don't need to be correct; or we can't even specify what the "correct" behavior would be. But the programs we must rely on—to protect us from malicious inputs sent by hackers, to protect one untrusted program from another, to calculate life-critical values in airplanes or medical devices—are worth writing carefully and then proving correct.

I say "deductively verified" to indicate the process of stating theorems and then deriving machine-checked proofs. There are other kinds of verification—just look up https://en.wikipedia.org/wiki/Software_verification to see what people can mean by the term. And the proofs must be machine-checked: unlike theorems in mathematics—which often have concise statements and sometimes beautiful, understandable, concise proofs—software is big and sometimes complex, its specifications are sometimes large, the proofs are sometimes huge. And programs never stay still: they are maintained, and their proofs must be maintained with them. Therefore, while humans can usually check proofs in mathematics, it takes a computer to check proofs in software engineering.

To do software verification: program in a programming language, use the proof theory (or program logic) of that language to state correctness theorems and prove them, and use a proof assistant (or sometimes a theorem prover) to manage the creation of the proof, check the proof, and check the correspondence of the proof to the program.

This process was first outlined by Floyd and Hoare in 1969, where the programming language was a small subset of Algol, the program logic was Hoare Logic, and there were no proof assistants or computerized proof checkers usable for Hoare Logic (though Turing outlined proof checkers in his famous 1936 paper). Floyd and Hoare's early approach has descendants today: separation-logic checkers for languages such as C and Java. But the proof theory, the program logics, of C and Java are quite complex and difficult to use; in part because of the side effects (mutable data structures) of imperative languages.

It is much better, simpler, easier to write programs in a clean, purely functional language with a simple, expressive proof theory. Then the proofs are much more elegant and enjoyable. That's what we will do in Verified Functional Algorithms. The language is Gallina, the pure functional language embedded in the Coq logic. The proof theory is Coq's Calculus of Inductive Constructions (CiC), the proof assistant is Coq, and the mechanical checker, both for proofs of theorems and correspondence of the theorems to the programs, is the Coq kernel. Programs can then be extracted from Coq to run in ML (Ocaml), Haskell, Lisp, or other functional languages.

There is a disadvantage to this approach: sometimes a pure functional program must pay a factor-of-logN penalty (in execution time) compared to an imperative program. This is because search trees (or other tree data structures) must be used where the imperative program would use a mutable array. So functional algorithms (as compared to imperative algorithms) can be used only where this factor-of-logN penalty is affordable.

Fortunately, many application domains can tolerate such performance: for example, the success of the CompCert verified optimizing C compiler, which pays the logN penalty but is generally about as fast as gcc, reminds us that compilers don't have to be blazingly fast, they just have to be fast enough.

And we do care about efficient performance! In this volume we will study algorithms and data structures that are asymptotically efficient. You will learn how to specify and verify (prove the correctness of) sorting algorithms, binary search trees, balanced binary search trees, and priority queues. We will not prove in Coq their asymptotic efficiency, but we will reason about it informally (that is, using proofs that mathematicians would call "formal" but proof-assistant users call "informal.") And, in some chapters, we will extract the programs to ML, compile them with the optimizing Ocaml compiler, and measure their performance.

Prerequisites and practicalities

Before using this book, you should have some understanding of these algorithms and data structures, available in any standard undergraduate algorithms textbook.

You should understand all the material in Software Foundations Volume 1 (Logic Foundations), but you will not need anything from Volume 2. The exposition here is intended for a broad range of readers, from advanced undergraduates to PhD students and researchers.

The principal novelty of Software Foundations is that it is one hundred percent formalized and machine-checked: the entire text is literally a script for Coq. It is intended to be read alongside an interactive session with Coq. All the details in the text are fully formalized in Coq, and the exercises are designed to be worked using Coq.

Chapter Dependencies

Before using Verified Functional Algorithms, read (and do the exercises in) these chapters of Software Foundations Volume I: Preface, Basics, Induction, Lists, Poly, Tactics, Logic, IndProp, Maps, and perhaps (ProofObjects), (IndPrinciples).

In this volume, the core path is:

VFA -> Perm -> Sort -> SearchTree -> Redblack

with many optional chapters whose dependencies are,

Sort -> ( Multiset or Selection or Decide )
SearchTree -> ( ADT or Extraction )
Perm -> Trie
Sort -> Selection -> SearchTree -> ADT -> Priqueue -> Binom

The Color chapter is advanced material that should not be attempted until the student has had experience with most of the earlier chapters, or other experience using Coq.

System Requirements

Coq runs on Windows, Linux, and OS X. The Preface of Volume 1 describes the Coq installation you will need. This edition was built with Coq 8.5pl₂.

In addition, two of the chapters ask you to compile and run an Ocaml program; having Ocaml installed on your computer is helpful, but not essential.

Exercises

Each chapter includes numerous exercises. Each is marked with a "star rating," which can be interpreted as follows:

One star: easy exercises that underscore points in the text and that, for most readers, should take only a minute or two. Get in the habit of working these as you reach them.
Two stars: straightforward exercises (five or ten minutes).
Three stars: exercises requiring a bit of thought (ten minutes to half an hour).
Four and five stars: more difficult exercises (half an hour and up).

Also, some exercises are marked "advanced", and some are marked "optional." Doing just the non-optional, non-advanced exercises should provide good coverage of the core material. Optional exercises provide a bit of extra practice with key concepts and introduce secondary themes that may be of interest to some readers. Advanced exercises are for readers who want an extra challenge (and, in return, a deeper contact with the material).

Please do not post solutions to the exercises in any public place: Software Foundations is widely used both for self-study and for university courses. Having solutions easily available makes it much less useful for courses, which typically have graded homework assignments. The authors especially request that readers not post solutions to the exercises anyplace where they can be found by search engines.

Downloading the Coq Files

A tar file containing the full sources for the "release version" of these notes (as a collection of Coq scripts and HTML files) is available here:

        http://www.cs.princeton.edu/~appel/vfa/vfa.tgz

If you are using the notes as part of a class, your professor may give you access to a locally extended version of the files, which you should use instead of the release version.

Acknowledgment

Development of this book was supported in part by National Science Foundation grant CCF-1521602.

For instructors and contributors

If you intend to use these materials in your own course, you will undoubtedly find things you'd like to change, improve, or add. Your contributions are welcome!

To keep the legalities of the situation clean and to have a single point of responsibility in case the need should ever arise to adjust the license terms, sublicense, etc., we ask all contributors (i.e., everyone with access to the developers' repository) to assign copyright in their contributions to the appropriate "author of record," as follows:

I hereby assign copyright in my past and future contributions to the Software Foundations project to the Author of Record of each volume or component, to be licensed under the same terms as the rest of Software Foundations. I understand that, at present, the Authors of Record are as follows: For Volumes 1 and 2, known until 2016 as "Software Foundations" and from 2016 as (respectively) "Logical Foundations" and "Programming Foundations," the Author of Record is Benjamin Pierce. For Volume 3, "Verified Functional Algorithms", the Author of Record is Andrew W. Appel. For components outside of designated Volumes (e.g., typesetting and grading tools and other software infrastructure), the Author of Record is Benjamin Pierce.

To get started, please send an email to Benjamin Pierce and Andrew Appel, describing yourself and how you plan to use the materials and including (1) the above copyright transfer text and (2) the result of doing "htpasswd -s -n NAME" where NAME is your preferred user name.

We'll set you up with access to the subversion repository and developers' mailing lists. In the repository you'll find a file INSTRUCTORS with further instructions.