by Andrew W. Appel
This textbook is in beta release. If you use this material,
please send your comments and feedback to Andrew Appel.
Introduction
This electronic book is Volume 3 of the
Software Foundations
series, which presents the mathematical underpinnings of reliable software.
Some software should be
deductively verified for functional
correctness, that is, you should prove that it behaves correctly
on all inputs. That can't be achieved by
testing alone, unless
there are only a finite number of possible inputs. I say "some
software" because many programs, perhaps most programs, don't
need to be correct; or we can't even specify what the "correct"
behavior would be. But the programs we must rely on—to protect
us from malicious inputs sent by hackers, to protect one untrusted
program from another, to calculate life-critical values in
airplanes or medical devices—are worth writing carefully and
then proving correct.
I say "deductively verified" to indicate the process of stating
theorems and then deriving machine-checked proofs. There are
other kinds of verification—just look up
https://en.wikipedia.org/wiki/Software_verification
to see what people can mean by the term. And the proofs must
be machine-checked: unlike theorems in mathematics—which often
have concise statements and sometimes beautiful, understandable,
concise proofs—software is big and sometimes complex, its specifications
are sometimes large, the proofs are sometimes huge. And programs
never stay still: they are
maintained, and their proofs must
be maintained with them. Therefore, while humans can usually
check proofs in mathematics, it takes a computer to check proofs
in software engineering.
To do software verification: program in a
programming language,
use the
proof theory (or
program logic) of that language
to state correctness theorems and prove them, and use a
proof assistant (or sometimes a
theorem prover) to manage
the creation of the proof, check the proof, and check the
correspondence of the proof to the program.
This process was first outlined by Floyd and Hoare in 1969,
where the programming language was a small subset of Algol,
the program logic was Hoare Logic, and there were no proof
assistants or computerized proof checkers usable for Hoare Logic
(though Turing outlined proof checkers in his famous 1936 paper).
Floyd and Hoare's early approach has descendants today:
separation-logic checkers for languages such as C and Java.
But the proof theory, the program logics, of C and Java are
quite complex and difficult to use; in part because of the
side effects (mutable data structures) of imperative languages.
It is much better, simpler, easier to write programs in a clean,
purely functional language with a simple, expressive proof theory.
Then the proofs are much more elegant and enjoyable. That's what
we will do in
Verified Functional Algorithms. The language is
Gallina, the pure functional language embedded in the Coq logic.
The proof theory is Coq's Calculus of Inductive Constructions
(CiC), the proof assistant is Coq, and the mechanical checker,
both for proofs of theorems and correspondence of the theorems to
the programs, is the Coq kernel. Programs can then be
extracted
from Coq to run in ML (Ocaml), Haskell, Lisp, or other functional
languages.
There is a disadvantage to this approach: sometimes a pure
functional program must pay a factor-of-logN penalty (in
execution time) compared to an imperative program. This is because
search trees (or other tree data structures) must be used where
the imperative program would use a mutable array. So
functional algorithms (as compared to imperative algorithms)
can be used only where this factor-of-logN penalty is affordable.
Fortunately, many application domains can tolerate such performance:
for example, the success of the CompCert verified optimizing C
compiler, which pays the logN penalty but is generally about as
fast as gcc, reminds us that compilers don't have to be blazingly
fast, they just have to be fast enough.
And we do care about efficient performance! In this volume we
will study algorithms and data structures that are asymptotically
efficient. You will learn how to specify and verify (prove the
correctness of) sorting algorithms, binary search trees, balanced
binary search trees, and priority queues. We will not prove
in
Coq their asymptotic efficiency, but we will reason about it
informally (that is, using proofs that mathematicians would call
"formal" but proof-assistant users call "informal.") And, in some
chapters, we will extract the programs to ML, compile them with
the optimizing Ocaml compiler, and measure their performance.
Prerequisites and practicalities
Before using this book, you should have some understanding of
these algorithms and data structures, available in any standard
undergraduate algorithms textbook.
You should understand all the material in
Software Foundations Volume 1 (Logic Foundations),
but you will not need anything from Volume 2.
The exposition here is intended for a broad range of readers,
from advanced undergraduates to PhD students and researchers.
The principal novelty of
Software Foundations is that it is one hundred
percent formalized and machine-checked: the entire text is
literally a script for Coq. It is intended to be read alongside
an interactive session with Coq. All the details in the text are
fully formalized in Coq, and the exercises are designed to be
worked using Coq.
Chapter Dependencies
Before using
Verified Functional Algorithms, read
(and do the exercises in) these chapters of
Software Foundations Volume I:
Preface, Basics, Induction, Lists, Poly, Tactics, Logic,
IndProp, Maps, and perhaps (ProofObjects), (IndPrinciples).
In this volume, the core path is:
VFA ->
Perm ->
Sort ->
SearchTree ->
Redblack
with many optional chapters whose dependencies are,
The
Color chapter is advanced material that should not be
attempted until the student has had experience with most
of the earlier chapters, or other experience using Coq.
System Requirements
Coq runs on Windows, Linux, and OS X. The Preface of
Volume 1 describes the Coq installation you will need.
This edition was built with Coq 8.5pl
2.
In addition, two of the chapters ask you to compile and run
an Ocaml program; having Ocaml installed on your computer is
helpful, but not essential.
Exercises
Each chapter includes numerous exercises. Each is marked with a
"star rating," which can be interpreted as follows:
- One star: easy exercises that underscore points in the text
and that, for most readers, should take only a minute or two.
Get in the habit of working these as you reach them.
- Two stars: straightforward exercises (five or ten minutes).
- Three stars: exercises requiring a bit of thought (ten
minutes to half an hour).
- Four and five stars: more difficult exercises (half an hour
and up).
Also, some exercises are marked "advanced", and some are marked
"optional." Doing just the non-optional, non-advanced exercises
should provide good coverage of the core material. Optional
exercises provide a bit of extra practice with key concepts and
introduce secondary themes that may be of interest to some
readers. Advanced exercises are for readers who want an extra
challenge (and, in return, a deeper contact with the material).
Please do not post solutions to the exercises in any public place:
Software Foundations is widely used both for self-study and for
university courses. Having solutions easily available makes it
much less useful for courses, which typically have graded homework
assignments. The authors especially request that readers not post
solutions to the exercises anyplace where they can be found by
search engines.
Downloading the Coq Files
A tar file containing the full sources for the "release version"
of these notes (as a collection of Coq scripts and HTML files) is
available here:
http://www.cs.princeton.edu/~appel/vfa/vfa.tgz
If you are using the notes as part of a class, your professor
may give you access to a locally extended version of the files,
which you should use instead of the release version.
Acknowledgment
Development of this book was supported in part by
National Science Foundation grant CCF-1521602.
For instructors and contributors
If you intend to use these materials in your own course, you will
undoubtedly find things you'd like to change, improve, or add.
Your contributions are welcome!
To keep the legalities of the situation clean and to have a single
point of responsibility in case the need should ever arise to
adjust the license terms, sublicense, etc., we ask all
contributors (i.e., everyone with access to the developers'
repository) to assign copyright in their contributions to the
appropriate "author of record," as follows:
I hereby assign copyright in my past and future contributions
to the Software Foundations project to the Author of Record of
each volume or component, to be licensed under the same terms
as the rest of Software Foundations. I understand that, at
present, the Authors of Record are as follows: For Volumes 1
and 2, known until 2016 as "Software Foundations" and from
2016 as (respectively) "Logical Foundations" and "Programming
Foundations," the Author of Record is Benjamin Pierce. For
Volume 3, "Verified Functional Algorithms", the Author of
Record is Andrew W. Appel. For components outside of
designated Volumes (e.g., typesetting and grading tools and
other software infrastructure), the Author of Record is
Benjamin Pierce.
To get started, please send an email to Benjamin Pierce and Andrew Appel,
describing yourself and how you plan to use the materials and including
(1) the above copyright transfer text and
(2) the result of doing "htpasswd -s -n NAME"
where NAME is your preferred user name.
We'll set you up with access to the subversion repository and
developers' mailing lists. In the repository you'll find a
file
INSTRUCTORS with further instructions.