Power
Acknowledgement: The materials in this lecture were derived from course materials created by Bob Harper and others at CMU for course number 15150.
Quick Links:
 Definition of when two programs are equal
 List of generalpurpose equational reasoning rules
 Definition of a valuable expression
 Definition of total and partial functions
 Template for a proof about lists
Reasoning About Programs
In the last note, we discussed the operational semantics of simple ML programs. This operational semantics provides us with a means to prove some simple things about our the programs we write. For example.
Theorem: The expression
let x = 1 in let y = x + 2 in x + y
evaluates to the value 4
.
Proof: By the definition of the operational semantics of ML:
let x = 1 in let y = x + 2 in x + y > let y = 1 + 2 in 1 + y > let y = 3 in 1 + y > 1 + 3 > 4QED.
That was a good proof (of a pretty easy theorem). Each line was justified by the operational semantics of ML. Evaluating expressions using the operational semantics is always going to be at the heart of our proofs, but we need a few more ideas to prove interesting results about our programs. In particular, rather than simply proving that a function operates correctly on a single input (by evaluating the function as applied to that input using the operational semantics), we will look at proofs that show that a function operates correctly on all inputs. These latter kinds of proofs allow us to show once and for all, for instance, that a sorting function sorts every list we supply it with properly. In general, we need to be able to reason that small parts of our programs (like the sorting function) behave correctly on all inputs. Then we will build up assurance that our overall application, which uses these many small pieces, behaves correctly.
Review: Substitution Notation
Recall the substitution notation that we intoduced in the previous note. In general, the notation looks like this:
e1[e2/x]It stands for the expression you get when you systematically replace all free occurrences of the variable x in e1 with e2. For example if we have an expression like (x + 7*x) and we systematically replace x with (2+3), we wind up with ((2+3) + 7*(2+3)). In other words:
(x + 7*x)[2+3/x] == ((2+3) + 7*(2+3))
As another example:
(let y = 13 + x in let x = 2 + y in x + y)[7/x] == let y = 13 + 7 in let x = 2 + y in x + y
We do not replace the 2nd occurrence of x with 7 because that second occurrence of x refers to the x bound by the inner let statement. In other words, it refers to 2 + y.
Substitution is a critical operation because it defines how functions and let statements and pattern matching execute, and when it comes down to it, all our reasoning about programs comes back to reasoning about how those programs execute.
Equational Reasoning
The kind of reasoning we will performing about our programs is called equational reasoning. In other words, we will be proving that one expression is equal to another. For example, we might want to prove that for any list l,
l == reverse (reverse l)In other words, for any list l, the reverse of the reverse of a list l is equal to l.
What does it mean for two expressions to be equal to one another?
Definition [Equivalent Expressions]: Two expressions are equivalent if and only if
 the both evaluate to the same value, or
 they both loop forever, or
 they both raise the same exception.
In order to actually prove complex expressions are equal to one another, we can rely on the following facts about expression equivalence.
 reflexivity: for any expression e, e == e
 symmetry: for any expressions e1 and e2, if e1 == e2 then e2 == e1
 transitivity: for any expressions e1, e2 and e3, if e1 == e2 and e2 == e3 then e1 == e3
 congruence: for any expressions e1, e2 and e, if e1 == e2 then e[e1/x] == e[e2/x]
 alphaequivalence: for any expressions e1 and e2, if e1 differs from e2 only by consistent renaming of variables then e1 == e2
 syntactic sugar: a let declaration such as this one:
let f x = e
is equivalent to an ordinary let declaration for an anonymous function:let f = fun x > e
 eval: for any expressions e1 and e2 if e1 > e2 then e1 == e2
x + y == y + xno matter what
x
and y
are.
Here are some simple examples of equivalent expressions using the rules above, with justifications to their right. We will refer to previous things we have proven using the numbers (1), (2), (3), etc. We write "eval" when expressions are equivalent by evaluation. We write "alpha" when they are equivalent by alphaconversion.
(1) 1 == 1 (reflexivity) (2) let x = 3 in x + x == let x = 3 in x + x (reflexivity) (3) 1 + 2 == 3 (eval) (4) 3 == 1 + 2 ((3), symmetry) (5) 5 + 8 == 3 (eval) (6) 5 + 8 == 1 + 2 ((4), (5), transitivity) (7) let x = 1 in x == let y = 1 in y (alpha) (8) 1 + 2 == (fun x > 1 + x) 2 (eval, symmetry) (9) let f x = x+1 in f 3 == let g = fun y > y+1 in g 3 (alpha (twice), syntactic sugar) (10) (fun x > 1 + x) 2 == 5 + 8 ((6), (8), transitivity, symmetry) (11) (5 + 8) * y == (1 + 2) * y ((6), congruence)The last equivalence is the most fun. It uses the congruence rule and it considers an expression with a free variable in it  the variable y. The reasoning goes like this: since we know that (5 + 8) is equivalent to (1 + 2) (by line 6), we can substitute those equivalent expressions in to some other bigger expression (in this case, substitute them for
x
in the expression x * y
) and get an equivalent expression. Notice that it does
not matter that we do not know what y
is. No matter what value it takes on,
the two expressions are equivalent. Here is another example
of using congruence:
f (z + 0) == f zIn the case above, we know that
z
is equivalent to z + 0
as this is a property
of the builtin addition operation. Hence, by congruence, we can substitute one expression for the
other. (It does not matter what z is or what the function f does.)
Finally, let's look at an ever so slightly more interesting theorem involving the easy function defined below.
let easy (x:int) (y:int) (z:int) : int = x * (y + z) ;;
Theorem: for all integer values a, b, c, easy a b c == easy a c b.
Proof: By simple equational reasoning.
easy a b c == a * (b + c) (eval) == a * (c + b) (property of +, congruence) == easy a c b (eval, symmetry)
QED
Above, through a series of steps, and use of transitivity,
we conclude that easy a b c == easy a c b. Notice that when we moved
from the 2nd last line to the last line of the proof, we performed "evaluation in reverse."
In other words a * (c + b)
is the body of the easy function (with a, c and b
substituted for x, y, and z) and we claimed that was equivalent to easy a c b
.
Evaluation in reverse is justified by evaluation in the normal direction plus
the rule of symmetry.
Now that you have the basics down, we can start to move a little faster. We will cease to mention uses of basic rules such as symmetry, transitivity or congruence and only mention the "interesting" parts of the proof. For instance, we will streamline the proof above as follows:
Proof: By simple equational reasoning.
easy a b c == a * (b + c) (eval) == a * (c + b) (math) == easy a c b (eval)
QED
Even though we have slimmed down our proofs a little, we will always write our proofs in 2column style. The first column includes the facts that we have proven. The second column has a justification. The justification explains why were able to conclude that line was true. Proofs in this style are very easy to read and hence are easier to get right than other styles. Sometimes we want to refer to lines of the proof that appear earlier (ie: not just the previous line). In this case, we will number our lines. For example:
Proof: By simple equational reasoning.
(1) easy a b c (2) == a * (b + c) (eval) (3) == a * (c + b) (math) (4) == easy a c b (eval)
QED
The Value of Values
Before we can move on to doing some really interesting proofs, there is one more basic concept that we must learn. As you know, the operational semantics for ML programs state that you may call a function, substituting the arguments for the parameters provided those arguments have been evaluated to a value.
let inc (x:int) : int = x + 1For instance, we know that:
inc 3 > 3 + 1 > 4And consequently, we may conclude that
inc 3 == 4
by the (eval) rule for
expression equivalence. But what about
the expression inc (y + 3)
where y
is a free integer variable?
We might like to reason as follows:
inc (y + 3) == (y + 3) + 1 == y + 4Indeed, this sort of proof is valid. It is valid because we know that no matter what value we substitute for
y
, y + 3
will evaluate to some other value (call it
v
) and then we can apply inc
to v
and continue.
That kind of reasoning leads one to believe that perhaps it is the case that if you have a function and an argument expression to that function then you can always substitute the argument expression for the parameter in the body of the function. In other words, one might conjecture that the following equational rule is true:
 Bad Eval Rule: For any expression
exp
and for any functionf
defined as followslet f x = body
The following expressions are equivalentf exp == body[exp/x]
Since I called the rule the Bad Eval Rule, you have a pretty big clue
that the above rule is NOT a valid equivalence rule. Here is a counterexample. Consider
the function forever
, which loops forever and the function one
,
which, when called, always returns the value 1:
let rec forever () : int = forever () let one (x:int) : int = 1Now, using the Bad Eval Rule we can immediately prove that:
one (forever ()) == 1But that is obviously incorrect because the lefthand side of the equation loops forever and never returns a value whereas the righthand side returns the value 1 immediately. According to our basic definition of equivalence, the two are not equal.
Clearly, the problem with the Bad Eval Rule is that it allows any expression to be substituted for a function argument. If it was a bit more restrictive and only allowed for substitution of those function argument expressions that are guaranteed to always terminate and produce a value then it would be a good rule.
Definition [Valuable Expression] An expression that always terminates and produces a value is called a valuable expression (no pun intended).
The following expressions are all examples of valuable expressions:
 constants: any constant (eg: 1, 2, "hello", 3.5, 'a', [], etc.) is valuable
 values: any value (eg: (2,3), [14], None, fun x > x + x) is valuable
 variables: any variable
x
is valuable because such variables will be replaced by values when the code is executed.  pairs: if
exp1
andexp2
are both valuable expressions then pair expression(exp1, exp2)
is also valuable  lists: if
exp1
andexp2
are both valuable expressions then list expressionexp1::exp2
is also valuable  any other data type constructor: if
exp
is a valuable expression andC
is a data type constructor (perhaps for an option or a tree or a set or a table or a shape or anything else) thenC exp
is also valuable  if statements: an if statement
if exp then exp2 else exp3
is value ifexp1
,exp2
andexp2
are all valuable expressions. (pattern matching statements are similar if they are not missing any pattern matching branches).  total functions: if
f
is a total function andexp
is a valuable expression thenf exp
is a valuable expression as well.
In the last case, we referred to the idea of a total function.
Definition [Total Function] A total function is a function that when supplied with any argument (of the correct type) always terminates and returns a value.
Definition [Partial Function] A partial function is a function that when supplied with some argument (of the correct type) will fail to terminate or will raise an exception.
For example, the function inc
defined above is a total function 
it always terminates and adds one. Many of the builtin operations such as +, , and ^
are total functions. However, some are not. For instance, the function
List.hd
, which returns the first element of a list, is a partial function as
it raises an exception when its argument is the empty list. In general, we discourage
your from using partial functions like List.hd, because they are
harder to reason about  you have to be careful that you don't miss a case.
Using pattern matching (with an exhaustive set of cases)
is preferable. Of course, one must use judgement. There are some effective uses of exceptions  one
just does not want to overuse them.
Recursive functions can be total functions. For instance, a listprocessing function like List.length is total  it will terminate and produce an integer when given any list. Here is the definition of List.length:
let rec length (xs:'a list) : int = match xs with [] > 0  hd::tail > 1 + length tail
We can tell that length is a total function because:
 It does not itself call any partial functions.
 It does not have any incomplete pattern matching.
 Whenever length is called recursively, it does so on a strictly smaller argument
(and there are no infinite sequences of lists that get smaller and smaller forever). In particular,
in the second branch of the match statement, the recursive call
length tail
involves an argument tolength
(ie:tail
) that is strictly smaller than the input, which ishd::tail
.
Consequently, length terminates on all inputs, never raises an exception and always returns a value.
The following are examples of partial functions.
let rec forever (xs:'a list) : int = match xs with [] > 0  hd::tail > 1 + forever (1::tail) let rec oops (xs:'a list) (acc:int) (num:int) : int = match xs with [] > acc / num  hd::tail > oops tail (acc+hd) (num+1)
Above, the function forever
loops forever because we pass the recursive call an argument
that is not shorter than the input list. Hence, forever
is a partial function. (Note, it
does terminate in one case  when given the empty list  but it must terminate in all cases to be total.)
The function oops
has a subtler problem. It might raise a dividebyzero error if the
denominator num
turns out to be zero in the base case of the recursion. Therefore, oops
is not total.
As an aside, one reason I dislike the design of Java is that most interesting expressions in Java, like method calls and field dereferences, are partial (not total) and might raise an exception because Java's type system does not distinguish between null and nonnull values (like ML's type system distinguishes between values with option type and nonoption type). In contrast, most (or at least many, if you follow the style advocated in this class) functions in ML are total. This is one of the reasons that Java is substantially harder to reason about than ML in theory, and in practice you see the same thing: lots of null pointer dereference bugs show up in real applications.
Finally, we can now state our good, valuable eval rule:
 Eval Value Rule: For any valuable
expression
exp
and for any functionf
let f x = body
The following expressions are equivalentf exp == body[exp/x]
More generally, any property that is true of ALL values of a particular
type also holds for the corresponding valuable expressions. For
instance, for any valuable
expressions exp1
and exp2
, we know all of
the following equations hold (and lots more to boot):
arithmetic equations:
exp1 + exp1 == 2*exp1or
0*exp1 == 0
(Any many other such equations.)
let evaluation:
let x = exp1 in exp == exp[exp1/x]
pair evaluation:
let (x,y) = (exp1,exp2) in exp == exp[exp1/x][exp2/y]
option evaluation:
match (Some exp1) with None > exp3  Some x > exp4 == exp4[exp1/x]
list evaluation:
match (exp1 :: exp2) with [] > exp3  hd::tail > exp4 == exp4[exp1/hd][exp2/tail]
data evaluation: For any constructor C:
(match (C exp1) with C x > exp3  D x > exp4  ...) == exp3[exp1/x]Feel free to use any such equations in your proofs and refer to them as "eval value rules" or "valuable expression rules". To reemphasize: you can manipulate a valuable expression in your proof just like you manipulate any ordinary value (like a number or string or a pair).
An Example Proof Using Valuable Expressions
Recall the length function on lists:
let rec length (xs:int list) : int = match xs with [] > 0  hd::tl > 1 + length tail
Now, assume that functions f and g are total with type int > int and prove the following statement for any integers m and n:
length (f m :: g n :: []) == 2
Note first that (f m) is valuable since f is total. Likewise (g n) is valuable since g is total. Consequently, both (f m :: g n :: []) and (g n :: []) are valuable since a list is valuable iff it is constructed exclusively from valuable expressions. With those facts in mind, here is the proof:
length (f m :: g n :: []) == match (f m :: g n :: []) with [] > 0  hd::tl > 1 + length tl (by eval value since f m :: g n :: [] valuable) == 1 + length (g n :: []) (by eval value since f m :: g n :: [] valuable) == 1 + (match (g n :: []) with [] > 0  hd::tl > 1 + length tl) (by eval value since g n :: [] valuable) == 1 + (1 + length []) (by eval value since g n :: [] valuable) == 1 + (1 + (match [] with [] > 0  hd::tl > 1 + length tl)) (by eval  [] is a value) == 1 + (1 + (0)) (by eval  [] is a value) == 2 (by math)
Revising the Example Proof
You will have noticed that we wind up having to write down a lot of side
conditions specifying that certain expressions are valuable. That's the
correct way of doing this, but it takes a lot of work. What we are going
for (most of) the rest of the class is to pretend we are working in a
sublanguage where all computations are pure (they have no side effects,
such as raising an exception or printing or modifying mutable data
structures) and
terminating. Hence you can freely use the sorts of laws above without
explicitly stating that certain expressions are valuable everywhere.
(
length (f m :: g n :: []) == match (f m :: g n :: []) with [] > 0  hd::tl > 1 + length tl (by eval) == 1 + length (g n :: []) (by eval) == 1 + (match (g n :: []) with [] > 0  hd::tl > 1 + length tl) (by eval) == 1 + (1 + length []) (by eval) == 1 + (1 + (match [] with [] > 0  hd::tl > 1 + length tl)) (by eval) == 1 + (1 + (0)) (by eval) == 2 (by math)
A Proof about a ListProcessing Function
It was easy to prove a property of all inputs to the easy
function primarily
because it was not a recursive function. When functions are recursive we can't just use
a small number of evaluation steps to reach any interesting conclusions. Listprocessing
functions are a good source of recursion, so we will explore some proofs about them.
As an example, let's
consider the function double
below.
let rec double (xs:int list) : int list = match xs with [] > []  hd::tail > hd::hd::(double tail)Now let's prove this theorem:
Theorem: for all integer lists xs, length (double xs) == 2 * (length xs).
To do so, recall the definition of lists: Every listxs
is either:

[]
: the empty list 
y::ys
 a list with at least one element (y) and a tail (ys)
y::ys
, we will assume
that our theorem holds for the smaller list ys
. In other words,
we will assume that:
length (double ys) == 2 * (length ys)when we are trying to prove that
length (double (y::ys)) == 2 * (length (y::ys))This assumption is called the induction hypothesis (IH). Whenever we do a proof about a recursive function, we will rely on such an induction hypothesis.
Ok, here goes the first case of the proof. In this case, we are considering the possibility that xs is the empty list []. So we replace xs with [] in the theorem. Then we try to prove that the lefthand side of the theorem is equal to the righthand side of the theorem. (Notice that when we start our proof, we describe the methodology that we use to do the proof  in this case, a proof by induction on the structure of lists.)
Proof: By induction on the structure of lists.
case xs == []: (1) length (double []) (2) == length (match [] with [] > []  hd::tail > hd::hd::double tail) (eval double) (3) == length [] (eval match expression) (4) == 0 (eval length) (5) == 2 * 0 (math) (6) == 2 * (length []) (eval length in reverse)We succeeded on the first case. Line (1) is the lefthand side of the theorem with xs replaced by [] and the last line (6) is the righthand side of the theorem with xs replaced by []. Every line in the middle is justified by simple equational reasoning.
Let's move on to the second case where xs == y::ys
for some values
y
and ys
. In this proof, we are going to use 2 wellknown
properties of the length function. For any list expression e and any element, y we know
that:
length (y::e) == 1 + length eHence, by applying that equation twice in a row, we also know that if we have two elements on the front of the list (say y1 and y2), we can conclude:
length (y1::y2::e) == 1 + 1 + length e == 2 + length e
On to the proof:
case xs == y::ys: (1) length (double (y::ys)) (2) == length (match y::ys with [] > []  hd::tail > hd::hd::(double tail)) (eval double) (3) == length (y::y::(double ys)) (eval match expression) (4) == 2 + length (double ys) (property of length) (5) == 2 + 2*(length ys) (IH) (6) == 2*(1 + length ys) (math) (7) == 2*(length (y::ys)) (property of length)
Success! We showed that the lefthand side of the theorem we were trying to prove (with y::ys replacing xs) was equivalent to the righthand side of the theorem (also with y::ys replacing xs). The key step was lines (4) to (5) where we used the induction hypothesis. Isolating those two lines, we showed:
(4) 2 + length (double ys) (5) == 2 + 2*(length ys) (IH)We were able to do that since the induction hypothesis in this case specifically states that
length (double ys) == 2*(length ys)Hint: When doing your proofs, it often helps for you to write down all the available information, so we recommend that you always write down the inductive hypothesis that you are allowed to use before starting a particular case of a proof.
A General Schema for Proofs About Lists
In general, when we program with lists, we saw in the past that we often find ourselves writing functions with the form:
let rec f xs = match xs with [] > ... no recursive calls ...  hd::tail > ... f tail ...The double function was one example. (Note that f can be called on tail two times, three or more times or zero times in the recursive branch. It does not have to be exactly once for the pattern to work.)
Likewise, proofs about lists also
have a typical form. Suppose we have a theorem
of the form: "for all lists xs
,
some property P holds of xs
". Property P might be a
property about a function like f (or double) applied to the list xs.
In such situations, we typically use
induction and our proofs will be structured as follows:
case xs == []: ... must show that property P holds of [] ... case xs == y::ys: ... must show that property P holds of y::ys ... ... this case uses the induction hypothesis, which states that P holds of ys ...
Another Example
Consider the functions sum_list
and concat
:
let rec sum_list (bs:int list) : int = match bs with [] > 0  hd::tail > hd + sum_list tail let rec concat (bs:int list) (cs:int list): int = match as with [] > cs  hd::tail > hd::(concat tail cs)
Now a theorem:
Theorem: for all integer lists xs and zs, sum_list (concat xs zs) == sum_list xs + sum_list zs
We will prove this theorem by induction. Since there are two
lists involved (xs and zs), we should indicate the list
that we will analyzing by cases  formally, speaking, "the list
over which we perform the induction." We will perform induction
over the list xs. We will also use the fact that concat
is a total function and therefore that concat exp
is
valuable for any valuable expression exp
.
Recall that a total function is any function that terminates and
produces a value given any input.
case xs == []: Must Prove: sum_list (concat [] zs) == sum_list [] + sum_list zs Proof: sum_list (concat [] zs) == sum_list zs (eval concat) == 0 + sum_list zs (math) == sum_list [] + sum_list zs (eval sum_list in reverse) case xs == y::ys: Must Prove: sum_list (concat (y::ys) zs) == sum_list (y::ys) + sum_list zs Induction Hypothesis: sum_list (concat ys zs) == sum_list ys + sum_list zs Proof: sum_list (concat (y::ys) zs) == sum_list (y::(concat ys zs)) (eval concat) == y + sum_list (concat ys zs) (eval sum_list) == y + sum_list ys + sum_list zs (IH) == sum_list (y::ys) + sum_list zs (eval sum_list in reverse) QED!
Hint: You will notice above that we wrote down what we
had to prove concretely for each case before we started (in the first
case, we substituted []
for xs
in the
theorem and in the second case, we substituted y:ys
for
xs
in the theorem). This is often a helpful trick.
Hint: You may start to see a pattern with some of these proofs. A common idiom is to use some amount of evaluation and/or math initially. Then use the induction hypothesis. Then use some amount of evaluation in reverse to sew things back together in the correct form.
Hint: You do not have to figure out a proof from the left to the right. You can start on the right and prove equalities, trying to come closer to the lefthand side. You can start on the left and prove equalities, trying to come closer to the righthand side. You can start from both sides and try to meet in the middle. Once you have figured it out, if you have made a mess, you should clean it up and present it nicely as I have done in these notes.
Summary 1: Reasoning in General
In this note, we looked at how to do proofs about functional programs. More specifically, we looked at how to prove that two different functional programs are equivalent. These kinds of proofs use the following basic rules about equivalence: reflexivity: for any expression e, e == e
 symmetry: for any expressions e1 and e2, if e1 == e2 then e2 == e1
 transitivity: for any expressions e1, e2 and e3, if e1 == e2 and e2 == e3 then e1 == e3
 congruence: for any expressions e1, e2 and e, if e1 == e2 then e[e1/x] == e[e2/x]
 alphaequivalence: for any expressions e1 and e2, if e1 differs from e2 only by consistent renaming of variables then e1 == e2
 syntactic sugar: a let declaration (or expression) such as this one:
let f x = e
is equivalent to an ordinary let declaration (or expression) for an anonymous function:let f = fun x > e
 eval: for any expressions e1 and e2 if e1 > e2 then e1 == e2
 valuability: Any equational property of values also holds of corresponding valuable expressions.
let f x = body
we know that f e == body[e/x]
.
Summary 2: Reasoning About Lists
In addition to using these general rules of equivalence, we also looked at the structure of proofs about listprocessing functions. These listprocessing proof were proofs by induction on the structure of the lists that were processed. In order to complete such a proof, it is typically necessary to use the induction hypothesis, which is a statement of the theorem being proven applied to a strictly smaller list than the list about which the theorm is being proven.
A key thing to remember is the general template for doing proofs about lists. When showing that a property P holds of all lists, the proof structure usually looks like the following:
Proof: Proof by induction on the structure of the list xs. case xs == []: To show: P([]) ... proof that property P holds of [] ... case xs == y::ys: To show: P(y::ys) IH: P(ys) ... proof that property P holds of y::ys ... ... uses the induction hypothesis, which states that P holds of ys ...
Keep in mind that just like when one is programming with lists, when one is proving with lists there are some other ways to break down the cases (eg: on occasion, one uses cases for [], [y], and y1::y2::ys, for instance and the induction hypothesis is used over ys in the latter case. The rule of thumb is that the induction hypothesis can only be used over strictly smaller lists than the input list.).