Caml
Power

Acknowledgement: The materials in this lecture were derived from course materials created by Bob Harper and others at CMU for course number 15-150.

Quick Links:

Reasoning About Programs

In the last note, we discussed the operational semantics of simple ML programs. This operational semantics provides us with a means to prove some simple things about our the programs we write. For example.

Theorem: The expression let x = 1 in let y = x + 2 in x + y evaluates to the value 4.

Proof: By the definition of the operational semantics of ML:

    let x = 1 in let y = x + 2 in x + y
--> let y = 1 + 2 in 1 + y
--> let y = 3 in 1 + y
--> 1 + 3
--> 4
QED.

That was a good proof (of a pretty easy theorem). Each line was justified by the operational semantics of ML. Evaluating expressions using the operational semantics is always going to be at the heart of our proofs, but we need a few more ideas to prove interesting results about our programs. In particular, rather than simply proving that a function operates correctly on a single input (by evaluating the function as applied to that input using the operational semantics), we will look at proofs that show that a function operates correctly on all inputs. These latter kinds of proofs allow us to show once and for all, for instance, that a sorting function sorts every list we supply it with properly. In general, we need to be able to reason that small parts of our programs (like the sorting function) behave correctly on all inputs. Then we will build up assurance that our overall application, which uses these many small pieces, behaves correctly.

Review: Substitution Notation

Recall the substitution notation that we intoduced in the previous note. In general, the notation looks like this:

e1[e2/x]

It stands for the expression you get when you systematically replace all free occurrences of the variable x in e1 with e2. For example if we have an expression like (x + 7*x) and we systematically replace x with (2+3), we wind up with ((2+3) + 7*(2+3)). In other words:

(x + 7*x)[2+3/x]   ==   ((2+3) + 7*(2+3))

As another example:

  (let y = 13 + x in let x = 2 + y in x + y)[7/x]
==
   let y = 13 + 7 in let x = 2 + y in x + y

We do not replace the 2nd occurrence of x with 7 because that second occurrence of x refers to the x bound by the inner let statement. In other words, it refers to 2 + y.

Substitution is a critical operation because it defines how functions and let statements and pattern matching execute, and when it comes down to it, all our reasoning about programs comes back to reasoning about how those programs execute.

Equational Reasoning

The kind of reasoning we will performing about our programs is called equational reasoning. In other words, we will be proving that one expression is equal to another. For example, we might want to prove that for any list l,

l == reverse (reverse l)
In other words, for any list l, the reverse of the reverse of a list l is equal to l.

What does it mean for two expressions to be equal to one another?

Definition [Equivalent Expressions]: Two expressions are equivalent if and only if

  • the both evaluate to the same value, or
  • they both loop forever, or
  • they both raise the same exception.

In order to actually prove complex expressions are equal to one another, we can rely on the following facts about expression equivalence.

  1. reflexivity: for any expression e, e == e
  2. symmetry: for any expressions e1 and e2, if e1 == e2 then e2 == e1
  3. transitivity: for any expressions e1, e2 and e3, if e1 == e2 and e2 == e3 then e1 == e3
  4. congruence: for any expressions e1, e2 and e, if e1 == e2 then e[e1/x] == e[e2/x]
  5. alpha-equivalence: for any expressions e1 and e2, if e1 differs from e2 only by consistent renaming of variables then e1 == e2
  6. syntactic sugar: a let declaration such as this one:
    let f x = e;;
    
    is equivalent to an ordinary let declaration for an anonymous function:
    let f = fun x -> e;;
    
  7. eval: for any expressions e1 and e2 if e1 --> e2 then e1 == e2
In addition to the above general properties of expression equivalence, we will also use well-known properties of built-in ML operations. For example, addition is commutative so:
x + y == y + x
no matter what x and y are.

Here are some simple examples of equivalent expressions using the rules above, with justifications to their right. We will refer to previous things we have proven using the numbers (1), (2), (3), etc. We write "eval" when expressions are equivalent by evaluation. We write "alpha" when they are equivalent by alpha-conversion.

(1)  1 == 1                                        (reflexivity)

(2)  let x = 3 in x + x == let x = 3 in x + x      (reflexivity)

(3)  1 + 2 == 3                                    (eval)

(4)  3 == 1 + 2                                    ((3), symmetry)

(5)  -5 + 8 == 3                                   (eval)

(6)  -5 + 8 == 1 + 2                               ((4), (5), transitivity)

(7)  let x = 1 in x == let y = 1 in y              (alpha)

(8)  1 + 2 == (fun x -> 1 + x) 2                   (eval, symmetry)

(9) let f x = x+1 in f 3
     == let g = fun y -> y+1 in g 3                (alpha (twice), syntactic sugar)


(10) (fun x -> 1 + x) 2 == -5 + 8                  ((6), (8), transitivity, symmetry)

(11) (-5 + 8) * y == (1 + 2) * y                   ((6), congruence)
The last equivalence is the most fun. It uses the congruence rule and it considers an expression with a free variable in it -- the variable y. The reasoning goes like this: since we know that (-5 + 8) is equivalent to (1 + 2) (by line 6), we can substitute those equivalent expressions in to some other bigger expression (in this case, substitute them for x in the expression x * y) and get an equivalent expression. Notice that it does not matter that we do not know what y is. No matter what value it takes on, the two expressions are equivalent. Here is another example of using congruence:
   f (z + 0) == f z
In the case above, we know that z is equivalent to z + 0 as this is a property of the built-in addition operation. Hence, by congruence, we can substitute one expression for the other. (It does not matter what z is or what the function f does.)

Finally, let's look at an ever so slightly more interesting theorem involving the easy function defined below.

let easy (x:int) (y:int) (z:int) : int =
  x * (y + z)
;;

Theorem: for all integer values a, b, c, easy a b c == easy a c b.

Proof: By simple equational reasoning.

   easy a b c
== a * (b + c)        (eval)
== a * (c + b)        (property of +, congruence)
== easy a c b         (eval, symmetry)

QED

Above, through a series of steps, and use of transitivity, we conclude that easy a b c == easy a c b. Notice that when we moved from the 2nd last line to the last line of the proof, we performed "evaluation in reverse." In other words a * (c + b) is the body of the easy function (with a, c and b substituted for x, y, and z) and we claimed that was equivalent to easy a c b. Evaluation in reverse is justified by evaluation in the normal direction plus the rule of symmetry.

Now that you have the basics down, we can start to move a little faster. We will cease to mention uses of basic rules such as symmetry, transitivity or congruence and only mention the "interesting" parts of the proof. For instance, we will streamline the proof above as follows:

Proof: By simple equational reasoning.

   easy a b c
== a * (b + c)        (eval)
== a * (c + b)        (math)
== easy a c b         (eval)

QED

Even though we have slimmed down our proofs a little, we will always write our proofs in 2-column style. The first column includes the facts that we have proven. The second column has a justification. The justification explains why were able to conclude that line was true. Proofs in this style are very easy to read and hence are easier to get right than other styles. Sometimes we want to refer to lines of the proof that appear earlier (ie: not just the previous line). In this case, we will number our lines. For example:

Proof: By simple equational reasoning.

(1)      easy a b c
(2)   == a * (b + c)        (eval)
(3)   == a * (c + b)        (math)
(4)   == easy a c b         (eval)

QED

The Value of Values

Before we can move on to doing some really interesting proofs, there is one more basic concept that we must learn. As you know, the operational semantics for ML programs state that you may call a function, substituting the arguments for the parameters provided those arguments have been evaluated to a value.

let inc (x:int) : int = x + 1;;
For instance, we know that:
inc 3 --> 3 + 1
      --> 4
And consequently, we may conclude that inc 3 == 4 by the (eval) rule for expression equivalence. But what about the expression inc (y + 3) where y is a free integer variable? We might like to reason as follows:
   inc (y + 3) 
== (y + 3) + 1 
== y + 4
Indeed, this sort of proof is valid. It is valid because we know that no matter what value we substitute for y, y + 3 will evaluate to some other value (call it v) and then we can apply inc to v and continue.

That kind of reasoning leads one to believe that perhaps it is the case that if you have a function and an argument expression to that function then you can always substitute the argument expression for the parameter in the body of the function. In other words, one might conjecture that the following equational rule is true:

  • Bad Eval Rule: For any expression exp and for any function f defined as follows
    let f x = body;;
    
    The following expressions are equivalent
    f exp == body[exp/x]
    

Since I called the rule the Bad Eval Rule, you have a pretty big clue that the above rule is NOT a valid equivalence rule. Here is a counter-example. Consider the function forever, which loops forever and the function one, which, when called, always returns the value 1:

let rec forever () : int = forever ();;

let one (x:int) : int = 1;;
Now, using the Bad Eval Rule we can immediately prove that:
one (forever ()) == 1
But that is obviously incorrect because the left-hand side of the equation loops forever and never returns a value whereas the right-hand side returns the value 1 immediately. According to our basic definition of equivalence, the two are not equal.

Clearly, the problem with the Bad Eval Rule is that it allows any expression to be substituted for a function argument. If it was a bit more restrictive and only allowed for substitution of those function argument expressions that are guaranteed to always terminate and produce a value then it would be a good rule.

Definition [Valuable Expression] An expression that always terminates and produces a value is called a valuable expression (no pun intended).

The following expressions are all examples of valuable expressions:

  • constants: any constant (eg: 1, 2, "hello", 3.5, 'a', [], etc.) is valuable
  • values: any value (eg: (2,3), [14], None, fun x -> x + x) is valuable
  • variables: any variable x is valuable because such variables will be replaced by values when the code is executed.
  • pairs: if exp1 and exp2 are both valuable expressions then pair expression (exp1, exp2) is also valuable
  • lists: if exp1 and exp2 are both valuable expressions then list expression exp1::exp2 is also valuable
  • any other data type constructor: if exp is a valuable expression and C is a data type constructor (perhaps for an option or a tree or a set or a table or a shape or anything else) then C exp is also valuable
  • if statements: an if statement if exp then exp2 else exp3 is value if exp1, exp2 and exp2 are all valuable expressions. (pattern matching statements are similar if they are not missing any pattern matching branches).
  • total functions: if f is a total function and exp is a valuable expression then f exp is a valuable expression as well.

In the last case, we referred to the idea of a total function.

Definition [Total Function] A total function is a function that when supplied with any argument (of the correct type) always terminates and returns a value.

Definition [Partial Function] A partial function is a function that when supplied with some argument (of the correct type) will fail to terminate or will raise an exception.

For example, the function inc defined above is a total function --- it always terminates and adds one. Many of the built-in operations such as +, -, and ^ are total functions. However, some are not. For instance, the function List.hd, which returns the first element of a list, is a partial function as it raises an exception when its argument is the empty list. In general, we discourage your from using partial functions like List.hd, because they are harder to reason about -- you have to be careful that you don't miss a case. Using pattern matching (with an exhaustive set of cases) is preferable. Of course, one must use judgement. There are some effective uses of exceptions --- one just does not want to overuse them.

Recursive functions can be total functions. For instance, a list-processing function like List.length is total -- it will terminate and produce an integer when given any list. Here is the definition of List.length:

let rec length (xs:'a list) : int =
  match xs with
    [] -> 0
  | hd::tail -> 1 + length tail
;;

We can tell that length is a total function because:

  • It does not itself call any partial functions.
  • It does not have any incomplete pattern matching.
  • Whenever length is called recursively, it does so on a strictly smaller argument (and there are no infinite sequences of lists that get smaller and smaller forever). In particular, in the second branch of the match statement, the recursive call length tail involves an argument to length (ie: tail) that is strictly smaller than the input, which is hd::tail.

Consequently, length terminates on all inputs, never raises an exception and always returns a value.

The following are examples of partial functions.

let rec forever (xs:'a list) : int =
  match xs with
    [] -> 0
  | hd::tail -> 1 + forever (1::tail)
;;

let rec oops (xs:'a list) (acc:int) (num:int) : int =
  match xs with
    [] -> acc / num
  | hd::tail -> oops tail (acc+hd) (num+1)
;;

Above, the function forever loops forever because we pass the recursive call an argument that is not shorter than the input list. Hence, forever is a partial function. (Note, it does terminate in one case -- when given the empty list -- but it must terminate in all cases to be total.) The function oops has a subtler problem. It might raise a divide-by-zero error if the denominator num turns out to be zero in the base case of the recursion. Therefore, oops is not total.

As an aside, one reason I dislike the design of Java is that most interesting expressions in Java, like method calls and field dereferences, are partial (not total) and might raise an exception because Java's type system does not distinguish between null and non-null values (like ML's type system distinguishes between values with option type and non-option type). In contrast, most (or at least many, if you follow the style advocated in this class) functions in ML are total. This is one of the reasons that Java is substantially harder to reason about than ML in theory, and in practice you see the same thing: lots of null pointer dereference bugs show up in real applications.

Finally, we can now state our good, valuable eval rule:

  • Eval Value Rule: For any valuable expression exp and for any function f
    let f x = body;;
    
    The following expressions are equivalent
    f exp == body[exp/x]
    

More generally, any property of values also holds for the corresponding valuable expressions. For instance, for any valuable expressions exp1 and exp2, we know all of the following equations hold (and lots more to boot):

arithmetic equations:

exp1 + exp1 == 2*exp1

or

0*exp1 == 0

(Any many other such equations.)

let evaluation:

let x = exp1 in exp == exp[exp1/x]

pair evaluation:

let (x,y) = (exp1,exp2) in exp == exp[exp1/x][exp2/y]

option evaluation:

   match (Some exp1) with None -> exp3 | Some x -> exp4 
== 
   exp4[exp1/x]

list evaluation:

   match (exp1 :: exp2) with [] -> exp3 | hd::tail -> exp4 
== 
   exp4[exp1/hd][exp2/tail]

data evaluation: For any constructor C:

   (match (C exp1) with 
     C x -> exp3 
   | D x -> exp4 
   | ...)
== 
   exp3[exp1/x]

Feel free to use any such equations in your proofs and refer to them as "eval value rules" or "valuable expression rules". To reemphasize: you can manipulate a valuable expression in your proof just like you manipulate any ordinary value (like a number or string or a pair).

An Example Proof Using Valuable Expressions

Recall the length function on lists:

let rec length (xs:int list) : int =
  match xs with 
    [] -> 0
  | hd::tl -> 1 + length tail
;;

Now, assume that functions f and g are total with type int -> int and prove the following statement for any integers m and n:

length (f m :: g n :: []) == 2

Note first that (f m) is valuable since f is total. Likewise (g n) is valuable since g is total. Consequently, both (f m :: g n :: []) and (g n :: []) are valuable since a list is valuable iff it is constructed exclusively from valuable expressions. With those facts in mind, here is the proof:

   length (f m :: g n :: [])

== match (f m :: g n :: []) with 
     [] -> 0 
   | hd::tl -> 1 + length tl                (by eval value
                                             since f m :: g n :: [] valuable)

== 1 + length (g n :: [])                   (by eval value
                                             since f m :: g n :: [] valuable)

== 1 + (match (g n :: []) with 
          [] -> 0 
        | hd::tl -> 1 + length tl)          (by eval value
                                             since g n :: [] valuable)

== 1 + (1 + length [])                      (by eval value
                                             since g n :: [] valuable)

== 1 + (1 + (match [] with 
               [] -> 0 
             | hd::tl -> 1 + length tl))    (by eval -- [] is a value)

== 1 + (1 + (0))                            (by eval -- [] is a value)

== 2                                        (by math)

A Proof about a List-Processing Function

It was easy to prove a property of all inputs to the easy function primarily because it was not a recursive function. When functions are recursive we can't just use a small number of evaluation steps to reach any interesting conclusions. List-processing functions are a good source of recursion, so we will explore some proofs about them.

As an example, let's consider the function double below.

let rec double (xs:int list) : int list =
  match xs with
    [] -> []
  | hd::tail -> hd::hd::(double tail)
Now let's prove this theorem:

Theorem: for all integer lists xs, length (double xs) == 2 * (length xs).

To do so, recall the definition of lists: Every list xs is either:
  • []: the empty list
  • y::ys -- a list with at least one element (y) and a tail (ys)
In order to perform the proof, we will split the proof in to two cases -- one case for the empty list and one case for a list with a least one element. We will do each case of the proof separately. (When we are done both cases, we will have covered all possible kinds of lists.) The detailed proof of each case mostly involves doing the kind of equational reasoning we have done before. However, there is one addition: when we do the proof for lists y::ys, we will assume that our theorem holds for the smaller list ys. In other words, we will assume that:
length (double ys) == 2 * (length ys)
when we are trying to prove that
length (double (y::ys)) == 2 * (length (y::ys))
This assumption is called the induction hypothesis (IH). Whenever we do a proof about a recursive function, we will rely on such an induction hypothesis.

Ok, here goes the first case of the proof. In this case, we are considering the possibility that xs is the empty list []. So we replace xs with [] in the theorem. Then we try to prove that the left-hand side of the theorem is equal to the right-hand side of the theorem. (Notice that when we start our proof, we describe the methodology that we use to do the proof -- in this case, a proof by induction on the structure of lists.)

Proof: By induction on the structure of lists.

case xs == []:

(1)      length (double [])

(2)   == length (match [] with [] -> [] | hd::tail -> hd::hd::double tail)
                                  (eval double)

(3)   == length []                (eval match expression)
(4)   == 0                        (eval length)
(5)   == 2 * 0                    (math)
(6)   == 2 * (length [])          (eval length in reverse)

We succeeded on the first case. Line (1) is the left-hand side of the theorem with xs replaced by [] and the last line (6) is the right-hand side of the theorem with xs replaced by []. Every line in the middle is justified by simple equational reasoning.

Let's move on to the second case where xs == y::ys for some values y and ys. In this proof, we are going to use 2 well-known properties of the length function. For any list expression e and any element, y we know that:

length (y::e) == 1 + length e
Hence, by applying that equation twice in a row, we also know that if we have two elements on the front of the list (say y1 and y2), we can conclude:
length (y1::y2::e) == 1 + 1 + length e
                   == 2 + length e

On to the proof:

case xs == y::ys:

(1)      length (double (y::ys))

(2)   == length (match y::ys with [] -> [] | hd::tail -> hd::hd::(double tail))       
                                         (eval double since y::ys is valuable)

(3)   == length (y::y::(double ys))      (eval match expression)
(4)   == 2 + length (double ys)          (property of length)
(5)   == 2 + 2*(length ys)               (IH)
(6)   == 2*(1 + length ys)               (math)
(7)   == 2*(length (y::ys))              (property of length)

Success! We showed that the left-hand side of the theorem we were trying to prove (with y::ys replacing xs) was equivalent to the right-hand side of the theorem (also with y::ys replacing xs). The key step was lines (4) to (5) where we used the induction hypothesis. Isolating those two lines, we showed:

(4)      2 + length (double ys)          
(5)   == 2 + 2*(length ys)               (IH)
We were able to do that since the induction hypothesis in this case specifically states that
length (double ys) ==  2*(length ys)
Hint: When doing your proofs, it often helps for you to write down all the available information, so we recommend that you always write down the inductive hypothesis that you are allowed to use before starting a particular case of a proof.

A General Schema for Proofs About Lists

In general, when we program with lists, we saw in the past that we often find ourselves writing functions with the form:

let rec f xs =
  match xs with
    [] ->  ... no recursive calls ...
  | hd::tail -> ... f tail ...
The double function was one example. (Note that f can be called on tail two times, three or more times or zero times in the recursive branch. It does not have to be exactly once for the pattern to work.)

Likewise, proofs about lists also have a typical form. Suppose we have a theorem of the form: "for all lists xs, some property P holds of xs". Property P might be a property about a function like f (or double) applied to the list xs. In such situations, we typically use induction and our proofs will be structured as follows:

case xs == []:

  ... must show that property P holds of [] ...

case xs == y::ys:

  ... must show that property P holds of y::ys ...
  ... this case uses the induction hypothesis, which states that P holds of ys ...

Another Example

Consider the functions sum_list and concat:

let rec sum_list (bs:int list) : int =
  match bs with
    [] ->  0
  | hd::tail -> hd + sum_list tail
;;

let rec concat (bs:int list) (cs:int list): int =
  match as with
    [] ->  cs
  | hd::tail -> hd::(concat tail cs)
;;

Now a theorem:

Theorem: for all integer lists xs and zs, sum_list (concat xs zs) == sum_list xs + sum_list zs

We will prove this theorem by induction. Since there are two lists involved (xs and zs), we should indicate the list that we will analyzing by cases --- formally, speaking, "the list over which we perform the induction." We will perform induction over the list xs. We will also use the fact that concat is a total function and therefore that concat exp is valuable for any valuable expression exp. Recall that a total function is any function that terminates and produces a value given any input.

case xs == []:

  Must Prove:   
     sum_list (concat [] zs) == sum_list [] + sum_list zs
   
  Proof:  
     sum_list (concat [] zs)
  == sum_list zs                 (eval concat)
  == 0 + sum_list zs             (math)
  == sum_list [] + sum_list zs   (eval sum_list in reverse)  

case xs == y::ys:

  Must Prove:   
     sum_list (concat (y::ys) zs) == sum_list (y::ys) + sum_list zs
  
  Induction Hypothesis:
     sum_list (concat ys zs) == sum_list ys + sum_list zs

  Proof:  
     sum_list (concat (y::ys) zs)
  == sum_list (y::(concat ys zs))   (eval concat -- y::ys is valuable)
  == y + sum_list (concat ys zs)    (eval sum_list -- y::(concat ys zs) is valuable 
                                                      because concat is total)
  == y + sum_list ys + sum_list zs  (IH)
  == sum_list (y::ys) + sum_list zs (eval sum_list in reverse)

QED!

Hint: You will notice above that we wrote down what we had to prove concretely for each case before we started (in the first case, we substituted [] for xs in the theorem and in the second case, we substituted y:ys for xs in the theorem). This is often a helpful trick.

Hint: You may start to see a pattern with some of these proofs. A common idiom is to use some amount of evaluation and/or math initially. Then use the induction hypothesis. Then use some amount of evaluation in reverse to sew things back together in the correct form.

Hint: You do not have to figure out a proof from the left to the right. You can start on the right and prove equalities, trying to come closer to the left-hand side. You can start on the left and prove equalities, trying to come closer to the right-hand side. You can start from both sides and try to meet in the middle. Once you have figured it out, if you have made a mess, you should clean it up and present it nicely as I have done in these notes.

Summary 1: Reasoning in General

In this note, we looked at how to do proofs about functional programs. More specifically, we looked at how to prove that two different functional programs are equivalent. These kinds of proofs use the following basic rules about equivalence:
  1. reflexivity: for any expression e, e == e
  2. symmetry: for any expressions e1 and e2, if e1 == e2 then e2 == e1
  3. transitivity: for any expressions e1, e2 and e3, if e1 == e2 and e2 == e3 then e1 == e3
  4. congruence: for any expressions e1, e2 and e, if e1 == e2 then e[e1/x] == e[e2/x]
  5. alpha-equivalence: for any expressions e1 and e2, if e1 differs from e2 only by consistent renaming of variables then e1 == e2
  6. syntactic sugar: a let declaration (or expression) such as this one:
    let f x = e;;
    
    is equivalent to an ordinary let declaration (or expression) for an anonymous function:
    let f = fun x -> e;;
    
  7. eval: for any expressions e1 and e2 if e1 --> e2 then e1 == e2
  8. valuability: Any equational property of values also holds of corresponding valuable expressions.
A particularly important special case of the valuability rule is the rule that states that we may subsitute any valuable argument applied to a function in to the body of the function. In other words, for any valuable expression e, and function f defined as let f x = body;; we know that f e == body[e/x].

Summary 2: Reasoning About Lists

In addition to using these general rules of equivalence, we also looked at the structure of proofs about list-processing functions. These list-processing proof were proofs by induction on the structure of the lists that were processed. In order to complete such a proof, it is typically necessary to use the induction hypothesis, which is a statement of the theorem being proven applied to a strictly smaller list than the list about which the theorm is being proven.

A key thing to remember is the general template for doing proofs about lists. When showing that a property P holds of all lists, the proof structure usually looks like the following:

Proof:  Proof by induction on the structure of the list xs.

case xs == []:

  To show: P([])

  ... proof that property P holds of [] ...

case xs == y::ys:

  To show: P(y::ys)

  IH: P(ys)

  ... proof that property P holds of y::ys ...
  ... uses the induction hypothesis, which states that P holds of ys ...

Keep in mind that just like when one is programming with lists, when one is proving with lists there are some other ways to break down the cases (eg: on occasion, one uses cases for [], [y], and y1::y2::ys, for instance and the induction hypothesis is used over ys in the latter case. The rule of thumb is that the induction hypothesis can only be used over strictly smaller lists than the input list.).