Problem Set 4: Proofs and Pattern Matching

Quick Links:


In this assignment, you will explore OCaml's execution model in further depth by extending the evaluator we looked at in class with pattern matching. You will also practice proving your functional programs are correct. You will do this assignment individually.

Getting Started

Download the code here. Unzip and untar it:

$ tar xfz a4.tgz
You should see the following files:

  •,, and
  • Makefile
  • proofs.txt

A few important things to remember before you start:

  • This assignment must be done individually.
  • As in the previous assignment, all of your programs must compile. Programs that do not compile will receive an automatic zero. Make sure that the functions you are asked to write have the correct names and the number and type of arguments specified in the assignment.
  • In this problem set, it is important to use good style (style will factor in to your grade). Style is naturally subjective, but the COS 326 style guide provides some good rules of thumb. As always, think first; write second; test and revise third. Aim for elegance.

Part 1: Pattern Matching (

The goal of this part of the assignment is to extend the interpreter for core OCaml explained in class with data types and pattern matching facilities. To prepare for this part of the assignment, read the online notes on OCaml operational semantics, substitution and scoping.

In addition, before tackling this problem, you should be sure that you understand how substitution and evaluation works in the simpler interpreters. You do not have to understand the code that converts expressions in to strings for the purpose of printing them. Though, of course, you will want to use it to facilitate debugging. (Moreover, if you ever want to build your own little language in OCaml, this is very handy reference code to show you how to print well-formatted expressions.) Working code for the simpler interpreters that you can play with has been included in the code bundle that you downloaded. The simpler, but correct, fully-implemented interpreters cover the following features:

  • Integers and let expressions:
  • Adding booleans and non-recursive functions:
  • Adding recursive functions:

We encourage you to use the top-level interpreter to explore a couple of these files. Try running the example programs through the interpreters given and then printing out the results. Try creating a few of your own simple programs by using the exp data type constructors. Then try running those through the interpreter. Feel free to ask questions in precept about the details of how these interpreters work.

Representing Data Types and Patterns

To represent an extended language with data types and pattern matching, we use the following data type definition.

type exp = 
  | Constant_e of constant
  | Op_e of exp * operator * exp
  | Var_e of variable
  | Fun_e of variable * exp
  | FunCall_e of exp * exp
  | Let_e of variable * exp * exp
  | Letrec_e of variable * exp * exp
  | Data_e of constructor * (exp list)        (* New! *)
  | Match_e of exp * ((pattern * exp) list)   (* New! *)

(* Patterns are either constants, variables, datatype constructors
 * applied to pattern arguments, or the underscore. *)
and pattern = 
  | Constant_p of constant
  | Var_p of variable
  | Data_p of constructor * (pattern list)
  | Underscore_p

To create (construct) a data type, we use the form:

Data_e (d,[e1;e2;...;en])
That will create the data type with constructor named d and arguments e1, ..., en. (This is really a lot like combining a standard OCaml data type, which always takes one argument, with a tuple.) For example, if you wanted to represent booleans true and false using data types, you would represent them using:
Data_e ("true",[])         (* represents true *)

Data_e ("false",[])        (* represents false *)
The arguments are just the empty list of arguments since true and false do not "carry" any additional values. If instead, you wanted to represent the elements of an option type, you would represent them as follows:
Data_e ("None", [])        (* represents None *)

Data_e ("Some", [e])       (* represents (Some e) *)
If you wanted to represent elements of a list, you would use:
Data_e ("Nil", [])         (* represents the empty list *)

Data_e ("Cons", [hd; tl])  (* represents hd::tl for some hd and tl *)

(* represents 2 and 3 respectively *)
let two = Constant_e (Int 2);;
let three = Constant_e (Int 3);;

(* represents 2::3::[] *)
let two_three =
  Data_e ("Cons", [two; 
    Data_e ("Cons", [three;
      Data_e ("Nil", [])])]) 

To create a pattern matching expression, we use the Match_e expression. Such expressions take an expression to match and a list of guards as arguments. Each guard is a pattern paired with an expression. So an OCaml match that looks like this:

 match e with 
   | p1 -> e1
   | p2 -> e2
   | ...
is represented as:
To evaluate such a statement, we evaluate e until we obtain a value v and then we one-by-one check if v matches one of the patterns p1, p2, .... We execute the first branch that has a matching pattern (i.e., we execute e3, if pattern p3 is the first to match). If there is no match, we raise the BadMatch exception.

For our little language, there are only 4 kinds of patterns: A constant pattern, a variable pattern, a datatype pattern and an underscore pattern. They work as follows.

  • Pattern (Constant_p c1): constant values (Constant_e c2) will match this pattern when the constant c1 is exactly equal to c2.
  • Pattern (Var_p x): any value will match this pattern. The expression in the corresponding branch of the match statement may use the variable x. When the match expression is evaluated, the matching value v will be substituted in to the branch expression for the variable x.
  • Pattern (Underscore_p): any value will match this pattern. Hence, it is similar to the Var_p pattern. However, it is strictly easier to handle since no substitution must be performed.
  • Data_p (cons1, [p1; p2; ...; pn]): a datatype value with the form Data_e (cons2, [v1; v2; ...; vn]) will match this pattern provided that cons1 and cons2 are the same constructor and v1 recursively matches the pattern p1 and v2 recursively matches p2, etc. The data constructor must carry exactly the same number of values (n) as the data pattern has subpatterns (n) --- otherwise, there is no match.

Here are a couple of examples of patterns for true and false:

Data_e("true",[])    matches   Data_p ("true",[])

Data_e("false",[])   matches   Data_p ("false",[])
Moreover recall our simple list two_three:
(* represents 2::3::[] *)
let two_three =
  Data_e ("Cons", [Constant_e (Int 2); 
    Data_e ("Cons", [Constant_e (Int 3);
      Data_e ("Nil", [])])]) 
It matches this pattern:
(* represents hd::tl *)
let list_pat = Data_p ("Cons",[Var_p "hd"; Var_p "tl"]);;
and when two_three matches list_pat the following substitutions should occur:

  • Constant_e (Int 2) should be substituted for variable "hd"
  • Data_e ("Cons", [Constant_e (Int 3); Data_e ("Nil", [])])]) should be substituted for "tl"

Tasks for Part 1

Your tasks for this part are as follows:

  • Write an exp named increment_all. This exp will represent a piece of code that declares a recursive function to add one to every element of a list and then applies that function to the list [1;2]. Follow the style of the example appendit given in Please see the very bottom of to complete this first task. When you complete the other elements of Part 1, you can test your work by trying your evaluator out on this example. (Of course, you will also want to construct other test cases as well.)
  • Complete the definition of the substitute function by replacing the "unimplemented()" expressions with correct implementations of substitution for data type and match expressions. For match expressions, if you are substituting a value v for a variable x and the variable x also appears as a variable in a pattern then substitution should cease at that point --- in other words, you should not substitute v in to the corresponding branch.
  • Complete the definition of the eval function. The most challenging aspect of this part is the proper implementation of pattern matching. For example, consider the following code:
    match e1 with
     pat -> e2
    Here, suppose e1 evaluates to the value v1. You will need to determine whether v1 matches the pattern pat and if it does match, you will need to perform the correct substitutions on expression e2. In OCaml, the same variable cannot appear more than once in a pattern. OCaml prevents this through the use of type checking. If a variable does occur more than once in one of your patterns (because we are not type checking patterns or any other aspect of the code), you may deal with it in any way you choose. For instance, you may raise an exception or you may say that the first or last variable that you encounter when processing such a pattern is the only one that "counts" and all others are treated as the underscore pattern. Write a comment explaining what your code does when it encounters multiple identical variables in a pattern.

Other comments and information can be found in the file. Please edit, complete and test that file to create a working interpreter. To complete some of the tasks above, you may want to create additional helper functions. Please go ahead and do so.

Part 2: Program Correctness

In this section, you must answer several theoretical questions. You must present your proofs in the 2-column style illustrated in class with facts that you conclude (like a new expression being equal to a prior one) on the left and a justification (like by "evaluation" or "simple math") on the right. Hand in the text file proofs.txt filled in with your answers.

Part 2.0

To prepare for this part of the assignment, read the online notes on equational reasoning. Hand in proofs in the style presented in those notes.

Part 2.1

For each of the following functions, either:

  • explain why it is total in a sentence, or
  • prove that it is not total by giving a counter-example.
You may use the fact that addition (+), equality (=) and inequality (<) on integers are total functions.
let inc x = x + 1;;			(* a *)

let biginc x = x + max_int;;  		(* b *)

let bigdiv x = max_int/x;;    		(* c *)

type tree = 
| Node of int * tree * tree

let rec search ((t,n) : tree * int) =	(* d *)
  match t with
    Node (x, left, right) -> 
      if n = x then Some x
      else if n < x then search (left, n)
      else search (right, n)
  | Leaf -> None

let twice (f: int -> int) : int =	(* e *)
  f (f 0)

let gen () =				(* f *)
  fun x -> failwith "hahaha"

Part 2.2: Complex Numbers

Let us represent complex numbers as a pair of integers. Consider the following addition function that adds the corresponding components of the pairs.

type complex = int * int

let cadd (c1:complex) (c2:complex) = 
  let (x1,y1) = c1 in
  let (x2,y2) = c2 in
  (x1+x2, y1+y2)

Prove that for any three complex numbers, a, b, c : complex, complex addition is associative:

cadd a (cadd b c) == cadd (cadd a b) c

Part 2.3: A Minimal Proof

Consider the following code (and please note that min_int is a constant with type int that OCaml provides you with):

let max (x:int) (y:int) = 
  if x >= y then x else y

let rec maxs (xs:int list) =
  match xs with
      [] -> min_int
    | hd::tail -> max hd (maxs tail)

let rec append (l1:'a list) (l2:'a list) : 'a list =
  match l1 with 
    [] -> l2
  | hd::tail -> hd :: append tail l2

You may use the fact that max, maxs and append are all total functions. You may also use the following simple properties of max without proving them:

For all integer values a, b, c:

(commutativity)  max a b == max b a

(associativity)  max (max a b) c == max a (max b c)

(idempotence)    max a (max a b) == max a b

(min_int)        max min_int a == a

Now, prove that for all integer lists xs and ys,

max (maxs xs) (maxs ys) == (maxs (append xs ys))

Your proof is by induction on the structure of the list xs. Mention where you use properties such as associativity, commutativity, idempotence or min_int, if and when you need to use them. Also state where it is necessary to appeal to the idea that some expression is valuable or some function is total (if necessary).

Part 2.4: Map and back

In previous assignments, we sometimes asked you to write recursive functions over lists using map and fold and some times without. Now you can prove that it doesn't matter which way you choose to write your function -- the two styles are equivalent.

let rec map (f:'a -> 'b) (xs:'a list) : 'b list =
  match xs with
      [] -> []
    | hd::tail -> f hd :: map f tail

let bump1 (xs:int list) : int list = 
  map (fun x -> x+1)  xs

let bump2 (xs:int list) : int list =
  match xs with
    [] -> []
  | hd::tail -> (hd+1) :: map (fun x -> x+1) tail

let rec bump3 (xs:int list) : int list =
  match xs with
    [] -> []
  | hd::tail -> (hd+1) :: bump3 tail

(a) Prove that for all integer lists l, bump1 l == bump2 l.

(b) Prove that for all integer lists l, bump1 l == bump3 l.

(c) In one sentence, what's the big difference between the part (a) and part (b)?

Part 2.5: Zippity do da

Consider the following functions:

let rec zip (ls:'a list * 'b list) : ('a * 'b) list =
  let (xs,ys) = ls in
  match (xs,ys) with
      ([],_) -> []
    | (_,[]) -> []
    | (x::xrest, y::yrest) -> (x,y)::zip (xrest,yrest)

let rec unzip (xs:('a * 'b) list) : 'a list * 'b list =
  match xs with
      [] -> ([],[])
    | (x,y)::tail -> 
      let (xs,ys) = unzip tail in
      (x::xs, y::ys)

Prove or disprove each of the following.

(a) For all l : ('a * 'b) list, zip(unzip l) == l.

(b) For all l1 : 'a list, l2 : 'b list, unzip(zip (l1,l2)) == (l1,l2).

Handin Instructions

This problem set is to be done individually.

You must hand in these files to dropbox (see link on assignment page):

  2. proofs.txt