Power
Thinking Recursively
We do not need recursive functions to analyze data structures such as tuples, pairs and options  we only need to do a small, finite, and predictable amount of work to extract all the information contained in such structures. On the other hand, data structures such as lists, trees and graphs may be arbitrarily large. They are recursively defined data structures, and we do need recursive functions to get at all the information they contain.
In this respect, natural numbers are quite similar to lists and trees, though you might not have thought of them this way before. Like lists, natural numbers may be understood as recursive data. Hence it is entirely sensible to analyze these recursive structures using recursive functions.
To define a recursive function, you must use a let rec
declaration instead of a simple let
declaration.
let rec f (x:t) : t' = ... f (e) ...If you are trying to define a recursive function like this one:
let sum (n:int) : int = if n <= 0 then 0 else n + sum (n1) ;;But find yourself flummoxed by an error message like this one:
File "sum.ml", line 2, characters 2831: Error: Unbound value sumThen you probably forgot to put the
rec
keyword in
indicating you wanted a recursive function instead of a nonrecursive one.
Reasoning about Recursion: First Steps
How do we convince ourselves that a recursive function produces a correct result? Consider f below.
let rec f (x:t) : t' = ... f (e) ...In general, we assume any result produced by the recursive call f(e) is correct, and on the basis of that assumption, we prove that any result generated by the rest of the body (
... f(e) ...
) is also correct.
Now, this notion of correctness is sound but limited 
it does not characterize the time or space used by the function.
It does not even guarantee that the function terminates. For
the latter, we must ensure no argument ever leads to an
infinite sequence of recursive calls.
More on each of these issues later in the course.
Integers and Natural Numbers
The boolean type, the first basic type we examined, contained just two values. O'Caml's integer type contains a whole lot more values (2^31 values on a 32bit machine and 2^63 values on a 64bit machine). Still, the basic programming paradigm does not change: When given an integer input, pattern match to analyze your input then based on the information extracted, construct your result. Of course, writing down 2^63 patterns for all the integers (or even just 2^31 patterns on a 32bit machine) will take you quite a long time! So, when dealing with integers, one must partitition the input space in to suitable subsets for processing.
Example. The natural numbers (0, 1, ...) are a subset of the integers. Write a function, which given a natural number n, sums the naturals between 0 and n.
To begin, we write our function name, types, comment and tests. Moreover, because there is no builtin type for natural numbers, only integers, we start our function with an assertion to ensure the function is only called with a natural number argument. (In the future, we will be able to define our own abstract type of natural numbers  a superior solution in many situations; for now an assertion suffices to augment the type declaration.)
(* the sum of 0..n; n must be a natural number *) let sum_to (n:int) : int = assert(0 <= n); ... ;; assert (sum_to 0 = 0);; assert (sum_to 3 = 6);;
Next, we need to deconstruct our natural number input. But how? One way is to observe that every natural number n is either:
 0, or
 m+1 for some other (smaller) natural number m.
Thereom 1: For all natural numbers n, if n is not 0 then n1 is a (smaller) natural number.
Proof: According to our definition, since n is not 0, it must be m+1 for some natural number m. And (m+1)1 is just the natural number m. Clearly, m is smaller than m+1.
That was easy to prove, but it's a thereom we use a lot when programming with natural numbers, so it's good to know it is true! With this information in hand, here is the code for our sum function.
(* the sum of 0..n; n must be a natural number *) let rec sum_to (n:int):int = assert(0 <= n); match n with 0 > 0  _ > n + sum_to (n1) ;;
The two patterns are 0 and _. In the first case, the sum of 0..0 is
just 0, so we return that value. In the second case, the sum of
0..n is n plus the sum of 0..n1 so we compute n + sum_to (n1)
.
The first thing to notice about this second branch is that we had to
convince ourselves that the precondition of sum_to
is satisfied or
else its assertion might fail  fortunately,
according to our theorem, the precondition is satisfied
(n1 is a natural number).
The second thing to notice is that we merely assumed that
sum_to
was implemented correctly (ie, that
sum_to (n1)
returns the
sum of 0..n1) and used that assumption to convince ourselves
that the second branch as a whole returns a correct result.
The third thing to notice is that every recursive call operates over a smaller natural number (one smaller to be precise). Hence, no matter what natural number we start with, only finitely many recursive calls may be made before the recursion bottoms out. The function will always terminate.
In general, whenever one wants writes a function over a natural number input, one might consider using the following function schema.
let rec f (n:int) : int = assert(0 <= n); match n with 0 > ... no recursive calls to f ...  _ > ... f (n1) ... f (n1) ... ;;
Moving on, we can observe that there are a bunch of other useful theorems about naturals we might use when programming:
Thereom 2: For all natural numbers n, if n is not 0 and not 1 then n2 is a (smaller) natural number.
Thereom 3: For all natural numbers n, if n is not 0 then n/2 is a (smaller) natural number.
Each such theorem leads to a different recursion scheme over the natural numbers. For instance, Theorem 2 leads to this scheme:
let rec f (n:int) : int = assert(0 <= n); match n with  0 > ... no recursive calls to f ...  1 > ... no recursive calls to f ...  _ > ... f (n2) ... f (n2) ... ;;
Theorem 3 leads to the following recursion scheme.
let rec f (n:int) : int = assert(0 <= n); match n with  0 > ... no recursive calls to f ...  _ > ... f (n/2) ... ;;
In general, if you consistently call f recursively with a smaller argument (and convince yourself that the argument is indeed a natural number, as opposed to, perhaps, a negative integer) then your function will always terminate and the assertion will never fail.
for(int i=n;i>=0;i){...}
, which
computes some natural number (or string or list or ...) result.
Translate this loop in to a recursive function
over the naturals in O'Caml. Now try translating an ascending loop
for(int i=0;i<=n;i++){...}
in to a recursive function.
Now, instead of adding or subtracting just 1, add or subtract k, for some k.
Lists
Lists are structurally very similar to natural numbers. Every list has one of two forms:

[ ]
 an empty list 
hd::tail
 a nonempty list with first elementhd
followed by some other (smaller) listtail
.
And there are no other lists. Programming with lists is even easier than programming with natural numbers. When we programmed with natural numbers, we had to be very careful that we supplied a natural number to our recursive function to avoid causing an assertion failure. (We used a couple of theorems to convince ourselves this was true.) However, when we program with lists, the type system will automatically tell us whether we do or do not have a list  we can't get this aspect of our program wrong.
Thought experiment for Java or C programmers or C++ programmers: When you first programmed in Java or C, was it easier to program with data structures like lists or was it easier to program with numbers? Why was that? What kinds of trickiness arises when dealing with lists in these other languages?
Example. Write a function that given a list of pairs, produces a list of the products of those pairs. To start, we write down the function name, types and a few examples.
let rec prods (xs : (int * int) list) : int list = ... ;; assert(prods [] = []);; assert(prods [(2,3); (4,7); (5,2)] = [6; 28; 10]);;
Now, to write the body of the function, we decompose the input list in to two cases.
let rec prods (xs : (int * int) list) : int list = match xs with [] > ...  hd::tail > ... ;;
Next, filling in the case for the empty list is easy (we construct the empty list as a result). For the second case, we realize that each element of the list is a pair by looking at the type of the argument, so we can refine the pattern in that second case before proceeding.
let rec prods (xs : (int * int) list) : int list = match xs with [] > []  (x,y)::tail > ... ;;
Finally, we must construct a result list in that second branch.
The first element of that result list will be x*y
.
To construct the rest of the list we assume that prods works
correctly on the (smaller) list tail, multiplying all its elements
together.
The resulting code follows.
let rec prods (xs : (int * int) list) : int list = match xs with [] > []  (x,y)::tail > (x*y)::prods tail ;;
Example. Write a function that takes two lists as arguments and returns an optional list of pairs. Return None if the lists have different lengths. Return Some if the lists have the same length.
let rec zip (xs : int list) (ys : int list) : (int * int) list option = ;; assert (zip [] [] = Some []);; assert (zip [2] [] = None);; assert (zip [] [2] = None);; assert (zip [2;3] [4;5] = Some [(2,4);(3,5)]);;
We have two inputs to this function and we must analyze both them. Each list may be empty or it may be nonempty. If we consider each combination of empty and nonempty separately, there are four cases.
let rec zip (xs : int list) (ys : int list) : (int * int) list option = match (xs,ys) with ([], []) > ...  (x::xtail,[]) > ...  ([],y::ytail) > ...  (x::xtail,y::ytail) > ... ;;
By the way, when we wrote (xs,ys)
in the match statement,
we were constructing a pair of lists from the separate inputs
xs
and ys
. This is why the resulting
patterns are pair (of lists) patterns.
Now, we fill in each of the four cases. In the last case below, we assume zip operates correctly on a pair of shorter lists. In addition, since zip returns an optional value, we pattern match on the result of zip, acting differently depending on whether the recursive call returns None or Some.
let rec zip (xs : int list) (ys : int list) : (int * int) list option = match (xs,ys) with ([], []) > Some []  (x::xtail,[]) > None  ([],y::ytail) > None  (x::xtail,y::ytail) > (match zip xtail ytail with None > None  Some zs > Some ((x,y) :: zs)) ;;
By the way, notice that I surrounded the inner match statement with parentheses. It is always a good idea to surround inner match statements with such parentheses because otherwise O'Caml can sometimes get confused about whether certain branches of a match belong to the inner or outer match statement. For more tips, see the O'Caml style guide.
Summary
Recursive functions often arise when one must process recursive data. Both natural numbers and lists may be viewed as recursive data. Indeed, they are structurally very similar types:
 Natural Numbers:
 0 is a natural number
 1+m is a natural number when m is a natural number
 Lists:
 [] is a list
 hd::tail is a list when tail is a list.
The similarity in the structure of the values of each type leads to (somewhat) structurally similar programs.