Caml
Power

Acknowledgement: This note created by Pramod Subramanyan and David Walker.

Polymorphism and Higher-Order Programming

Good programmers are lazy: they never writes the same piece of code twice. Instead they strive to factor out the common bits in to meaningful, reusable components. Write a component once, find and fix all bugs, and use it many times. That is the path to becoming efficient programmer. If you need to update the component, perhaps for performance or to fix an error, it only needs to be updated in one place.

In OCaml, higher-order and polymorphic functions are powerful tools for code reuse. Higher-order functions are those functions that take other functions as arguments or return functions as results, while polymorphic functions are functions that act over values with many different types. Together, they enable a great deal of code reuse. In this lecture, we will look specifically at how to use higher-order and polymorphic functions to represent complex, recursive, control-flow patterns.

Higher-Order Programming

Consider the following two functions which (a) increments all the elements in a list and (b) squares all the elements in a list.

let rec inc_all (xs:int list) : int list = 
  match xs with 
  | [] -> []
  | hd::tl -> (hd+1)::(inc_all tl)

let rec square_all (xs:int list) : int list =
  match xs with
  | [] -> []
  | hd::tl -> (hd*hd)::(square_all tl)

The only difference between inc_all and square_all is in the expressions hd+1 and hd*hd --- the other parts of these functions are exactly the same. OCaml's higher-order functions make it easy to extract these commonalities out in to a reuseable component. Below, we present the map function, which applies its argument f to all elements of a list.

let rec map (f:int->int) (xs:int list) : int list = 
  match xs with 
  | [] -> []
  | hd::tl -> (f hd)::(map f tl)

map is one of the most ubiquitous OCaml functions --- you should get used to reading and writing programs that use it. With map, recursive functions like inc_all and square_all become simple, non-recursive one-liners as shown below.

let inc x = x+1
let inc_all xs = map inc xs

let square y = y*y
let square_all xs = map square xs

Anonymous Functions

When programming with higher-order functions like map, one has a tendency to need many little functions like inc and square, which are then often only used once. Rather than defining a named function to be used just once, we can define it without a name and use it in place. For instance we would usually write inc_all and square_all as follows.

let inc_all xs = map (fun x -> x + 1) xs
let square_all xs = map (fun y -> y * y) xs

The expression fun x -> x + 1 is an anonymous function that takes one argument (x) as input and returns x+1 as a result. One can also define multi-argument functions using the syntax

fun x y z -> x + y * z
. However, one cannot define recursive functions --- one must have a function name for that.

Conceptually, anonymous functions are no more complicated than anonymous numbers (like 3 or 4), anonymous strings ("hello") or any other anonymous value. It would certainly be annoying if instead of writing:

print_string ("hello" ^ " " ^ "world")
one had to explicitly bind names to each of the strings first:
let hello = "hello" in
let space = " " in
let world = "world" in
print_string (hello ^ space ^ world) 

Why should function values be treated differently from other values like integers or strings? They shouldn't!

Non-anonymous (Conspicuous?) Functions as Anonymous Functions

It turns out that the function definitions we have been using so far are actually abbreviations. The following code:

let square x = x*x 
let add x y = x+y 
is just syntactic sugar for:
let square = (fun x -> x*x) 
let add = (fun x y -> x+y) 

With this in mind, it is easy to see that several of the functions we have written earlier are equivalent:

let square x = x*x in
map square xs

==

let square = fun x -> x*x in
map square xs

==

map (fun x -> x*x) xs
The 3rd expression is derived from the second by substituting fun x -> x*x for the variable square.

A comment on style: One must be somewhat careful with anonymous functions. They are great when one needs to define a small function (like square or increment) that is used once. However, if one must define a larger function, it is typically better to give it a name, because it will be easier for a colleague or teammate to read. Use your judgement and for more tips, see our style guide.

Polymorphic Functions

map seems like a pretty great function until we stumble across div_all:

let rec div_all (xs:float list) : float list =
  match xs with
  | [] -> []
  | hd::tl -> (hd /. 2.0)::(div_all tl)
Once, again, the code of div_all is almost identical to the code of square_all or inc_all and it seems like we should be able to implement this using map, but we can't. map operates over integer lists whereas we need a function that operates over floating point lists. Fortunately, we can redefine map to make it more general. There is no reason to constrain map to operate over integers alone, we can define it to work over lists with elements of any type 'a and transform them in to lists of any other type 'b.
let rec map (f:'a -> 'b) (xs:'a list) : 'b list = 
  match xs with 
  | [] -> []
  | hd::tl -> (f hd)::(map f tl)

In general, in OCaml, whenever a type name is preceded by an apostrophe (as in 'a and 'b), it is a type variable that may stand for any type. If one were to write out the full type of map, it would be the following:

map : ('a -> 'b) -> 'a list -> 'b list
We would read the type as saying "for all types 'a and 'b, map takes a first argument with type 'a -> 'b, a second argument with type 'a list and produces a result with type 'a list." To understand how we might use a polymorphic value like map, we can substitute any concrete type we like for the type variables that appear in map's type. For instance, if we substitute int for 'a and bool for 'b in the type of map, we wind up with a type like this:
(int -> bool) -> int list -> bool list
Consequently, we could use map at that type in the following expression:
let pos : int -> bool = fun n -> n > 0 in
map pos [1;2;3;-1;-2;-3]

Alternatively, if we substitute float for 'a and float for 'b in the type of map, we wind up with a type like this:

(float -> float) -> float list -> float list
We can use map at that type to implement div_all:
map (fun x -> x /. 2.0) [5.0; 7.0; -3.2]

Finally, there is nothing stopping us from substituting arbitrarily complex types like list types and option types and tuple types and other function types for the type variables 'a and 'b. For instance, below, we substitute the type int list for 'a and also for 'b.

map (map (fun x -> x + 1)) [[2]; [4;5]] 

A Generic Reducer

The higher-order function map implements one very common recursion pattern over lists, but there are more. Consider the following two functions. What do they have in common?

let rec sum (xs:float list) : float = 
  match xs with 
  | [] -> 0.0
  | hd::tl -> hd +. (sum tl)


let rec all_pos (xs:int list) : bool = 
  match xs with 
  | [] -> true
  | hd::tl -> (hd > 0) && (prod tl)

Both functions are defined using two cases -- one base case for the empty list, and one recursive case for a non-empty list. The base case returns a specific, pre-determined value. The recursive case makes a recursive call over the tail of the list and uses the result of that recursive call, together with the head of the list in a computation that produces the final result for that case. To capture this recursion pattern, we will define a function reduce that has the following property.

reduce f u [x1; x2; x3; ...; xn]
==
f x1 (f x2 (... (f xn u)))
For instance:
reduce (+.) 0.0 [1.0; 2.0; 3.0]
==
1.0 +. (2.0 +. (3.0 + 0.0))
or
let pos (x:int) (b:bool) : bool = (x > 0) && b) 

reduce pos true [1; 2; 3]
==
pos 1 (pos 2 (pos 3 true))
Here is our definition.
let rec reduce (f:'a -> 'b -> 'b) (u:'b) (xs:'a list) : 'b = 
  match xs with
  | [] -> u
  | hd::tl -> f hd (reduce f u tl)

A Note on MapReduce

It is worth noting that the functions very similar to the map and reduce we've defined above are the basis of Google's MapReduce framework. If you're interested in learning more, this paper from OSDI 2004 and a related paper from HPCA 2007 are good places to start reading.

Curried Functions and Partial Application

It turns out that all functions in OCaml are unary (1-argument) functions! But how can this be? Didn't we just write functions with 2 and 3 three arguments while writing map and reduce?

The following declaration of a function that seemingly accepts two arguments:

let add = (fun x y -> x+y)

is actually shorthand for the following.

let add = (fun x -> (fun y -> x+y))

Let's parse the complicated definition of add. We will start from the inside and work our way out. The innermost expression fun y -> x + y declares a function that takes as argument an integer y and returns the integer x+y.

Where does x come from? We see that x is bound from the definition of outer function (fun x -> (fun y -> x+y)). So the way to understand this is that this expression creates a function that takes a single argument x and returns the function fun y -> x + y. In other words, add is itself a single argument function and when applied on an argument x, it turns returns another single argument function that adds x to the argument supplied (y) to the latter function.

This is a slightly subtle concept, so let's look at an example OCaml session that might help explain it.

# let add = (fun x -> fun y-> x + y);;
val add : int -> int -> int = 
# add;;
- : int -> int -> int = 

The session shows that add has type int -> int -> int which we now realize means that it is of type int -> (int -> int), or equivalently add takes an argument of type integer and returns a function of type integer to integer.

# let add2 = add 2;;
val add2 : int -> int = 

We've defined add2 by applying the argument 2 to the function add. As we'd expect, the type of add2 is a function from integer to integer.

And what does add2 do?

# add2 3;;
- : int = 5
# add2 10;;
- : int = 12
# add2 100;;
- : int = 102

It simply adds the integer 2 to its argument.

Partial Application

This process of applying fewer than n arguments to a n-argument function is called partial application. The function add2 was defined by partially applying the argument 2 to the function add.

Another example of partial application is the following:

let inc = add 1

Summary

Using higher-order and polymorphic functions, OCaml programmers are able to capture recurring recursion patterns in their code.