Power

## Type-directed Programming

One of the principle advantages of programming in a strongly typed programming language like OCaml is that the types of function arguments and results can help guide the way you construct programs.

In this class, whenever you write a function in
OCaml, you should do so by following the *type-directed programming
methodology*. This programming methodology has the following steps:

- Write down the name of the function and the types of its arguments and results.
- Write a comment that explains its purpose and any preconditions.
- Write several examples of what your function does.
- Write the body of the function. (This is the hard part!)
- Turn your examples into tests.

The place where types really help is in the hard part: *Write the body
of the function* because function bodies involve two conceptual
activities:

- Deconstruct (ie, tear apart or analyze) the input values.
- Construct (ie, build) the output values.

*complete*solutions to the programming problem at hand.

For example, the type `bool`

comes with exactly
two values, `true`

and `false`

. When writing a
function with an argument of type `bool`

, one must consider
what to do when supplied with the input `true`

and
one must consider what to do when supplied with the input
`false`

-- there are never any other possibilities to
consider. Analogously, when writing a function
with a result type `bool`

, there are only two possible results
you can construct -- the value `true`

and the value
`false`

.

In following, we consider a number of the built-in OCaml types. For each type, we'll discuss the set of values for that type as well as how to deconstruct those values (for inputs) and construct those values (for outputs). An ml file associated with this lecture may be found here.

### Booleans

You already know the set of values that make up the boolean type:
`true`

and `false`

-- and that's it.
Given an input of type bool, we may determine which of the
two values we have using a `match`

statement.
In general, match statements have the following form.

match expression with | pattern1 -> result1 | pattern2 -> result2 ... | patternk -> resultk

The code above evaluates the `expression`

and then
checks to see whether the resulting value matches one of the
patterns. The patterns are checked against the computed value in order
and the result associated with the first pattern that matches is
executed. The kinds of patterns available depend upon the type
of the `expression`

. When the `expression`

has type bool, there are two patterns ...
`true`

and `false`

, because, of course,
those are the only two values that
can have type bool. Hence, a boolean pattern matching
expression looks like this:

match expression with | true -> result1 | false -> result2

**Example.**
Let's define a function that converts a boolean in to an
integer. According to our methodology, we will first
define the function name and types, write a comment
to explain what it does and then write down some examples:

(* convert a boolean to an integer: * bool_to_int true = 1 * bool_to_int false = 0 *) let bool_to_int (b:bool) : int = ...

Next, we fill in the body of the function by deconstructing the input and reconstructing a result.

(* convert a boolean to an integer: * bool_to_int true = 1 * bool_to_int false = 0 *) let bool_to_int (b:bool) : int = match b with | true -> 1 | false -> 0

Finally, we will convert our examples into tests.
Below I've created a series of tests from our examples using assert
statements. An assertion is simply an
expression with boolean type wrapped in the keyword `assert`

.
Assertions have the benefit that they may be turned off in production
code by using the compiler option -noassert. Hence, you can put them
in your code for testing purposes but suffer no performance penalty when
deploying your final product.
See the OCaml manual for more.
Here is our final code with our tests.

let _ = assert (bool_to_int true = 1); assert (bool_to_int false = 0);

Notice that we deleted the portion of the comment with the examples.
The examples were a good intermediate step for our own thinking process,
but the final code is almost identical to the comment so the comment is
redundant. The code itself is simple and clear enough that
additional comments just get in the way. It is better style
to omit them and "let the code do the talking" in this case.
(As an aside, notice that any function application
"`f arg`

"
has higher precedence than an operator such as "=" so you do not
need parens around `bool_to_int true`

.)

If you compile your file (using ocamlbuild), and an assertion fails, the Assert_failure exception is raised with the source file name and the location of the boolean expression as arguments.

By the way, a synonym for a match statement on booleans like the one above is an if-then-else statement. Hence, a completely equivalent piece of code is as follows. Notice, of course, that our tests do not change when we change how we write our function. Creating durable tests for a function helps keep code correct as it evolves.

(* convert a boolean to an integer *) let bool_to_int (b:bool) : int = if b then 1 else 0 let _ = assert (bool_to_int true = 1); assert (bool_to_int false = 0);

Most programmers will write functions on a single boolean using an if statement. We introduced the idea of analyzing a boolean value using pattern matching because pattern matching is the general paradigm that programmers use to deconstruct data. An if statement is a special case that only really exists for historical reasons and because programmers coming from other kinds of languages feel comfortable with it.

Ok, that's booleans, and, by the way, if you were thinking "oh my god, I can't believe he spent so much time on such a simple function." Well, you are right. It was pretty easy -- a proficient OCaml programmer can write that function in 5 seconds. Onwards and upwards!

### Tuples

The type `t1 * t2`

represents pairs of values where the first
component of the pair has type `t1`

and the second component
has type `t2`

. The type `t1 * t2 * t3`

represents
triples of values where the first
component of the triple has type `t1`

, the second component
has type `t2`

and the third component has type `t3`

.
An n-tuple has type `t1 * t2 * ... * tn`

and has n such
components.

We create a pair or triple or n-tuple values by writing down a series of expressions separated by commas and enclosed by parentheses. For example:

let name_and_age1 : string * string * int = ("David", "Walker", 25) let name_and_age2 : string * string * int = ("Brian", "Kernighan", 15)

**Example.**
To analyze a pair or any other kind of tuple, we may again use pattern
matching. Let's write a function to extract the string components of
a triple like the one above and return a string. According to our
methodology, we write the name and types first along with a comment.
Then we add some examples.

(* create a string from the first two components, separated by a space *) let full_name (name_and_age : string * string * int) : string = ... let _ = assert(full_name name_and_age1 = "David Walker"); assert(full_name name_and_age2 = "Brian Kernighan");

To fill in the body of the function, we use pattern matching to extract the content we need from the triple. Recall that the "^" operator concatenates two strings.

(* create a string from the first two components, separated by a space *) let full_name (name_and_age : string * string * int) : string = match name_and_age with | (first, last, _) -> first ^ " " ^ last let _ = assert(full_name name_and_age1 = "David Walker"); assert(full_name name_and_age2 = "Brian Kernighan");

Above, the pattern `(first, last, _)`

matches any
triple; the variable `first`

is bound to the first
value in the triple and the variable `last`

is bound
to the second value in the triple. The "_" is a pattern that matches
any value. It informs the reader of the code that "I don't care about
this value." In this function, we don't use the contents of
the age component, so the underscore pattern is appropriate.

Whenever a match statement contains just one pattern, a programmer may replace the match statement with a let statement. For instance, the following is a bit more compact.

(* create a string from the first two components, separated by a space *) let full_name (name_and_age : string * string * int) : string = let (first, last, _) = name_and_age in first ^ " " ^ last

Even more compact, we can place the pattern match in the function argument position:

(* create a string from the first two components, separated by a space *) let full_name ((first, last, _) : string * string * int) : string = first ^ " " ^ last

While these latter two examples are more compact, it is important to understand that pair patterns are just like boolean patterns or integer patterns or any other kind of pattern in that they may be used within a match expression. In more complicated examples, we may use several different patterns in conjunction to analyze and extract information from an input.

**Example.**
Define a function that computes the disjunction of a pair of booleans.

(* compute disjunction *) let or_pair (p:bool*bool) : bool = ... let _ = assert(or_pair (true,true) = true); assert(or_pair (true,false) = true); assert(or_pair (false,true) = true); assert(or_pair (false,false) = false);

Next, we fill in the body of the function by deconstructing the input and reconstructing a result.

(* compute disjunction *) let or_pair (p:bool*bool) : bool = match p with | (true,true) -> true | (true,false) -> true | (false,true) -> true | (false,false) -> false let _ = assert(or_pair (true,true) = true); assert(or_pair (true,false) = true); assert(or_pair (false,true) = true); assert(or_pair (false,false) = false);

Now, since the input contains a pair of booleans and each boolean contains two different values, true and false, it is natural that we would start out writing our function using 4 cases (2*2 = 4). However, we might now observe that these 4 cases may be written as two using a wildcard pattern:

(* compute disjunction *) let or_pair (p:bool*bool) : bool = match p with | (false,false) -> false | _ -> true let _ = assert(or_pair (true,true) = true); assert(or_pair (true,false) = true); assert(or_pair (false,true) = true); assert(or_pair (false,false) = false);

**Example.** Write a function that counts the number
of true values in a 5-tuple.

(* count the number of occurrences of "true" in the input *) let count5 (p:bool*bool*bool*bool*bool) : int = ... let _ = assert(count5 (true,true,true,true,true) = 5); assert(count5 (false,false,true,false,false) = 1); assert(count5 (false,false,false,false,false) = 0);

Perhaps your first instinct when writing the function is to once again break down the tuple of booleans in to cases as follows.

(* count the number of occurrences of "true" in the input *) let count5 (p:bool*bool*bool*bool*bool) : int = match p with | (false, false, false, false, false) -> 0 | (true, false, false, false, false) -> 1 | (false, true, false, false, false) -> 1 ...

... but that is clearly going to get way out of hand and so many
cases are going to
be hard to read and verify. How can we reduce the number of cases?
Well, "counting" a single boolean is easy -- it involves just two
cases (as all functions on single booleans do!) -- if the boolean is
true, it returns 1 and if it is false, it returns 0. We'll just
use that function 5 times and sum the results. Moreover,
we've already written a function to "count" a single boolean --
it is called `bool_to_int`

!
How lucky is that? (Even if we hadn't written it already, writing
it now would take 5 seconds and be far easier and clearer than
writing the 2^5 patterns we would have had to write if we followed
the naive approach.) Here is the code.

(* count the number of occurrences of "true" in the input *) let count5 (p:bool*bool*bool*bool*bool) : int = let (b1, b2, b3, b4, b5) = p in bool_to_int b1 + bool_to_int b2 + bool_to_int b3 + bool_to_int b4 + bool_to_int b5 let _ = assert(count5 (true,true,true,true,true) = 5); assert(count5 (false,false,true,false,false) = 1); assert(count5 (false,false,false,false,false) = 0);

### Unit Type

We have talked about pairs, which are tuples with 2 fields. We have
talked about triples, which are tuples with 3 fields. We have talked
about quintuples, which are tuples with 5 fields. Ever consider what
a tuple with 0 fields looks like? It looks like this: `()`

. In
OCaml, this value is referred to colloquially as "unit" and it's type is also
called `unit`

.

Surprisingly, even though the unit value has no information content, it is quite heavily used! Whenever an expression has an effect on the outside word, but returns no interesting data, unit is its type. For instance, expressions that do nothing but print data to stdout will typically have type unit. The following is an example of an expression with type unit:

print_string "hello world\n\n"

Assertions are also expressions with type unit. Why? Because when an assertion succeeds, it does nothing, returning the unit value.

It is also possible for a function to have no interesting input -- such functions already contain all the data they need to execute. In such cases, unit is a reasonable argument type. Like other types, one can pattern match on expressions with unit type -- the pattern is ().

**Example (Poor Style).**
Here is a function that prints hello world:

let hello_world (x:unit) : unit = match x with | () -> print_string "hello world\n"

However, as with other kinds of pairs, since there is only one branch of the match expression, we can (and should) shift the pattern in to the argument position as follows (note that we omit the type of the argument in this revision since the argument type unit is fully determined by the pattern ()).

**Example (Better Style).**

let hello_world () : unit = print_string "hello world\n"

**Example.**
Sometimes, we need to execute several unit-valued expressions in a row.
we could use successive pattern matching, but that is overly verbose.
Instead of successive pattern matching,
use a semi-colon to separate one unit-valued expression from the next.

let hello_world () : unit = print_string "hello"; print_string " "; print_endline "world"

**Example.**Now that we know that assertions are just expressions with unit type, we can use them inside functions to check necessary conditions of our inputs and verify the correctness of our outputs.

(* precondition: 0 <= n < Str.length s * returns the nth character of s *) let nth (s:string) (n:int) : char = assert (0 <= n & n < Str.length s); Str.nth s

**Example.**Unit-valued functions We can also use assertions within functions as part of our testing apparatus. An effective way to test your functions is often to compute the same answer in two different ways. For instance, we know that disjunction should be symmetric. If we find it isn't, we must have an error. Here is some simple code to test for symmetry.

let test_symmetry (x:bool) (y:bool) : unit = assert(or_pair (x,y) = or_pair (y,x)) let _ = test_symmetry true false; test_symmetry false false; test_symmetry true true;

### Option Types

An option type, written `t option`

, contains two sorts of values,
the value `None`

and the value `Some v`

where v is a
value with type t.

**Example.** A point is a pair of integer coordinates.
Write a function that finds the slope of a line between two points.
Return `None`

if the line is vertical and the slope is
undefined. Return `Some slope`

if the slope is non-negative.

To start out, it is useful to define a type abbreviation for points:

type point = int * int

When defining a type abbreviation, the new type name (ie: `point`

) is
in every way identical to its definition (ie: `int * int`

).
Hence we may now use `point`

and `int * int`

interchangeably in our code.
However, using `point`

(where the data in question does in
fact represent a point) makes the code easier to read. It
is good documentation and good style. Now, on to computing the slope
of a line between two points.

(* slope of a line: * slope (0,0) (0,1) = Some .5 * slope (2,-1) (2,17) = None *) let slope (p1:point) (p2:point) : float option = ...

Since computations with floating point values may be imprecise, we'll start with our examples in comments and create some tests from them afterwards. On to the body of the function:

(* slope of a line: * slope (0,0) (12,0) = Some 0.0 -- horizontal line * slope (2,-1) (2,17) = None -- vertical line * slope (0,0) (1,1) = Some 1.0 -- 45 degree angle *) let slope (p1:point) (p2:point) : float option = let (x1,y1) = p1 in let (x2,y2) = p2 in let xd = x2 - x1 in let yd = y2 - y1 in if xd != 0 then Some ( float_of_int yd /. float_of_int xd ) else None

Notice that we used pattern matching on points to extract their components. This is perfectly legal since a point is a pair and pattern matching on pairs is legal. Before dividing yd by xd, we tested xd for zero. If it is not zero, we divide and return a Some. If it is zero, we return None.

When testing floating point results, we would like to test that the results are within an acceptable range as opposed to being exactly equal to some constant we write down in our file because of the imprecision of floating point arithmetic. Hence, to facilitate testing, we will write another function, inrange, to help us. Of course, whenever we write code to help us test our program functionality, it is possible the testing code is incorrect. Nevertheless, it usually helps us detect errors in our work because writing a computation two different ways typically helps weed out errors. (Sometimes we'll find an error in our test when the function being tested is correct. That's ok, we can quickly fix the test.) Here's a testing function:

(* in_range: * true if x is (Some f) and f is between low and high, inclusive * false otherwise *) let in_range (x: float option) (low:float) (high:float) = match x with None -> false | Some f -> (low <= f & f <= high) let _ = assert(in_range (Some 0.0) (-1.0) (2.0)); assert(in_range (Some 0.0) (0.0) (0.0)); assert(not (in_range (Some 3.0) (5.0) (-2.0))); assert(not (in_range (None) (-100.0) (200.0)));

Now that we have in_range, testing slope is made easier.

let _ = assert(slope (0,0) (0,0) = None); assert(slope (1,17) (1,-15) = None); assert(in_range (slope (0,0) (1,0)) (-0.0001) 0.0001); assert(in_range (slope (0,0) (1,1)) (0.9999) 1.0001);

### Summary

Strong, precise type systems help guide the construction of functions. Typically, we analyze the inputs to our functions according to their type and build ouputs for our function, again, according to their type. The following table summarizes the types we have looked, the shape of the patterns for analyzing values of those types and the common deconstruction patterns for that type.

Type T | Pattern(s) | Common Deconstruction |
---|---|---|

bool | true; false | if e then ... else ... |

t1 * t2 | (pat,pat) | let (x,y) = e in ... |

t1 * t2 * t3 | (pat,pat,pat) | let (x,y,z) = e in ... |

t1 * ... * tn | (pat,...,pat) | let (x1,...,xn) = e in ... |

unit | () | e; ... |

t option | None; Some pat | match e with None -> ... | Some x -> ... |