# Assignment 6: Lazy Programming

You may do this assignment in pairs. Each member of the pair is responsible for
participating in all elements of the assignment.

## Quick Links:

- [Part 1](#part-1-lazy-streams-streamsml)
- [Part 2](#part-2-memoization)
- [Part 3](#part-3-property-based-testing)

## Part 1: Lazy Streams (`streams.ml`)

In class, we looked at a couple of ways to implement infinite streams, which you
can check out in the lecture notes. In the referenced code, we repeatedly used
functions with type `unit ->'a * 'a stream`. Such functions are sometimes called
"suspended computations." They are very much like plain expressions with type
`'a * 'a stream` except that plain expressions are *evaluated right away*
whereas functions are not evaluated until they called. In other words,
evaluation is *delayed*. That delay was critical in our implementation of the
stream infinite data structure. Without the delay, we would have written code
containing infinite loops.

In general, we can implement suspended or delayed computations that return a
value with type `'a` with just a function with type `unit -> 'a`. Let's try it:

```
type 'a delay = unit -> 'a

(* create a delayed computation *)
let create_delay (d:unit -> 'a) = d

(* execute a delayed computation *)
let force_delay (d:'a delay) : 'a = d ()
```

You'll notice, however, that this implementation (like our original stream
implementation) can incur extra, unnecessary work when we force the same delay
more than once. For instance:

```
let adder () =
  let l = [1;2;3;4;5;6;7;8;9;0] in
  List.fold_left (+) 0 l

let x = force_delay adder
let y = force_delay adder
```

Above, we run over the list l once to compute x and then we do exactly the same
thing again to compute y. A smarter implementation would use a technique called
*memoization*, which saves away the result of forcing a delay the first time so
that any subsequent time the computation is forced, it can simply look up the
already-computed result. A delayed computation coupled with memoization is
called a *lazy* computation. Here is how we might implement a general-purpose
library for lazy computations, as seen in lecture, but with some added error-
handling:

```
type 'a thunk = Unevaluated of (unit -> 'a) | Evaluated of 'a | Error of exn
type 'a lazy_t = ('a thunk) ref

let create_lazy (f : unit -> 'a) : 'a lazy_t =
  ref ( Unevaluated f )

let force_lazy (l:'a lazy_t) : 'a =
  match !l with
  | Error exn -> raise exn
  | Evaluated a -> a
  | Unevaluated f ->
        try
	  let result = f () in
	    l := Evaluated result;
          result
        with
        | exn -> l := Error exn;
                 raise exn
```

Note that if the underlying function f that you use to create a lazy computation
has no effects other than raising an exception&mdash;does not print, does not
mutate external data structures, does not read from standard input, etc.&mdash;
then an outside observer cannot tell the difference between a lazy computation
and an ordinary function with type `unit -> 'a` (except that the lazy
computation is faster if you force it a 2nd, 3rd or 4th time). If, on the other
hand, your function f does have some effects (like printing) then you will see a
difference between using a lazy computation and using a function with type `unit
-> 'a`. For instance, this code:

```
let a = create_lazy (fun () -> (1 + (print_string "hello\n"; 1)))

let _ = force_lazy a
let _ = force_lazy a
```

only prints `hello\n` once, not twice.

You'll notice that it is a little bit verbose to have to write things like:

```
let a = create_lazy (fun () -> ...)
```

Consequently, OCaml provides convenient built-in support for creating and using
lazy computations via the `Lazy` module, which is
[documented here](https://caml.inria.fr/pub/docs/manual-ocaml/libref/Lazy.html).
This module is not just an ordinary module&mdash;it is really a language
extension as it changes the way OCaml code is evaluated. In particular, any code
that appears inside the lazy constructor

```
lazy (...)
```

is suspended and is not executed until the lazy computation is forced. It is
just like you wrote

```
create_lazy (fun () -> ...)
```

**See the file `streams.ml` for a series of questions concerning implementing
streams using OCaml's `Lazy` module. You will also need to read the OCaml
documentation on `Lazy` data
[here](https://caml.inria.fr/pub/docs/manual-ocaml/libref/Lazy.html).**

## Part 2: Memoization

A lazy computation uses memoization to avoid recomputing the result of executing
a single expression. A more general kind of memoization avoids recomputing the
results of a whole class or set of expressions. In this part of the assignment,
we will explore using memoization for functions instead of simple expressions. A
memoized function f never recomputes f x for the same argument x twice.

To build generic support for function memoization, we will use a dictionary to
store a mapping from function inputs to outputs already computed. You can think
of this dictionary like a cache if you want: The cache saves away function
results for later reuse.

To begin, look at the naive implementation of the Fibonacci sequence defined in
the module `Fib` provided in `fib.ml`.

```
(* slow fib! *)
module Fib : FIB =
struct
  let rec fib (n : int) : int =
    if n > 1 then
      fib (n-1) + fib (n-2)
    else
      n
end
```

Anyone who has taken COS 226 will recognize this as a horrendously inefficient
version of the Fibonacci sequence that takes exponential time because we
recompute fib n for small values of n over and over and over again instead of
reusing computation. The most efficient way to compute the Fibonacci numbers in
OCaml is something like this:

```
(* fast fib! *)
module FastFib : FIB =
struct
  let fib (n : int) : int =
    (* f1 is fib(i-1) and f2 is fib (i-2) *)
    let rec aux i f1 f2 =
      if i = n then
      f1 + f2
      else
      aux (i+1) (f1+f2) f1
    in
    if n > 1 then aux 2 1 0
    else n
end
```

Intuitively, what is happening here is that instead of recomputing `fib(n-1)`
and `fib(n-2)` over and over again, we are saving those results and then reusing
them. When computing the fibonacci sequence, one only has to save away the last
two results to compute the next one. In other computations, we must save away a
lot of data, not just two elements. Sometimes we don't know exactly what we need
to save so we might want to save all the data we have space for. There are
entire businesses built around memoization and caching like this.
[Akamai](https://www.akamai.com/) is one that comes to mind&mdash;they cache web
page requests on servers close to customers who make requests in order to make
web response times faster. In any event, the idea of caching is clearly a
fundamental concept in computer science. Hence, we are going to build generic
caching infrastructure for OCaml computations. This generic caching
infrastructure is going to save away *all* results computed by a function. This
is more than we need to save away to compute Fibonnacci *once* (and it is a big
waste of space in this case), but if we wanted to call Fibonacci many times,
there would be savings across the many calls. (True, there aren't many critical
applications that I can think of that make 1,000,000 calls to the Fibonacci
function but it does make a good, simple test case!)

### Question 2.1

Finish the `MemoFib` functor in `fib.ml` by writing a memoized version of
Fibonacci. In this implementation, you will save away *all* results from ever
calling the function `fib`, including all recursive calls that `fib` might make
to itself. You will do this by representing the (input,output) mapping using a
reference containing a persistent dictionary. It is important that the
dictionary be shared between all calls to `MemoFibo`, so that results are reused
between multiple top-level calls (ie: if you call fib 10000 first and then some
time later in your application call fib 10001, the second call should be
instantaneous since you'll reuse all the work you did on the first call).

In this assignment, we will use OCaml's `Map` library to implement dictionaries.
To investigate OCaml's `Map` library, start by looking
[here](https://caml.inria.fr/pub/docs/manual-ocaml/libref/Map.html). You'll note
that from that web page, you can click on links to find definitions of
[`OrderedType`](https://caml.inria.fr/pub/docs/manual-ocaml/libref/Map.OrderedType.html)
signature and the
[`Map.S`](https://caml.inria.fr/pub/docs/manual-ocaml/libref/Map.S.html), which
is the signature of a module implementing a `Map`. The functor `Map.Make` takes
a module with a `Map.OrderedType` as an argument and produces module with type
`Map.S` as a result.

If you don’t know where to start implementing `MemoFib`, one good strategy is to
use a pair of mutually recursive functions: make one function in the pair the
fib function required by the signature; make the other function responsible for
checking and updating the mapping. The benefit to this strategy is that it lets
you separate memoizing from the function being memoized.

### Question 2.2

Instead of hand-rolling a new version of every function that we’d like to
memoize, it would be nice to have a higher-order function that produces a
memoized version of any function. A totally reasonable—but wrong—first attempt
at writing such an automatic memoizer is shown below (and also may be found in
`memoizer.ml`).

```
module PoorMemoizer (D : DICT) : (POORMEMOIZER with type key = D.key) =
struct
  type key = D.key

  let memo (f : key -> 'a) : key -> 'a =
    let f_memoed x =
      let history = ref (D.empty) in
      try D.find x (!history) with
        Not_found ->
	        let result = f x in
	        history := D.add x result (!history);
		      result
    in
    f_memoed
end
```

What is wrong with this code? For example, apply the functor and use it to
memoize an implementation of Fibonacci like this: 

module BadMemo = PoorMemoizer(Map.Make(IntOrder))

let bad_fib = BadMemo.memo (Fib.fib)

bad_fib (big_number)

You should observe that bad_fib is much slower than the hand-rolled version you 
wrote. (If not, your hand-rolled version is wrong!) Why?  Try tracing through 
the memo function in PoorMemoizer assuming that f is the exponential time fib 
function.  What happens when f is called and then calls itself recursively?

The goal of this part is to finish off the `Memoizer` functor in `memoizer.ml`
by writing an automatic memoizer that doesn’t have the problems of the
`PoorMemoizer`. Notice that `Memoizer` has a different signature than
`PoorMemoizer`. Functions that can be memoized by `Memoizer` take a new
argument: rather than having type `key -> ’a `, they have type `(key -> ’a) ->
(key -> ’a)`. When implementing `Memoizer`, assume that any function you memoize
uses this new first argument instead of directly calling itself recursively. As
an example, Here is the factorial function crafted in this style:

```
let fact_body (recurse:int->int) (n:int) : int =
  if n <= 0 then 1
  else n * (recurse (n-1))


let rec fact (n:int) : int =
  fact_body fact n
```

Notice how easy it is now to reuse the body of fact but print out intermediate
results after every intermediate call to fact.

```
let rec printing_fact (n:int) : int =
  let result = fact_body printing_fact n in
  print_int result; print_newline();
  result
```

Try out that code to make sure you understand it. Recall also that this is a
very similar technique to what we used in the evaluator code in earlier lectures
and in Assignment 4. Search through that code and near the bottom you will see
the function `eval_body` and `eval` and `debug_eval`. All three are closely
related to our variants of factorial.

### Question 2.3

In `fib.ml`, finish the structure `AutoMemoedFib` using your `Memoizer` functor.
This will let you test your `Memoizer` structure to make sure that you solved
the problem. The Fibonacci implementation produced by your `Memoizer` function
should be very nearly as fast as the hand-rolled version in `MemoedFibo`.

**Rhetorical question:** (Don't hand in an answer but discuss with your
classmates, prof or TA) What happens if you use `Memoizer` to memoize a function
that has effects? In particular, what happens if you memoize a function that
prints things?

## Part 3: Property-Based Testing

To do this part of the assignment, you must have install the qcheck package on 
your machine.  This can be done by executing:

opam install qcheck

The materials for Part 3 can be found in lib/part.ml, which contains a module 
that implements the "partition" function for quicksort.  
You should know how quicksort works from COS 226, but if you've forgotten, 
you can google it.

The module Partition contains a single function called partition.  Here is the 
signature of the module.

module type PARTITION = sig
  val partition : int -> int list -> int list * int list
end

module Partition : PARTITION 

The implementation we have given you is correct.  

Your goal is to develop property-based tests using the QCheck library that fully
test the functionality of the partition function.  To do that, you will 
implement the Test functor, which also appears in handout-pbt/lib/part.ml.  
That functor will take an implementation of PARTITION, and apply a bunch of 
tests to it.  We have developed a number of incorrect implementations of 
PARTITION.  We won't show you what our incrorrect implementations do. It is up 
to you to create enough tests (ie: to specify the partition function tightly 
enough) that your tests detect the errors in our incorrect implementations. We 
will apply your Test functor to each of our implementations and run the tests 
generated.

Note:  when developing your tests, you are not allowed to use the correct 
version of partition (or your own reimplementation of partition) or an 
implementation of quicksort that uses partition.  Your goal is think about what 
properties the partition function should have and to use those properties as 
tests.

module Test (P:PARTITION) : PTESTS = struct
  let arb1 = ...
  let prop1 = ...
  let name1 = ...
  let t1 = Q.Test.make ~name:name1 ~count:20 arb1 prop1
  let tests = [t1]
end

Your test functor should implement at least 2 tests.  You can implement as many 
as you like.

Once you have implemented the Test functor, you should be able to build it and 
then run it using:

dune build
dune exec partition

For this part, you should only hand in part.ml

## Handin Instructions

You may do this problem set in pairs. If you do so, both students are
responsible for all components of the assignment, both students will earn the
same grade and accrue the same number of late days.

## Acknowledgements

Assignment components drawn from work by Bob Harper (CMU) and Dan Licata (then-CMU,
now Wesleyan) and Greg Morrisett (then-Harvard, now Cornell).
