Power
Parallelism and Concurrency
Quick Links
While reading this note, you will want to refer to several bits of code. Refer to the Makefile to see how to build executables that use threads as well as how to build and use an OCaml top-level environment that uses threads.
- Makefile
- nondeterminism.ml
- counter.ml
- future.mli
- future.ml
- pmap.ml
- or all the above files in a tarball: parallel.tgz
Introduction
Most people would say that writing parallel programs is hard,
much harder than writing sequential ones.  Why is that?  Consider the
functions f and g defined below.
let r = ref 1;; let f () = let t1 = !r in (* 1 *) let t2 = !r in (* 2 *) r := t1 + t2 + 1 (* 3 *) ;; let g () = let t = !r * 5 in (* 4 *) r := t (* 5 *) ;;Suppose we run
f and g at roughly the same time.
What might happen?  Well it depends on 
exactly when instructions in f and g
are executed.  And unfortunately, the exact timing of when the
instructions in f and g is typically determined by
the whim of the parallel or concurrent system executing
the functions.   It is not something the programmer can predict and
it isn't necessarily consist from one run of the program to the next ---
it often depends on what other processes are running on the machine
at the time of the run and possibly other factors.  In one case, the
lines 1, 2 and 3 might be
executed first (perhaps function f is scheduled to execute first in
its entirety) followed by 4, 5.  In such a case, the result of the
computation is 15.  If, on the other hand,
instructions 4, 5 execute prior to 1, 2, and 3 then the result is 11. 
If the instructions of the two processors are interleaved 1, 4, 2, 5, 3
then the result is 3. 
Moral of the story: if you allow functions like f and g to run in parallel, you will have an immensely difficult time trying to figure out what your program does because there are so many different possible outcomes, and the outcome you get depends on the rate of execution of the instructions in the two functions -- something we can't predict or analyze effectively.
Now consider the following functions h and j.
let x = 1;; let h () : int = let t1 = x in (* 1 *) let t2 = x in (* 2 *) t1 + t2 + 1 (* 3 *) ;; let j () : int = let t = x * 5 in (* 4 *) x (* 5 *) ;;
If h and j are executed in parallel, how 
many possible results can they produce?  Well, if the statements
are executed in the order 1, 2, 3 then 4, 5, 
then function
h produces 3 and j 
produces 5.
If the order is 4, 5, then 1, 2, 3, then 
h still produces 3 and j 
still produces 5.
If it is 1, 4, 2, 5, 3 then ... there is still no change.  In
fact, it is easy to see that
h always produces 3 and j always
produces 5.  These functions do not depend upon a changing environment
(x never changes) so they always produce the same results
whenever they are called and no matter how their execution is interleaved.
In common programming parlance, h and j
are called deterministic functions.  Mathematically speaking, we simply
call them functions --- a mathematical function (as opposed to a
relation) is an object
whose output (or, more generally, whose behavior) is determined exclusively
by its input.  Functions f and g, when executed
concurrently, are non-deterministic --- they can produce many 
different behaviors depending on how the instructions in each function
are scheduled.
In general, the secret to making parallel programming easy, or at least easier, is to cut down on the amount of non-determinism in your program. In fact, all proposals for structuring parallel programs that I know of seek to cut down the amount of (visible) non-determinism in such programs. Indeed, if you can cut the non-determinism down to zero, then writing parallel programs is no more difficult than writing sequential ones.
One of the best and easiest ways to cut out the non-determinism in
parallel programs is to write pure functions -- i.e.,
functions like h and j that use immutable
data like the value x and don't otherwise interact with
a changing environment (by reading or writing from stdin, for instance).  
Of course, to do anything interesting, programs
must interact with the environment in some way.  But if you can push
those interactions to the sequential "edges" of your program, and
use parallel pure functions in the "middle," then you will have a
deterministic program.  
Many of the most successful parallel programming frameworks such as Google's Map-Reduce, Apache Hadoop and parallel database implementations use these concepts. It's not surprising Google named their parallel programming system after the functions map and reduce you've already learned in this class -- the ideas come from functional programming. Indeed, as we will see, functional languages like OCaml (or Haskell or F#) are great languages for this style of parallel programming as all data structures are immutable by default.
Parallelism vs. Concurrency
Some people distinguish between the words parallelism
and concurrency.  A
parallel computation is a computation in 
which some number of tasks
occur simultaneously.  Parallelism improves throughput by using
a number of processors (or cores or machines) to execute a set of 
tasks.  A
concurrent computation is a computation in which several
tasks access a shared, mutable resource.  The example given
above of the functions f and g reading
and writing from the mutable variable x is an example
of a concurrent computation -- the shared resource is the mutable cell 
x.  
It is important to know that OCaml does not have a parallel run-time system. Below, we will explain how to execute separate computations in separate threads. However, even if your machine has multiple cores, only one core will be used to execute the OCaml program and all of its threads. The OCaml run-time system is responsible for scheduling execution of these threads: The run-time will allow one thread to execute for a short time and then it will switch that thread out and resume executing another one. Despite the fact that there is no parallelism in the OCaml run-time, we will often pretend that there is. When it comes to understanding the correctness of programs, it makes little difference whether computations are parallel or only concurrent as in OCaml.
Threads
Threads are one of the most common parallel and concurrent computing paradigms --- there are threads libraries in most modern programming languages, including OCaml. You may think of a thread as a virtual processor. There are several functions in the Thread library for creating, managing and synchronizing threads. The most important follow.
(* create f x starts a new thread that executes f x *)
Thread.create : ('a -> unit) -> 'a -> Thread.t
(* join t waits for the thread t to terminate before continuing *)
Thread.join : Thread.t -> unit
(* delay n suspends the current thread until at least n seconds have passed *)
Thread.delay : float -> unit
(* return the currently executing thread *)
Thread.self : unit -> Thread.t
(* return the unique identifier associated with the thread *)
Thread.id : Thread.t -> int
When a programmer decides
that several bits of work may happen at the same time, he or she may 
create several threads and execute a computation in each thread.  For example,
assume we have a function work : t -> unit and arguments
a and b with type t.  We can compute 
work a and work b concurrently as follows.
let t = Thread.create work a in work b; Thread.join t; Printf.print "a and b done\n"
Thread.create starts execution of work a in a new
thread t and returns to execute
work b.
The thread that executes work a is typically called the
child thread.  The thread that created the child is typically called
the parent thread.
The computation work b may terminate before work a.
If it does, Thread.join t will suspend execution of the parent
thread until the child containing work a terminates.
Compiling Multi-threaded programs
Implementing multi-threaded programs requires special support.  Hence,
whenever you use the Thread library, you must compile using the 
-thread option
and include thread.cma.  For example, to compile your own multi-threaded
program you have written in the my_prog.ml file, use the following
command in your Makefile:
ocamlc -thread -o my_prog unix.cma threads.cma my_prog.mlThen execute your program as usual:
./my_progTo use threads within the ocaml toplevel environment, create a custom top-level environment as follows.
ocamlmktop -thread unix.cma threads.cma -o threaded_topThen run the top-level environment using the following command.
./threaded_top -I +threads
Non-Determinism
As mentioned above, non-determinism is what makes parallel or concurrent programming incredibly hard. A non-deterministic function has many outcomes and programmers just aren't very good at keeping track of all the possible outcomes (it is hard to keep track of just one outcome).
Non-determinism can arise when two concurrently executing threads both contain effects such as printing, reading from standard input, or reading and writing shared mutable references. In all such cases, the exact order in which the instructions of each thread are scheduled can cause threads, and overall program, to exhibit different behaviors.
To get a concrete sense of relationship between effects and non-determinism, consider the functions f and g from above (modified slightly):
let f r = let t1 = get r in (* 1 *) let t2 = get r in (* 2 *) set r (t1 + t2 + 1) (* 3 *) ;; let g r = let t = get r * 5 in (* 4 *) set r t (* 5 *) ;;Each of f and g takes a mutable reference as an argument and the functions get and set either get the contents or set the contents of the reference. For the purpose of illustration, we've added a random delay to the get and set functions. This will give us a lot of variation between runs and will increase the non-determinism.
let delay () = Thread.delay (Random.float 1.);; let get r = delay(); !r ;; let set r v = delay(); r := v ;;Now, we can execute functions f and g concurrently and examine what happens:
let run i random_seed = Random.init random_seed; let r = ref 1 in let t = Thread.create f r in g r; Thread.join t; Printf.printf "Result for run %d is %d\n" i !r; flush stdout (* ensure output printed now *) ;;Finally, a main program runs the same pair of functions with different random seeds that vary the delays in each thread.
let main () = run 1 17; run 2 9; run 3 19 ;; main ();;The output I get is as follows.
Result for run 1 is 7 Result for run 2 is 15 Result for run 3 is 3Try running the code yourself. Untar the package (or download the individual files at the top of this note) and type:
make nondeterminism ./nondeterminism
Another interesting example in the package is the counter.ml code.
Futures
So far we have seen that the combination of effects and threads often leads to highly non-deterministic code, which is difficult to reason about. Let's try to write a simple program in a pure, functional style using only immutable data structures. For example, recall our pure functions h and j from above:
let x = 1;; let h () : int = let t1 = x in (* 1 *) let t2 = x in (* 2 *) t1 + t2 + 1 (* 3 *) ;; let j () : int = let t = x * 5 in (* 4 *) x (* 5 *) ;;Suppose our goal is to execute h and j in parallel and then sum their results. Our first attempt might begin as follows:
let t = Thread.create f () in (* WRONG! Doesn't type check *) let j_result = j (); Thread.join t; let final_result = j_result + ...You will notice that we created a thread in which to run f, but we couldn't actually get f's result back from the thread. In fact, the first line above doesn't type check because Thread.create expects a function with type 'a -> unit for some 'a but we gave it a function with type unit -> int.
It seems that in order to get the value back out of the thread that executes f, we need to use a mutable reference! This sounds worrisome, but with some thought, we can create a library designed specifically to execute pure code in a new thread and return its result (using a reference) but avoid introducing any non-determinism in to a program. The library is amazingingly simple, it declares a new abstract data type called a future along with two operations on that type.
type 'a future
val future : ('a -> 'b) -> 'a -> 'b future
val force : 'a future -> 'a
Intuitively, the function future g x initiates
execution of g x
in a separate thread and returns the future data structure f.  Later, 
when you require the result of the
computation, you may call force f, which will 
return the value g x.  If g x has not
completed when force f is executed, your program will block
until  g x does complete. 
The implementation of futures is as follows.
type 'a future = {t : Thread.t; value : 'a option ref};;
let future f x = 
  let r = ref None in     
  let t = Thread.create (fun () -> r := Some(f x)) () in    
  {t=t ; value=r}
;;
let force (f:'a future) : 'a = 
  Thread.join f.t;
  match !(f.value) with
    | Some v -> v    
    | None -> failwith "impossible!"
;;
We placed these definitions in future.ml and 
the interface in future.mli.
Programming with futures is quite easy because futures support the following equational law whenever g is a pure (effect-free), total function.
let x = g x in e' == let f = future g x in (e'[force f/x])The above equation can be derived from the even simpler law that provided e is a pure (effect-free), valuable expression:
e == force(future (fun _ -> e) ())
Using that law, we can easily transform effect-free sequential programs in to parallel programs that use futures. For instance, consider the following map function over trees.
type 'a tree = Leaf | Node of 'a * 'a tree * 'a tree
let rec map (f:'a -> 'b) (t:'a tree) : 'b tree =
  match t with
      Leaf -> Leaf
    | Node (x, left, right) ->
      let left' = map f left in
      let right' = map f right in
      Node (f x, left', right')
;;
We can (recursively) parallelize our map function simply by executing 
the tree_map on the left subtree in a separate thread from the right subtree.
let rec parallel_map (f:'a -> 'b) (t:'a tree) : 'b tree =
  match t with
      Leaf -> Leaf
    | Node (x, left, right) ->
      let left' = Future.future (parallel_map f) left in
      let right' = parallel_map f right in
      Node (f x, Future.force left', right')
;;
Please see pmap.ml for an example of parallel_map in action (as well as parallel tree creation).
Unfortunately, if one were to measure the performance of the code above, one would likely see the parallel code significantly slower than the sequential code for two reasons. First, in OCaml, separate threads do not execute on separate processors. This first objection is immediately addressed if one were to write similar code in language like Java or C# or F# (Microsoft's variant of OCaml) or Haskell. The second objection is that in order to make it worthwhile to fork a new parallel thread, one must have a sizeable amount of work to do. Forking a new thread for each of the Leaf base cases in the tree map would incur a lot of overhead. Hence, one would want to cut off creation of new threads when the tree's size decreases below a certain threshold. (Such a technique is highly reminiscent of techniques used to implement efficient sorts of large data -- when sorting a large segment of the array, one would use a quicksort but when the size of the portion of the array to be sorted decreases below a threshold, it is more efficient to switch to an insertion sort.)Summary
There are several key points to remember from this note:
- A large part of the difficulty of parallel and concurrent programming is due to the presence of non-determinism.
- This non-determinism arises when threads are mixed with effectful code (most perniciously in code that uses mutable data structures).
- To combat these difficulties, one should use immutable data structures and effect-free code whenever possible. Moreover, it is more important than ever to build and use high-level, correct-by construction libraries that hide difficult implementation details.
- One example of such a library is a library of futures.  You can use
futures to transform pure, sequential functional code in to pure, parallel
functional code extremely easily.  If using the following simple equational law.
whenever f is pure (ie, effect-free) total function.
let x = g x in e' == let f = future g x in (e'[force f/x]) 
- Though the implementation of futures involves a mutable reference, the use of the reference in the implementation does not result in non-determinism. Moreover, since the future type is abstract, client code may not (accidentally) interfere with the implementation invariants and introduce non-determinism unless that client code allocates its own references or performs other effectful operations.
Acknowledgement: Lectures notes adapted from materials developed by Greg Morrisett.