Problem Set 7: Parallel Sequences

Quick Links


This assignment is longer and more open-ended than previous assignments. We have left you a README file in which you may comment on any unusual design decisions you made or problems that you had. You are not required to write lengthy explanations of your code here but you may add any elements that you believe will help the graders understand and appreciate your work. Also, any comments on the assignment and suggestions for next year are appreciated.

Part 1: Parallel Sequences

A sequence is an ordered collection of elements. OCaml's built-in list type is one representation of sequences. In this problem, you will develop another representation for sequences that support a number of parallel operations. For example, the map function will apply a function to all the elements of the sequence in parallel.

The following table summarizes the most important operations in the sequence module. The work and span columns describe the complexity requirements of your implementation under several assumptions listed below. The only special case is flatten -- you may use any correct, functional implementation without worrying about the work/span complexity.

Function Work Span Notes
length O(1) O(1)
empty O(1) O(1)
cons O(n) O(1) Second argument has length n.
singleton O(1) O(1)
append O(n+m) O(1) Arguments have length n and m.
tabulate O(n) O(1) Argument has length n.
nth O(1) O(1)
filter O(n) O(log n) Argument has length n.
map O(n) O(1) Argument has length n.
reduce O(n) O(log n) Argument has length n.
repeat O(n) O(1) Second argument is n.
flatten -- -- Sum of lengths of lists contained in argument is n.
zip O(min(n,m)) O(1) Arguments have length n and m.
split O(n) O(1) Argument has length n.

Complexity Assumptions

The OCaml Thread library provides functions for creating and interacting with independent tasks that may be executed in parallel. In order to implement the operations in the parallel sequences library, you will assume that OCaml threads operate in parallel. You will also have to make the following assumptions:

  • Array.make is O(1) work and span.
  • The two functions below, multi_create and multi_join, are O(1) work and span.
These costs are not accurate, of course, in the actual OCaml run-time, but they do hold in the idealized parallel machine we assume you would use.

let multi_create (f:int -> unit) (n:int) : Thread.t array = 
  let a = Array.make n (Thread.self ()) in 
  for i=0 to n-1 do 
    a.(i) <- Thread.create f i;

let multi_join (a:Thread.t array) : unit = 
  Array.iter Thread.join a

For experts: Array.make is actually performing two tasks: allocation and initialization. Allocation can indeed be peformed in (amortized) constant time. Initialization is trickier -- it would be implemented similarly to multi-create on an idealized parallel machine. For multi_create and multi_join, one can imagine a parallel machine where initializing a thread of execution running the same function on each of its processing units can be done in constant time. Our multi_create lets you do exactly this.

Your Task

Your task is to build an implementation of sequences that have the work and span listed above. We have provided a complete interface in the sequence.mli as well as a reference implementation based on lists in This implementation is only meant to be a guide — it does not have the time complexities described above, but may be useful for debugging.

To submit:

Part 2: Map-Reduce Applications

The following applications are designed to test out your sequence implementation and illustrate the "Map-Reduce" style of computation.

When implementing your routines, you should do so under the assumption that your code will be executed on a parallel machine with many cores. You should also assume that you are computing over a large amount of data (eg: computing your inverted index may involve processing many documents; your perfect matching service may have many profiles). Hence, an implementation that iterates in sequence over all profiles or over all documents will be considered impossible slow --- you must use bulk parallel operations such as map and reduce when processing sets of documents to minimize the span of your computation. Note, however, that we do not care about small constants; you do not need to over-optimize your code. Style and clarity is important too.

Inverted Index

An inverted index is a mapping from words to the documents in which they appear. For example, if we started with the following documents:

Document 1:

OCaml map reduce

Document 2:
fold filter ocaml
The inverted index would look like this:

ocaml1 2
map 1
reduce 1
fold 2
filter 2

To implement this application, you should take a dataset of documents (such as data/reuters.txt or data/test1.txt) as input and use the map and reduce operations to produce an inverted index. Complete the mkindex function in apps/ This function should accept the name of a data file and print the index to screen. To get started, investigate some of the functions in the module Util (with interface util.mli and implementation In particular, notice that Util contains a useful function called load_documents --- use it to load a collection of documents from a file. You should print the final result using Util function print_reduce_results.

Hint: The example above suggests that your inverted index should be case insensitive (OCaml and ocaml are considered the same word). You might find functions such as String.uppercase and/or String.lowercase from the String library useful.

A Perfect Matching

For the next application, you will implement apm, a simple dating service. The files data/profiles.txt (long) and data/test_profiles.txt (shorter) contain some randomly generated profiles of fictional people seeking partners for a romantic relationship in the files. The type of a profile is Apm.profile. We have supplied code in for parsing data files and printing results to the console. You must write the matchme function.

This function takes several arguments as a string array: the data file, the number of matches requested, and the first and last name of the client, who should have a profile in the data file. The map step should pair the client's profile with every other profile in the database and test the match between them. For each pair, compute a compatibility index, which is a float between 0. (perfectly incompatible) and 1. (perfectly compatible). We leave the details of this computation up to you. Please include comments as to your reasoning; be respectful and don't get too carried away.

The reduce step should use the compatibility index and the corresponding pairs to create the desired number of matches. The results should then be printed out the using the supplied print_matches function. The Util module has some functions for reading in files and manipulating strings that may be useful.

Here is a sample run of apm using our solution.

Client: Jeremiah Sanford
  sex: M  age: 23  profession: trucker
  nondrinker  nonsmoker
  has children  does not want children
  prefers a female partner between the ages of 19 and 29
  likes classical music and sports

2 best matches:
Compatibility index: 0.691358
Katy Allen
  sex: F  age: 21  profession: trucker
  nondrinker  nonsmoker
  no children  does not want children
  prefers a male partner between the ages of 18 and 24
  likes country music and sports
Compatibility index: 0.691358
Corina Savage
  sex: F  age: 24  profession: trucker
  nondrinker  nonsmoker
  no children  does not want children
  prefers a male partner between the ages of 18 and 32
  likes country music and sports

Your Task

Complete the matchme and mkindex functions making use of the parallel map and reduce functionality provided by your Sequence implementation. You may use your own metric for calculating match compatibility.

You have also been provided with a Makefile in the apps directory that will compile the apps into an executable for testing. For example to run the inverted index, run make (on Unix) and then run:

  ./main_apps mkindex data/test1.txt
and for A Perfect Match run
  ./main_apps matchme data/profiles.txt 5 John Doe

DO NOT change any code already provided to you (especially the parsing functions in Make sure you place any additional functions you write in your submission files.

To submit: and

Part 3: N-Body Simulation

An n-body simulation models the movement of objects in space due to the gravitational forces acting between them. There is a straightforward O(n^2) algorithm that computes the movement of each object by calculating the interactions with every other object. Your task in this problem will be to implement a more efficient algorithm that approximates an n-body simulation. Your solution will build on the code for parallel sequences developed in Part 2.

Naive Algorithm

In an n-body simulation, we are given a collection of n objects or bodies, each possessing a mass, location, and velocity. The goal of the simulation is to compute, for each time step, the new positions and velocities for each body based on the gravitational forces acting on each.

Assume that we have a module Plane that defines representations for scalar values, two-dimensional points, vectors, and common functions such as Euclidean distance (implementing such a module will be a part of this problem). Using Plane, we can define a type that represent the mass, position, and velocity of a body,

  type mass = Plane.scalar
  type location = Plane.point
  type velocity = Plane.vector
  type body = mass * location * velocity
and a function acc that calculates the acceleration of one body on another:
  val acceleration : body -> body -> Plane.vector

To understand how the acceleration function works, we need to review a few basic facts from physics. Recall that force is equal to mass times acceleration (F = m × a) and the gravitational force between objects with masses m and n separated by distance d is given by (G × m × n) / d² where G is the gravitational constant. Putting these two equations together, and solving for a, we have that the magnitude of the acceleration vector due to gravity for the object with mass n is G × m / d²; the direction of the acceleration vector is the same as the direction of the unit vector between the objects. Note that this calculation assumes that the objects do not collide. We will make this simplifying assumption throughout this problem.

Using acceleration and the functions from the Sequence module, we can then define a function accelerations that computes the accelerations on each of the bodies in the simulation:

val accelerations : body Sequence.t -> Plane.vector Sequence.t 
Finally, we update the position p and velocity v of each body to p + v + a/2 and v + a respectively, where a is the Plane.vector in the sequence returned by accelerations.


An n-body simulation produces a stream of results — one "frame" for each step of the simulation. In your solution, you will need to define a function make_simulation that creates a stream to represent a simulation. We will use the following datatype to represent streams:
  type 'a stream = Nil | Cons of 'a * (unit -> 'a stream)
The helper functions we have provided for running simulations (discussed next) will access these streams and give you a way to test your code on simple examples.

Testing and Visualization

N-Body GUI
We have provided several modules, helper functions, test cases, and a graphical interface that may be useful while developing your solution. The function run_simulation in Main_nbody can be used to generate "transcripts" of simulations. The module Test_nbody defines the initial values for several simulations. The Java application bouncy.jar (depicted in the image above) can be used to animate a simulation using a transcript. Assuming you have java installed on your system, simply run

java -jar bouncy.jar
and the GUI will pop up. If you click on the "open" button, you will be able to load a transcript file. Then click the "run" button to execute that transcript.

Your Tasks

Your task for the first part of this problem is to implement the naive n-body algorithm. In particular, you will need to implement and fill in the missing functions in

To submit: and

The Barnes-Hut Algorithm

The Barnes-Hut algorithm provides a way to approximate an n-body simulation, while dramatically decreasing its cost. The insight behind the algorithm is that the gravitational forces between objects separated by large distances are weak, so they can be approximated without affecting the fidelity of the overall simulation very much.

The algorithm works by grouping bodies into regions and using a fixed threshold θ to determine if it should perform the exact acceleration calculation for the each body in the region, or if it should merely approximate acceleration using a pseudobody that represents aggregate information about all of the bodies contained in the region. This dramatically reduces the time needed to perform large simulations — asymptotically, from O(n^2) to O(n log n).

Bounding Boxes and Quad Trees

To represent regions, we will use bounding boxes comprising two points:

type bbox = {north_west:Plane.point; south_east:Plane.point}
To represent the decomposition of a collection of bodies into regions, we will use the following datatype:
type bhtree = 
  | Single of body 
  | Cell of body * bbox * (bhtree Sequence.t)
Intuitively, Empty represents a region with no bodies; Single(b) represents a region with a single body b; and Cell(b,bb,qs) represents a region with pseudobody b, bounding box bb, and four sub-quadrants qs. The position of the pseudobody in a Cell is the center of mass (or barycenter) of all objects in the region. The mass of the pseudobody is the total mass of all objects. The velocity of the pseudobody can be arbitrary, as it is not needed for the acceleration calculation. In your solution, you will likely have to write a function that decomposes a sequence of n bodies into a bhtree.


The actual Barnes-Hut simulation works as follows. For each body b, the algorithm traverses the top-level bhtree. If it encounters a region with more than one body — i.e., a Cell node — it checks whether θ ≥ m / d, where m is the total mass of the pseudobody associated with the region, and d is the distance between b and the pseudobody's center of mass. If so, then it treats the entire region as if it were one large pseudobody. Otherwise, it recursively performs the computation on the sub-quadrants of the region. The base cases are the Empty and Single(b) cases, where the algorithm simply invokes the acceleration function as in the naive algorithm.

Your Tasks

Your task for the second part of this problem is to implement the Barnes-Hut algorithm. You will need to fill in the missing code in

To submit:

Handin Instructions

You may do this problem set in pairs. If you do so, both students are responsible for all components of the assignment. Please write both names at the top of all files if you do it in pairs. Please only have 1 member of the pair hand in the assignment to dropbox.

You must hand in these files to dropbox (see link on assignment page):



This assignment is based on materials developed by Dan Licata and David Bindel originally, and modified by Nate Foster.