Power
Problem Set 7: Parallel Sequences
Quick Links
 Code Bundle for this assignment
 Notes on how to compile and use OCaml threads
 Notes on work and span and parallel complexity
 Parallel Sequences
 Mapreduce Apps
 Parallel Nbody Simulation
README
This assignment is longer and more openended than previous assignments. We have left you a README file in which you may comment on any unusual design decisions you made or problems that you had. You are not required to write lengthy explanations of your code here but you may add any elements that you believe will help the graders understand and appreciate your work. Also, any comments on the assignment and suggestions for next year are appreciated.
Part 1: Parallel Sequences
A sequence is an ordered collection of elements. OCaml's
builtin list
type is one representation of sequences. In
this problem, you will develop another representation for sequences
that support a number of parallel operations. For example,
the map
function will apply a function to all the
elements of the sequence in parallel.
The following table summarizes the most important operations in the sequence module. The work and span columns describe the complexity requirements of your implementation under several assumptions listed below. The only special case is flatten  you may use any correct, functional implementation without worrying about the work/span complexity.
Function  Work  Span  Notes 

length  O(1)  O(1)  
empty  O(1)  O(1)  
cons  O(n)  O(1)  Second argument has length n. 
singleton  O(1)  O(1)  
append  O(n+m)  O(1)  Arguments have length n and m. 
tabulate  O(n)  O(1)  Argument has length n. 
nth  O(1)  O(1)  
filter  O(n)  O(log n)  Argument has length n. 
map  O(n)  O(1)  Argument has length n. 
reduce  O(n)  O(log n)  Argument has length n. 
repeat  O(n)  O(1)  Second argument is n. 
flatten      Sum of lengths of lists contained in argument is n. 
zip  O(min(n,m))  O(1)  Arguments have length n and m. 
split  O(n)  O(1)  Argument has length n. 
Complexity Assumptions
The OCaml Thread library provides functions for creating and interacting with independent tasks that may be executed in parallel. In order to implement the operations in the parallel sequences library, you will assume that OCaml threads operate in parallel. You will also have to make the following assumptions:
 Array.make is O(1) work and span.
 The two functions below, multi_create and multi_join, are O(1) work and span.
let multi_create (f:int > unit) (n:int) : Thread.t array = let a = Array.make n (Thread.self ()) in for i=0 to n1 do a.(i) < Thread.create f i; done; a let multi_join (a:Thread.t array) : unit = Array.iter Thread.join a
For experts: Array.make is actually performing two tasks: allocation and initialization. Allocation can indeed be peformed in (amortized) constant time. Initialization is trickier  it would be implemented similarly to multicreate on an idealized parallel machine. For multi_create and multi_join, one can imagine a parallel machine where initializing a thread of execution running the same function on each of its processing units can be done in constant time. Our multi_create lets you do exactly this.
Your Task
Your task is to build an implementation of sequences that have the
work and span listed above. We have provided a complete interface in
the sequence.mli
as well as a reference implementation
based on lists in sequence.ml
. This implementation is
only meant to be a guide — it does not have the time
complexities described above, but may be useful for debugging.
To submit: sequence.ml.
Part 2: MapReduce Applications
The following applications are designed to test out your sequence implementation and illustrate the "MapReduce" style of computation.
When implementing your routines, you should do so under the assumption that your code will be executed on a parallel machine with many cores. You should also assume that you are computing over a large amount of data (eg: computing your inverted index may involve processing many documents; your perfect matching service may have many profiles). Hence, an implementation that iterates in sequence over all profiles or over all documents will be considered impossible slow  you must use bulk parallel operations such as map and reduce when processing sets of documents to minimize the span of your computation. Note, however, that we do not care about small constants; you do not need to overoptimize your code. Style and clarity is important too.
Inverted Index
An inverted
index is a mapping from words to the documents in which they
appear. For example, if we started with the following documents:
Document 1:
OCaml map reduce
Document 2:
fold filter ocamlThe inverted index would look like this:
word  document 

ocaml  1 2 
map  1 
reduce  1 
fold  2 
filter  2 
To implement this application, you should take a dataset of documents (such
as data/reuters.txt
or data/test1.txt
)
as input and use the map and reduce operations
to produce an inverted index. Complete the mkindex
function
in apps/inverted_index.ml
. This function should
accept the name of a data file and print the index to screen.
To get started, investigate some of the functions in the module
Util
(with interface util.mli
and implementation
util.ml
). In particular, notice that Util
contains
a useful function called load_documents
 use it to
load a collection of documents from a file.
You should print the
final result using Util
function print_reduce_results
.
Hint: The example above suggests that your inverted index should be case insensitive (OCaml and ocaml are considered the same word). You might find functions such as String.uppercase and/or String.lowercase from the String library useful.
A Perfect Matching
For the next application, you will implement apm
, a
simple dating service. The files data/profiles.txt
(long) and data/test_profiles.txt
(shorter) contain some
randomly generated profiles of fictional people seeking partners for a
romantic relationship in the files. The type of a profile
is Apm.profile
. We have supplied code
in apm.ml
for parsing data files and printing results to
the console. You must write the matchme
function.
This function takes several arguments as a string array: the data
file, the number of matches requested, and the first and last name of the
client, who should have a profile in the data file. The map step should pair
the client's profile with every other profile in the database and test the match
between them. For each pair, compute a compatibility index, which is
a float
between 0. (perfectly incompatible) and 1. (perfectly
compatible). We leave the details of this computation up to you. Please include
comments as to your reasoning; be respectful and don't get too carried
away.
The reduce step should use the compatibility index and the corresponding pairs
to create the desired number of matches. The results should then be printed out
the using the supplied print_matches
function. The Util
module has some functions for reading in files
and manipulating strings that may be useful.
Here is a sample run of apm using our solution.
 Client: Jeremiah Sanford sex: M age: 23 profession: trucker nondrinker nonsmoker has children does not want children prefers a female partner between the ages of 19 and 29 likes classical music and sports 2 best matches:  Compatibility index: 0.691358 Katy Allen sex: F age: 21 profession: trucker nondrinker nonsmoker no children does not want children prefers a male partner between the ages of 18 and 24 likes country music and sports  Compatibility index: 0.691358 Corina Savage sex: F age: 24 profession: trucker nondrinker nonsmoker no children does not want children prefers a male partner between the ages of 18 and 32 likes country music and sports
Your Task
Complete the matchme
and mkindex
functions making use
of the parallel map and reduce functionality provided by your Sequence
implementation. You may use your own metric for calculating match
compatibility.
You have also been provided with a Makefile in the apps directory that will compile the apps into an executable for testing. For example to run the inverted index, run make (on Unix) and then run:
./main_apps mkindex data/test1.txtand for A Perfect Match run
./main_apps matchme data/profiles.txt 5 John Doe
DO NOT change any code already provided to you (especially the parsing functions in apm.ml). Make sure you place any additional functions you write in your submission files.
To submit: apm.ml and inverted_index.ml.
Part 3: NBody Simulation
An nbody simulation models the movement of objects in space due to the gravitational forces acting between them. There is a straightforward O(n^2) algorithm that computes the movement of each object by calculating the interactions with every other object. Your task in this problem will be to implement a more efficient algorithm that approximates an nbody simulation. Your solution will build on the code for parallel sequences developed in Part 2.
Naive Algorithm
In an nbody simulation, we are given a collection of n objects or bodies, each possessing a mass, location, and velocity. The goal of the simulation is to compute, for each time step, the new positions and velocities for each body based on the gravitational forces acting on each.
Assume that we have a module Plane
that defines
representations for scalar values, twodimensional points, vectors,
and common functions such as Euclidean distance (implementing such a
module will be a part of this problem). Using Plane
, we
can define a type that represent the mass, position, and velocity of a
body,
type mass = Plane.scalar type location = Plane.point type velocity = Plane.vector type body = mass * location * velocityand a function
acc
that calculates the acceleration of
one body on another:
val acceleration : body > body > Plane.vector
To understand how the acceleration
function works, we need to
review a few basic facts from physics. Recall that force is equal to
mass times acceleration (F = m × a) and the gravitational
force between objects with masses m and n separated by
distance d is given by (G × m × n) / d²
where G is the gravitational constant. Putting these two
equations together, and solving for a, we have that the
magnitude of the acceleration vector due to gravity for the object
with mass n is G × m / d²; the direction of
the acceleration vector is the same as the direction of the unit
vector between the objects. Note that this calculation assumes that
the objects do not collide. We will make this simplifying assumption
throughout this problem.
Using acceleration
and the functions from
the Sequence
module, we can then define a
function accelerations
that computes the accelerations on
each of the bodies in the simulation:
val accelerations : body Sequence.t > Plane.vector Sequence.tFinally, we update the position p and velocity v of each body to p + v + a/2 and v + a respectively, where a is the
Plane.vector
in the sequence
returned by accelerations
.
Streams
An nbody simulation produces a stream of results — one "frame" for each step of the simulation. In your solution, you will need to define a functionmake_simulation
that creates a
stream to represent a simulation. We will use the following datatype
to represent streams:
type 'a stream = Nil  Cons of 'a * (unit > 'a stream)The helper functions we have provided for running simulations (discussed next) will access these streams and give you a way to test your code on simple examples.
Testing and Visualization
We have provided several modules, helper functions, test cases, and a
graphical interface that may be useful while developing your
solution. The function run_simulation
in Main_nbody
can be used to generate "transcripts" of
simulations. The module Test_nbody
defines the initial
values for several simulations. The Java
application bouncy.jar
(depicted in the image above) can
be used to animate a simulation using a transcript. Assuming you
have java installed on your system, simply run
java jar bouncy.jarand the GUI will pop up. If you click on the "open" button, you will be able to load a transcript file. Then click the "run" button to execute that transcript.
Your Tasks
Your task for the first part of this problem is to implement the
naive nbody algorithm. In particular, you will need to
implement plane.ml
and fill in the missing functions
in naive_nbody.ml
.
To submit: plane.ml and naive_nbody.ml.
The BarnesHut Algorithm
The BarnesHut algorithm provides a way to approximate an nbody simulation, while dramatically decreasing its cost. The insight behind the algorithm is that the gravitational forces between objects separated by large distances are weak, so they can be approximated without affecting the fidelity of the overall simulation very much.
The algorithm works by grouping bodies into regions and
using a fixed threshold θ to determine if it should perform the
exact acceleration
calculation for the each body in the
region, or if it should merely approximate acceleration
using a pseudobody that represents aggregate information about
all of the bodies contained in the region. This dramatically reduces
the time needed to perform large simulations — asymptotically,
from O(n^2) to O(n log n).
Bounding Boxes and Quad Trees
To represent regions, we will use bounding boxes comprising two points:
type bbox = {north_west:Plane.point; south_east:Plane.point}To represent the decomposition of a collection of bodies into regions, we will use the following datatype:
type bhtree = Empty  Single of body  Cell of body * bbox * (bhtree Sequence.t)Intuitively,
Empty
represents a region with no
bodies; Single(b)
represents a region with a single
body b
; and Cell(b,bb,qs)
represents a
region with pseudobody b
, bounding box bb
,
and four subquadrants qs
. The position of the pseudobody
in a Cell
is the center of mass (or barycenter) of
all objects in the region. The mass of the pseudobody is the total
mass of all objects. The velocity of the pseudobody can be arbitrary,
as it is not needed for the acceleration
calculation. In
your solution, you will likely have to write a function that
decomposes a sequence of n bodies into
a bhtree
.
Simulation
The actual BarnesHut simulation works as follows. For each
body b, the algorithm traverses the toplevel
bhtree
. If it encounters a region with more than one body
— i.e., a Cell
node — it checks
whether θ ≥ m / d, where m is the total mass
of the pseudobody associated with the region, and d is the
distance between b and the pseudobody's center of mass. If so,
then it treats the entire region as if it were one large
pseudobody. Otherwise, it recursively performs the computation on the
subquadrants of the region. The base cases are the Empty
and Single(b)
cases, where the algorithm simply invokes
the acceleration
function as in the naive algorithm.
Your Tasks
Your task for the second part of this problem is to implement the
BarnesHut algorithm. You will need to fill in the missing code
in barnes_hut_nbody.ml
.
To submit: barnes_hut_nbody.ml.
Handin Instructions
You may do this problem set in pairs. If you do so, both students are responsible for all components of the assignment. Please write both names at the top of all files if you do it in pairs. Please only have 1 member of the pair hand in the assignment to dropbox.
You must hand in these files to dropbox (see link on assignment page):

README

sequence.ml

inverted_index.ml

apm.ml

plane.ml

naive_nbody.ml

barnes_hut_nbody.ml
Acknowledgments
This assignment is based on materials developed by Dan Licata and David Bindel originally, and modified by Nate Foster.