Caml
Power

Problem Set 7: Parallel Sequences

Quick Links

Introduction

In this assignment, we will implement a functional parallel sequence abstraction on top of F#'s parallel Async library. This short assignment will give you a taste of parallel functional programming and an introduction to another functional programming language in the ML family (F#).

You may do this problem set in pairs. If you do so, both students are responsible for all components of the assignment. Please write both names and netids where requested in the files if you do it in pairs.

By the way, your prof has just started to play with F# and it certainly seems like a fun language. It has some advantages over OCaml and some disadvantages. Here are some things I've noticed about F#. It's certainly not a conclusive list and some of these things could change if I gained familiarity with the language and began to adapt my programming style:

  • Pro: The libraries and documentation are great! File and string processing is so easy to get started with. I may start using it as my go-to scripting language! When doing this assignment, I highly recommend googling for help with syntax, programming style and libraries to use. Stack Overflow and Microsoft docs are your friends.
  • Pro: Computation expressions and generators are really fun. Simple, terse ways to generate and manipulate collections.
  • Pro: Parallel run-time system!
  • Con: The module system. I really wanted to reuse signatures and to have functors for this assignment to make it easier to both parameterize clients over several possible sequence implementations and to make several modules with the same signature available at the same time for benchmarking. We circumvented this problem, but, basically, it was more typing (and lots of retyping while we were adjusting the interface we wanted you to implement).
  • Con: Incomplete type inference. F# has a little bit of subtyping in it. For example, there is a seq type, which is a supertype of both array and list types (and probably some others). Sometimes, to resolve the overloading of certain operations, it is necessary to place typing annotations on your functions (OCaml avoids this but that results in a different kind of ugliness, e.g. + and +. and ^ are three separate operators). It also seems that the signature subtyping algorithm (the algorithm that checks to see if a module implementation matches a signature) is occasionally insufficient to figure out that a module really is an instance of a given signature. I found this to be resolved when one adds typing annotations to one's functions in the module. But if you get an unsual type error message, consider placing type annotations in your code to help you resolve it.

Preliminaries

You will need to install F# to complete this assignment. There are many options for using F# with various IDEs, code editors, and command line tools. We are happy to try to help, but we cannot be experts in all of these. Here are some suggestions that we have found to work for us:

F# Installation

  • Mac OS See here. If you wish to use your own editor, follow Option 5 to install F# and Mono. Otherwise, options 1 or 2 will give you IDE options that are tuned to F#.
  • Windows See here. If you wish to use your own editor, follow option 4 to install the F# command line tools. Otherwise, options 1 or 2 will give you IDE options that are tuned to F#.
  • Linux See here and follow Option 1. This is sufficient if you want to use your own editor, but there is also a Linux-compatible offering of Visual Studio Code (Option 1 further down the page) if you would prefer.
  • Note: we do not recommend the COS 326 Virtual Box image for this assignment.

F# Development Environment

  • Visual Studio. Download Visual Studio 2017 Community and select "F# language support"
  • VS Code. Install the package Ionide-fsharp
  • Atom. Install the package Ionide-fsharp. Note: there are differences in configuration depending on how you installed Mono. If installed using brew, Atom may not be able to find Mono (the cryptic error "Critical error: language services could not be spawned"), which can be fixed by chaning the path under the monopath settings for ionide-fsharp from the default to one appropriate for your brew installation (possibly /usr/local/bin, /usr/local/Cellar, /Library/Frameworks/Mono.framework, or others).
  • Sublime. Install FSharp with Package Control
  • Emacs. The consensus seems to be the package fsharp-mode from MELPA. You'll have to configure emacs to use MELPA and then update your package list. We found the "getting started" instructions sufficient.
  • Vim. The consensus seems to be vim-fsharp but we have not yet used it extensively.

Compiling and Running

If you are planning to use an editor other than Visual Studio, you may also want to compile your code using make. You should be able to compile your assignment using the Makefile we included, though some installations may require changing the names for the compiler (e.g. fsc instead of fsharpc), or removing the mono invocations.

  make A7.exe    # compile the benchmarks to compare sequential and parallel
                 # performance, and also the word search application
  make rbench    # run the benchmarks to compare sequential and parallel performance
  make jingle    # find the song with the most occurrences of the word jingle
  make dreidel   # find the song with the most occurrences of the word dreidel
  make flurries  # find the song with the most occurrences of the word flurries

You may also compile the assignment using Visual Studio. In the provided A7.zip file, there is a file named A7.sln in addition to a folder named A7. A7.sln and the A7 folder containing the assignment files need to be in the same folder. Then, you can open A7.sln in Visual Studio.

This will give you a project that will build, and will call Words or Bench based on the number of arguments given.

Obtaining Speedup

In order to obtain the expected speedup from parallel computation, you will need to test this assignment on a multi-core machine. If you do not have a multi-core machine (or if you do, but are running this on a VM that does not have access to multiple cores), you can still implement and test all of the desired functionality. If you have an undergraduate CS account, you can use the CS department machine wash.cs.princeton.edu to measure your speedup.

Parallel Libraries

In this assignment, you will use F#'s Async library. We discussed Async in class and precept. You can find more comprehensive documentation here. There are three functions that you will find particularly useful:

  • Async.Parallel : Async<'a> array -> Async<'a array> Run an array of Async's in parallel
  • Async.StartChild : Async<'a> -> Async< Async< 'a > > Start running an Async now and return a handle to the running background thread that when forced using let!, waits for the background process to continue.
  • Async.RunSynchronously : Async<'a> -> 'a Run an async computation now and wait until it completes
Here are a couple of examples:
// create an async computation:
let work n = async { ... }

// run several async computations in parallel:
   [| work 0; work 1; work 2 |] 
|> Async.Parallel
|> Async.RunSynchronously
// given two functions, f and g, which when
// applied to arguments x and y respectively
// return async computations, run both
// async computations in parallel
let bothAsync f x g y =
    async { 
           let! cf = Async.StartChild (f x)
           let! resg = g y
           let! resf = cf
           return (resf, resg)
          }

Part 1: Parallel Sequences

A sequence is an ordered collection of elements. One way to represent a sequence is using a list. That's not a bad representation for a short sequence, but it's a terrible representation for a large one. In this problem, you will develop another representation for sequences. Your representation will be built using arrays and will use F#'s Asyncs to implement a few of the sequence operations in parallel.

The following table summarizes the most important operations in the sequence module and their complexity requirements. If the table says "Sequential" in the "Implementation" column, then you are expected to provide a Sequential implementation (even in the parallel version of the code). For instance, "iter" is sequential -- we often want to use that to print out the results in order, so the natural parallel version will have the wrong semantics.

In your README.txt file, briefly explain the work and the span of your parallel implementation of each of tabulate, map, reduce. State any reasonable assumptions you make in doing so. For example, reasonable assumptions include the fact that any hard-coded sequential cutoff that you use involves work and span that are small relative to the overall computation. You may assume that array allocation and initialization are constant time operations (even though they aren't).

Function Description Implementation
tabulate Create a sequence of length n where each element i holds the value f(i) Parallel
from_array Create a sequence from F#'s built-in array type. Note that you would like to guarantee that your sequences are immutable data types. Consequently, you will need to copy elements out of the input array to create your sequence. Sequential
to_array Create an array from a sequence. Here again, you should ensure that if the client writes to the array they receive, that will not cause a modification to an object of type sequence (which should be immutable). Sequential
iter Iterate through the sequence applying function f on each element in order from smallest to largest element of the sequence. Useful for debugging Sequential
length Return the length of the sequence Constant time
nth Get the nth value in the sequence. Indexing is zero-based. Constant time
map Map the function f over a sequence Parallel
reduce Fold a function f : 'a -> 'a -> 'a over the sequence. The result of reduce f z [a; b; c] should be f (f (f z a) b) c. Additionally, f must be associative. You should not assume z is an identity for f. In other words, f (f z a) b is not necessarily the same as f a b. Parallel

Implementation Details

Your goal will be to write parallel versions of the functions tabulate, map, and reduce and to obtain a speedup when executing these functions on large inputs on multicore machines. Because the overhead of starting and managing parallel threads is relatively high, one will only get a speedup if the amount of work is high (which is generally okay -- if there is not much work, one doesn't need a parallel implementation anyway!).

The Bench.fs application contains several artificial workloads to test your sequence implementation and compare it with the sequential implementation. Note that it is plausible that you will see very little speedup (the speedup you get will depend on a wide variety of factors, ranging from how you implemented your data structure to the number of cores on your machine). If you find you aren't getting much speedup, you can increase the amount of work by altering the benchmarks (ie, by increasing the amount of work they do or the size of the sequence). If you can't get any speedup, even on large workloads, you may need to reconsider your implementation strategy. In particular, every thread must be given enough work. That means that successful implementations will almost certainly need to implement a "sequential cutoff." In other words, when sequence sizes are small, one will need to use a sequential algorithm rather than a parallel one. If one is implementing a divide-and-conquer parallel algorithm, dividing up the problem into chunks that are too small may cause your parallel algorithm to take much longer than the sequential version, even if it is using multiple cores!

Array library: You may represent sequences however you would like, but one obvious choice is to use arrays (at least in part) and the array library. Notice that one quirk of the (very limited) F# module system is that if your implementation (.fs file) contains a simple abbreviation such as

type 'a t = 'a array
then you will get an error message if your interface file (.fsi file) makes the type abstract such as:
type 'a t
To circumvent this issue, you have to create a type abbreviation that has at least one constructor as follows:
type 'a t = Rep of 'a array

Yes, this is pretty weird. Only objects or data types can be abstract types. Few languages are perfect. :-)

Please note that the sequence API must be a functional API. If your underlying representation involves a mutable data structure such as an array, you must make a copy of the array when implementing functions such as map. You should not destructively over-write elements of an array when performing a map. If you avoid such imperative overwrites, client code can freely use your implementation in parallel applications without having to worry about synchronization and the extreme debugging headaches it usually entails.

Chunking: In class, most of the algorithms we covered were binary divide-and-conquer algorithms. i.e.: We cut a problem into 2 halves and then recursively solved each of those problems in half, and then recursively cut the problem in half again and again. You can take this approach if you'd like. Another approach is to generate a whole array of M Async tasks at once (where M is much greater than 2, for instance). One can then use Async.Parallel to process all of the tasks at once. Feel free to compare and contrast approaches to see if you notice a performance difference. If you notice a difference, tell us about it and the experiment that you used to investigate the situation in your README.

Async.RunSynchronously Cost: This may be a costly method. You should probably only execute this method once for each call to map, tabulate or reduce. For example, your solution is unlikely to be very efficient if you use this method recursively on every recursive call to a recursive divide-and-conquer algorithm.

Your Task: The details

We have provided a complete interface for sequences as well as a reference implementation based on lists in Sequential.fs. This implementation is only meant to be a guide to the functional behavior expected of the module. The reference implementation should be useful for debugging. You can also use this implementation to complete parts the word counting part of the assignment (Words.fs and Counter.fs) without first finishing your parallel sequence module. The word counting part of the assignment works on a very small amount of data so it is unlikely you will actually get a speed up, but you could try it on some big repository of files in your spare time (feel free to let us know on Piazza and in your README if you generate a data set that you think will be interesting for other students to try out. We believe you are going to need to process *a lot* of files to get a speedup. You could check out data.gov to find some interesting data to analyze.

To test your sequence implementation, we encourage you to use Bench.fs. (We got roughly a 2x speed up on most of the benchmarks on a Mid 2012 2GHz Intel Core i7 laptop --- your experience on your machine may vary widely! Do not worry if you do not see a similar speedup). Your results may vary depending on your computer's hardware, whether you are running in a VM, etc. Implementations with unnecessary data copying, which leads to poor performance will receive less credit.

We also highly encourage you to write a top-level testing file yourself to test for correctness. You do not need to hand in such a file.

You do need to hand a very brief write-up in your README that describes the order of growth of the WORK and the SPAN of the parallel operations you have implemented.

To submit: Parallel.fs, README.txt

Part 2: A Word Count Application

The following application is designed to test out your sequence implementation and illustrate the "Map-Reduce" style of computation, as described in Google's influential paper on their distributed Map-Reduce framework.

A common prelude to indexing the web is to count words of various kinds. This simple application implements a word count over a directory of files. For example, if words.exe is the generated executable (and you are using mono to execute the program), then:
mono words.exe data/holidays jingle
will count all occurrences of the word "jingle" in all of the files in the directory data/holidays. If it can't find any occurrences of jingle, it should say:
jingle was not found in any files in data/holidays
If it can find occurrences of jingle in some of the files, it should print out the file name with the most occurrences of jingle and print out the number of occurrences. e.g.:
file jinglebells.txt had 25

Your Task

This application will involve the files Words.fs (the main file), Counter.fs, and the Sequence implementation. Note that Words.fs and Counter.fs both refer to the Sequence module. Also notice that Sequence is just a wrapper for either Parallel or for Sequential sequence modules. Initially, we have set things up so that Sequence is a wrapper for Sequential. You can toggle between sequential and parallel implementations by commenting/uncommenting lines 6/7 of Sequence.fs. (Though it will not make much difference in performance because we are not processing much data. Once again, if you can supply a large data set for which the parallelism will give you a nice speedup, please let your prof know! We'd be glad to have it.)

Complete the function counter in the file Counter.fs, making use of the Sequence implementation.

To submit: Counter.fs.

Handin Instructions

You must hand in these files to TigerFile (see link on assignment page):

  1. README.txt
  2. Parallel.fs
  3. Counter.fs

Acknowledgments

Many of the ideas behind this assignment pertaining to sequences were developed by Dan Licata and David Bindel originally, and modified by Nate Foster and Ryan Beckett. Nick Giannarakis helped develop the current version for F#.