Caml
Power

Assignment 2: Box Office Trivia

Introduction

The main goal of this assignment is to improve your skills at writing recursive functions. We're writing recursive functions over lists, but the main way of thinking about these problems---i.e., assuming that the recursive call "does the right thing" and then using the result of recursion to build a solution to your problem---is applicable to other kinds of recursive data as well.

This is an individual assignment (not an assignment where you pair up).

More concretely, the assignment has two parts. In the first part, you'll solve a series of smaller problems to get some practice with lists. Note that some of these "smaller" problems are actually quite difficult. Feel free to move on to the second part of the assignment before you are done with the first part. You don't need to do things in order. In the second part, you'll produce OCaml code that will help you answer a set of questions about the top-grossing films in movie history (forgive us---our movie data is slightly out of date!).

NOTE: Because the purpose of the assignment is to get practice implementing recursive functions, we ask that in solving the problems in this assignment you do not use the OCaml List library. Normally, using libraries is good style and one could certainly use the list library to solve many of the questions posed effectively. However, we want you to learn about recursion. So please use the basic list operations like "::", and "@" and "[]" and pattern matching/match statements to do your work. Don't use List.map or List.fold or List.sort, etc. (Technically, one might consider "@" to be part of the list library, but we'll let you use it. Otherwise, one of the first things you might do is simply reimplement it as a helper function, which is pretty easy.)

To get started, download the tarball located here. Unzip and untar the code and data by typing the following command at a unix prompt.

$ tar xfz a2.tgz

You should find that the tarball contains the following items.

  • .merlin: This file contains directives for Merlin to find your source code and additional files after you have compiled.
  • part1.ml: This file contains list-processing problems for you to complete as part 1 of the assignment.
  • query.ml: This file is mostly blank. You will write a set of functions to analyze lists of movies as part 2 of the assignment.
  • main.ml: This file contains the driver for the program. Don't modify this file, but you can take a look to see how it's organized. It uses the module Arg to parse command line arguments in a nice way.
  • io.ml: This utility file contains a set of useful routines for parsing and printing movie files. Don't modify it, but feel free to take a look to see how it's organized. If you'd like to parse and print other simple text files using OCaml, this code could be a starting point.
  • README.txt: Report your answers to the movie trivia questions in this file. Also, explain any unusual design decisions or problems you had here. Suggestions for improving this assignment are also welcome.
  • Makefile: Type make all to compile your code and create an executable you can run to help you answer the key boxoffice trivia questions.
  • data: This directory contains the data you need to answer the questions asked on this assignment. The data was downloaded and processed in May 2012, so it will not be up to date -- films like the The Hunger Games will have continued to rack up the dollars over the last few years, and new movies will have been released. That's okay. In order to make it easy to verify your answers, use the data we provide as opposed to more recent data that you download from other sites. There is one file for each movie rating category (G, PG, PG-13, R) and one file for all-time inflation-adjusted gross movies. There are also a couple of small files you can use for testing. You will find that the data in the files comes in an arbitrary order.
  • data.ml: This file contains example data already formatted as Ocaml structures for use in testing. These examples are designed to make testing query.ml easier.
  • run_tests.ml: This file runs the unit tests for part1.ml and query.ml.

Part 1: Lists

Examine the contents of the file part1.ml and answer the questions found therein. This file must typecheck and compile when you hand it in. Please add more tests to the test functions below questions 4, 5 and 6. You can run the test functions by compiling with make test, then running run_tests.native.

Part 2: Box Office Analysis in OCaml

The file query.ml is missing the implementations to many functions that you need to code. See the file for details. Your goal should be to focus first on this file independently of any of the other files. You will use these functions to query just a few data files. However, we will test your functions thoroughly when grading them. There are unit tests are provided for each function, however, these tests are incomplete. You must expand on these tests to make sure that you functions operate correctly on all possible inputs. You can run the test functions by compiling with make test, then running run_tests.native

Since the data files you are working with are relatively small, you should not overly concern yourself with the efficiency of your code -- though needlessly or egregiously inefficient solutions should be avoided. Your main goals are correctness, clarity and good style. Be sure to refer to our style guide.

We suggest you implement the functions in part 2 in the following order.

  • take -- return only the first n elements of a movie list
  • drop -- return everything but the first n elements of a movie list
  • average -- return the average gross of all movies
  • decade -- return all movies produced in a given decade
  • sort -- a polymorphic (selection) sort function
  • sort_by_gross -- sort by gross revenue
  • sort_by_year -- sort by year produced
  • by_studio -- return total gross from all movies produced per studio
  • sort_by_studio -- sort list of studio-gross revenue pairs by gross revenue

Part 3: Answering the Box Office Trivia Questions

Scripting is a kind of functional programming: Scripts take data files (often representing lists) as inputs and produce new data files as outputs. Like functions in a functional program, scripts often compose: you can pipe the output on stdout of one script into the input on stdin of another script.

When you have finished coding and thoroughly testing the functions in query.ml, compile the entire application by typing "make" at a shell prompt in your code directory. To find out what you can do with your script, type the following at a shell prompt.

./boxoffice -help

You should see a list of options you can use. As a simple check to make sure things are working properly, type the following:

./boxoffice -echo < data/trial1.txt

The above command should send the contents of the trial1.txt data file out on to standard output. We also included that test as a part of your Makefile so you can also type the following to check your setup.

make check
Another command you might try is below. What should it do?
./boxoffice -take 1 < data/G.txt
Recall that the pipe operator (vertical bar) allows you to send the output of one command in to the input of another command. With that in mind, what does the following do?
./boxoffice -sort-gross < data/G.txt | ./boxoffice -take 1

Now, take a look inside the Makefile. You will see the clause for compiling boxoffice at the top. At the bottom, you'll see the clause for "topG". If you type:

make topG
you'll see the same thing. Feel free to add your own commands to the file.

To Do: Use your script to answer the questions about boxoffice trivia posed in the README.txt file. In addition to reporting the answers to the questions, report the scripting commands you used to find the answers. Try to make the script do as much work as you can. If possible, use a series of calls to your script to produce only the data you need to answer the question and no more. (This may not be possible.)

Hand-in Instructions

This problem set is to be done individually.

You must hand in these files to TigerFile:

  1. part1.ml -- this file contains the list processing problems
  2. query.ml -- this file contains the movie processing functions
  3. README.txt -- this file contains written answers

Please make sure you submit your solutions, not the blank stubs you downloaded.

Important notes about grading:

  1. Rec: For this assignment, you may add the rec keyword to any of the let statements that define functions if you would like the function to call itself. And in the other direction, a function for which we have given you a let rec needn't call itself, for example, if you have defined a recursive auxiliary function to use instead.
  2. Compile errors: All programs that you submit must type check and compile. Programs that do not compile will be subject to a penalty in addition to the deduction for the bug that causes the failure to compile. If you are having trouble getting your assignment to compile, please visit office hours. If you run out of time, it is better to comment out the parts that do not compile and hand in a file that compiles, rather than handing in a more complete file that does not compile.
  3. Missing functions: We will be using an automatic grading script, so it is crucial that you name your functions and order their arguments according to the problem set instructions, and that you place the functions in the correct files. Otherwise you may not receive credit for a working but misnamed function.
  4. Code style: Finally, please pay attention to style. Refer to the OCaml style guide and lecture notes. Ugly code is not fun to debug, nor to grade. Take the extra time to think through the problems and find the most elegant solutions before coding them up. Good programming style is also required on all the subsequent assignments.