Assignment 2: Box Office Trivia


Introduction

The goal of this assignment is to produce O'Caml code that will help you answer a set of questions about the top-grossing films in movie history. To get started, download the tarball located here. Unzip and untar the code and data by typing the following commands at a unix prompt.

$ tar -xfz a2.tar

You should find that the tarball contains the following items.

Task 1: Data Analysis in O'Caml

The file main.ml is missing a bunch of functions that you need to code. See the file for details. Your goal should be to focus first on this file independently of any of the other files. Use the O'Caml top-level environment to fully debug all functions you create first before proceeding with the second part of the assignment. You will use these functions to query just a few data files. However, we will test your functions thoroughly when grading them (and you should test them thoroughly when writing them). They should operate correctly on all possible inputs. At the bottom of the file there is one bit of sample data data4 and one unit test test1. You must create many more such tests to debug your code.

Since the data files you are working with are relatively small, you should not overly concern yourself with the efficiency of your code. However, none of your functions should be more than quadratic in complexity. Your main goals are correctness, clarity and good style. Be sure to refer to our style guide.

For this assignment, you should not use functions from the List module. You should code these functions "from first principles" using recursion.

We suggest you implement them in the following order.

  • take -- return only the first n elements of a movie list
  • drop -- return everything but the first n elements of a movie list
  • average -- return the average gross of all movies
  • decade -- return all movies produced in that decade
  • sort -- a polymorphic (selection) sort function
  • sort_by_gross -- selection sort by gross revenue
  • sort_by_year -- selection sort by year produced
  • by_studio -- return total gross from all movies produced per studio
  • Task 2: Answering the Box Office Trivia Questions

    Scripting is a kind of functional programming: Scripts take data files (often representing lists) as inputs and produce new data files as outputs. Like functions in a functional program, scripts often compose: you can pipe the output on stdout of one script into the input on stdin of another script.

    When you have finished coding and thoroughly testing the functions in query.ml, compile the entire application by typing "make" at a shell prompt in your code directory. To find out what you can do with your script, type the following at a shell prompt.

    ./boxoffice -help
    

    You should see a list of options you can use. As a simple sanity check to make sure things are working properly, type the following:

    ./boxoffice -echo < data/trial1.txt
    

    The above command should send the contents of the trial1.txt data file out on to standard output. We also included that test as a part of your makefile so you can also type the following to check your setup.

    make check
    
    Another command you might try is this one:
    ./boxoffice -take 1 < data/G.txt
    
    What does it do? Recall that the pipe operator (vertical bar) allows you to send the output of one command in to the input of another command. With that in mind, what does the following do?
    ./boxoffice -sort-gross < data/G.txt | ./boxoffice -take 1
    

    Now, take a look inside the makefile. You will see the clause for compiling boxoffice at the top. At the bottom, you'll see the clause for "topG". If you type:

    make topG
    
    you'll see the same thing. Feel free to add your own commands to the file.

    To Do: Use your script to answer some questions about boxoffice trivia. Report your answers in the README.txt file. Also report the scripting commands you used to find the answers. Try to make the script do as much work as you can. If possible, use a series of calls to your script to produce only the data you need to answer the question and no more. (This may not be possible.)

    1. Adjusting for inflation, what is the top-grossing film of all time (use the alltime.txt data)?
    2. What is the 50th ranked R film by gross?
    3. Suppose you had a chance to make 1 film with a top director and the director was so good you were guaranteed that whatever film you made would be in the top 5 grossing films in its ratings category (G, PG, PG-13, R). What rating (G, PG, PG-13, R) would you choose to give your film if you wanted to make the most money?
    4. Taking inflation in to account, would you have preferred to make money off of blockbusters in the 70s or in the 80s?

    Karma (Optional)

    Recall, you are not required to do Karma questions. They will have little if any impact on your grade. They are mostly for fun and a little extra challenge. Be sure to do the other questions first and ensure they are correct.

    Handin Instructions

    This problem set is to be done individually.

    You must hand in these files to dropbox:

    1. query.ml -- this file contains the bulk of your solution
    2. README.txt -- this file contains written answers
    3. io.ml -- this file does not need to be modified at all unless you did the Karma question
    4. main.ml -- this file does not need to be modified at all unless you did the Karma question

    Be sure to include your name and netid at the top of every file.

    Please make sure you submit your solutions, not the blank stubs you downloaded.

    Important notes about grading:

    1. Compile errors: All programs that you submit must type check and compile. Programs that do not compile will likely receive an automatic zero. If you are having trouble getting your assignment to compile, please visit office hours. If you run out of time, it is better to comment out the parts that do not compile and hand in a file that compiles, rather than handing in a more complete file that does not compile.
    2. Missing functions: We will be using an automatic grading script, so it is crucial that you name your functions and order their arguments according to the problem set instructions, and that you place the functions in the correct files. Otherwise you may not receive credit for a function properly written.
    3. Code style: Finally, please pay attention to style. Refer to the O'Caml style guide and lecture notes. Ugly code that is functionally correct may still lose points. Take the extra time to think out the problems and find the most elegant solutions before coding them up. Good programming style is also required for all the subsequent assignments in the rest of the semester.
    4. Late assignments: Please carefully review the course website's policy on late assignments, as all assignments handed in past the deadline will be considered late. Verify on CMS that you have submitted the correct version, before the deadline. Submitting the incorrect version before the deadline and realizing it after the deadline still constitutes a late assignment.