COS 226 Programming Assignment Checklist: Small World Phenomenon

Frequently Asked Questions

What's the maximum number of movies and actors? The Internet Movie Database contains less than 125,000 movies and 500,000 actors. We won't test your program on anything larger than this.

What if there are multiple chains of the same length. Which one do I print out? Yes, in general there will be many paths of the same length from an actor to Kevin Bacon. Print out any one you like.

Do I always use "Kevin Bacon" as the source? Yes, although you're welcome to experiment with others. Apparently, Christopher Lee is the center of the Hollywood Universe. For your final submission, we recommend something like

#define SOURCE "Bacon, Kevin"

What do I print if an actor is not in the same connected component as Kevin Bacon? Print that their Bacon number is "infinity."

How on earth is the reference solution so fast? It takes me much longer just to build the graph. Who said you had to build the graph explicitly? See Problem 17.79 in Sedgewick.

Input, Output, and Testing

Input and output. Here are a number of sample input files. The numbers below are slightly off because we are now editing the data files to remove movies with no actors and remove '/' characters from movie titles.

File Movies Actors Description
input-bacon.txt 46 1,498 With Kevin Bacon
input-top-grossing.txt 187 8,265 Top grossing
input-hero.txt 193 2,513 Contains string "hero"
input-mpaa-g.txt 967 13,850 Rated G by MPAA
input-year2000.txt 4,757 43,940 Released in 2000
input-mpaa.txt 14,192 170,539 Rated by MPAA
input-all.txt 122,938   418,577   All

Our data is taken from the Internet Movie Database. Feel free to download the full 850MB+ database for yourself. In addition to cast lists, it has movie reviews, audio clips, ratings, etc. If you want to create interesting input files (in our format), send them to us. We'll award extra credit and post them for other students to use.

Reference solutions.   For reference, we provide executable code for our solution in Windows, Solaris, and OS X. The program has not been thoroughly tested, so please report any bugs for extra credit. Note that our solutions on input-all.txt does not agree precisely with The Oracle of Bacon because our input file includes TV movies and excludes a few movies that our perl script don't parse.

Submission and readme

We will compile your program with "gcc *.c" so you are free to organize your program as you see fit. As usual, we encourage and reward sound design principles.

Here is a template readme.txt file. It should contain the following information:

  • A high-level description for the design of your algorithm, and what influenced your decisions.
  • Unix users may find the shell script timeit.csh useful for computing a table of CPU times.

    Getting started

  • Download the directory bacon. This directory is mirrored on arizona at /u/cs226/pub/bacon. It contains reference solutions and sample input files.

  • Successfully read in and parse the input lines. You may wish to use parse.c as a starting point.

  • A useful next step is to store the actors in a symbol table, perhaps as in Sedgewick Program 17.10. Print out all of the distinct actors in lexicographic order.
  • Enrichment links

  • The Erdös number is a statistic very similar to the Kevin Bacon number, but it involves academic journal publications and mathematicians instead of movies and actors. Two mathematicians are linked if they have co-authored a journal paper. Paul Erdös was an incredibly prolific 20th century mathematician, and he plays the role of Kevin Bacon in this domain. The Erdös number project is dedicated to this problem.

  • The Black Sabbath number is another variant dealing with rock bands. Two bands are linked if a common musician performed for both groups. (This is not quite analogous to the Kevin Bacon number since here we are linking the analogs of movies instead of actors.) As an example, AC/DC has a Black Sabbath number of 2 since Simon Wright used to be the drummer of AC/DC and Wright also played drums for the band Dio; Ronnie James Dio is the singer for Dio and also sang for Black Sabbath. The Black Sabbath Game is dedicated to this problem.

  • COS 226 Home Page
    wayne@cs.princeton.edu
    Last modified: April 15, 2002