COS 226 Programming Assignment

Small World Phenomenon

In The Small-World Phenomenon: An Algorithmic Perspective, Jon Kleinberg writes:

Long a matter of folklore, the small-world phenomenon -- the principle that we are all linked by short chains of acquaintances -- was inaugurated as an area of experimental study in the social sciences through the pioneering work of Stanley Milgram in the 1960's. This work was among the first to make the phenomenon quantitative, allowing people to speak of the six degrees of separation between any two people in the United States. Since then, a number of network models have been proposed as frameworks in which to study the problem analytically. One of the most refined of these models was formulated in recent work of Watts and Strogatz; their framework provided compelling evidence that the small-world phenomenon is pervasive in a range of networks arising in nature and technology, and a fundamental ingredient in the evolution of the World Wide Web.
In this assignment you will investigate the six degrees of Kevin Bacon. Two actors or actresses are linked if they appeared in a movie together. The Kevin Bacon number of an actor is the shortest chain of links that leads to Kevin Bacon. For example, Robert De Niro has a Kevin Bacon number of 1 because he appeared in Sleepers with Kevin Bacon. Elvis Presley's number is 2: although Elvis did not appear in any movie with Kevin Bacon, he was in Change of Habit with Edward Asner, and Asner appeared in JFK with Kevin Bacon. Your task is to read in a file containing a list of movies and the actors that appeared in them and compute the Kevin Bacon numbers for each actor. You will then read in a list of actors from standard input and print out a shortest chain of movies that leads each actor back to Kevin Bacon.

Input format. In addition to the massive list of movies and casts from the Internet Movie Database, we include some smaller data files that include only a specific subset of movies, e.g., all movies released in 2000. Each line in the data file consists of a movie title, followed by a list of actors and actresses that appeared in that movie, delimited by the character '/'. Here is an abbreviated example:

Picture Perfect (1997)/Aniston, Jennifer/Bacon, Kevin/Dukakis, Olympia/Mohr, Jay
Planes, Trains & Automobiles (1987)/Bacon, Kevin/Candy, John/Martin, Steve/Robins, Laila
Beach, The (2000)/DiCaprio, Leonardo/York, Daniel/Patarakijjanon, Patcharawan
Use a command-line parameter to enter the name of the movie database file. You will also read in a list of actors from standard input, one per line.

Output format. First, print out a table of the distribution of Kevin Bacon numbers.

Bacon number    Frequency
-------------------------
           0            1
           1         1494
           2       127791
           3       239671
           4        36475
           5         2965
           6          275
           7           39
           8           47
           9           99
          10           15
          11            2
    infinity         9703
Then, for each actor and actress read from standard input, print out their Kevin Bacon number and a shortest chain of movies that connect them to Kevin Bacon. Here's an example.
Dane, Cynthia has a Bacon number of 3
Dane, Cynthia was in "Solstice (1994)" with Scott, Dennis
Scott, Dennis was in "What About Bob? (1991)" with Murray, Bill
Murray, Bill was in "Wild Things (1998)" with Bacon, Kevin