COS 226 Programming Assignment Checklist: WordNet


Frequently Asked Questions

Should SAP work if the digraph is not a DAG? Yes, the definition still applies in the presence of directed cycles.

What data structure(s) should I use to store the synsets? This part of the assignment is up to you. You must carefully select data structures to achieve the specified performance requirements.

I understand how to compute the length(int v, int w) method in time proportional to E + V but my length(Iterable<Integer> v, Iterable<Integer> w) method takes time proportional to a * b (E + V), where a and b are the sizes of the two iterables. How can I improve it to be proportional to E + V? The key is using the constructor in BreadthFirstDirectedPaths that takes an iterable of sources instead of using a single source.

How do I read input directly from a file, without redirecting standard input? Use the In data type. Read pp. 82-83 of the textbook for more details.

Can I read the synset or hypernym file twice? No, file i/o is very expensive so please read each file only once and store it in an appropriate data structure.

Any advice on how to read in and parse the synset and hypernym data files? Use the readLine() method in our In library to read in the data one line at a time. Use the split() method in Java's String library to divide a line into fields. You can find an example using split() in Domain.java. Use Integer.parseInt() to convert string id numbers into integers.

Can I assume the hypernym digraph for WordNet is a DAG? Yes. However, your program should work on any input, not just the synset and hypernym files provided. You may assume that if there are S synsets, then the ids will be numbered 1 through S (sorry, not the usual 0 through S-1). However, there is no guarantee that the id numbers will appear consecutively in the synset file.

Should I re-implement breadth-first search in my SAP class? No, you should call the relevant method(s) in BreadthFirstDirectedPaths.java. You may modify BreadthFirstDirectedPaths.java to optimize your code, but if you do so, rename it, say to DeluxeBFS.java, and submit it.

Can I use my own Digraph class? No, it must have the same API as our Digraph.java class; otherwise, you are changing the API to the SAP constructor (which takes a Digraph argument). Do not submit Digraph.java.

Is a vertex considered an ancestor of itself? Yes.

Can a noun appear in more than one synset? Absolutely. It will appear once for each meaning that the noun has. For example, here are all of the entries in synsets.txt that include the noun word.

37559,discussion give-and-take word,an exchange of views on some topic; "we had a good discussion"; "we had a word or two about it"
50266,news intelligence tidings word,new information about specific and timely events; "they awaited news of the outcome"
60429,parole word word_of_honor,a promise; "he gave his word"
60430,password watchword word parole countersign,a secret word or phrase known only to a restricted group; "he forgot the password"
80883,word,a unit of language that native speakers can identify; "words are the blocks from which sentences are made"; "he hardly said ten words all morning"
80884,word,a brief statement; "he didn't say a word about it"
80885,word,a verbal command for action; "when I give the word  charge!"
80886,word,a word is a string of bits stored in computer memory; "large computers use words up to 64 bits long"

Can a synset consist of exactly one noun? Yes. Moreover, there can be several different synsets that consist of the same noun.

71,Aberdeen,a town in western Washington
72,Aberdeen,a town in northeastern South Dakota
73,Aberdeen,a town in northeastern Maryland
74,Aberdeen,a city in northeastern Scotland on the North Sea

Can two synsets have identical glosses? Yes. In such cases, glosses() should include a copy of each.

24936,barley barleycorn,a grain of barley
24941,barleycorn,a grain of barley

64933,quadrant right_angle,a quarter of the circumference of a circle
64934,quadrant quarter-circle,a quarter of the circumference of a circle

Some of the glosses have example sentences at the end. What is this? The example sentence is considered to be part of the gloss. You shouldn't need to do anything special to handle it.

I'm an ontologist and I noticed that your hypernyms.txt file contains both is-a and is-instance-of relationships. Yes, you caught us. This ensures that every noun (except entity) has a hypernym. Here is an article on the subtle distinction.

In SAP, what should length() and ancestor() do if one (or more) of the input arguments is not an integer between 0 and G.V() - 1? It's not in the API, so we won't test such cases. Throwing a runtime exception is recommended.

In WordNet, what should glosses() do if the input argument is not a wordnet noun? Return an Iterable<String> that has zero elements.

In WordNet, what should distance() and sap() do if one (or both) of the input arguments are not wordnet nouns? The assignment defines the distance to be infinite if either argument is not a wordnet noun. In such cases, distance() should return -1 and sap() should return null, as specified in the API.

What should outcast() do if one (or more) of its arguments is not a noun in WordNet? Or if the array is of size 0? It's not in the API, so we won't test such cases.

How can I make SAP immutable? You can (and should) save the associated digraph in an instance variable. However, because digraphs are mutable, you must first make a defensive copy by calling the copy constructor.

Input, Output, and Testing

Some examples. Here are some interesting examples that you can use in your code.

Possible progress steps

Optional Optimizations

There are a few things you can do to speed up a sequence of SAP computations on the same digraph. Do not attempt to do this or any of your own invention without thoroughly testing your code. You can gain bonus points for correctly implementing some of these optimizations but you risk losing more points than you can gain if you introduce bugs that render your solution no longer correct.

Enrichment