COS 226 Programming Assignment Checklist: WordNet


Frequently Asked Questions

Is a vertex considered an ancestor of itself? Yes.

Can a noun appear in more than one synset? Absolutely. It will appear once for each meaning that the noun has. For example, here are all of the entries in synsets.txt that include the noun word.

37559,discussion give-and-take word,an exchange of views on some topic; "we had a good discussion"; "we had a word or two about it"
50266,news intelligence tidings word,new information about specific and timely events; "they awaited news of the outcome"
60429,parole word word_of_honor,a promise; "he gave his word"
60430,password watchword word parole countersign,a secret word or phrase known only to a restricted group; "he forgot the password"
80883,word,a unit of language that native speakers can identify; "words are the blocks from which sentences are made"; "he hardly said ten words all morning"
80884,word,a brief statement; "he didn't say a word about it"
80885,word,a verbal command for action; "when I give the word  charge!"
80886,word,a word is a string of bits stored in computer memory; "large computers use words up to 64 bits long"

Can I assume the id numbers will be integers in a small range? Yes, if there are V synsets, the ids will be numbered 1 through V (sorry, not the usual 0 through V-1). However, there is no guarantee that the id numbers appear consecutively in the input file.

Should my program work on datasets other than WordNet? Absolutely. It should work on any datasets in the appropriate format.

Should SAP work if the digraph is not a DAG? Yes, the definition still applies in the presence of directed cycles.

Some of the glosses have example sentences at the end. What is this? The example sentence is considered to be part of the gloss. You shouldn't need to do anything special to handle it.

Any advice on how to read in and parse the data files? Use the readLine() method in our In library to read in the data, one line at a time. Use the split() method in Java's String library to divide a line into fields. Use Integer.parseInt() to convert string id numbers into integers.

In WordNet, what should distance() and sap() return if there is no ancestral path? Return Double.POSITIVE_INFINITY and null, respectively.

In WordNet, what should glosses() return if the noun is not in WordNet? Return an Iterable that has zero items.

What should I do if one of the nouns in Outcast is not in WordNet? We'll only give you nouns that are in WordNet.

I'm an ontologist and I noticed that your hypernyms.txt file contains both is-a and is-instance-of relationships. Yes, you caught us. This ensures that every noun (except entity) has a hypernym. Here is an article on the subtle distinction.

Can I use my own Digraph class? No, it must have the same API as our Digraph.java class; otherwise, you are implicitly changing the API to SAP (which takes a Digraph argument in the constructor).

Input, Output, and Testing

Input and output. We encourage you to create your own (possibly pathological) inputs to help test your program. If your datasets create problems for other programs (or ours!), we'll award extra credit. The input should be very small, and it should expose a potential flaw that other programs are likely to face. In your readme.txt, you should describe what the input is testing.

Extra credit. Submit either an interesting example (or corner case) that you used to test your code, preferably a case that arises in the WordNet digraph (and one that uses everyday words). But you can also make up your own small synsets.txt and hypernyms.txt files.

Submission and readme

Here is a template readme.txt file. It should contain the following information:

Possible progress steps

Optional Optimizations

There are a few things you can do to speed up a sequence of SAP computations on the same digraph.