COS 226 Programming Assignment Checklist: WordNet


Frequently Asked Questions

Is a vertex considered an ancestor of itself? Yes, this is the typical convention.

Can a noun appear in more than one synset? Absolutely. It will appear once for each meaning that the noun has. For example, here are all of the glosses associated with synsets that contain word.

a brief statement; "he didn't say a word about it"
a promise; "he gave his word"
a secret word or phrase known only to a restricted group; "he forgot the password"
a unit of language that native speakers can identify; "words are the blocks from which sentences are made"; ...
a verbal command for action; "when I give the word  charge!"
a word is a string of bits stored in computer memory; "large computers use words up to 64 bits long"
an exchange of views on some topic; "we had a good discussion"; ...
new information about specific and timely events; "they awaited news of the outcome"

Can I assume the id numbers will be integers in a small range? Yes, if there are V synsets, the ids will be numbered 1 through V (sorry, not the usual 0 through V-1). Though, they may not appear consecutively in the input file.

Should my program work on datasets other than WordNet? Absolutely. It should work on any datasets in the appropriate format.

Should SAP work if the digraph is not a DAG? Yes, the definition still applies in the presence of directed cycles.

Some of the glosses have example sentences at the end. What is this? That's just part of the gloss.

Any advice on how to read in and parse the data files? Use the readLine() method in our In library to read in the data, one line at a time. Use the split() method in Java's String library to divide a line into fields. Use Integer.parseInt() to convert string id numbers into integers.

In WordNet, what should distance() and sap() return if there is no ancestral path? Return Double.POSITIVE_INFINITY and null, respectively.

In WordNet, what should glosses() return if the noun is not in WordNet? Return an Iterable that has zero items.

What should I do if one of the nouns in Outcast is not in WordNet? We'll only give you nouns that are in WordNet.

I'm an ontologist and I noticed that your hypernyms.txt file contains both is-a and is-instance-of relationships. Yes, you caught us. This ensures that every noun (except entity) has a hypernym. Here is an article on the subtle distinction.

Can I use my own Digraph class? No, it must have the same API as our Digraph class; otherwise, you are implicitly changing the API to SAP (which takes as Digraph in the constructor).

Input, Output, and Testing

Input and output. We encourage you to create your own (possibly pathological) inputs to help test your program. If your datasets create problems for other programs (or ours!), we'll award extra credit. The input should be very small, and it should expose a potential flaw that other programs are likely to face. In your readme.txt, you should describe what the input is testing.

Extra credit. Submit either an interesting example (or corner case) that you used to test your code, preferably a case that arises in the WordNet digraph (and one that uses everyday words). But you can also make up your own small synsets.txt and hypernyms.txt files.

Submission and readme

Here is a template readme.txt file. It should contain the following information:

Possible progress steps

Optional Optimizations

There are a few things you can do to speed up a sequence of SAP computations on the same digraph.