Programming Assignment Checklist: Markov Model of Natural Language


Pair programming. On this assignment, just like the last one, you are encouraged (not required) to work with a partner provided you practice pair programming (same rules as last assignment).

Frequently Asked Questions

What are the main goals of this assignment? Use a graph-like data structure, learn about natural language processing, and learn about Markov chains.

What is the origin of the Markov text generator? It was first described by Claude Shannon in 1948. The first computer version was apparently written by Don P. Mitchell, adapted by Bruce Ellis, and popularized by A. K. Dewdney in the Computing Recreations section of Scientific American. Brian Kernighan and Rob Pike revived the program in a University setting and described it as an example of design in The Practice of Programming. The program is also described in Jon Bentley's Programming Pearls.

How do I read in characters from standard input? Use StdIn.readAll().

Given a string s, is there an efficient way to get a substring? Use s.substring(i, i+k) to get the k-character substring starting at i. In Java, this operation takes constant time, regardless of the size of k.

I get the error foreach not applicable to expresion type when using a foreach loop. What am I doing wrong? Probably nothing. It seems to be a bug with some versions of DrJava. Upgrade to a newer version of DrJava or compile from the command-line and your problem will go away.

What does the following statement in Graph.java do?

private ST<String, SET<String>> st;
It declares a symbol table named st whose keys are strings and whose values are sets of strings. For more details, read Section 4.3 (to learn about generic types) and Section 4.4 (for symbol tables and sets). To create a symbol table whose keys are string and whose values are random queues of strings, use
private ST<String, RandomQueue<String>> st;

Do I need to worry about returning objects of type Iterable<String>? No, you don't need to do this on this assignment.

How do I emulate the behavior of a cyclic string? There are a number of approaches. The easiest might be to concatenate the first k characters to the end of the text.

Should my program generate a different output each time I run it? Yes.

My random text ends in the middle of a sentence. Is that OK? Yes, that's to be expected. You can add a System.out.println(); to the end of the text you generate to pretty it up.

For which values of k should my program work? It's fine to assume k > 0. Naturally, as k gets larger, your program will use more memory and take longer. Also, you can assume k <= L (where L is the length of the input string).

I get an OutOfMemoryException. How do I tell Java to use more of my computer's memory? Depending on your operating system, you may have to ask the Java Virtual Machine for more main memory beyond the 64MB default.

java -Xmx100m TextGenerator 7 1000 < input.txt
The 100m means 100MB, and you should adjust this number depending on the size of the input text.

Why not store the probability of each transition instead of using parallel edges? It's easier to implement this way. Storing the probabilities would be more efficient and more general, but it's not necessary on this assignment.

Can I use Java's built in LinkedList class? Absolutely not!

Submission

readme.txt. Use the following readme file template.

Possible Progress Steps

These are purely suggestions for how you might make progress. You do not have to follow these steps.

  1. Download the directory markov. It contains a number of sample text files.

  2. Create the RandomQueue.java data type. As a starting point, you may wish to review QueueLite.java (a stripped down version of Queue.java).

  3. Create the MarkovChain.java data type. As a starting point, you may wish to review GraphLite.java, (a stripped down version of Graph.java), which relies on ST.java.

  4. Write the client program TextGenerator.java in small steps.

Enrichment