Programming Assignment Checklist: Data Compression

Frequently Asked Questions

What are the goals of this assignment? To learn about binary trees and data compression.

Do I need to use a symbol table on this assignment? No. However, note that the encoding table is really a symbol table that maps from characters to sequences of 0s and 1s. To encode the message, you would build this symbol table, and then compress the message by replacing each character with the corresponding sequence of 0s and 1s. But your task on this assignment is to decode, so you don't need to build an explicit symbol table.

Can there be whitespace characters in the preorder representation of the tree? Yes, if the original message uses whitespace, then the tree will contain whitespace characters. You should not need to do anything special to handle them.

How do I detect the newline between the preorder and the sequence of 0s and 1s? The easiest way is to simply ignore anything other than 0s and 1s when traversing the tree: if 0 go left, if 1 go right, otherwise ignore.

The .pre file seems to have a trailing newline character after all of the 0s and 1s. Why is this? Unfortunately, all files end with a newline. When decoding, ignore any characters other than 0s or 1s as in the prevous question. If you get the error message "Error: tried to read from empty input stream", this is the likely cause. Be sure to check if (!CharStdIn.isEmpty()) before reading each character.

How do I compare characters? If c is of type char, then if (c == '*') is true if and only if c represents the asterisk character.

If I invoke the method preorder with tree.preorder(), how can I access the invoking object tree from with the method preorder? The statement PrefixTree x = this; stores a reference to the invoking object in the variable x.

Do I have to organize my program the way it is outlined below. No, you are free to organize your program as you see fit. However, we will compile your program with java PrefixTree and execute it with java PrefixTree < input.pre.

Can I assume the uncompressed message will have at least two characters? Yes.

Do I need to format my output exactly as specified in the assignment? No, but you should put the symbol encoding table first, then the uncompressed message, then the statistics. Don't worry about lining things up perfectly.

How come System.out.print(c) doesn't work when c is a char? Java is a bit inconsistent here. The println method supports primitive input types, but print does not. Instead, use string concatenation to ensure that you're sending a string, e.g., System.out.print(c + "").

Input, Output, and Testing

Input.   Here are some sample input files (with extension .pre). The input file consists of two parts. The first part is the preorder traversal of the tree. The second part is the string of 0's and 1's. There will be a single newline character separating the two parts.

Execution.   Execute with the following command:

java PrefixTree < monalisa.pre
To avoid seeing the text whiz by, use
java PrefixTree < monalisa.pre | more

Submission

Submission. Your programs must be named PrefixTree.java. Don't forget to hit the "Run Script" button on the submission system to test that it compiles cleanly.

readme.txt. Use the following readme file template.

Possible Progress Steps

These are purely suggestions for how you might make progress. You do not have to follow these steps.

  1. Do the assigned readings and exercises on binary trees, especially Sedgewick 5.6-5.7. Program ParseTree.java is a version of Sedgewick Program 5.20 that is quite relevant. It's deceptively simple, so be sure you understand what's going on before continuing.

  2. Download the directory prefix to your system. It contains some sample data.

  3. Write the PrefixTree constructor to read in the preorder traversal (from standard input) and create the corresponding binary tree. Hint: think recursively. Use CharStdIn.java to read in the input, one character at a time. You will know you're done building the tree when every internal node (i.e., node that stores '*') has two non-null children. This is the trickiest code of the assignment (a recursive constructor), but not necessarily the hardest to write (at most 8-10 lines).

  4. To test your constructor, implement the recursive method public void preorder() to traverse the tree in preorder and print out the contents of each node. It should be very similar to the toString method in ParseTree.java. Check that you get back the original by including the following main in PrefixTree.java.
    public static void main(String[] args) {
       PrefixTree tree = new PrefixTree();
       tree.preorder();
    }
    

  5. Now, modify preorder to print out only the external nodes (nodes containing a character other than '*'). Modify it again to keep track of the encoding itself. (You will need to make preorder take a string argument.) String concatenation is your friend here. The finished function should not be alot of code (7-10 lines).

  6. Now, write the method public void uncompress(). There is no need for recursion here, and it will only make your program harder to write. Instead use two nested while loops: the inner loop iterates from the top of the tree to the bottom; the outer loop makes this process repeat until there are no more bits left to read in. To avoid annoying issues with the first and last newline characters, ignore any characters read in besides '0' and '1'.

Enrichment

See Project Gutenberg for an amazing selection of free etexts. You'll notice that the zip'd versions are roughly half the size of the corresponding raw text versions. The method of data compression is very similar to the one you are using on this assignment!


COS 126 Home Page
wayne@cs.princeton.edu