COS 111, Fall 2000: Problem Set 3

COS 111, Fall 2000 - Problem Set 3

Due by 5pm, Tuesday Oct. 10, 1998

Show your work on all problems. Without it, we can't give partial credit.

Problem 1
Consider the following simple compression scheme based on run-length encoding. The scheme encodes a file in a sequence of 10-bit "chunks". In a 10-bit "chunk" we can represent some single bit repeated up to 255 times by writing a leading 0, followed by the bit to repeat, followed by an 8-bit binary number representing the number of repetitions (a run-length encoding). Alternatively, we could in this 10-bit "chunk" represent a 9-bit sequence by writing a leading 1, followed by the 9 bits of the sequence. In this scheme, for more than 9 repetitions of a bit, it makes sense (because it makes the encoded file smaller) to use the run-length encoding.

Consider compressing with this scheme a simple bit sequence consisting of n repetitions of the bit 0 followed by 90 bits of 0's and 1's that have no repetitions longer than 8. What does the value of n have to be to get a compressed representation that is 80% the length of the original sequence?

Let's start out by guessing that n isn't bigger than 255, so that we can represent the repeated zero bits by a single chunk. Then the size of the original (uncompressed) data is n+90 bits, and the compressed data requires 11 chunks (one for the repeated zero bits, and ten for the other 90 bits), or 110 bits in all (since a chunk is ten bits).

To make the compressed file 80% of the size of the original, choose n to solve this equation:

	110 = 0.80 x (n+90)

Expanding out the right-hand side, we get:

	110 = 0.8n + 72

Rearranging terms, we have

	(110-72)/0.8 = n

So n = 47.5.

Problem 2
Brookshear, Chapter One Review Problem 60, pg. 75:
The following message was compressed using LZ77. Decompress the message.

0100101 (4,3,0) (8,7,1) (17,9,1) (8,6,1)

To expand out the (4,3,0), we go back 4 characters and take 3 characters (010), then add a single 0. So we have

01001010100 (8,7,1) (17,9,1) (8,6,1)

Now we go back 8 characters and take 7 characters (0101010), then add a single 1. Now we have

0100101010001010101 (17,9,1) (8,6,1)

Next we go back 17 characters and take 9 characters (001010100), then add a single 1. Now we have

01001010100010101010010101001 (8,6,1)

Finally we go back 8 characters and take 6 characters (101010), then add a single 1. We end up with

010010101000101010100101010011010101

Problem 3
As we discussed in lecture, in some situations it is sufficient to detect errors in data, and in others we need to be able to correct errors. Give one example of a situation where error detection is enough, and one example where error correction is necessary. Justify your answers.

Error detection is enough when there is a backup copy of the data somewhere. For example, when you install software from a CD onto your computer, you would be satisfied with error detection on the copy of the software that is on your computer. If an error was detected, you could just reinstall the software from the CD.

Error correction is necessary when you really need the right answer. For example, if you are storing data in an airplane's flight recorder (the "black box"), you want to use error correction techniques so that you can reconstruct what happened if the flight recorder is damaged.

Problem 4
Suppose we want to store a 64-bit piece of data, and we want to use the parity method to detect errors in the data. One way to do it is to add a single parity bit that holds the parity of the entire 64-bit value. Another way to do it is to divide the 64-bit data into two 32-bit halves, and then to add two parity bits, one for each half. Using two parity bits has the disadvantage that it makes the file slightly bigger (66 vs. 65 bits). Can you think of an advantage that the two-parity-bits method might have over the one-parity-bit method?

The two-parity-bits method can survive some errors that the one-parity-bit method cannot survive. For example, suppose that two bits are corrupted, one in each half of the data. With separate parity bits for each half, we can detect these errors. If we had only a single parity bit for all the data, then we might not notice the double corruption (because the parity bit is guaranteed only to detect one-bit errors).

Back to Schedule and Assignments