COS 126: Hamming Codes in TOY

COS 126

Hamming Codes in TOY

Programming Assignment

Checklist

Write a TOY program to encode data using Hamming codes. Then write a TOY program to correct encoded data that has been corrupted.

Perspective. Error-correcting codes enable data to be sent through a noisy communication channel without corruption. To accomplish this, the sender appends redundant information to the message, so that even if some of the original data is corrupted during transmission, the receiver can still recover the original message intact. Transmission errors are common and can arise from scratches on a CD, static on a cell phone, or atmospheric interference. In a noisy environment, error-correcting codes can increase the throughput of a communication link since there is no need to retransmit the message if it becomes partially corrupted during transmission. For this reason, error-correcting codes are used in many common systems, such as storage devices (CD, DVD, DRAM), mobile communication (cell phones, wireless, microwave links), and digital television.

Hamming Codes. A Hamming code is a specific type of error-correcting code that allows the detection and correction of single-bit transmission errors. Hamming codes are used in many applications where such errors are common, including DRAM memory chips and satellite communication hardware. Hamming codes work by repeatedly reading four message bits, which we denote by m₁, m₂, m₃, m₄, and then inserting three parity bits, which we denote by p₁, p₂, and p₃. If any one of these seven bits is corrupted during transmission, the receiver can detect the error and recover the original four message bits intact. This is called single-bit error correction because at most one bit can be corrected per unit of data sent. The overhead for using this method is a 75% increase in bandwidth because it requires three extra parity bits for every four message bits. An alternative approach of sending three copies of each bit (and using the one that appears most frequently) results in a 200% increase in bandwidth.

Before we describe the algebra of Hamming codes, we first visualize the calculation of the parity bits using Venn diagrams. As an example, suppose we wish to send the 4-bit message 1101. We associate each of the four message bits with a specific intersection region of three pairwise overlapping circles, as illustrated below:

The Hamming code adds three parity bits so that each of the three circles has even parity.

That is, the sum of the four bits contained in each of the three circles is even:

For this example, the three parity bits are 1 (top), 0 (left), and 0 (right). So, to send a version of the message 1101 that is robust against single-bit errors, the actual message we send is the 7-bit message 1101100.

Now, imagine this picture is transmitted over a noisy communication channel, and that one bit is corrupted so that the following picture arrives at the receiving station (corresponding to 1001100):

The receiver discovers that an error occurred by checking the parity of the three circles. Moreover, the receiver can identify where the error occurred (the second bit) and recover the four original message bits!

The parity check for the top circle and right circle failed, but the left circle was ok. So, the only 7-bit message that balances all three parity checks and has at most one changed bit is the one obtained by flipping m₂, since m₂ represents the intersection of exactly those two circles. Thus the receiver knows that the message originally sent was 1101100—representing the four message bits 1101 plus the three parity check bits 100. This achieves the goal, since the original message bits sent were indeed 1101.

If the center bit m₄ is corrupted, then all three parity checks will fail. If a parity bit itself is corrupted, then only one parity check will fail. If the communication channel is so noisy that two or more bits are simultaneously corrupted, then the scheme will not work. Can you see why? More sophisticated types of error-correcting codes can handle such situations.

Of course, in practice, only the seven bits are transmitted, rather than the Venn diagrams.

Part 1. Write a TOY program encode.toy to encode a binary message using the scheme described above. Repeatedly read 4 bits m₁, m₂, m₃, and m₄ from TOY standard input and write the 7 bits m₁, m₂, m₃, m₄, p₁, p₂, p₃ to TOY standard output. Stop upon reading FFFF from standard input.

p₁ = m₁ ^ m₂ ^ m₄

p₂ = m₁ ^ m₃ ^ m₄

p₃ = m₂ ^ m₃ ^ m₄

Recall that ^ is the exclusive or operator in Java and TOY. This captures the parity notion described above.

Part 2. Write a TOY program decode.toy to correct a Hamming encoded message. Repeatedly read 7 bits m₁, m₂, m₃, m₄, p₁, p₂, p₃ from TOY standard input and write 4 bits to TOY standard output. Stop upon reading FFFF from standard input. Recall, to determine which one, if any, of the message bits is corrupted, perform the parity checks:

p₁ = m₁ ^ m₂ ^ m₄
p₂ = m₁ ^ m₃ ^ m₄
p₃ = m₂ ^ m₃ ^ m₄

Compare the parity bits you computed with the parity bits you received. If they don't match, then some bit got flipped. Here's a summary of what to do with the results:

If exactly zero or one of the parity checks fail, then all four message bits are correct.

If checks 1 and 2 fail (but not check 3), then bit m₁ is incorrect.

If checks 1 and 3 fail (but not check 2), then bit m₂ is incorrect.

If checks 2 and 3 fail (but not check 1), then bit m₃ is incorrect.

If all three checks fail, then bit m₄ is incorrect.

Flip the corrupted message bit (if necessary) and write m₁, m₂, m₃, and m₄ to standard output.

Input format. The input format for encode.toy is a text file that contains the sequence of bits to be transmitted. Each line consists of a sequence of 4 bits, with each bit specified as a 4-digit hexadecimal integer (either 0000 or 0001), separated by whitespace. The last line consists of the single integer FFFF. The input format for decode.toy is the same, except that each line consists of a sequence of 7 bits. Here are input files for encode.toy and decode.toy in the specified format:

% more encode3.txt
0001 0001 0000 0001
0001 0001 0001 0000
0001 0001 0001 0001
FFFF

% more decode4.txt
0001 0001 0000 0001 0001 0000 0000
0001 0000 0000 0001 0001 0000 0000
0001 0001 0000 0000 0001 0000 0000
0001 0001 0000 0001 0001 0000 0001
FFFF

Files for this assignment. The file hamming.zip includes two Java programs that illustrate the encoding and decoding procedures; sample input and output files for both encode.toy and decode.toy; the TOY reference card; the readme.txt template; a sample TOY program; and the TOY simulator TOY.java.

Using the TOY simulator. To execute your TOY program using the command-line simulator TOY.java, type the commands that appear in bold below. You should see the following output:

% java-introcs TOY encode.toy < encode3.txt
0001
0001
0000
0001
0001
0000
0000
0001
0001
0001
0000
0000
0000
0000
0001
0001
0001
0001
0001
0001
0001

% java-introcs TOY decode.toy < decode4.txt
0001
0001
0000
0001
0001
0001
0000
0001
0001
0001
0000
0001
0001
0001
0000
0001

Alternatively, you can use the Visual X-TOY simulator.

Documentation. Include a standard header at the top of each TOY program; comment each line of TOY code with the corresponding pseudocode; optionally, include a comment describing the Java equivalent. See multiply.toy for an example. In the readme.txt file, describe what you used each of the registers for in your programs.

Submission. Submit encode.toy, decode.toy, and a completed readme.txt.

Leaderboard (optional). Submit an "optimized" decode.toy to the leaderboard for class fame and glory.

Optimize decode.toy with the goal of using the fewest (nonzero) words of TOY main memory. Under 40 (in decimal) is very good; under 35 is great; the all-time record is 27.
Optimize decode.toy with the goal of executing the fewest instructions to process a certain input with 1,000 7-bit encoded messages. This is a new challenge this semester, so we're not yet sure what is a good score.