9. Hamming Codes in Toy

Download Project Zip | Submit to TigerFile

Goals

To learn about error-correcting codes.
To learn to program in a machine language.
To appreciate the programming and debugging advantages of a structured programming language like Java over machine language.

Getting Started

Download the project zip file for this assignment from TigerFile , which contains the files you will need for this assignment, which contains:
- two Java programs (HammingEncoder.java and HammingDecoder.java) that illustrate the encoding and decoding procedures;
- sample input and output files;
- TOY reference card;
- the readme.txt template;
- the acknowledgments.txt template;
- multiply.toy TOY program;
- Visual X-TOY simulator;
- TOY simulator TOY.java
This is a partner assignment. Instructions for help finding a partner and creating a TigerFile group can be found on Ed.
The rules for partnering:
- Choose a partner whose skill level is close to your own - only two partners per group.
- Your partner does not have to be in your precept.
- The rules for partnering are specified on the course syllabus. Make sure you read and understand these rules, and please post on Ed if you have questions. In your acknowledgments.txt file, you must indicate that you adhered to the COS 126 partnering rules.

Background

Error-correcting codes enable data to be sent through a noisy communication channel without corruption. To accomplish this, the sender appends redundant information to the message, so that even if some of the original data is corrupted during transmission, the receiver can still recover the original message intact. Transmission errors are common and can arise from scratches on a CD, static on a cell phone, or atmospheric interference. In a noisy environment, error-correcting codes can increase the throughput of a communication link since there is no need to retransmit the message if it becomes partially corrupted during transmission. For this reason, error-correcting codes are used in many common systems, such as storage devices (CD, DVD, DRAM), mobile communication (cell phones, wireless, microwave links), and digital television.

A Hamming code is a specific type of error-correcting code that allows the detection and correction of single-bit transmission errors. Hamming codes are used in many applications where such errors are common, including DRAM memory chips and satellite communication hardware. Hamming codes work by repeatedly reading four message bits, which we denote by \( m_1, m_2, m_3, m_4 \), and then inserting three parity bits, which we denote by \(p_1, p_2, p_3\). If any one of these seven bits is corrupted during transmission, the receiver can detect the error and recover the original four message bits intact. This is called single-bit error correction because at most one bit can be corrected per unit of data sent. The overhead for using this method is a 75% increase in bandwidth because it requires three extra parity bits for every four message bits. An alternate approach is sending three copies of each bit (and using the one that appears most frequently), which results in a 200% increase in bandwidth.

Approach

Before we describe the algebra of Hamming codes, we first visualize the calculation of the parity bits using Venn diagrams. As an example, suppose we wish to send the 4-bit message 1101. We associate each of the four message bits with a specific intersection region of three pairwise overlapping circles, as illustrated below:

The Hamming code adds three parity bits so that each of the three circles has even parity:

That is, the sum of the four bits contained in each of the three circles is even:

For this example, the three parity bits are 1 (top), 0 (left), and 0 (right). So, to send a version of the message 1101 that is robust against single-bit errors, the actual message we send is the 7-bit message 1101100.

Now, imagine this picture is transmitted over a noisy communication channel, and that one bit is corrupted so that the following picture arrives at the receiving station (corresponding to 1001100):

The receiver discovers that an error occurred by checking the parity of the three circles. Moreover, the receiver can identify where the error occurred (the second bit) and recover the four original message bits!

The parity check for the top circle and right circle failed, but the left circle was ok. So, the only 7-bit message that balances all three parity checks and has at most one changed bit is the one obtained by flipping \(m_2\), since \(m_2\) represents the intersection of exactly those two circles. Thus the receiver knows that the message originally sent was 1101100—representing the four message bits 1101 plus the three parity check bits 100. This achieves the goal, since the original message bits sent were indeed 1101.

If the center bit \(m_4\) is corrupted, then all three parity checks will fail. If a parity bit itself is corrupted, then only one parity check will fail. If the communication channel is so noisy that two or more bits are simultaneously corrupted, then the scheme will not work. Can you see why? More sophisticated types of error-correcting codes can handle such situations.

Of course, in practice, only the seven bits are transmitted, rather than the Venn diagrams.

Requirements

Write two programs encode.toy and decode.toy.
Each TOY instruction must have corresponding pseudocode. (This is auto-generated by Visual X-Toy – see below.)
It is good practice to add line breaks between logically related sections of TOY code.
Write a comment above each section explaining what that code does.
The third column is used for your own comments. Write comments as you write your code. This will help you understand your design as well as help during office hours. Hint: use the Java code from HammingEncoder.java and HammingDecoder.java as comments.

TOY Program: `encode.toy`

Write a TOY program encode.toy to encode a binary message using the scheme described above. Repeatedly read four (4) bits \( m_1, m_2, m_3, m_4 \) from TOY standard input and write the seven (7) bits \( m_1, m_2, m_3, m_4, p_1, p_2, p_3 \) to TOY standard output. Stop upon reading FFFF from standard input and then output FFFF to standard output.

\(p_1 = m_1 \wedge m_2 \wedge m_4\)
\(p_2 = m_1 \wedge m_3 \wedge m_4\)
\(p_3 = m_2 \wedge m_3 \wedge m_4\)

Recall that ^ is the exclusive or operator in Java and TOY. This captures the parity concept described above.

TOY Program: `decode.toy`

Write a TOY program decode.toy to decode and correct a Hamming encoded message. Repeatedly read seven (7) bits \( m_1, m_2, m_3, m_4, p_1, p_2, p_3 \) from TOY standard input and write the four (4) correct bits \( m_1, m_2, m_3, m_4 \) to TOY standard output. Stop upon reading FFFF from standard input and then output FFFF to standard output. Recall, to determine which one, if any, of the message bits is corrupted, perform the parity checks:

\(p_1 = m_1 \wedge m_2 \wedge m_4\)
\(p_2 = m_1 \wedge m_3 \wedge m_4\)
\(p_3 = m_2 \wedge m_3 \wedge m_4\)

Compare the parity bits you computed with the parity bits you received. If they do not match, then some bit was flipped. Here’s what to do with the results:

If exactly zero or one of the parity checks fail, then all four message bits are correct.
If checks 1 and 2 fail (but not check 3), then bit \(m_1\) is incorrect.
If checks 1 and 3 fail (but not check 2), then bit \(m_2\) is incorrect.
If checks 2 and 3 fail (but not check 1), then bit \(m_3\) is incorrect.
If all three checks fail, then bit \(m_4\) is incorrect.

Flip the corrupted message bit (if necessary) and write (only) \( m_1, m_2, m_3, m_4 \) to standard output.

Input format

The input format for encode.toy is a text file that contains the sequence of bits to be transmitted. Each line consists of a sequence of four (4) bits, with each bit specified as a 4-digit hexadecimal integer (either 0000 or 0001), separated by whitespace. The last line consists of the single integer FFFF, representing the end of the file. If there are no bits to be transmitted, the file contains only the single integer FFFF.

The file encode3.txt contains three lines of 4-hex sequences, each representing a 4-bit sequence:

> more encode3.txt
0001 0001 0000 0001
0001 0001 0001 0000
0001 0001 0001 0001
FFFF

The 4-hex sequence 0001 0001 0000 0001represents the 4-bit sequence 1101.
The 4-hex sequence 0001 0001 0001 0000 represents the 4-bit sequence 1110.
The 4-hex sequence 0001 0001 0001 0001 represents the 4-bit sequence 1111.

The input format for decode.toy is similar, except that each line consists of a sequence of seven (7) hex digits, each representing a 7-bit sequence. For example:

> more decode4.txt
0001 0001 0000 0001 0001 0000 0000
0001 0000 0000 0001 0001 0000 0000
0001 0001 0000 0000 0001 0000 0000
0001 0001 0000 0001 0001 0000 0001
FFFF

You can assume the input files are in the prescribed format. The input files will contain only a sequence of 0000’s and 0001’s, followed by FFFF. If there are no bits to be decoded, the file contains only the single integer FFFF.

In summary, input files for encode.toy will have a multiple of four 0000’s and 0001’s; input files for decode.toy will have a multiple of seven 0000’s and 0001’s; and the last line consists of the single integer FFFF, representing the end of the file.

Using the Visual X-TOY Simulator

Use the Visual X-TOY simulator by executing the following command in the IntelliJ terminal:

> java -jar toy.jar

The Visual X-TOY simulator provides lots of useful development features. There are two major modes: edit and debug. Use Mode > Edit Mode to edit .toy programs and Mode > Debug Mode to debug or run .toy programs. In Debug Mode, use the Reference, Stdin, Stdout and Core tabs on the right to help debug your program.

To open a TOY program from a file, use Select File > Open, then browse to find the program, e.g., multiply.toy.

To edit a TOY program, use Select Mode > Edit Mode. You can type your program directly in the edit window. Use multiply.toy as a reference for formatting and commenting.

To redirect standard input from a file, in Edit Mode, select Mode > Load File to Stdin.

To execute a TOY program, use Select Mode > Debug Mode. Then click the Run button at the bottom. You can enter data on standard input by typing the value in the Stdin tab and clicking the Add button. You will have to click the Run button to resume execution. You can see the results on standard output in the Stdout tab.

Common TOY programming issues

The program counter starts at 10 so make sure that you place your code starting at memory address 10.
Be sure that each line follows the required format XX: XXXX.
Check for duplicated line numbers or out-of-order line numbers.
Check for using a letter like O or I instead of the number 0 or 1.
Remember that all values, memory addresses, and arithmetic are in hex. For example, memory address 1A follows 19. If you don’t specify the value for a memory address, that memory address will have the default value of 0000, which is a HALT instruction.
Watch out for jump statements — if you insert a new line of code between existing lines, you will have to renumber all the following lines. Also, be sure to update any jump statements that have hardwired line numbers.
Be very careful to update both your comments and your actual code at the same time! It is very common to update only the pseudocode and not the machine instruction.
All registers are global variables. One way to be sure that a register is not used for two different purposes is to fill in the question in the readme.txt about registers before or during debugging.

Possible Progress Steps

These are purely suggestions for how you might make progress. You do not have to follow these steps. If you get stumped or frustrated on some portion of the assignment, you should not hesitate to consult a preceptor.

Open the Visual X-Toy Simulator and experiment with some of the example programs.
1. Try the sample programs Add, Echo, and Sum by going to File > Open Example Programs.
2. Open and experiment with multiply.toy, which is included in the project folder.
3. Try executing some of the programs from precept.
Read, compile, and run the reference solutions HammingEncoder.java and HammingDecoder.java. Use these reference solutions as a design for your TOY programs.
As you write your TOY programs, remember:
- The first column contains the TOY instruction in hex.
- The second column contains the TOY pseudo-code.
- The third column is used for your own comments. Write comments as you write your code. This will help you understand your design as well as help during office hours, grading, etc.
In X-Toy, create a new file (File > New).
Save the file (File > Save) as encode.toy in your Hamming assignment project folder.
Incrementally implement and test encode.toy, using HammingEncoder.java as a guide. Remember that as you revise your program, you will need to edit your TOY statement numbers accordingly.
- Write TOY code that reads a number from standard input. If the number is negative, the program shall halt. (Note, the only valid negative input is FFFF.) Otherwise, it prints the number and inputs the next number.
- Edit your TOY program from step (a) so that it now reads in sequences of four numbers, halting if the first number is negative. Otherwise, it writes the four numbers and inputs the next sequence of numbers.
- Edit your TOY program from step (b) so that it now computes the parity bits, and writes the parity bits.
Incrementally implement and test decode.toy, using HammingDecoder.java as a guide.

Testing

We provide several input test files of different sizes for both encode.toy and decode.toy, as well as the answers for each. Note that for convenience, the answer files list each input line and just a shortened version of the corresponding outputs.

Encoding

input file	output file
`encode0.txt`	`encode0-answer.txt`
`encode1.txt`	`encode1-answer.txt`
`encode3.txt`	`encode3-answer.txt`
`encode16.txt`	`encode16-answer.txt`

Decoding

input file	output file
`decode0.txt`	`decode0-answer.txt`
`decode1.txt`	`decode1-answer.txt`
`decode3.txt`	`decode3-answer.txt`
`decode4.txt`	`decode4-answer.txt`
`decode6.txt`	`decode6-answer.txt`
`decode16.txt`	`decode16-answer.txt`
`decode128.txt`	`decode128-answer.txt`

To test your TOY program using the command-line TOY.java simulator, type the commands below into the IntelliJ embedded terminal. You should see the following output - one bit (Toy hex word) per line:

> java-introcs TOY encode.toy < encode3.txt
0001
0001
0000
0001
0001
0000
0000
0001
0001
0001
0000
0000
0000
0000
0001
0001
0001
0001
0001
0001
0001
FFFF

> java-introcs TOY decode.toy < decode4.txt
0001
0001
0000
0001
0001
0001
0000
0001
0001
0001
0000
0001
0001
0001
0000
0001
FFFF

You can then compare the output to the corresponding answer file.

The most complete way to test your TOY programs is to encode and decode all possible inputs. Use the sample inputs encode16.txt and decode128.txt to do this.

Complete `readme.txt`

Your comments in your Toy program should help you answer these questions.

`encode.toy` registers

Describe what, if anything, you use each of the registers for in encode.toy.

`decode.toy` registers

Describe what, if anything, you use each of the registers for in decode.toy.

Submission

Submit to TigerFile : encode.toy, decode.toy, completed readme.txt and acknowledgments.txt files.

Grading

Files	Points
encode.toy	16
decode.toy	22
readme.txt	2
Total	40

Leaderboard - Optional

Submit an optimized decode.toy to the Hamming Leaderboard for class fame and glory. Optimize decode.toy with the goal of using the fewest (non-zero) words of TOY main memory. Under 40 (in decimal) is very good; under 35 is great; the all-time record is 27. You may work with your partner or by yourself.

Submit to Leaderboard | View Current Leaderboard

Also

Upload a nickname.txt file that contains the name of your group. Please be respectful with the nickname.
Upload a leaderboard.txt file that describes your approach.

Enrichment

Error correcting codes are used in CDs. You can drill a 2.5mm hole through it, and it will still play. Only 1/3 of data is music.

Error correcting codes are also widely used on noisy analog channels such as radio links. This enables you to reduce the amount of power needed to transmit signals.