Computer Science 111 Problem Set 5: ----------------- Due date: November 13, 2001 In this assignment, we will look at the basic ideas underlying steganography. You will work out the details of encoding a secret message in an image. The details will build upon ideas that you developed on assignments and in labs during the first part of the course. Steganography is a technique is said to have been used by terrorists in the past and may be currently used by the Al Qaeda. See the New York Times article for further details about steganography. For our version of steganography, we will do a simple encoding. You are to create a message and encode it into a binary representation which you can represent by hexadecimal digits. You must show the details of your work and also send an email message with the hexadecimal digits to cs111@princeton.edu. We will then use your code to modify a picture. You will be able to view the changes to the picture brought about by your code and decide for yourself whether the changes are noticeable. Step 1 -- Create a message of between 150 and 200 characters that you wish to transmit. Your message should consist only of the characters a-z (ie no punctuation, no capital letters, no numbers). Step 2 -- Give an encoding of the alphabet so that each character is stored in 1 byte (ie 2 hexadecimal digits). Write your message in this encoding. Step 3 -- Use the compression ideas we discussed to compress your message. When you build your dictionary, you can use the numbers 1-9 to represent repeated strings of characters. You can also use the character # to separate words in the dictionary and the character * to separate the dictionary words from the compressed text. Explain how you made the choices needed to compress your message. Step 4 -- Add representations for # and * to your encoding of Step 2 and write your compressed message as a sequence of hexadecimal digits in this encoding. Step 5 -- Now you are ready to create the message to be hidden. Your message will start for a prefix of 37 bytes. These will be the bytes that represent the letters a through z, followed by the bytes that represent the numbers 0 through 9 and then the byte for # and the byte for *. Of course, each byte will be written as 2 hexadecimal digits. Following this prefix you will have the bytes corresponding to the compressed version of your message, so a stream of bytes representing your dictionary followed by a stream of bytes representing your message. Step 6 -- We now consider a picture that could be displayed on your web page. Remember that a picture consists of a stream of bytes representing color intensities. We now change the picture so that there are only 128 intensities for each color, rather than 256. These intensities will be 0, 2, 4, 6, ... , 254. So, each byte of color intensity might have to be changed by 1 to give the even number next to it. We can do this by adding or subtracting 1 from the original color byte. Remember that a byte of color corresponding to an even number will always end in 0. So, we have transformed the colors so that they all have bytes in which the last bit is irrelevant. Imagine that the image is 80x60 pixels in size. Each pixel is stored in 3 bytes. In each of these bytes, the last bit is 0. A hexadecimal digit involves 4 bits. Put all of this together and compute how large a message could be stored by using these bits. We now, take our message (as hexadecimal digits from Step 5) and encode it as bits (using the extra bits as described above). Tell how we might do this encoding. We put the encoding into the picture by storing the bits of the message into the last bits of the color bytes. Explain why we can do this and why it will have little effect on the image. We will do the work of doing this encoding for you since it requires a tool that is too difficult for you to write at this stage of the course. Submit your hexadecimal digits of Step 5 by sending them in an email message to cs111@princeton.edu with the Subject line ``Assignment 5''. We will then build a web page of paired images and you will be able to see if the images are different and if you can tell that a message has been encoded. Hand in the rest of the assignment as you normally do by bringing it to class or by putting it into the homework box before 5 PM on November 13.