The purpose of this assignment is to help you learn (1) how programs are represented in machine language, (2) how stack frames are structured in memory, and (3) how programs can be vulnerable to buffer overrun attacks.
You may work with one partner on this assignment. You need not work with a partner, but we strongly prefer that you do. Your preceptor will help you with your partner search, if you so desire.
Your partner must be from your precept. Sorry, no exceptions. So make sure you find a partner as soon as you can. After all, if your precept has an odd number of students, then someone must work alone; don't let that person be you!
If you work with a partner, then only one of the partners should submit files. Your
readme file, your
memorymap file, and your source code files should contain both your name and your partner's name.
"Part A+" (as defined below) is the "on your own" part of this assignment. That is, when doing that part you may consult with your partner, but must not consult with the course instructors, lab TAs, Piazza, etc., except perhaps to clarify requirements. As noted below, that part is worth 10% of this assignment.
We will provide a program, both source code (
grader.c) and executable binary code (
grader). The file
grader was produced from
grader.c using the
gcc217 command with the
-static options. The
-static option commands the linker to do "static" linking rather than the default "dynamic" linking. Chapter 7 of Computer Systems: A Programmer's Perspective (Bryant & O'Hallaron) describes static vs. dynamic linking. Static linking could make your analysis of the
grader program simpler, and ensures that each function of the program resides at the same address on both fez and fedora.
The program asks you your name, and prints out something like this (where the
user input and
program output are indicated by font style):
$ grader What is your name? Bob D is your grade. Thank you, Bob.
However, the author of the program inexplicably has forgotten to do bounds-checking on the array into which the program reads the input, and therefore the program is vulnerable to attack.
Precepts will explain such buffer overrun (alias buffer overflow) attacks. The paper entitled Detection and Prevention of Stack Buffer Overflow Attacks by Kuperman et al. also does so. That paper is on electronic reserve at Princeton's library.
Your task is to attack the given program by exploiting its buffer overrun vulnerability. More specifically, your job is to provide input data to the program so that it prints something more like this:
$ grader < data What is your name? A+ is your grade. Thank you, Bob.
As you can see from reading the program, it is designed not to give anyone an A+ under any circumstances. However, it is programmed sloppily: it reads the input into a buffer, but forgets to check whether the input fits. This means that a too-long input can overwrite other important memory, and you can trick the program into giving you an A+.
This assignment has five parts:
Copy this sentence to your
readme file, and fill in the blanks so the sentence is correct:
"If you were to use a buffer overrun attack to knowingly gain unauthorized access or to cause damage to other people's computers, the Computer Fraud and Abuse Act provides a maximum penalty of ______ years in prison for a first offense. However, the creator of the Melissa virus plea-bargained down to ______ months in prison."
grader executable binary file that we have provided you, and use gdb to analyze its sections:
$ gdb grader (gdb) x/72i readString
Copy the resulting 72 lines of text into a text file named
memorymap, and then annotate the code to explain what's going on. You should use the source code in
grader.c as a reference, and indeed your annotation should consist of showing how the machine code corresponds to the C code. You don't need an annotation for every line of machine code, but every line of C code (or equivalent "flattened" C code) should be present as an annotation. Your analysis of the text section should be of the form:
Line of (flattened) C code Line of corresponding assembly language code Line of corresponding assembly language code ... <blank line> Line of (flattened) C code Line of corresponding assembly language code Line of corresponding assembly language code ...
Your analysis of the text section should begin with a few lines that state where the
readString function's parameter (
s) and local variables (
c) are stored. Are they stored in registers? If so, in which registers? Are they stored in memory? If so, in which sections of memory?
$ gdb grader (gdb) print &grade (gdb) print/x grade
Place a table in your
memorymap file showing the layout of the data section. The table should have three columns: Address (in hex), Contents (in hex), and Description. The table should contain only one row.
$ gdb grader (gdb) print &name
Place a table in your
memorymap file showing the layout of the bss section. The table should have three columns: Address (in hex), Contents (in hex), and Description. The table should contain one row for each element of the
At the start of program execution, the contents of the
name array will be zeros. Later during program execution, the
name array will contain more interesting data.
You should compose your memory map of the bss section before you implement Part A (as described below), that is, before you implement your "A attack." The table in your memory map should show the contents of the
name array as you wish it to appear near the end of program execution during your A attack. Thus your memory map of the bss section will serve as a plan for your A attack. Then while you're debugging your A attack, you should make sure that the bss section indeed contains that data that it should contain, as indicated by your memory map.
For your sake, it's fine to add another column to your memory map showing the contents of the
name array as you wish it to appear near the end of program execution during your A+ attack (as described below). But the assignment doesn't require you to do that.
buf. Issue these commands to do that:
$ gdb grader (gdb) break *readString+77 (gdb) run Type a name (gdb) print $esp (gdb) print $ebp (gdb) x/??xb $esp (where ?? is the appropriate number of bytes)
Place a table in your
memorymap file showing the layout of the stack frame. The table should have three columns: Address, Contents (in hex), and Description. Each address should be expressed as a positive offset relative to the ESP register. The table should contain one row for each byte in the
readString function's stack frame, from the first byte pointed to by ESP through the last byte of the return address. The table should show the contents of the
readString function's stack frame when the function has read "normal" data.
Write a C program named
createdataC.c that produces a file named
dataC, as short and simple as possible, that causes the
grader program to generate a segmentation fault. Explain its principles of operation in one sentence as a comment within your
createdataC.c program should write to the
dataC file; it should not write to
Write a C program named
createdataB.c that produces a file named
dataB, as short and simple as possible, that causes the
grader program to print your name and recommend a grade of "B". You can see by reading the program that, if your name is Andrew Appel, this is very easy to do. But probably your name isn't Andrew Appel! Explain its principles of operation in one sentence as a comment within your
createdataB.c program should write to the
dataB file; it should not write to
Write a C program named
createdataA.c that produces a file named
dataA, as short and simple as possible, that causes the
grader program to print your name and recommend a grade of "A". Explain its principles of operation in a few sentences as a comment within your
createdataA.c program should write to the
dataA file; it should not write to
Write a C program named
createdataAplus.c that produces a file named
dataAplus, as short and simple as possible, that causes the
grader program to print your name and recommend a grade of "A+". Explain its principles of operation in a few sentences as a comment within your
createdataAplus.c program should write to the
dataAplus file; it should not write to
For parts B, A, and A+ feel free to truncate your name(s) if necessary.
On some versions of Linux, every time the program is executed the initial stack pointer is in a different place. This makes it difficult to make an attack in which the return address points into the same data that was just read into the buffer on the stack. (Indeed, that is the purpose of varying the initial stack pointer!) However, you will note that the data is copied from
name. You'll find that
name is reliably in the same place every time you (or we) run the program.
On some versions of Linux, executing instructions from the bss section causes a segmentation fault. The purpose of this is to defend against buffer overrun attacks! The
mprotect call in our sample program is to disable this protection. You're not required to understand or explain how this line works. Note, however, that this mechanism (even if we didn't disable it) would not defend against the "C" or "B" attacks.
If you work hard, you could create a data input that will exploit the buffer overrun to take over the grader's Linux process and do all sorts of damage. DO NOT DO THAT! Any deliberate attempt of that sort is a violation of the University's disciplinary code, and also is a violation of the Computer Fraud and Abuse Act (see part F above).
Create your programs on hats using
/u/cos217/Assignment6 contains the
grader files. It also contains a simple
makefile that you might find useful during development.
readme text file that contains:
A description of whatever help (if any) you received from others while doing the assignment, and the names of any individuals with whom you collaborated, as prescribed by the course "Policies" web page.
The text from Part F.
(Optionally) An indication of how much time you spent doing the assignment.
(Optionally) Your assessment of the assignment: Did it help you to learn? What did it help you to learn? Do you have any suggestions for improvement? Etc.
(Optionally) Any information that will help us to grade your work in the most favorable light. In particular you should describe all known bugs.
Submit your work electronically on hats via the commands:
submit 6 createdataC.c createdataB.c createdataA.c createdataAplus.c submit 6 memorymap readme
When we grade this assignment, we will take the recommendation of the
grader program into account. But that will not be the only criterion. In particular, see the grade percentages noted above.
We will grade your code on quality from the user's and programmer's points of view. From the the user's point of view, your code has quality if it successfully implements the attacks described above.
From the programmer's point of view, your code has quality if it is well styled. In part, good style is defined by the
critTer tools, and by the rules given in The Practice of Programming (Kernighan and Pike) as summarized by the Rules of Programming Style document. Each C program should contain file, function, and local comments as appropriate, as well as an explanation of the program's principles of operation. For this assignment (only) it is acceptable to use "magic numbers" in your C programs as long as they are well commented. To encourage good coding practices, we will deduct points if
gcc217 generates warning messages.
While debugging your attacks you might find it useful to use
gdb to step through the execution of the
grader program at the machine language level. These commands are appropriate for doing that:
gdb maintains a "display list." You can issue the
display command to place items on the display list. Typically you place variables on the display list. At each pause in execution,
gdb displays the values of all those variables. Thus the display list is a handy way to track the values of variables throughout the execution of your program.
display command tells
gdb to place the EIP (instruction pointer) register on the display list. Thus at each pause in execution
gdb will display the contents of the memory to which EIP points. That is, at each pause in execution
gdb will display the instruction that is about to be executed.
Moreover, that particular
display command tells
gdb to use the
i (instruction) format when displaying. Thus at each pause in execution
gdb will interpret the contents of the memory to which EIP points as a machine language instruction, and will display that instruction in assembly language.
In short, that command tells
gdb to display the instruction that is about to be executed.
As you know, in
step command (abbreviated
s) executes the next line of C code. Since the
grader executable binary file was built without the
gdb has no knowledge of how the machine language instructions of the
grader executable binary file correspond to lines of C code. So the
step command is useless when analyzing the
grader executable binary file.
Instead you can use the lower-level
stepi command. The
stepi command (abbreviated
gdb to execute the next machine language instruction.
This assignment was written by Andrew Appel with contributions by Robert M. Dondero, Jr.