Project 5 - Buffer Overflows


CS3410

Due: 11:59pm, Tuesday, November 1, 2016

Late Policy: Standard

Reminder: you must work alone for this project.


Setting Up Your Linux Environment

In this project you should be using the course VM.

Files. To obtain the files in your VM, open a terminal and type:

$ cd
$ wget http://www.cs.cornell.edu/courses/cs3410/2016fa/projects/p5/p5.zip
$ unzip p5.zip

Now, run sudo apt-get install libreadline5 unzip libc6-i386. Enter y when prompted.

$PATH.The first cd put us in our $HOME directory, but we need to update our $PATH. This is how you add these files to your $PATH so that your shell knows where to find the simulate program and all of the mipsel-linux tools. This means that you can just type simulate as opposed to /home/vagrant/mips-sim/binary/simulate every time you want to run something. From here forward, we assume you are on the course VM, the user is vagrant. If you decide not to work on the VM you need to change that.

$NETID.This project requires a NETID environment variable to be set.

In the above commands, if your netID was abc123, then you would have executed export NETID=abc123, without the less than / greater than signs.

The Project. Now that you have followed the directions from the environment page you now have a folder p5 in your home directory. You are able to work on this project where ever you want, now that you have added the executables to your $PATH from the environment directions.

$ cd <preferred_location>
$ mv ~/p5 .
$ cd p5

Overview

The goal of this project is to get intimately familiar with the layout and use of call stacks, as well as MIPS machine language, assembly and disassembly, debugging, and reverse engineering. As a side benefit, we hope to raise your awareness of computer security issues. To this end, you will write a buffer overrun exploit to break a program that we provide to you.

WARNING: These kinds of friendly hacking challenges have a long history, and hacking skills are priceless, as they reflect a deep understanding of the operation of a computer system. But you must be responsible and use your skills wisely. Taking over machines or hacking the Internet carries stiff penalties, is a sure-fire way to get expelled from Cornell, interferes with other people's lives, and is a waste of your talent. It is also plain wrong.

The Story

In this homework, you will "0wn" a binary program called browser that we will provide to you. We will not be providing the source code for this program. All that you know about this program is what is documented here, and what you can figure out for yourself by running or examining the binary. The browser is a simplified web browser. The normal operation of browser is very simple. When executed, it prompts you for a URL, and then prints a simple message (the '$' shown here is the linux shell prompt):

$ simulate browser
Where to connect? www.google.com
Connected to www.google.com!

I can also send input to browser from another program using the linux shell '|' operator, with the same results:

$ echo "www.google.com" | simulate browser
Where to connect?
Connected to www.google.com!

However, this browser was supplied to you by Google, which has decided it is finally time for them to abandon their long-time motto of "Don't be evil"!!! The browser only lets you connect to www.google.com. All other URLs will be rejected — try it and see!

The rumor is that browser suffers from a buffer overflow vulnerability. Since the program only takes one input, it's not difficult to guess where the problem might lie. You are a typical college student wanting to use social networking tools to connect with your friends. Thus, you would like to to get this browser to let you connect to Facebook, even though Google originally designed it to only allow access to its own website.

0wning browser. Your job is to craft some input to browser that will cause it to print out a different message, specifically: "LOL 0wn3d! <netid> is on Facebook!" (substitute your own NetID)1. The fact that the normal "Only www.google.com is allowed" message is missing constitutes proof that you have completely subverted the browser, and have gotten it to do something that it could not do before.

$ cat exploit | simulate browser
Where to connect?
LOL 0wn3d! hw342 is on Facebook!

To do this, you will need to inject new code into the browser program as it is running. You are not allowed to modify or replace the browser program on disk. The only way you get to interact with browser is to feed it some carefully crafted input.

The simulator. The browser program is compiled to run on a MIPS CPU. Since most of you don't have access to a real MIPS CPU (neither do we), you will not be able to natively execute the program. Instead, you can run a program which takes browser and simulates the execution of the code. The simulate program we provide is just such a simulator. (Another popular simulator is spim, but it won't work for this assignment: spim only deals with assembly code, rather than actual binary machine code as our in-house simulator does.)

To figure out how to attack browser, you'll need to step through its code as it is executing and reverse engineer the parts that matter, namely, where (i.e., at which memory location) the input buffer is stored, what the values are that lie near them in memory, and what precise instruction sequence is vulnerable to a buffer overflow attack. Since you have the MIPS binary, you can use various mipsel-linux tools, like mipsel-linux-objdump and mipsel-linux-readelf, to disassemble the browser binary and learn about its layout and code.

You can also use the "-d" option to the simulator, which starts an interactive debugger for the simulated program execution. This lets you step through the execution one instruction at a time, examine memory and the stack contents, and so on. See the README file in /home/vagrant/mips-sim/ for help using the simulator and it's built-in debugger.

Stack Randomization. Note that in a feeble effort to thwart just such attacks, the simulator, like many real machines, implements stack randomization, a limited kind of program layout randomization. When the simulator starts, it initializes the stack to a variable address, rather than the standard 0x7FFFFFFC. The starting location of the stack is derived from the $USER or $NETID environment variable. Since the $USER on the course VM is vagrant which is different from your NetID, then it is very important that you set the environment variable $NETID properly before running the simulator. (See the directions at the top)

Executing the Attack. Once you have figured out the program and stack layout, you need to come up with a carefully crafted input that will take over browser. This input will likely contain some binary data (the attack payload) that corresponds to MIPS instructions you want to have executed. There are several tools you might want to use to create the payload and inject it into the running browser: a MIPS assembler (e.g. mipsel-linux-gcc, Logisim's "MIPS Rom" assembler, web-based MIPS assemblers, or yourself with the book's green reference card) to convert from MIPS assembly into MIPS machine language; xxd for converting text files containing hex digits to (or from) raw binary files; and cat for sending raw binary input to browser.

Once your attack causes browser to print the "LOL 0wn3d! <netid> is on Facebook!" message, the browser program should exit gracefully. It is trivial to make it loop forever. A clean exit only takes a few extra instructions to invoke the normal exit() routine.

Command-line arguments:

Standard convention is for main(int argc, char **argv) to be invoked with the name of the program itself as argv[0]. In order to support this, the simulator pushes the name of the MIPS program file on to the stack before invoking main. You may have noticed this file name appears near the top of the stack.

Important: An unfortunate consequence is that the stack alignment of main and all other functions depends on how you invoke the program. There are ways to avoid this.

Option 1: Always be sure to invoke the simulator using the exact same file name argument for the browser file. In other words, whatever you are using as the last argument to the simulate program, you should always use as the last argument.

So if you invoke the simulator like this:

 $ simulate browser

Then any of these is acceptable as well, and will not change the stack alignment:

$ /home/vagrant/mips-sim/bin/simulate browser
$ simulate -d browser
$ /home/vagrant/mips-sim/bin/simulate -d -i exploit.binary browser
$ cat exploit.txt | xxd -r -p | simulate browser

But any of these will cause the stack to have a different alignment:

 $ simulate ./browser
 $ simulate /home/vagrant/Desktop/p5/browser
 $ simulate ~/browser
 $ simulate p5/browser

Option 2: Use the "-noargs" option of the simulator which will always invoke main with some constant-length arguments regardless of what you write on the command line. Using the "-noargs" option, you can invoke simulator any way you like and it will always use the same stack layout:

 $ simulate -noargs browser
 $ simulate -noargs ./browser
 $ simulate -noargs ~/browser
 $ simulate -noargs /home/vagrant/Desktop/p5/browser

Although this is how real programs do work, this extra wrinkle in the assignment isn't intended to add extra difficulty to it. You may specify in your documentation what path to browser is appropriate for your solution, and we will honor it when testing.

Without any specification, we will assume that "browser" was used, or that the "-noargs" flag was used. You should prefer one of these two options unless you have already tested extensively with another option before reading this paragraph. This should never happen, since every student reads this handout carefully before starting the assignment, right? :)

What to submit

Submit your raw binary exploit file containing the specially crafted input. We will try it out on our own copy of browser to see if it successfully breaks it.

Also submit a text document that briefly explains the exploit file. A text listing from xxd of the bytes in your exploit file, annotated with a few comments to explain what it is doing (or trying to do) is sufficient documentation. We will use this document for partial credit and for understanding your exploit.

Tools

Here are a few tools you might find useful for this homework.

xxd is a tool for converting back and forth between raw binary files and text representations of the binary data. For example, if I create a file exploit.txt (using a regular text editor) specifying twenty-eight consecutive "bytes" in hex:

68 77 33 34 32 20
00 00 00 00 00 00 00 00 00
00
01 02 03 04
aa bb cc dd
11 22 33 44

then I can convert this into raw binary using xxd in "reverse plain" mode:

$ xxd -r -p exploit.txt > exploit
$ ls -l exploit*
-rw-r--r-- 1 hw342 hw342 28 2011-02-25 12:06 exploit
-rw-r--r-- 1 hw342 hw342 84 2011-02-25 12:06 exploit.txt

You can see that the text version is 84 bytes (includes spaces and 2 digits of text per "byte"), and the raw binary version is exactly 28 bytes (one byte per byte, of course). xxd is extremely picky about the format of the input file in "reverse" mode (spaces at the ends of lines silently mess things up, for example). So you may want to convert the raw file back to text and compare to your desired bytes to make sure nothing went wrong:

$ xxd exploit
0000000: 6877 3334 3220 0000 0000 0000 0000 0000  hw342 ..........
0000010: 0102 0304 aabb ccdd 1122 3344            ........."3D

mipsel-linux-objdump can give you a listing of the assembly code for browser:

$ mipsel-linux-objdump -xdl browser

Pipes and redirection, you may recall, are shell command line operators that let you connect the output of one program (say cat or xxd) to the input of another program or to a file. So you can, for example, concatenate two text files using cat, send the resulting text as input to xxd -r -p, send the resulting raw binary to the simulated browser, then send the resulting output to a file output.txt, all using a single command:

$ cat exploit_part1.txt exploit_part2.txt | xxd -r -p | simulate browser > output.txt

Epilogue

We're here to help. Take advantage of our office hours if you are stuck.

For an entertaining (and a somewhat dated) read on buffer overflow attacks, check out:

Aleph One. Smashing the Stack for Fun and Profit. Phrack Magazine, 7(49), November 1996.
http://www.phrack.org/issues.html?issue=49&id=14

And finally, to reiterate: a friendly hacking challenge can be fun, and hacking skills are invaluable for working with real systems. But you must be responsible for your own behavior. We are not giving you free reign to launch attacks on CMS, fellow students' machines, or any anything else. Such behavior is unethical and most likely illegal as well.

FAQ


Syscalls
As you should be able to tell from the disassembled code for browser, the SYSCALL instruction works pretty much just like a regular function call. The only difference is that before invoking, $v0 must contain a numeric code for which operating system service should be invoked: 1 for "printi", 2 for "prints", 3 for "putc", and so on. These numeric codes are not standard, but are specific to the cs3410 simulator. Other simulators you might find on the web will use their own numeric codes.

You need the newlines!

Yes, you need the newlines both before and after the "LOL 0wn3d!" message. Of course getting the message in the first place is worth the most points, but the newlines will get you those final few points.

So, an exploit that looks like this:

$ simulate browser < pht24-soln
Where to connect?
LOL 0wn3d! pht24 is on Facebook!
MIPS program exits with status 0 (approx. 8105 instructions in 249071 nsec at 32.32540 MHz)

... is preferable to an exploit that looks like this:

$ simulate browser < pht24-bad
Where to connect?  LOL 0wn3d! pht24 is on Facebook! MIPS program exits with status 0 (approx. 8105 instructions in 252959 nsec at 32.32040 MHz)

As you may have discovered, you can't simply embed a newline or carriage return in the message, because the browser stops reading when it encounters these characters. Something more clever is called for.


Aha! I found this handy vertical tab (0x0b) character! I can just use that instead of a newline, right?

No, a vertical tab is not a newline. You must embed a newline into the message.


Why doesn't my exploit work when I try to use it with the debugger?

The short answer: Use the command simulate -i exploit -d browser

The long answer:

You are likely trying to use one of these two commands:

simulate -d browser < exploit
cat exploit | simulate -d browser

This works just fine without the debugger (simulate browser), but what happens when we run it with the debugger? It takes the contents of the file named exploit and pipes that to the standard input of simulate -d browser... but what input does simulate -d browser expect? The debugger is expecting a command from the user (step, continue, info, etc.). Your exploit code is being used as a debugger command, which confuses the debugger and causes it to exit without executing the browser. The -i flag to simulate will solve this problem.

Don't forget that you can find out about the -i flag and other potentially useful flags by running simulate without any arguments.


Is there a way to get around the command-line arguments limitation?

There are ways to make your program resistant to changes in stack layout. These clever exploits work when the stack starts in some small region, instead of only working for one fixed location. If you implement such an exploit, feel free to brag about it in your documentation for extra credit!

Finally, there is a way to make your program work with any arbitrary stack layout. We'll leave this one for the adventurous. If you find this exploit, again, specify clearly in your documentation what we need to do to see this awesome exploit in action, and you will be awarded more extra credit.