CS3410
Due: 11:59pm, Tuesday, November 1, 2016
Late Policy: Standard
Reminder: you must work alone for this project.
In this project you should be using the course VM.
Files. To obtain the files in your VM, open a terminal and type:
$ cd $ wget http://www.cs.cornell.edu/courses/cs3410/2016fa/projects/p5/p5.zip $ unzip p5.zip
Now, run sudo apt-get install libreadline5 unzip libc6-i386
. Enter y
when prompted.
$PATH.The first cd
put us in our $HOME
directory, but we need to update our $PATH.
This is how you add these files to your $PATH
so that your shell knows where to find the
simulate
program and all of the mipsel-linux tools. This means that you can just type
simulate
as opposed to /home/vagrant/mips-sim/binary/simulate
every time you want to run something.
From here forward, we assume you are on the course VM, the user
is vagrant
. If you decide
not to work on the VM you need to change that.
bash
(which is what the course VM uses), to avoid having to re-set your $PATH
every time you log in to your VM, you can do the following just once:
$ cd $ echo 'export PATH=${PATH}:/home/vagrant/mipsel-linux/bin:/home/vagrant/mips-sim/bin' >> .bashrc $ . .bashrc
$NETID.This project requires a NETID
environment variable to be set.
bash
, to avoid the need to do this everytime you log into the VM, you can do the following just once:
$ cd $ echo 'export NETID=<netid>' >> .bashrc $ . .bashrc
In the above commands, if your netID was abc123
, then you would have executed export NETID=abc123
,
without the less than / greater than signs.
The Project. Now that you have followed the directions from the environment page you now have a folder p5
in
your home directory. You are able to work on this project where ever you want, now that you have added the executables
to your $PATH
from the environment directions.
$ cd <preferred_location> $ mv ~/p5 . $ cd p5
The goal of this project is to get intimately familiar with the layout and use of call stacks, as well as MIPS machine language, assembly and disassembly, debugging, and reverse engineering. As a side benefit, we hope to raise your awareness of computer security issues. To this end, you will write a buffer overrun exploit to break a program that we provide to you.
WARNING: These kinds of friendly hacking challenges have a long history, and hacking skills are priceless, as they reflect a deep understanding of the operation of a computer system. But you must be responsible and use your skills wisely. Taking over machines or hacking the Internet carries stiff penalties, is a sure-fire way to get expelled from Cornell, interferes with other people's lives, and is a waste of your talent. It is also plain wrong.
In this homework, you will "0wn" a binary program called browser
that we will provide to you. We will
not be providing the source code for this program. All that you know about this program is what is
documented here, and what you can figure out for yourself by running or examining the binary. The browser
is a simplified web browser. The normal operation of browser
is very simple. When executed, it prompts you for a URL, and then prints a simple message (the '$' shown here
is the linux shell prompt):
$ simulate browser Where to connect? www.google.com Connected to www.google.com!
I can also send input to browser
from another program using the linux shell '|' operator, with the same
results:
$ echo "www.google.com" | simulate browser Where to connect? Connected to www.google.com!
However, this browser was supplied to you by Google, which has decided
it is finally time for them to abandon their long-time motto of "Don't be
evil"!!! The browser only lets you connect to www.google.com
. All
other URLs will be rejected — try it and see!
The rumor is that browser
suffers from a buffer overflow vulnerability. Since the program only takes one
input, it's not difficult to guess where the problem might lie. You are a
typical college student wanting to use social networking tools to connect
with your friends. Thus, you would like to to get this browser to let you
connect to Facebook, even though Google originally designed it to only
allow access to its own website.
0wning browser
. Your job is to craft some input to browser
that will cause it to print out
a different message, specifically: "LOL 0wn3d! <netid> is on Facebook!" (substitute your own
NetID)1. The fact that the normal "Only www.google.com is allowed" message is missing constitutes proof
that you have completely subverted the browser, and have gotten it to do something that it could not do before.
$ cat exploit | simulate browser Where to connect? LOL 0wn3d! hw342 is on Facebook!
To do this, you will need to inject new code into the browser
program as it is running. You are not
allowed to modify or replace the browser
program on disk. The only way you get to interact with
browser
is to feed it some carefully crafted input.
The simulator. The browser
program is compiled to run on a MIPS CPU. Since most of you don't have
access to a real MIPS CPU (neither do we), you will not be able to natively execute the program. Instead, you can run
a program which takes browser
and simulates the execution of the code. The simulate
program
we provide is just such a simulator. (Another popular simulator is spim
, but it won't work for this
assignment: spim
only deals with assembly code, rather than actual binary machine code as our in-house
simulator does.)
To figure out how to attack browser, you'll need to step through its code as it is executing and reverse engineer
the parts that matter, namely, where (i.e., at which memory location) the input buffer is stored, what the values
are that lie near them in memory, and what precise instruction sequence is vulnerable to a buffer overflow attack.
Since you have the MIPS binary, you can use various mipsel-linux tools, like mipsel-linux-objdump
and
mipsel-linux-readelf
, to disassemble the browser
binary and learn about its layout and code.
You can also use the "-d" option to the simulator, which starts an interactive debugger for the simulated program
execution. This lets you step through the execution one instruction at a time, examine memory and the stack contents,
and so on. See the README file in /home/vagrant/mips-sim/
for help using the simulator and it's built-in
debugger.
Stack Randomization. Note that in a feeble effort to thwart just such attacks, the simulator, like many
real machines, implements stack randomization, a limited kind of program layout randomization. When the simulator
starts, it initializes the stack to a variable address, rather than the standard 0x7FFFFFFC
. The starting
location of the stack is derived from the $USER
or $NETID
environment variable. Since the $USER
on the
course VM is vagrant
which is different from your NetID, then it is very important that you set the environment
variable $NETID
properly before running the simulator. (See the directions at the top)
Executing the Attack. Once you have figured out the program and stack layout, you need to come up with a
carefully crafted input that will take over browser
. This input will likely contain some binary data (the
attack payload) that corresponds to MIPS instructions you want to have executed. There are
several tools you might want to use to create the payload and inject it into the running browser
: a MIPS assembler
(e.g. mipsel-linux-gcc
, Logisim's "MIPS Rom" assembler, web-based MIPS assemblers, or yourself with the
book's green reference card) to convert from MIPS assembly into MIPS machine language; xxd
for converting
text files containing hex digits to (or from) raw binary files; and cat
for sending raw binary
input to browser
.
Once your attack causes browser
to print the "LOL 0wn3d! <netid> is on Facebook!" message, the browser program should exit gracefully. It is trivial to make it loop forever. A clean exit only takes a few extra
instructions to invoke the normal exit()
routine.
Standard convention is for main(int argc, char **argv)
to be
invoked with the name of the program itself as argv[0]
. In order to
support this, the simulator pushes the name of the MIPS program file on to the
stack before invoking main. You may have noticed this file name appears near
the top of the stack.
Important: An unfortunate consequence is that the stack
alignment of main
and all other functions depends on how you invoke
the program. There are ways to avoid this.
Option 1: Always be sure to invoke the simulator using the
exact same file name argument for the browser
file. In other words,
whatever you are using as the last argument to the simulate
program,
you should always use as the last argument.
So if you invoke the simulator like this:
$ simulate browser
Then any of these is acceptable as well, and will not change the stack alignment:
$ /home/vagrant/mips-sim/bin/simulate browser $ simulate -d browser $ /home/vagrant/mips-sim/bin/simulate -d -i exploit.binary browser $ cat exploit.txt | xxd -r -p | simulate browser
But any of these will cause the stack to have a different alignment:
$ simulate ./browser $ simulate /home/vagrant/Desktop/p5/browser $ simulate ~/browser $ simulate p5/browser
Option 2: Use the "-noargs" option of the simulator which will always invoke main with some constant-length arguments regardless of what you write on the command line. Using the "-noargs" option, you can invoke simulator any way you like and it will always use the same stack layout:
$ simulate -noargs browser $ simulate -noargs ./browser $ simulate -noargs ~/browser $ simulate -noargs /home/vagrant/Desktop/p5/browser
Although this is how real programs do work, this extra wrinkle in the
assignment isn't intended to add extra difficulty to it. You may specify in
your documentation what path to browser
is appropriate for your
solution, and we will honor it when testing.
Without any specification, we will assume that "browser" was used, or that the "-noargs" flag was used. You should prefer one of these two options unless you have already tested extensively with another option before reading this paragraph. This should never happen, since every student reads this handout carefully before starting the assignment, right? :)
Submit your raw binary exploit file containing the specially crafted input. We will try it out on our own copy of
browser
to see if it successfully breaks it.
Also submit a text document that briefly explains the exploit file. A text listing from xxd
of
the bytes in your exploit file, annotated with a few comments to explain what it is doing (or trying to do) is
sufficient documentation. We will use this document for partial credit and for understanding your exploit.
Here are a few tools you might find useful for this homework.
xxd
is a tool for converting back and forth between raw binary files and text representations of
the binary data. For example, if I create a file exploit.txt
(using a regular text editor) specifying
twenty-eight consecutive "bytes" in hex:
68 77 33 34 32 20 00 00 00 00 00 00 00 00 00 00 01 02 03 04 aa bb cc dd 11 22 33 44
then I can convert this into raw binary using xxd
in "reverse plain" mode:
$ xxd -r -p exploit.txt > exploit $ ls -l exploit* -rw-r--r-- 1 hw342 hw342 28 2011-02-25 12:06 exploit -rw-r--r-- 1 hw342 hw342 84 2011-02-25 12:06 exploit.txt
You can see that the text version is 84 bytes (includes spaces and 2 digits of text per "byte"), and the raw
binary version is exactly 28 bytes (one byte per byte, of course). xxd
is extremely picky about the format
of the input file in "reverse" mode (spaces at the ends of lines silently mess things up, for example). So you may
want to convert the raw file back to text and compare to your desired bytes to make sure nothing went wrong:
$ xxd exploit 0000000: 6877 3334 3220 0000 0000 0000 0000 0000 hw342 .......... 0000010: 0102 0304 aabb ccdd 1122 3344 ........."3D
mipsel-linux-objdump
can give you a listing of the assembly code for browser
:
$ mipsel-linux-objdump -xdl browser
Pipes and redirection, you may recall, are shell command line operators that let you connect the output of one program
(say cat
or xxd
) to the input of another program or to a file. So you can, for example, concatenate
two text files using cat
, send the resulting text as input to xxd -r -p
, send the resulting raw
binary to the simulated browser
, then send the resulting output to a file output.txt
, all using a single command:
$ cat exploit_part1.txt exploit_part2.txt | xxd -r -p | simulate browser > output.txt
We're here to help. Take advantage of our office hours if you are stuck.
For an entertaining (and a somewhat dated) read on buffer overflow attacks, check out:
Aleph One. Smashing the Stack for Fun and Profit. Phrack Magazine, 7(49), November 1996.
http://www.phrack.org/issues.html?issue=49&id=14
And finally, to reiterate: a friendly hacking challenge can be fun, and hacking skills are invaluable for working with real systems. But you must be responsible for your own behavior. We are not giving you free reign to launch attacks on CMS, fellow students' machines, or any anything else. Such behavior is unethical and most likely illegal as well.
browser
, the SYSCALL
instruction works pretty much just like
a regular function call. The only difference is that before invoking,
$v0
must contain a numeric code for which operating system service
should be invoked: 1 for "printi", 2 for "prints", 3 for "putc", and so on.
These numeric codes are not standard, but are specific to the cs3410
simulator. Other simulators you might find on the web will use their own
numeric codes.Yes, you need the newlines both before and after the "LOL 0wn3d!" message. Of course getting the message in the first place is worth the most points, but the newlines will get you those final few points.
So, an exploit that looks like this:
$ simulate browser < pht24-soln Where to connect? LOL 0wn3d! pht24 is on Facebook! MIPS program exits with status 0 (approx. 8105 instructions in 249071 nsec at 32.32540 MHz)
... is preferable to an exploit that looks like this:
$ simulate browser < pht24-bad Where to connect? LOL 0wn3d! pht24 is on Facebook! MIPS program exits with status 0 (approx. 8105 instructions in 252959 nsec at 32.32040 MHz)
As you may have discovered, you can't simply embed a newline or carriage return in the message, because the browser stops reading when it encounters these characters. Something more clever is called for.
No, a vertical tab is not a newline. You must embed a newline into the message.
The short answer: Use the command simulate -i exploit -d browser
The long answer:
You are likely trying to use one of these two commands:
simulate -d browser < exploit cat exploit | simulate -d browser
This works just fine without the debugger (simulate browser
), but
what happens when we run it with the debugger? It takes the contents of the
file named exploit
and pipes that to the standard input of
simulate -d browser
... but what input does simulate -d
browser
expect? The debugger is expecting a command from the user
(step, continue, info, etc.). Your exploit code is being used as a debugger
command, which confuses the debugger and causes it to exit without executing
the browser. The -i
flag to simulate
will solve this
problem.
Don't forget that you can find out about the -i
flag and other
potentially useful flags by running simulate
without any
arguments.
There are ways to make your program resistant to changes in stack layout. These clever exploits work when the stack starts in some small region, instead of only working for one fixed location. If you implement such an exploit, feel free to brag about it in your documentation for extra credit!
Finally, there is a way to make your program work with any arbitrary stack layout. We'll leave this one for the adventurous. If you find this exploit, again, specify clearly in your documentation what we need to do to see this awesome exploit in action, and you will be awarded more extra credit.