Project 3 - Buffer Overflow
CS 3410 Fall 2018
Due: 11:59pm, October 22nd, 2018
Late Policy: Up to 2 slip days can be used for this project. If you are out of slip days, submissions after the due date will incur a 25% deduction per day late.
Grace Period Policy: Do not rely on the grace period to submit on time. Everything should be uploaded BEFORE the due date above.
Reminder: You must work alone for this project.
Warning: Read the ENTIRE writeup before you begin. Regrades will not be honored for submissions that do not follow the writeup.
Setting up your Environment
For this project you should either be SSH'd into a UGCLINUX machine or
be using the course VM found on the course webpage.
Files: The files you will need for this assignment will be in
your personal github repository.
We assume that you cloned your github repo into your home directory. If
you placed it elsewhere, you will have to modify the commands below.
Setting up $PATH: The first cd
will put you in your
$HOME
directory, but we need to update our
$PATH
. This is how you add these files to your
$PATH
so that your shell knows where to find the
simulate
program and all of the mipsel-linux tools.
This means that you can just type simulate
as opposed to
/home/<<NETID>>/<<REPONAME>>/mips-sim/binary/simulate
every time you want to run something. From here forward, we assume you
are SSH'd into a UGCLINUX machine.
- To avoid having to reset your
$PATH
every time you log
in, you can do the following just once:
Note: You should replace <<NETID>>
with your own NetID and <<REPONAME>>
with the
name of your GitHub Repository (which is most likely also your
NetID).
cd
echo 'export PATH=${PATH}:/home/<<NETID>>/<<REPONAME>>/mipsel-linux/bin:/home/<<NETID>>/<<REPONAME>>/mips-sim/bin' >> .bashrc
Then logout and log back in. You can type exit
to logout.
We assume that you cloned your github repo into your home directory. If
you placed it elsewhere, you will have to modify the commands below.
Setting up $PATH: The first cd
will put you in your
$HOME
directory, but we need to update our
$PATH
. This is how you add these files to your
$PATH
so that your shell knows where to find the
simulate
program and all of the mipsel-linux tools.
This means that you can just type simulate
as opposed to
/home/vm/<<REPONAME>>/mips-sim/binary/simulate
every time you want to run something. From here forward, we assume you
are using a Virtual Machine.
- To avoid having to reset your
$PATH
every time you log
in, you can do the following just once:
Note: You should replace <<REPONAME>>
with the
name of your GitHub Repository (which is most likely also your
NetID).
cd
echo 'export PATH=${PATH}:/home/vm/<<REPONAME>>/mipsel-linux/bin:/home/vm/<<REPONAME>>/mips-sim/bin' >> .bashrc
Then logout and log back in. You can type exit
to logout.
Setting up $NETID: You must follow these steps to set up your
environment correctly before attempting this assignment.
- In order to set your
$NETID
variable, do the following just once:
Once again, note that you should replace
<<NETID>>
with your own NetID.
cd
echo 'export NETID=<<NETID>>' >> .bashrc
Then logout and log back in.
You should now be able to check that your $NETID
variable is
set correctly:
echo $NETID
If you see your NetID printed to your terminal, you are all set.
Checking $PATH and $NETID environment variables:
- If your
$PATH
and $NETID
are not showing up, try
cat ~/.bashrc
Look at the output:
'export PATH=....'
should only appear once.
'export NETID=...'
should only appear once.
If they appear more than once, then open ~/.bashrc
in a text editor and remove all instances except the last one
echo $NETID
echo $PATH
Does it work? If not, try:
simulate --help
Does it work?
If you can see your NETID and correct PATH, then you are all set.
Overview
The goal of this project is to get intimately familiar with the layout and
use of call stacks, as well as MIPS machine language, assembly and
disassembly, debugging, and reverse engineering. As a side benefit, we hope
to raise your awareness of computer security issues. To this end, you will
write a buffer overrun exploit to break a program that we provide to
you.
WARNING: These kinds of friendly hacking challenges have a long
history, and hacking skills are priceless, as they reflect a deep
understanding of the operation of a computer system. But you must be
responsible and use your skills wisely. Taking over machines or hacking the
Internet carries stiff penalties, is a sure-fire way to get expelled from
Cornell, interferes with other people's lives, and is a waste of your
talent. It is also plain wrong.
What to Submit
Submit your raw binary exploit file containing the specially crafted
input. We will try it out on our own copy of browser
to see
if it successfully breaks it.
Also submit a text document that briefly explains the exploit file.
A text listing from xxd
of the bytes in your exploit file,
annotated with a few comments to explain what it is doing (or trying
to do) is sufficient documentation.
The Story
In this project, you will "0wn" a binary program called
browser
that we will provide to you. We will not be
providing the source code for this program. All that you know about this
program is what is documented here, and what you can figure out for
yourself by running or examining the binary. The browser
is a
simplified web browser. The normal operation of browser
is
very simple. When executed, it prompts you for a URL, and then prints a
simple message (the '$' shown here is the linux shell prompt):
simulate browser
Where to connect? www.google.com
Connected to www.google.com!
I can also send input to browser
from another program using
the linux shell '|' operator, with the same results:
echo "www.google.com" | simulate browser
Where to connect?
Connected to www.google.com!
However, this browser was supplied to you by Google, which has decided it
is finally time for them to abandon their long-time motto of "Don't be
evil"!!! The browser only lets you connect to www.google.com
.
All other URLs will be rejected — try it and see!
The rumor is that browser
suffers from a buffer overflow
vulnerability. Since the program only takes one input, it's not difficult
to guess where the problem might lie. You are a typical cs3410 student
wanting to procrastinate on your work by browsing memes and being influenced
politically by foreign agents. Thus,
you would like to get this browser to let you connect to facebook.ru (NOT facebook.com), even
though Google originally designed it to only allow access to its own
website.
0wning browser
: Your job is to craft some input to
browser
that will cause it to print out a different message,
specifically: "LOL 0wn3d! <netid> is on facebook.ru!" (substitute your
own NetID). The fact that the normal "Only www.google.com is allowed"
message is missing constitutes proof that you have completely subverted the
browser, and have gotten it to do something that it could not do before.
cat exploit | simulate browser
Where to connect?
LOL 0wn3d! awb93 is on facebook.ru!
To do this, you will need to inject new code into the browser
program as it is running. You are not allowed to modify or replace the
browser
program on disk. The only way you get to interact with
browser
is to feed it some carefully crafted input.
The simulator: The browser
program is compiled to run
on a MIPS CPU. Since most of you don't have access to a real MIPS CPU
(neither do we), you will not be able to natively execute the program.
Instead, you can run a program which takes browser
and
simulates the execution of the code. The simulate
program we provide is just such a simulator. (Another popular simulator is
spim
, but it won't work for this assignment: spim
only deals with assembly code, rather than actual binary machine code as
our in-house simulator does.)
To figure out how to attack browser, you'll need to step through its code
as it is executing and reverse engineer the parts that matter, namely,
where (i.e., at which memory location) the input buffer is stored, what the
values are that lie near them in memory, and what precise instruction
sequence is vulnerable to a buffer overflow attack. Since you have the MIPS
binary, you can use various mipsel-linux tools, like
mipsel-linux-objdump
and mipsel-linux-readelf
, to
disassemble the browser
binary and learn about its layout and
code.
You can also use the -d option to the simulator, which starts an
interactive debugger for the simulated program execution. This lets you
step through the execution one instruction at a time, examine memory and
the stack contents, and so on. See the README
file in your
repo for help using the simulator and it's built-in debugger.
Stack Randomization: Note that in a feeble effort to thwart just
such attacks, the simulator, like many real machines, implements stack
randomization, a limited kind of program layout randomization. When the
simulator starts, it initializes the stack to a variable address, rather
than the standard 0x7FFFFFFC
. The starting location of the
stack is derived from the $NETID
environment variable.
Executing the Attack: Once you have figured out the program and
stack layout, you need to come up with a carefully crafted input that will
take over browser
. This input will likely contain some binary
data (the attack payload) that corresponds to MIPS instructions you
want to have executed. There are several tools you might want to use to
create the payload and inject it into the running browser
: a
MIPS assembler (e.g. mipsel-linux-gcc
, Logisim's "MIPS Rom"
assembler, web-based MIPS assemblers, or yourself with the book's green
reference card) to convert from MIPS assembly into MIPS machine language;
xxd
for converting text files containing hex digits to (or
from) raw binary files; and cat
for sending raw binary input
to browser
.
Once your attack causes browser
to print the "LOL 0wn3d!
<netid> is on facebook.ru!" message, the browser program should exit
gracefully (this means, exit with status 0). It is trivial to make it loop forever. A clean exit only takes
a few extra instructions to invoke the normal exit()
routine.
Command-line Arguments
Standard convention is for main(int argc, char **argv)
to be
invoked with the name of the program itself as argv[0]
. In
order to support this, the simulator pushes the name of the MIPS program
file on to the stack before invoking main. You may have noticed this file
name appears near the top of the stack.
Important: An unfortunate consequence is that the stack
alignment of main
and all other functions depends on how you
invoke the program. There are ways to avoid this.
Option 1: Always be sure to invoke the simulator using
the exact same file name argument for the browser
file. In
other words, whatever you are using as the last argument to the
simulate
program, you should always use as
the last argument.
So if you invoke the simulator like this:
$ simulate browser
Then any of these is acceptable as well, and will not change the stack
alignment:
/home/<<USERNAME>>/<<REPONAME>>/mips-sim/bin/simulate browser
simulate -d browser
/home/<<USERNAME>>/<<REPONAME>>/mips-sim/bin/simulate -d -i exploit.binary browser
cat exploit.txt | xxd -r -p | simulate browser
where <<USERNAME>>
represents the username
depending on whether you are using SSH or the VM, and
<<REPONAME>>
is the name of your repository.
But any of these will cause the stack to have a different alignment:
simulate ./browser
simulate /home/<<USERNAME>>/<<REPONAME>>/p3/browser
simulate ~/browser
simulate p3/browser
Option 2: Use the -noargs option of the
simulator which will always invoke main with some constant-length arguments
regardless of what you write on the command line. Using the -noargs
option, you can invoke simulator any way you like and it will always use
the same stack layout:
simulate -noargs browser
simulate -noargs ./browser
simulate -noargs ~/browser
simulate -noargs /home/<<USERNAME>>/<<REPONAME>>/p3/browser
Although this is how real programs do work, this extra wrinkle in the
assignment isn't intended to add extra difficulty to it. You may specify in
your documentation what path to browser
is appropriate for
your solution, and we will honor it when testing.
Without any specification, we will assume that "browser" was used, or that
the -noargs flag was used. You should prefer one of these two
options unless you have already tested extensively with another option
before reading this paragraph. This should never happen, since every
student reads this handout carefully before starting the assignment, right?
:)
Tools
Here are a few tools you might find useful for this homework.
xxd
is a tool for converting back and forth between
raw binary files and text representations of the binary data. For example,
if I create a file exploit.txt
(using a regular text editor)
specifying twenty-eight consecutive "bytes" in hex:
68 77 33 34 32 20
00 00 00 00 00 00 00 00 00
00
01 02 03 04
aa bb cc dd
11 22 33 44
then I can convert this into raw binary using xxd
in "reverse
plain" mode:
xxd -r -p exploit.txt > exploit
ls -l exploit*
-rw-r--r-- 1 hw342 hw342 28 2011-02-25 12:06 exploit
-rw-r--r-- 1 hw342 hw342 84 2011-02-25 12:06 exploit.txt
You can see that the text version is 84 bytes (includes spaces and 2
digits of text per "byte"), and the raw of the input file in "reverse" mode
(spaces at the ends of lines silently mess things up, for example). So you
may want to convert the raw file back to text and compare to your desired
bytes to make sure nothing went wrong:
xxd exploit
0000000: 6877 3334 3220 0000 0000 0000 0000 0000 hw342 ..........
0000010: 0102 0304 aabb ccdd 1122 3344 ........."3D
mipsel-linux-objdump
can give you a listing of the
assembly code for browser
:
mipsel-linux-objdump -xdl browser
Pipes and redirection, you may recall, are shell command line
operators that let you connect the output of one program (say
cat
or xxd
) to the input of another program or to
a file. So you can, for example, concatenate two text files using
cat
, send the resulting text as input to xxd -r
-p
, send the resulting raw binary to the simulated
browser
, then send the resulting output to a file
output.txt
, all using a single command:
cat exploit_part1.txt exploit_part2.txt | xxd -r -p | simulate browser > output.txt
Epilogue
We're here to help. Take advantage of our office hours if you are stuck.
For an entertaining (and a somewhat dated) read on buffer overflow
attacks, check out:
Aleph One. Smashing the Stack for Fun and Profit. Phrack Magazine, 7(49), November 1996.
http://www.phrack.org/issues.html?issue=49&id=14
And finally, to reiterate: a friendly hacking challenge can be fun, and
hacking skills are invaluable for working with real systems. But you must
be responsible for your own behavior. We are not giving you free
reign to launch attacks on CMS, fellow students' machines, or any anything
else. Such behavior is unethical and most likely illegal as well.