A8: Buffer Overflow
Instructions:
Remember, all assignments in CS 3410 are individual.
You must submit work that is 100% your own.
Remember to ask for help from the CS 3410 staff in office hours or on Ed!
If you discuss the assignment with anyone else, be careful not to share your actual work, and include an acknowledgment of the discussion in a collaboration.txt
file along with your submission.
The assignment is due via Gradescope at 11:59pm on the due date indicated on the schedule.
- Submission Requirements
- Restrictions
- Getting Started
- Overview
- Background: Buffer Overflow Vulnerabilities
- Your Task
- (Lab) Part 0: Overflow
- Part 1: Overwrite
- Part 2: Execute the Payload
- Epilogue
- FAQ
- ECALLS and Other Instructions
- You need the newlines!
- Aha! I found this handy vertical tab (
0x0b
) character! I can just use that instead of a newline, right? - Why does calling printf in my exploit print garbage?
- Why are some instructions only 16 bits wide?
- Why does it fail to connect to my program when I run gdb and try to connect to localhost?
- Docker gives an error when I try to launch the container with
rv-debug
, saying there is a conflict. - I’m getting “broken pipe” and/or “the input device is not a TTY” errors when I try to pipe my exploit to
browser
’s input.
- Submission
- Rubric
Submission Requirements
You will submit your completed solution to this assignment on Gradescope. You must submit:
-
exploit
, a raw binary file containing your specially-crafted input. We’ll try it out on our own copy ofbrowser
(in an environment that uses your NetID) to see if it successfully breaks it. A successful exploit causesbrowser
to output the following exactlyLaunching for user <NETID> Where to connect? LOL 0wn3d! <NETID> is on imgur.com!
where
<NETID>
is replaced with your NetID.browser
must also exit gracefully (i.e., exits with a return code of 0). -
README.md
, a text document that explains yourexploit
file. It should contain the following:- A text listing from
xxd
of the bytes in your exploit file, annotated with comments to explain what your exploit is doing (or trying to do). - An explanation of how your exploit tries to subvert the program’s check that the input string matches the expected string, and why this works.
- An explanation of how your exploit is able to take control of the program and what steps the exploit takes to force the program to print out the desired string.
- If you wrote a script to make your exploit for you, you may include it
in your
README.md
. However, by itself, a script does not constitute an explanation. We still expect a prose description of how your exploit works.
- A text listing from
Restrictions
-
You cannot use system calls anywhere in your exploit. You may only use the standard library functions already linked to the
browser
executable. -
Don’t modify the executable in any way. We will be evaluating your exploit using our own copy of
browser
. -
Your
exploit
must work when piped in as input to thebrowser
executable. Specifically, we will run the following command within the CS3410 container, launched with the docker command that is aliased byrv
(as discussed previously in the course infrastructure setup and in Lab 4):cat exploit | qemu browser
Getting Started
To get started, obtain the release code by cloning your assignment repository from GitHub:
$ git clone git@github.coecis.cornell.edu:cs3410-2025sp-student/<NETID>_bufferof.git
Replace <NETID>
with your NetID. All the letters in your NetID should be in lowercase.
Overview
In this assignment you will get a chance to apply your knowledge of RISC-V assembly, calling conventions, and the layout of memory in order to exploit a buffer overflow vulnerability in a program we provide you. To accomplish this feat, you’ll analyze a pre-compiled binary using disassembly and debugging tools and and write an exploit which assumes control of the target program. We also hope to raise your awareness of real-world computer security issues.
There are conceptually three parts to this assignment:
- Part 0 (Lab). Starting in the lab, you’ll begin by understanding how the program we provide you works with the aim of identifying where the buffer overflow vulnerability is and how you can exploit it. The goal of this part is to use the buffer overflow to cause the program to crash.
- Part 1. The second step is to modify the return address stored on the stack in order to cause the program to jump execution to a location in memory of your choosing, thereby giving you (the attacker) control over the execution of the program.
- Part 2. In the final part of this assignment, you’ll update your buffer overflow exploit to execute a set of RISC-V instructions of your choosing.
These kinds of friendly hacking challenges have a long history, and hacking skills are priceless, as they reflect a deep understanding of the operation of a computer system. But you must be responsible and use your skills wisely. Taking over machines or hacking the Internet carries stiff penalties, is a sure-fire way to get expelled from Cornell, interferes with other people’s lives, and is a waste of your talent. It is also plain wrong.
Background: Buffer Overflow Vulnerabilities
Before getting your hands dirty, let’s start by understanding what a buffer overflow vulnerability is.
First, what do we mean by “vulnerability”? In the context of computer security, a vulnerability is a flaw in a computer, system, or program that compromises its security. Vulnerabilities can be caused by a design flaw or an implementation bug. Malicious attackers then can exploit these vulnerabilities to steal or damage the hardware, software, or data of a system, as well as disrupt any services the system provides. If you’re curious, most of the exploitable vulnerabilities that have been discovered are documented in the Common Vulnerabilities and Exposures (CVE) database.
A buffer overflow is perhaps the most well known form of a software vulnerability. Despite this, buffer overflow exploits are still quite common today. A buffer overflow is the result of a program trying to put more data into a buffer than can the buffer can hold. We’ve actually seen buffer overflows earlier this semester but with a different name: an out-of-bounds memory access.
For example, consider the following C program which contains a buffer overflow.
void foo(char* str, int n) {
char buffer[8];
for (int j = 0; j < n; j++) {
buffer[j] = str[j];
}
}
int main() {
char long_str[128];
for (int i = 0; i < 128; i++) {
long_str[i] = 'A';
}
foo(long_str, 128);
return 0;
}
This program initializes a string long_str
consisting of 128
'A'
s. The foo()
function then copies long_str
into the local variable
buffer
. A buffer overflow occurs when the for
loop in foo()
starts to
overwrite memory beyond the end of buffer
(i.e., when j >= 8
) as buffer
is
only 8 bytes large but long_str
is 128 bytes large.
What will happen when we run this program? Because accessing out-of-bounds
memory is a form of undefined behavior in C, we don’t know for sure! But let’s take a
closer look at what happens when the program tries to write to &buffer[8]
through
&buffer[127]
.
Depicted above is the layout of foo
’s stack (call) frame. From top to bottom,
the stack stores the return address (ra
), the frame pointer (s0
/fp
), and
then the two local variables buffer
and j
. Since buffer
was only given
8 bytes on the stack, the 120 bytes after buffer
would be overwritten with the
ASCII character 'A'
(0x41
) once the for
loop finishes. This includes the
frame pointer and the return address! This means that after the for
loop the
return address is now 0x4141414141414141
. When foo()
returns,
the next instruction is read from 0x4141414141414141
, possibly causing a
segmentation fault.
To summarize, a buffer overflow allowed us overwrite the return address of the
call to foo()
. This caused the program to jump to 0x4141414141414141
instead
of back to main()
, likely crashing the program. While this certainly is
interesting, it may not yet be clear why buffer overflows are so dangerous.
Suppose that an attacker knew there was a buffer overflow in the program and also knew where the return address was stored on the stack. They could then set the return address to an address of their choosing causing whatever instructions located there to be blindly executed by the program! This effectively allows the attacker to execute arbitrary code without actually modifying the program itself — yikes!
Your Task
In this assignment, you will “0wn” a binary program called browser
that we will provide to
you. We provide one file, the compiled program. Although we do not provide any source
code files, a version of the program’s source can be viewed (or extracted) from the
compiled binary. All that you know about this program is what is documented here, and
what you can figure out for yourself by running or examining the binary.
The browser
program is a simplified “web browser”. When executed, it prompts you
for a URL, and then prints a short message (the ~#
shown here is the shell prompt within
the CS3410 container):
$ rv
~# qemu browser
Launching for user hw342
Where to connect? www.cs.cornell.edu
hw342 is connected to www.cs.cornell.edu!
You can also send input to browser
from another program using the Unix pipe
operator (|
):
~# echo "www.cs.cornell.edu" | qemu browser
Launching for user hw342
Where to connect?
hw342 is connected to www.cs.cornell.edu!
However, this browser only lets you connect to www.cs.cornell.edu
.
All other URLs will be rejected — try it and see!
As a proud hacker and social media enthusiast, you demand the right to connect to
other, less productive sites, and the fact that browser
was designed to only allow
access to cs.cornell.edu
is unacceptable. Rumor has it that browser
suffers
from a buffer overflow vulnerability. Since the program only takes one input, it’s
not difficult to guess where the problem might lie.
The browser
program has been compiled for RISC-V, which means that you need to run it using the QEMU emulator.
Further, because it makes some use of interactivity and TTY escape sequences, there are
uses of it—in particular, the exploits you will most likely write—that must be done at
an existing shell prompt within the CS3410 container. While you can run it using the rv
alias
(i.e., rv qemu browser
), attempts to redirect input using rv
will most likely fail. There are
ways to do it, but not elegant ones.
Instead, we recommend running all your commands using an interactive shell inside of a CS 3410 infrastructure container.
The easiest way to open one is to use the rv-debug
alias.
0wning browser
Your job is to craft some input to browser
that will cause it to print
out a different message, specifically: “LOL 0wn3d! exploit
.
When it is fed as input to browser
, the compromised behavior of the browser
program should be the result:
~# cat exploit | qemu browser
Launching for user hw342
Where to connect?
LOL 0wn3d! hw342 is on imgur.com!
The fact that the normal “Only www.cs.cornell.edu is allowed
”
message is missing constitutes proof that you have completely subverted the
browser, and have gotten it to do something that it could not do before.
You are not allowed to modify or replace the browser
program on disk. The only
way you get to interact with browser
is to feed it some carefully crafted input.
While not trivial to do, attacking browser
is easier than you might think. The trick
is to encode within your input a sequence of bytes that are structured in a way that injects
new code into the browser
program as it is running. To craft an input that will be read
and subsequently give you control of the program, you need to find where the vulnerable
input buffer is stored, how big it is, what the values are that lie near it in memory, and
what precise instruction sequence is vulnerable to a buffer overflow attack.
In a feeble effort to thwart just such attacks, the QEMU VM, like many real machines,
implements stack randomization, a limited kind of program layout randomization. When
the simulator starts, it initializes the stack to a variable address, rather than the
standard 0x7FFFFFFC
. Likewise, Linux also implements ASLR. If you craft your exploit
properly, neither of those things will matter.
The browser
executable also performs a weak form of stack protection. The actual starting
location of the stack is derived from the NETID
environment variable, which should be equal to your
Cornell NetID. We will test your submitted exploit in an environment where
$NETID
is your NetID. Hopefully, you already set that when you made your rv
and rv-debug
aliases/PowerShell
functions! You can check this by running rv env
and seeing whether NETID
is
indeed set to your Cornell NetID.
(Lab) Part 0: Overflow
View the lab slides here.
Before starting the lab, make sure you’ve got your copy of the browser
binary by
following the instructions in Getting Started.
The goal of this lab is to familiarize yourself with the
browser
binary, locate where the buffer overflow is, and finally craft an input which
causes browser
to crash with a segmentation fault. Once you’ve crashed
browser
, you can move on to Part 1.
As stated previously, you are only given the browser
binary for this
assignment. We encourage you to start exploring how browser
works by testing
it on example inputs. Recall that by default, browser
will only allow
connections to www.cs.cornell.edu
and will refuse to connect to any other
website.
It likely won’t be long until you’ve learned all that you can about browser
simply by running it. You’ll need to use standard developer tools to help you
learn more about how browser
works. Luckily, browser
has been built with
source-level symbol and debugging information!
Whoever made browser
appears to have embedded the source code within the
executable! You can definitely use that to you advantage.
To extract the source code, you’ll need to run browser
through
GDB by following the instructions below. Once you’re inside
a GDB prompt, run (gdb) printf "%s\n", src
. This will open an application
called a “pager” which allows you to page through the source code. You’re also
free to copy the source code into a separate file (e.g., browser.c
). If you do
so, make sure that your text editor doesn’t automatically format the code!
Otherwise, the line numbers that GDB reports won’t match up with line numbers in
the source code that you extracted. VSCode in particular is notorious for this.
Once you’ve found the source of the buffer overflow, you need to use it to crash
the program. One way to easily cause the program to crash is to overwrite the
return address to a restricted area of memory. Note that you don’t need to know
exactly where the return address is in order to get browser
to crash. We
recommend that you run browser
through GDB so you can see where it
crashes.
objdump
objdump
(“object dump”) is a tool to display information about object files
(i.e., machine code). You can use it to give you a listing of the assembly code
for browser
(among many other uses). For example, to see the assembly code of
browser
and all the libraries it uses, run:
$ rv objdump -xdl browser
To save the output of this command to a file, you can redirect the output using
the >
shell operator:
$ rv objdump -xdl browser > browser.s
This becomes very helpful as it includes the disassembly of the standard library, which has functions you’ll need to call later in Part 2.
Example: In the assembly of browser
, you’ll find many labeled blocks
which correspond to included standard library functions. For example, the
following block ultimately calls the strlen()
function from string.h
:
00000000000109d0 <strlen@plt>:
109d0: 00002e17 auipc t3,0x2
109d4: 670e3e03 ld t3,1648(t3) # 13040 <strlen@GLIBC_2.27>
109d8: 000e0367 jalr t1,t3
109dc: 00000013 nop
This block calls the strlen()
function by first loading the address of the
strlen()
function into the t3
register. Elsewhere in the assembly you will
be able to find multiple occurrences of the instruction jal 109d0 <strlen@plt>
. This tells us that to call the strlen()
function we need to
first load our arguments into the appropriate registers according to our calling
conventions and then we need to jump to 0x109d0
.
All functions which aren’t prefixed by underscores follow calling
conventions. If you want to see information about function calls to the standard
library (e.g., printf()
, exit()
) that you see in the object dumps, refer to
Section 3 of the Linux man pages.
GDB
Another powerful tool that you can (and should!) use in this assignment is GDB.
Recall from Lab 4 that you can start an interactive GDB session for the program execution by opening two terminal windows within the same CS3410 container image. In one of them, invoke our CS3410 “debugging” container
$ rv-debug
and in the other, open a shell in this same container:
$ docker exec -it `docker ps -f name=testing -q` bash
Next, start the browser
executable for remote GDB in one of the terminal windows:
root@dd70ff2495b5:~# qemu -g 1234 browser
Finally, open gdb using the following commands in the other terminal window:
root@dd70ff2495b5:~# gdb -ex 'target remote localhost:1234' -ex 'set sysroot /opt/riscv/sysroot' browser
You can now debug the program remotely. From Lab 4, you are already familiar with common GDB commands for investigating details of source code-level symbols. In addition to those, you may find some of the following lower-level commands helpful:
-
To see the assembly of a single procedure in GDB, you can use
disassemble _<procedure name>_
(gdb) disassemble main Dump of assembler code for function main: <output omitted>
Just typing
disassemble
without specifying a procedure name will give you the assembly for the program’s entry point,_start
. -
To step to the next assembly instruction, you can use
stepi
(or its abbreviation,si
):(gdb) stepi
-
To set a breakpoint at a memory address
addr
, prefix the address with a*
:(gdb) break *addr
For example, if we wanted to set a breakpoint at the address
0x123456
we would use the following:(gdb) break *0x123456
-
To examine the stack, we can use the
sp
register. This will give you the first 4 words of the memory, starting fromsp
.(gdb) x/4x $sp
This will give you the first 10 words starting from
sp - 20
:(gdb) x/10x $sp-20
-
To list all the registers, you can use:
(gdb) info r
-
To print the contents of a specific register, use the following (where
<#>
is the register number):(gdb) print $<#>
You may find the GDB lab (Lab 4) useful as a refresher. For more information on the GDB x
command, refer to: https://sourceware.org/gdb/onlinedocs/gdb/Memory.html.
Part 1: Overwrite
You should now have an input which exploits the buffer overflow in
browser
to cause it to crash by overwriting the return address with some
garbage value. Next, you’ll refine this input by locating exactly where the
return address is stored on the stack. Once you know where
the return address is stored, you’ll be able to change it to whatever value you
wish. We suggest that you try to change the return address to
0x0000000000000000
.
We strongly recommend writing a script that will build the exploit string for you. You’ll rapidly go through different versions of your exploit as you test it. Having a script that constructs your exploit will likely save you a lot of time, as well as help you document how your exploit works.
Be careful to ensure that any string you build consists of raw byte values in
the places where they are needed, not just ASCII characters. For example, in
Python, all ordinary string concatenation operations will produce ASCII
characters, but you can use other means (e.g., the pack
method in Python’s
struct
module) to convert non-byte values to bytes.
You may use any language you wish to write your script in, if you choose to write one at all. You do not need to submit it, although you can certainly reference it in your exploit writeup.
Below are some additional tools and tricks that can help you during this part of the attack.
xxd
xxd
is a tool for converting back and forth between
raw binary files and text (ASCII) representations of the binary
data.
Using xxd
in “plain mode”, you can convert ASCII text (interpreted as raw
binary) into ASCII hexadecimal digits. For example,
$ echo "CS 3410" | xxd -p
435320333431300a
xxd
can also go in reverse. For example,
if you create a file exploit.txt
(using a regular text editor)
specifying twenty-eight consecutive “bytes” in hex:
68 77 33 34 32 20
00 00 00 00 00 00 00 00 00
00
01 02 03 04
aa bb cc dd
11 22 33 44
You can convert these bytes, which are currently written as hexadecimal
characters in ASCII, into raw binary using xxd
in “reverse plain” mode:
$ xxd -r -p exploit.txt > exploit
$ ls -l exploit*
-rw-r--r-- 1 hw342 hw342 28 2025-02-25 12:06 exploit
-rw-r--r-- 1 hw342 hw342 84 2025-02-25 12:06 exploit.txt
You can see that the text version is 84 bytes (includes spaces and 2 digits of text per “byte”), while the raw binary of the input file in “reverse” mode is exactly 28 (spaces at the ends of lines silently mess things up, for example). So you may want to convert the raw file back to text and compare to your desired bytes to make sure nothing went wrong:
$ xxd exploit
0000000: 6877 3334 3220 0000 0000 0000 0000 0000 hw342 ..........
0000010: 0102 0304 aabb ccdd 1122 3344 ......... "3D
You can learn more about xxd
by reading its manpage.
When in “reverse plain” mode (xxd -r -p
), xxd
will only convert the ASCII
hexadecimal digits in its input to raw binary. All other characters will be
skipped and won’t appear in the output.
Pipelines and Redirections
The shell command line pipe operator (|
) allows you to
connect the output of one program (e.g., cat
or xxd
) to the input of
another. For example, you can call
~# cat exploit | qemu browser
to pass the contents of the exploit
file as the input to qemu browser
.
You can also redirect the output of one program to a file, overwriting it if
it previously existed, using the redirect operator (>
). For example, the following command writes the string
Hi!
to a file hi.txt
:
~# echo "Hi!" > hi.txt
The append operator (>>
) does nearly the same thing, except that it doesn’t
overwrite the file and instead appends its input to the end of the given file.
You can also chain multiple shell operators together to form a pipeline. For
example, the following uses cat
to feed the contents of exploit.txt
as
input to xxd -r -p
, sends the resulting raw binary
to the browser
binary, and finishes by writing the output of browser
to
output.txt
, all with one command:
~# cat exploit.txt | xxd -r -p | qemu browser > output.txt
Part 2: Execute the Payload
You’re nearly there! Now you’ll finish your exploit by injecting into the
browser
program
some RISC-V assembly code to cause it to print “LOL 0wn3d! <netid> is on imgur.com!
” and then exit gracefully with a return code of 0.
Early forms of buffer overflow attacks were made easier by the ability to modify an executable’s instructions directly. By default, Linux makes a program’s text and data sections read only. So, any attempt to modify the programs instructions while it is running will cause a segmentation fault. You’ll need to find another way.
asbin
The asbin
script inside of the CS 3410 infrastructure container is a
convenient way to assemble RISC-V assembly instructions.
For example, we can use asbin
to turn the RISC-V assembly instructions within
payload.s
into machine code:
$ rv asbin payload.s
This will create a file payload.bin
containing the binary encoded instructions
in your current working directory.
You can also use the equivalent shell incantation (assuming you’re already within the CS 3410 infrastructure container):
~# as payload.s -o tmp.o && objcopy tmp.o -O binary payload.bin && rm tmp.o
Epilogue
We’re here to help! Start early and take advantage of our office hours if you get stuck. Also, see the FAQ!
For an entertaining (and a somewhat dated) read on buffer overflow attacks, check out:
“Aleph One. Smashing the Stack for Fun and Profit”. Phrack Magazine, 7(49), November 1996. https://phrack.org/issues/49/14#article
And finally, to reiterate: a friendly hacking challenge can be fun, and hacking skills are invaluable for working with real systems. But you must be responsible for your own behavior. We are not giving you free reign to launch attacks on CMS, fellow students’ machines, or any anything else. Such behavior is unethical and most likely illegal as well.
FAQ
ECALLS and Other Instructions
You may see ECALL and other RISC-V instructions in the object-dump. ECALL is an assembly instruction used to make a system call to the OS. You can refer to the RISC-V manual for further explanation on instructions, but don’t worry too much about understanding every instruction. .
You need the newlines!
Yes, you need the newlines both before and after the “LOL 0wn3d!
” message. Of
course, getting the message in the first place is worth the most points, but the
newlines will get you those final few points. So, an exploit that looks like this:
~# cat exploit | qemu browser
Where to connect?
LOL 0wn3d! hw342 is on imgur.com!
… is preferable to an exploit that looks like this:
~# cat exploit | qemu browser
Where to connect? LOL 0wn3d! hw342 is on imgur.com!
As you may have discovered, you can’t simply embed a newline or carriage return in the message, because the browser stops reading when it encounters these characters. Something more clever is called for here.
Aha! I found this handy vertical tab (0x0b
) character! I can just use that instead of a newline, right?
No, a vertical tab is not a newline. You must embed a newline into the message.
Why does calling printf in my exploit print garbage?
Because of the nature of the exploit, we may end up ruining the value of the stack pointer.
We need to set our sp
and fp
to be valid stack values so that function calls still work
nicely.
Why are some instructions only 16 bits wide?
Some instructions in the browser instruction set are compressed.This shouldn’t affect your solution.
Why does it fail to connect to my program when I run gdb and try to connect to localhost?
This might be because you did not open your second shell with the same Docker container image,
which will happen if you use rv
instead of rv-debug
to start the container. Check the
first two instructions in the Debugging section, above, and make sure you didn’t mis-type
anything. You can verify that the shell prompts in the two windows are in the same container
instance by looking at the full text of the prompt, “root@<container_id>:~#
”. The
<container_id>
value for both prompts should match.
Docker gives an error when I try to launch the container with rv-debug
, saying there is a conflict.
This could result from having a running rv-debug
instance running in a different tab,
or an older container instance may have failed to tear down, even though it is no longer
accessible (among other ways, this can happen with some uses of remote GDB). Check to see
if you have another instance of the container running, and if so, use docker stop
to kill
it.
$ rv-debug
docker: Error response from daemon: Conflict. The container name "/testing" is already in use by container "d56938529b09ec020c69431d49ecc08a0f3043df26df684e125e92eb4b3f78ab". You have to remove (or rename) that container to be able to reuse that name.
See 'docker run --help'.
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d56938529b09 ghcr.io/sampsyo/cs3410-infra "/bin/bash" 3 minutes ago Up 3 minutes testing
$ docker stop testing
testing
$ rv-debug
root@45efb0b3855a:~#
I’m getting “broken pipe” and/or “the input device is not a TTY” errors when I try to pipe my exploit to browser
’s input.
This will happen if you try to run browser
with your exploit using rv
or rv-debug
in a single command, instead of first invoking a shell prompt in the container, because
you’re connecting the stdout
of one command with the stdin
of browser
, and there
isn’t a way to invoke rv
/rv-debug
on both with the same container image. Possible
forms of this error may look like one of the following:
$ rv cat exploit | qemu browser
bash: qemu: command not found
write /dev/stdout: broken pipe
$ cat exploit | rv qemu browser
the input device is not a TTY
$ rv cat exploit | rv qemu browser
the input device is not a TTY
write /dev/stdout: broken pipe
Submission
Submit all the files listed in Submission Requirements to Gradescope.
Rubric
- exploit: 72 points
- README.md: 28 points
Exploits that make system calls directly (i.e., using ecall
) will receive no
credit.