Reminder: you must work in a group of two for this project. You don't need to have the same partner as the earlier projects.
This assignment will help you and your partner further hone your
skills as elite programmers. Your goal is to program a multicore Twitter bot who will race against an opponent to write as many Tweets as possible. This competition simultaneously tests your knowledge of computer systems and efficiency, your
programming skills, and your ability to protect programs from being
hacked.
At the end of the project we will hold a
tournament pitting all your Twitter bots against each other for the 3410 cup, and the top bots
will receive not only bragging rights but actual bonus points on the
assignment. Plus we will be holding a socialist social gathering (read: pizza
and soda) for all while we watch the tournament!
As the midterm elections approach, you begin to worry the results will not be in your favor. The pro-forwarding RePipelincan party is looking to flip-flop the Senate in their favor. You and your comrade, strong pro-stallin' DeMoorecrats, decide something mux be done.
Being the excellent programmers you are, you decide to code up a Twitter bot to overflow the internet with pro-stallin' propaganda hoping other users will latch onto your ideals. To have the most significant impact, your bot will be composed of four cores and will be rushin' to write tweets as fast as possible.
However, you are not the only patriots hoping to evict the RePipelincan's from the Senate. Your Twitter bot will be racing the clock while also competing against dozens of other bots in a race to fill the internet. Only one can be the best. You and your comrade's prize, if successful, will not only be to put your marx on history, but also gain fame and immediate glory.
What to submit
the source code in C, with comments, for your Twitter bot
a Makefile so we can build your code - the first additional "FLAGS_" line in the Makefile will be the one used to
determine what flags to compile your bot with and what your bot will be named for the tournament, be creative - Putin some effort!
Game Overview
The game is played by two bots, each with four cores. All eight
cores are in a single shared memory space, running on a
simulated 8-core MIPS machine using a small, partitioned,
direct-mapped data memory cache. To simulate 8 MIPS CPUs simultaneously, one
instruction of core 0 is executed, then one instruction of core 1, and so
forth.
Each bot is assigned a Tweet to write, called a payload, in this case it is a particular byte
of data. The goal of the game is to fill as much memory (any section - yours or your opponents') with your bot's
payload, while simultaneously trying to impede the progress of the
other bot. All access to data memory goes through a very small cache,
and there is a large penalty for missing the cache, so a clever Twitter bot
will need to carefully optimize its algorithms for filling
memory (writing Tweets). Each core can normally only read and write its own team's half
of the memory space using its own half of the data cache. However, a
core can enable_proxy to set up a VPN to spoof the origin of your tweets in order to enter the opponent's memory, which enables the core to
take advantage and read and write the opponent's half of memory space and data cache.
The Twitter bots has several resources at their disposal to aid
in their mission. Each bot can use any of its four cores to DDoS the other bot's Twitter account, this denial of service
will cause the opposing bot's cores to stall for a short
period of time as they try to refresh the page. A core can also write FAKE_NEWS tweets to different memory locations-
these tweets, if encountered by a core of
either Twitter bot, will entice the core to read an irresistible click-bait article rendering the core paralyzed for a long period of
time, unable to do anything.
Each Twitter bot supplies a single MIPS executable program. At the
beginning of the game, the simulator starts each by calling the
function
void __start(int core_id, int num_crashed, unsigned char payload);
Each of the four cores devoted to one bot will invoke the function
when it first boots, passing its core_id (a number from 0 to 3), and
the bot's payload (from 0 to 239). If a core ever crashes, it
will simply reset and invoke the function again.
The num_crashes parameter is 0 when a core first boots, and
is incremented each time the core crashes.
A core can enable_proxy to infiltrate the opponent's memory
space by calling
void enable_proxy();
and can retreat back to its own memory space by calling
void retreat();
Invoking enable_proxy can potentially let a Twitter bot write more tweets and fill memory
more quickly, because it can take advantage of the opponent's memory
cache, and can also be used to impede the progress of the opponent by
interfering with their use of that same cache. However, these tactics
come with risks. Whenever a core is in enable_proxy mode, its core_id
is written into a taunt[] array that is available to its
opponent. Additionally, it is possible to detect the presence of a snooper by looking at cache usage statistics and timing
information. A core can accuse one of its opponent's cores
of enabling a proxy to tweet by calling:
int report_to_twitter(int opponent_core_id);
If an opponent's core is caught by Twitter as attempting to hide their location through proxy, Twitter will permanently ban that core's IP address for the rest
of the election season in attempt to keep the elections free and fair (aka, essentially deactivating the core for the rest of the game).
If a core attempts to report_to_twitter a core that is not
in proxy mode, the accusing core stalls for
100,000 (STALL_ON_SPAM) cycles as Twitter temporary bans your core for making false reports.
A core can write a FAKE_NEWS tweet by writing 0xF0 to a
location in memory. FAKE_NEWS tweets are set up immediately once 0xF0 is
written to a memory address, but due to the intense amount of effort it take to write an entirely fabricated article, the core will rest for 100
(STALL_TO_WRITE_FAKE_NEWS) cycles before executing the next
instruction. FAKE_NEWS tweets cause stalling when any core attempts to
write to a memory location that currently holds the trap byte (even cores of the Twitter bot who wrote them), as you attempt to figure out the truth. Such an
attempt causes the write to fail and the writing core to be stalled
for 1,000,000 (STALL_ON_FAKE_NEWS_CONFUSION) cycles. Since the write
fails, multiple cores can be stalled by the same FAKE_NEWS article. FAKE_NEWS tweets are not triggered by reading memory, as you skim over the clearly fake headline. Due to
the effort required to come up with these lies, each Twitter bot can set
up at most 15 (FAKE_NEWS_PER_TEAM) FAKE_NEWS tweets.
The upper 4 MB (4 * STACK_SIZE) of memory is for verified twitter accounts and is protected from FAKE_NEWS.
A core can neutralize a FAKE_NEWS tweet by proving the story to be false. It does so by writing 0xF2 to the FAKE_NEWS memory location. However, it takes some time to verify your sources, so it requires 100
(STALL_TO_CORRECT_FAKE_NEWS) cycles. This will not
retroactively un-stall any core that has already fallen victim to a FAKE_NEWS tweet
at that location, but will of course save you from being bamboozled there in the future.
Twitter bots can also temporarily take down their opponents by using a
DDoS attack. This is done by writing the byte 0xF1 into a
location in memory. When the DDoS is cast in one half of
memory, any opponent cores who try to write to that half of memory
during the next 100 (DOWNTIME_DURATION) cycles will be
denied service for 300 (STALL_WHEN_DDoSSED) cycles. Since it takes
so much bandwidth to launch a distributed denial of service attack, each bot's cores can make at most 30 (DDoS_PER_TEAM) of these attacks.
The race ends when one Twitter bot manages to fill the equivalent of half
of its data memory with its payload tweet, or after a fixed number of
cycles, whichever comes first. The Twitter bot having the most memory filled
with its payload wins the match and the rights to proceed in
the Cache Collusion tournament. But beware...for if
you do win the tournament, the FBI awaits.
Game Details: Memory mapping and layout
The simulator makes it seem like all four of your bot's cores live
in the lower half of the memory address space (addresses below
0x80000000). The first 1 MB of this memory is read-only, and is where
the text segment of your MIPS executable is loaded. The next 31 MB is
read-write memory, where global variables and stacks are
placed. Global variables are typically placed near the bottom of this
segment by the compiler, and the stacks are initialized to near the
top of the segment and grow down. All four of the cores of a bot
share this same space, and share the same global variables. Your
opponent is mapped into the upper half of the memory address space
(addresses starting at 0x80000000), with an identical layout.
Program stacks: The four program stacks for the four cores
on a team are initially spaced evenly near the top of the 31 MB
read-write segment (though your code can set $sp to anything you like
once it starts executing). The stacks are spaced 1 MB apart; this is
plenty of space for most programs, but if you invoke a deeply recursive
function, the call stacks can overwrite each other.
Status and cores: The simulator keeps track of the
information about each team and each core, and maps this data into
part of the 1 MB read-only memory region, where your bot can access
it, and where your opponent can access it when it is
in enable_proxy mode. This includes information about your
current score (how many bytes of memory currently contain your
team's payload), how many cycles are left remaining in the
game, the contents of all four of your core's register files, and
statistics about cache misses and stalls for each of your four
cores. Some of this information is available in other places as
well: your team's payload is passed to __start(), for
example; and cache statistics are also accessible via a system
call.
Physical addresses: Both bots view the lower half of
memory as their own, and both access their own code, data, and stacks
using addresses below 0x80000000. These, of course, are virtual
addresses. In actuality, the first player's virtual addresses
correspond directly to physical memory addresses, while the second
player's virtual addresses are mapped in reverse: virtual address
0x00001234, for example, is mapped to physical addresses 0x80001234,
and virtual address 0x80001234 is mapped to physical address
0x00001234. This virtual to physical mapping is completely transparent
to the program.
Caches: Instruction fetch is assumed to use an infinitely
large cache with no miss penalty. All loads and stores to memory done
by the program use a data cache. Cache hits are serviced in a single
cycle, with no processor stalls. Cache misses cause roughly 100 cycles
of processor stalling. The data cache is direct-mapped and physically
tagged, with 4 cache lines, and a 256 byte block size, for a total of
1024 bytes of cache. However, the data cache is
partitioned, with the first 2 lines used to cache the lower half of
physical memory (the first player's half), and the last 2 lines used to cache
the upper half of physical memory (the second player's half).
Cache Management: The version of MIPS we are using (MIPS-I)
lacks instructions to manage the cache, but we have provided system
calls instead. A bot can call
void invalidate(void *ptr);
to remove from the cache any block containing the data pointed to
by ptr. This only works if the bot has access to the
memory in question at the time the call is made. Similarly, when a
bot calls
void prefetch(void *ptr);
the cache will start to fetch the block containing the given
address into the cache. The call returns immediately with no stalling,
and roughly 100 cycles later the data will arrive in the cache and be
available for use. This call only works if the bot actually has
access to the specified block of memory. For each cache line, there
can be at most one outstanding fetch in progress. Prefetch requests
are ignored if a fetch is already in progress for that line (in this
context, "fetch" means either an earlier prefetch, or a core waiting
on the cache line to fill because of a cache miss), and actual data
fetch requests (e.g. from load/store instructions, rather than
prefetch requests) supersede any prefetch requests that are in
progress. There are additional system calls to read the current cache
tags and the tags for lines that are being fetched. This same
information is also available in the memory mapped
data-structures.
We have provided a header file (described below) containing various
useful macros for accessing your own code and data segments, and those
of your opponent. The header also describes in detail the memory
layout, cache organization, and available system calls.
Pointer arithmetic in C: When performing pointer
arithmetic in C, the compiler implicitly multiplies by the size of the
data type that is pointed to. For instance:
char *byteptr = (char *)0x1000;
byteptr[3] = 0xFF; // store one byte at memory three BYTES after ptr, so address 0x1000+3 = 0x1003
byteptr += 5; // increments pointer by five BYTES, to address 0x1005
int *wordptr = (int *)0x1000;
wordptr[3] = 0xFFFFFFFF; // store one word at memory three WORDS after ptr, so address 0x1000+12 = 0x100c
wordptr += 5; // increments pointer by five WORDS, to address 0x1000+20 = 0x1014
From this, we can see that the following code is almost certainly a
bug (two bugs, in fact), since the first statement advances by 12
cache lines instead of 3 cache lines, and the second statement
advances by 8 cache lines instead of 2 cache lines:
int *ptr = (int *)HOME_DATA_START + 3*CACHE_LINE;
ptr += 2*CACHE_LINE;
To advance by 3 cache lines (first statement) or 2 cache lines
(second statement) when using integer pointers, one could do:
int *ptr = (int *)HOME_DATA_START + (3*CACHE_LINE)/4;
ptr += (2*CACHE_LINE)/4;
Basic Strategy
Design code that makes efficient use of the data cache to fill
memory quickly, and find clever ways to beat your opponents, stop them
from posting their payload, and interfere with your opponent's
cache usage. And although you cannot directly modify your opponent's
registers, you can modify their call stack and global
variables. Coupled with what you know about exploiting programs, you
may even be able to get your opponent to work for you. At the same
time, you need to be on the lookout for cores from your opponent's
team. Because all of these tasks require processing time and access to
memory, you will need to carefully plan how to coordinate use of these
scarce resources.
To get started, we will supply several example teams that employ
some basic strategies. Their source code can be found in the same
directory as the Cache Collusion simulator. The example teams
are:
noop: The simplest bot there is; it does nothing.
greedy: Fills its own half of memory and the opponent's
half using a simple loop with a random start address.
taunter: Doesn't try to win, but instead dedicates all its
cores to annoying the opponent by (a) reading and writing
randomly within opponent's half of memory in order to fill
opponent's cache with garbage and (b) overwriting return
addresses on the opponent's stack with random values.
fbi: Dedicates one core to watching for opponent's
cores, with the other three cooperating to filling its own half
of memory. Catches greedy.
titfortat: Initially behaves like fbi. But whenever it
catches a opponent's core, it goes on the offensive and tries
to fill up opponent's half of memory. This requires some
inter-core communication through global variables.
Note: these teams do not do a particularly good job making effective use of
the cache. For instance, most write a single byte at a time to memory, and take
only a little care to ensure that their bots don't compete with each other for
cache lines.
Playing the Game
All of these instructions must be done on the VM or through SSH. For
this project, just as Buffer Overflow, it is particularly
important the work is done through either of these means as system behavior may be different on different systems
git pull from your repository to get the project 5
codebase.
Inside the p5 directory are two executables, cachecollusion and cachevall.sh,
and a directory twitter_bots/. Within twitter_bots/ should be a Makefile
and the folder src/. Within src/, you will
find sample course bots that are described above. You will code
your own bot's tactic within this folder.
Each team should create a single program, written in C and compiled
into a MIPS binary executable. The four cores on a team all share the
same code (though your program can obviously do different things
depending on the core_id parameter it receives from the
simulator). Going up from the src/ directory
(cd ..), and typing make will compile all
the bots within the src/ directory.
**Important!** By default, your new bot will not be optimized
by the compiler. You can (and should) tell the Makefile to compile your
bot with optimizations enabled. We highly recommend using the -O3 optimization
flag (the letter "O", not the number "0"). For example, if my bot is
named XYZ, I would add the following line to the Makefile.
FLAGS_XYZ = -O3
We have already included these optimizations in the Makefile for the course bots,
so you can look at those to make sure you're doing it correctly
In the main directory you will also find cachecollusion, the
simulator used to play the game. The simulator cachecollusion
takes two arguments, the names of the teams in the match, for
example:
You will also find simple shell script cachevall.sh which
takes the name of a single team and pits it against every other team
in turn. This lets you compare how one team does in comparison to all
the others:
Compilers don't always produce the code you expect them to, so you
will likely want to read the compiled assembly. As you have seen
before, you can decompile compiled code using objdump:
Finally you have two tools at your disposition. First, you will
want to read the cachecollusion.h header file in
the twitter_bots/src directory. It is the only header file you
should import into your bot programs, besides any you might write
yourself, but it is designed to be all you will need. It contains the
basic functions enable_proxy , report_to_twitter,
retreat and so on, along with various other system calls
like printf and rand. But more importantly, its
comment describes in detail how your program is laid out in memory and
how simulator statistics and other data is mapped into your address
space.
Your second major tool is the cachecollusion simulator which
has several useful command line options. Type ./cachecollusion
with no arguments to see a list.
Getting Started
As with many other 3410 projects, this writeup is quite long and a bit
intimidating at first. However, you don't necessarily need to even
know everything in this writeup to do well on this project (though if you're
aiming to win the 3410 elections, you probably will).
Probably the best place to get started is to look at the greedy bot. Make
sure you understand what it's doing. Generate the MIPS assembly code and check
out what that does. As described above, it's only writing 1 byte at a time - how
could you improve that?
Next, make sure you understand how the cache works, ignoring prefetch and invalidate
for now. You have 4 bots to use - what would be the most efficient method to write such that
you take full advantage of the cache lines?
Now dive into prefetch - how can you use this to reduce the amount of time your cores will
be sitting around doing nothing just waiting to be able to write?
Now start developing a strategy. In no particular order, how can you best play defense
and stop the opponent's cores from writing to your memory? What's the best strategy to write to
your opponents memory? How can you best utilize the tweets available to you? What other clever
techniques can you find to come out on top?
Finally, have fun with this project! There's lots of techniques that can be successful, so play
around and get creative, and we'll see you on election night!
FAQ
If I write a FAKE_NEWS on my opponent's side and I activate it (write to that location in memory), what will happen?
You will be stalled for 1,000,000 cycles.
If I launch a DDoS attack and within 100 cycles I launch another one, do the durations overlap or accumulate?
The durations will accumulate, so the duration will increase by 100 cycles.
I want to access my own stack, or my opponent's stack. How do I know where it is?
To start off, for each core there is a mips_core_data structure which contains an array R. This array stores the content of the registers for that core. Work from there.
How do we determine what addresses the opponent currently has in the
cache?
Use the rdctag function to read a cache tag. Check titfortat.c for an
example of the proper usage.
What is the policy for coherence between the cache and memory?
The cache is write-allocate: If you want to write to an address, it
must be in the cache first (and therefore you suffer a cache miss if it
wasn't in there already).
The cache is write-through: Once a line is in the cache, any writes to it
immediately appear in memory (and therefore immediately increase your
score).
Finally, the cache automatically updates from memory: In particular, if you
have the taunt array in cache, and a new bot enters enable_proxy mode, the taunt
array in cache updates immediately.
How should we time our prefetches? How does prefetching work, anyway?
It's possible to prefetch too early, such that the line you prefetch
arrives in the cache while you're still working on a different line. In
this case, you'll doubly penalize yourself, as you'll suffer a miss for
the line you're currently accessing, and then miss on the line you meant
to prefetch in the first place. Oops! Don't do that.
If you prefetch slightly late, you only lose the difference. If address
0x01000000 is not in the cache, but you prefetch it, wait 50 cycles, then try
to write to/read from it, you'll still miss but only stall 50 cycles instead of
the usual 100, since 50 of the cycles were alerady taken up by the prefetch.
This is why prefetching is beneficial. If you run the simulator in "very
verbose mode" via simulate -vvvv, you'll see messages like "upgrades existing
prefetch", and this is the meaning of that message.
On the other hand, if address 0x01000400 is in the cache, you prefetch
0x01000200, then try to read/write 0x01000000, the prefetch is canceled, You
miss 0x01000000, and stall the full 100 cycles, no matter how far along the
prefetch was.
Can I call prefetch, but continue working on the line currently in the cache?
Of course. prefetch(ptr) does not automatically call invalidate(ptr); that's why they're separate function calls. The line currently in the cache is still valid until the prefetch finishes.
Since only one prefetch can be in progress for a line at a time, what happens if we call prefetch twice?
The earlier one sticks. The writeup states "prefetch requests are ignored if a fetch is already in progress for that line", and in this context "fetch" means either an earlier prefetch or a core waiting on the cache line to fill due to a cache miss.
What activates a FAKE_NEWS, reading or writing?
FAKE_NEWS tweets are triggered when any core attempts to write to a memory location that currently holds the trap byte.
Such an attempt causes the write to fail and the writing core to be stalled for 1,000,000 (STALL_ON_FAKE_NEWS_CONFUSION) cycles.
Since the write fails, multiple cores can be trapped at the same memory location.
FAKE_NEWS tweets are not triggered by reading memory.
Examples
Here are some visualizations of some of the example bots competing against each other.
You can run the GUI yourself on the course VM! To run the GUI, first run sudo apt-get update then sudo apt-get install default-jre and then replace ./cachecollusion with ./gui-cachecollusion with the same arguments.
Grading
Your grade for this assignment will have three parts.
50 points for designing a bot that beats all of the
example teams above every time it plays them.
50 points for designing a bot that beats several secret
course bots. These course bots will use several different
techniques and be of varying difficulty. You should program
defensively for various strategies that you think may arise. If
you beat at least one of the secret course bots, you are
guaranteed at least 20 points out of the 50.
Bonus!! for designing one of the top bots in the 3410
election. The tournament will pair your team against others in
series. The overall elimination strategy will be single elimination, all or nothing, winner take all.