CS 414/415 Programming Project 4
Emin Gün Sirer
Overview
Projects 1 through 3 covered many of the essential aspects of systems
programming. The goal of project 4 is to allow you to showcase what
you have learned in 414/415 by building an interesting system component.
There is no set assignment. You are free to pick any one of the problems
listed below, or to persue other projects that
There are very few ground rules for project 4. You may develop in any
language you like, including C, Java, assembler, or whatever other
language you choose. You may build on minithreads, or you may develop
code that runs on top of Linux, Windows NT, CE, or bare hardware. You
may even integrate other people's code into your projects, provided
that the part of the project you write is substantial and embodies
an interesting systems concept. In fact, we encourage you to save work
whereever possible by reusing code components such as GUI toolkits,
image manipulation libraries, and regular expression packages.
Project Options
-
Idea:
Peer-to-peer messaging application: Implement DSR,
AODV or any other
ad-hoc routing protocol to ferry messages from computer to computer
without resorting to a central host.
Implementation:
If you do this on top of minithreads, it will be easy to port to
PDAs. You will have to develop some test-bed software
to make a cluster of workstations behave like an ad-hoc network -
luckily, this only involves restricting each node to talk to only
a subset of the other nodes in the network. Make sure that this ad-hoc
network simulation component is a clean, separable module, as we might
want to use it in future classes on its own.
Demo:
N students sit at N machines, a student sends an ICQ-like message to
another student, it gets routed automatically through M machines to
get to its destination. Extra points for a GUI for sending messages,
and for displaying incoming messages in popup windows.
Variants:
DSR, AODV, ZRP, etc. Image files instead of plain text messages.
-
Idea:
Ordered communication subsystem. Implement a chat system
where N nodes participate in a shared, globally broadcast
conversation. Ensure that if person A sends message 23 to the group in
response to message 57 from person B, all parties to the conversation
observe message 23 from A AFTER message 57 from B.
Implementation:
You will undoubtedly want to use a causally-ordered broadcast
primitive.
These papers from the Horus and Isis projects describe some alternatives.
Demo:
A lot like the peer-to-peer messaging demo, except no one can observe
messages out of order.
-
Idea:
Peer-to-peer filesharing. Implement Freenet, as described in class and
in their papers.
Implementation:
You will probably need a software package that implements the
DES,
MD5 or
SHA algorithms.
Demo:
I ask for a Metallica album that you have made available, my machine
finds it, decrypts it and caches it for future use (for this demo, we
will purchase two copies of a Metallica album).
-
Idea:
Distributed event notification system. There are N hosts arranged in
an arbitrary network, with clients attached. Clients can register to
be notified when an event occurs, and they can raise events. When a
client generates an event, e.g. "X happened", everyone who registered
to be notified of the X event gets a message.
Implementation:
The trick is to do it first by using dumb flooding, and then to move
to an intelligent event-filtering scheme that will scale. Note that the
excitement level of this project is closely indexed to how expressive
a language you provide for event handlers. Here are some definitions
you will need:
typedef struct EventHandler EventHandler;
typedef struct Predicate Predicate;
typedef enum Op Op;
enum Op { EQUAL, LE, LT, GE, GT };
struct Predicate {
Op op; // comparison operator
int offset; // offset of byte in event array
int value; // value to compare to
};
struct EventHandler {
Predicate p;
Predicate *next;
};
void raiseevent(char *eventarray, int len) {} // event can be anything
void registereventhandler(EventHandler eh) {}
An event handler interprets predicates specified according to the
scheme above, e.g. "first byte of the event should be equal to "M",
second byte to "S", third to "F", fourth to "T" and the fifth byte
should be greater than 100.
Demo:
"Let me know when YAHOO stock goes above 80," or "let me know when
someone is sharing Metallica songs."
-
Idea:
Query caching. A wireless host has some information that satisfies a
predicate cached. When another host in the vicinity needs info that
satisfies the same predicate, its query gets routed to the host that
has that info cached.
Implementation:
This is a lot like peer-to-peer filesharing. The clever bit is the
representation of the predicate for easy lookup. Start with a complete
text match, and then support regular expression matching on strings,
or perhaps string tuples.
Demo:
Student A says that his machine knows the answer to "Where are the
Starbucks stores in upstate NY ?" and provides a web page with
answers. Student B asks the question "Where are the
Starbucks stores in upstate NY ?" and gets the web page from student
A.
-
Idea:
Intrusion detection tool. Build a tool that examines the pattern of
system calls emitted by known-to-be-uncompromised applications and
extracts a pattern. Once the pattern has been extracted, the tool
runs in a security checking mode, and flags any deviation from the
pattern as a possible security compromise.
Implementation:
The easy way to do this is to take the trace as a textual input, or
else to layer the tool on top of strace, the Unix system call tracing
utility.
There are many variants of algorithms for extracting patterns.
If you are interested, we'll provide a set of papers - you can pick
one and implement just one simple scheme.
Demo:
You run your tool with a "good" trace followed by a "bad" trace, and
your tool flags that there may be a compromise. Extra points if the
false positive rate is low.
-
Idea:
Authentication and payment scheme. Assume that the big NT workstation
could be made as small as a PDA, be physically secure, and could carry
1G of data. You could carry them anywhere but they will not be
constantly connected. Come up with a scheme by which you could pay
someone without having to communicate with a bank.
Implementation:
These two papers by Doug Tygar et al. provide a survey and a possible
implementation.
Demo:
You issue some fake cash to machine A, it pays B, B returns the cash
to the bank at some point and gets credited. If A spends the same
money at C's store, A is caught when B and C both check in with the
bank.
-
Idea:
Secret sharing. A text file is partitioned between N hosts, and
can be decrypted when N/2+1 of those hosts get together.
Implementation:
Need to ensure that N/2+1 is both necessary and sufficient.
Look in Applied Cryptography by Bruce Schneier for different
algorithms. The simplest implementation relies on creating a linear
system of N/2+1 unknowns, which is solvable by any majority of the
secret holders. You can use matlab for solving linear systems of equations
easily. Or you can use the
GNU bignum package.
Demo:
A text file is encrypted and partitioned between N hosts. N/2+1 get
together and read the file.
-
Idea:
Leader election. N hosts agree that host A is their leader. If
host A dies, they detect who is up and elect one as a leader. If
they get partitioned into smaller groups, the group that has N/2+1
members elects a leader, all the rest sit idly until reconnected.
Implementation:
Electing leaders is often an important subcomponent of distributed
algorithms. Can assume that each host has a unique number.
Demo:
Run N copies of your system, kill off N/2-2, see that they elect
a new leader.
-
Idea:
Software RAID. The idea is to build a filesystem that computes
and stores redundant information about a file such that it can
recover from errors.
Implementation:
You can build your RAID filesystem on top of a regular
filesystem. Simply partition file A into files A0, A1, A2, etc.
Build this component so that it can be mounted as an NFS filesystem on
a Unix machine. Look into one of the user-level NFS server
implementations on the web, throw everything out and replace with your
code.
Demo:
We store a file in your system, delete one of the underlying storage
units (i.e. simulate disk crash), and check if we can still read the
file.
Variants:
The higher the RAID level, the better the project implementation.
-
Idea:
Automatic caching and prefetching. Write a web proxy, and prefetch
files that you see used frequently or periodically.
Implementation:
A web proxy is easy to write in Java, but likely to be slow. Use any
learning algorithm to manage the cache on the proxy. A possible
problem is that the percentage of documents marked cacheable may be
small. In that case, you can go ahead and cache documents marked
uncacheable - you can then detect two consecutive requests for the
same document from the same client, and take that as an indication
that the cached copy is out of date.
Demo:
We set our web browser to use your proxy instead of going to the
network directly, and see a performance improvement.
-
Idea:
Quality of service and multimedia. Implement a protocol for
transferring multimedia files over lossy networks with changing
bandwidth and congestion parameters. A simple way to do this
is to structure each file transfer as N (where N is a small number
like 4) parallel transfers. Connection 1 transfers the important,
coarse grain images, connection 2 has more detail, conn 3 carries even
more detail, and so forth. In response to packet loss in connection M,
reduce bandwidth consumption at levels >= M, so you give up on the
details but the coarse image still makes it through
Implementation:
You can use the
Intel
JPEG library
or the
JPEG library from the Independent JPEG Group.
for encoding images with different levels of detail. In you build on top of minimessages, you will be able to detect any loss on any
connection, and thus react appropriately.
Demo:
Send and display a picture, while we vary the network speed and
congestion. Use video (and an MPEG recoder) instead of still images
for a far more impressive demo.
-
Idea:
Process migration. Migrate a minithread from one host to another.
Implementation:
Assume that the minithread has no files open or other system resources
in use at the time of migration.
Demo:
Computation moves from host to host.
-
Idea:
Measure and compare the performance of different IPC mechanisms, e.g.
pipes, FIFOs, message queues, shared memory, provided by an OS of your
choice.
Implementation:
Under Unix or Windows. Need to be complete.
Demo:
A set of graphs.
-
Idea:
Implement and test the performance of different scheduling algorithms
(RR, FIFO, SJF, priority scheduling, multilevel feedback queue, ...) in minithreads.
Implementation:
Need to implement the algorithms as well as an interesting test case
that behaves differently and showcases each algorithm.
Demo:
-
Idea:
Deadlock detection. Add mutexes and condition
variables to minithreads. Implement deadlock detection.
Implementation:
Demo:
We feed some bad minithreads code into your system, your code detects
the deadlock problem.
-
Idea:
Implement a "virtual file system" for the minithreads package over a
single sequential file. Normal file read/writes should be supported.
Implementation:
Provide a block-level interface, i.e. minidisk_getblocksize(), minidisk_read(blockno) and minidisk_write(blockno, data), that operates on a single large file. Use these block-level primitives to implement a hierarchical filesystem. Your filesystem should support eight primitives:
minifile_t minicreat(char *filename),
minifile_t miniopen(char *filename, int mode),
int miniread(minifile_t file, char *data, int maxlen),
int miniwrite(minifile_t file, char *data, int len),
miniclose(minifile_t),
miniunlink(char *filename),
minimkdir(char *dirname),
minirmdir(char *dirname)
.
Look up these functions without the "mini" prefix on Unix or NT to get the documentation on what they are supposed to do.
Demo:
We create directories and files using your filesystem and minifile_
operations.
Variants:
UFS or LFS.
-
Idea:
Remote execution service. A program provides a script to run. Your system
selects the most appropriate machine, and maybe runs multiple copies of the
script to guard against failure.
Implementation:
Initially assume that scripts have no side effects (e.g. they do not
write files to disk). Then capture the side effects and commit to disk
only once when the first copy completes. You can use a safe-language
interpreter (e.g. python).
Variants:
You can focus on the runtime design instead.
Demo:
We run start some code on Host A, it runs pieces of computation on
hosts B, C and D and returns.
-
Idea:
Secure location service.
Implementation:
Demo:
"I am logged in to this computer. I want person A to be able to know this,
but not person B"
-
Idea:
Memory allocator with garbage collector.
Implement your own memory allocator for minithreads, which only calls
malloc for relatively large, fixed-size blocks of memory (say, 4K at a
time). Applications call minimalloc to get memory from your system.
No one ever calls minifree. Your collector scans all the thread
stacks and TCBs every now and then (say every 1000th minimalloc) and
finds which of the allocated objects are still in use. It returns
the unused objects to the free pool.
Implementation:
This is called a mark-sweep collector. You can also use it to detect
memory leaks. See this paper for details: Boehm, H., and M. Weiser, "Garbage Collection in an Uncooperative Environment", Software Practice & Experience, September 1988, pp. 807-820.
Demo:
for(;;) minimalloc(100); does not run out of memory.
-
Idea:
Encryption for minimessages. Encrypt minimessages such that eavesdroppers
cannot tell what is being transmitted.
Implementation:
Start with a shared-key symmetric cipher, like DES. Add diffie-helman
key exchange.
Demo:
tcpdump yields no information.
-
Idea:
Anonymous Message Broadcast. There are n users who want to
communicate with one another anonymously. Every user should be able to
broadcast a message to all other users, but no other user should be able
deduce who among the n sent the message.
Implementation:
You can use matlab, or better yet, to call matlab's linear system
solving routines from your C program, for this project.
Demo:
-
Idea:
Broadcast Routing for Real Time Multi-Media Streams,
using client based multicasting.
Implementation:
Multicast routing efficently utilizes bandwidth by
eliminating duplicate streams to clients located near
each other on the network.
Demo:
Transfer MP3 files using this multicast transport mechanism.
How to Get Started
After you have selected one of the projects from the list above, send
an email to egs@cs.cornell.edu where the subject line says "GROUP #GROUPNO: PROJECT #PROJECTNO" where GROUPNO is your group's number, and PROJECTNO is the number of the project from the list above.
Grading
The usual criteria apply: correctness, robustness, and elegance. An
additional criterion for this project is the challenge posed by the
problem picked, and the flashiness of the resulting system.
If your system requires that you be present to set up and demonstrate, i.e.
us running demo.exe will not be able to showcase your project well, then
please arrange to meet with the course staff for a 10-minute demo.
Submissions
Follow the following steps to submit.
- Log into CSUGLAB using the login id given to you in class. This login id
should be a member of the cs415 group.
- Create your submission:
- Get your group numbers from the Group Assignments
link on the web-page. Create a folder named as proj4-yourgroupnumber.
So if your group number is, say, 10, then you would create proj4-10.
- Create a subfolder within this folder named
Code .
- Put all your .c files and project files into the Code
folder. We should be able to go to this folder and just click on your
project file to open up all your working code.
- Check to make sure that you can compile your code and
generate an executable - this is what we will be doing to test it.
- Put a README file into the proj4-yourgroupnumberfolder.
It should contain:
- Your names, netid's and group number
- Describe whether or not your code works, and how you tested
it. Include all test programs.
- Check your submission:
- Check: that you are able to open your project by double-clicking on
it (in the Code folder).
- Check: that you are able to run your demo by
double-clicking on the executable file in the Executable
folder.
- Check: that Executable and Code
folders, and README is
actually within the main folder you created (i.e.,
with your netid).
- Submit:
- Log into goose using the same id as in step 1.
- Right click on the proj4-yourgroupnumber folder
you created above, and
choose copy.
- Go to the folder \Courses\cs415-sp01\Project4-submit on goose,
right click on this folder, and choose paste.
- Do not drag and drop the proj4-yourgroupnumber
folder into the \Courses\cs415-sp01\Project4-submit folder. We will not be able to access it.
- Your project will have been submitted ! You won't get any confirmation, or
be able to view anyone' submissions (including your own).
- If you absolutely need
to submit a second time, create a proj4-resubmit-n-yourgroupnumber
(where n is higher than your earlier
submissions) and repeat the above steps with it instead of proj4-yourgroupnumber.
So, if your group number is 10, your first resubmission would be proj4-resubmit-2-10,
your second resubmission would be proj4-resubmit-3-10
...
Final Word
If you need help with any part of your project, we are
here to help.
cs414@cs.cornell.edu