CS 414/415 Programming Project 4

CS 414/415 Programming Project 4
Emin Gün Sirer

Overview

Projects 1 through 3 covered many of the essential aspects of systems programming. The goal of project 4 is to allow you to showcase what you have learned in 414/415 by building an interesting system component. There is no set assignment. You are free to pick any one of the problems listed below, or to persue other projects that

There are very few ground rules for project 4. You may develop in any language you like, including C, Java, assembler, or whatever other language you choose. You may build on minithreads, or you may develop code that runs on top of Linux, Windows NT, CE, or bare hardware. You may even integrate other people's code into your projects, provided that the part of the project you write is substantial and embodies an interesting systems concept. In fact, we encourage you to save work whereever possible by reusing code components such as GUI toolkits, image manipulation libraries, and regular expression packages.

Project Options

Idea: Peer-to-peer messaging application: Implement DSR, AODV or any other ad-hoc routing protocol to ferry messages from computer to computer without resorting to a central host.
Implementation: If you do this on top of minithreads, it will be easy to port to PDAs. You will have to develop some test-bed software to make a cluster of workstations behave like an ad-hoc network - luckily, this only involves restricting each node to talk to only a subset of the other nodes in the network. Make sure that this ad-hoc network simulation component is a clean, separable module, as we might want to use it in future classes on its own.
Demo: N students sit at N machines, a student sends an ICQ-like message to another student, it gets routed automatically through M machines to get to its destination. Extra points for a GUI for sending messages, and for displaying incoming messages in popup windows.
Variants: DSR, AODV, ZRP, etc. Image files instead of plain text messages.
Idea: Ordered communication subsystem. Implement a chat system where N nodes participate in a shared, globally broadcast conversation. Ensure that if person A sends message 23 to the group in response to message 57 from person B, all parties to the conversation observe message 23 from A AFTER message 57 from B.
Implementation: You will undoubtedly want to use a causally-ordered broadcast primitive. These papers from the Horus and Isis projects describe some alternatives.
Demo: A lot like the peer-to-peer messaging demo, except no one can observe messages out of order.
Idea: Peer-to-peer filesharing. Implement Freenet, as described in class and in their papers.
Implementation: You will probably need a software package that implements the DES, MD5 or SHA algorithms.
Demo: I ask for a Metallica album that you have made available, my machine finds it, decrypts it and caches it for future use (for this demo, we will purchase two copies of a Metallica album).
Idea: Distributed event notification system. There are N hosts arranged in an arbitrary network, with clients attached. Clients can register to be notified when an event occurs, and they can raise events. When a client generates an event, e.g. "X happened", everyone who registered to be notified of the X event gets a message.
Implementation: The trick is to do it first by using dumb flooding, and then to move to an intelligent event-filtering scheme that will scale. Note that the excitement level of this project is closely indexed to how expressive a language you provide for event handlers. Here are some definitions you will need:
```
typedef struct EventHandler EventHandler;
typedef struct Predicate Predicate;
typedef enum Op Op;

enum Op { EQUAL, LE, LT, GE, GT };

struct Predicate {
     Op op;  // comparison operator
     int offset; // offset of byte in event array
     int value;  // value to compare to
};

struct EventHandler {
     Predicate p;
     Predicate *next;
};

void raiseevent(char *eventarray, int len) {} // event can be anything

void registereventhandler(EventHandler eh) {}
```
An event handler interprets predicates specified according to the scheme above, e.g. "first byte of the event should be equal to "M", second byte to "S", third to "F", fourth to "T" and the fifth byte should be greater than 100.
Demo: "Let me know when YAHOO stock goes above 80," or "let me know when someone is sharing Metallica songs."
Idea: Query caching. A wireless host has some information that satisfies a predicate cached. When another host in the vicinity needs info that satisfies the same predicate, its query gets routed to the host that has that info cached.
Implementation: This is a lot like peer-to-peer filesharing. The clever bit is the representation of the predicate for easy lookup. Start with a complete text match, and then support regular expression matching on strings, or perhaps string tuples.
Demo: Student A says that his machine knows the answer to "Where are the Starbucks stores in upstate NY ?" and provides a web page with answers. Student B asks the question "Where are the Starbucks stores in upstate NY ?" and gets the web page from student A.
Idea: Intrusion detection tool. Build a tool that examines the pattern of system calls emitted by known-to-be-uncompromised applications and extracts a pattern. Once the pattern has been extracted, the tool runs in a security checking mode, and flags any deviation from the pattern as a possible security compromise.
Implementation: The easy way to do this is to take the trace as a textual input, or else to layer the tool on top of strace, the Unix system call tracing utility. There are many variants of algorithms for extracting patterns. If you are interested, we'll provide a set of papers - you can pick one and implement just one simple scheme.
Demo: You run your tool with a "good" trace followed by a "bad" trace, and your tool flags that there may be a compromise. Extra points if the false positive rate is low.
Idea: Authentication and payment scheme. Assume that the big NT workstation could be made as small as a PDA, be physically secure, and could carry 1G of data. You could carry them anywhere but they will not be constantly connected. Come up with a scheme by which you could pay someone without having to communicate with a bank.
Implementation: These two papers by Doug Tygar et al. provide a survey and a possible implementation.
- Anonymous Atomic Transactions.
- NetBill: An Internet Commerce System Optimized for Network-Delivered Systems.
Demo: You issue some fake cash to machine A, it pays B, B returns the cash to the bank at some point and gets credited. If A spends the same money at C's store, A is caught when B and C both check in with the bank.
Idea: Secret sharing. A text file is partitioned between N hosts, and can be decrypted when N/2+1 of those hosts get together.
Implementation: Need to ensure that N/2+1 is both necessary and sufficient. Look in Applied Cryptography by Bruce Schneier for different algorithms. The simplest implementation relies on creating a linear system of N/2+1 unknowns, which is solvable by any majority of the secret holders. You can use matlab for solving linear systems of equations easily. Or you can use the GNU bignum package.
Demo: A text file is encrypted and partitioned between N hosts. N/2+1 get together and read the file.
Idea: Leader election. N hosts agree that host A is their leader. If host A dies, they detect who is up and elect one as a leader. If they get partitioned into smaller groups, the group that has N/2+1 members elects a leader, all the rest sit idly until reconnected.
Implementation: Electing leaders is often an important subcomponent of distributed algorithms. Can assume that each host has a unique number.
Demo: Run N copies of your system, kill off N/2-2, see that they elect a new leader.
Idea: Software RAID. The idea is to build a filesystem that computes and stores redundant information about a file such that it can recover from errors.
Implementation: You can build your RAID filesystem on top of a regular filesystem. Simply partition file A into files A0, A1, A2, etc. Build this component so that it can be mounted as an NFS filesystem on a Unix machine. Look into one of the user-level NFS server implementations on the web, throw everything out and replace with your code.
Demo: We store a file in your system, delete one of the underlying storage units (i.e. simulate disk crash), and check if we can still read the file.
Variants: The higher the RAID level, the better the project implementation.
Idea: Automatic caching and prefetching. Write a web proxy, and prefetch files that you see used frequently or periodically.
Implementation: A web proxy is easy to write in Java, but likely to be slow. Use any learning algorithm to manage the cache on the proxy. A possible problem is that the percentage of documents marked cacheable may be small. In that case, you can go ahead and cache documents marked uncacheable - you can then detect two consecutive requests for the same document from the same client, and take that as an indication that the cached copy is out of date.
Demo: We set our web browser to use your proxy instead of going to the network directly, and see a performance improvement.
Idea: Quality of service and multimedia. Implement a protocol for transferring multimedia files over lossy networks with changing bandwidth and congestion parameters. A simple way to do this is to structure each file transfer as N (where N is a small number like 4) parallel transfers. Connection 1 transfers the important, coarse grain images, connection 2 has more detail, conn 3 carries even more detail, and so forth. In response to packet loss in connection M, reduce bandwidth consumption at levels >= M, so you give up on the details but the coarse image still makes it through
Implementation: You can use the Intel JPEG library or the JPEG library from the Independent JPEG Group. for encoding images with different levels of detail. In you build on top of minimessages, you will be able to detect any loss on any connection, and thus react appropriately.
Demo: Send and display a picture, while we vary the network speed and congestion. Use video (and an MPEG recoder) instead of still images for a far more impressive demo.
Idea: Process migration. Migrate a minithread from one host to another.
Implementation: Assume that the minithread has no files open or other system resources in use at the time of migration.
Demo: Computation moves from host to host.
Idea: Measure and compare the performance of different IPC mechanisms, e.g. pipes, FIFOs, message queues, shared memory, provided by an OS of your choice.
Implementation: Under Unix or Windows. Need to be complete.
Demo: A set of graphs.
Idea: Implement and test the performance of different scheduling algorithms (RR, FIFO, SJF, priority scheduling, multilevel feedback queue, ...) in minithreads.
Implementation: Need to implement the algorithms as well as an interesting test case that behaves differently and showcases each algorithm.
Demo:
Idea: Deadlock detection. Add mutexes and condition variables to minithreads. Implement deadlock detection.
Implementation:
Demo: We feed some bad minithreads code into your system, your code detects the deadlock problem.
Idea: Implement a "virtual file system" for the minithreads package over a single sequential file. Normal file read/writes should be supported.
Implementation: Provide a block-level interface, i.e. minidisk_getblocksize(), minidisk_read(blockno) and minidisk_write(blockno, data), that operates on a single large file. Use these block-level primitives to implement a hierarchical filesystem. Your filesystem should support eight primitives: minifile_t minicreat(char *filename), minifile_t miniopen(char *filename, int mode), int miniread(minifile_t file, char *data, int maxlen), int miniwrite(minifile_t file, char *data, int len), miniclose(minifile_t), miniunlink(char *filename), minimkdir(char *dirname), minirmdir(char *dirname). Look up these functions without the "mini" prefix on Unix or NT to get the documentation on what they are supposed to do.
Demo: We create directories and files using your filesystem and minifile_ operations.
Variants: UFS or LFS.
Idea: Remote execution service. A program provides a script to run. Your system selects the most appropriate machine, and maybe runs multiple copies of the script to guard against failure.
Implementation: Initially assume that scripts have no side effects (e.g. they do not write files to disk). Then capture the side effects and commit to disk only once when the first copy completes. You can use a safe-language interpreter (e.g. python).
Variants: You can focus on the runtime design instead.
Demo: We run start some code on Host A, it runs pieces of computation on hosts B, C and D and returns.
Idea: Secure location service.
Implementation:
Demo: "I am logged in to this computer. I want person A to be able to know this, but not person B"
Idea: Memory allocator with garbage collector. Implement your own memory allocator for minithreads, which only calls malloc for relatively large, fixed-size blocks of memory (say, 4K at a time). Applications call minimalloc to get memory from your system. No one ever calls minifree. Your collector scans all the thread stacks and TCBs every now and then (say every 1000th minimalloc) and finds which of the allocated objects are still in use. It returns the unused objects to the free pool.
Implementation: This is called a mark-sweep collector. You can also use it to detect memory leaks. See this paper for details: Boehm, H., and M. Weiser, "Garbage Collection in an Uncooperative Environment", Software Practice & Experience, September 1988, pp. 807-820.
Demo: for(;;) minimalloc(100); does not run out of memory.
Idea: Encryption for minimessages. Encrypt minimessages such that eavesdroppers cannot tell what is being transmitted.
Implementation: Start with a shared-key symmetric cipher, like DES. Add diffie-helman key exchange.
Demo: tcpdump yields no information.
Idea: Anonymous Message Broadcast. There are n users who want to communicate with one another anonymously. Every user should be able to broadcast a message to all other users, but no other user should be able deduce who among the n sent the message.
Implementation: You can use matlab, or better yet, to call matlab's linear system solving routines from your C program, for this project.
Demo:
Idea: Broadcast Routing for Real Time Multi-Media Streams, using client based multicasting.
Implementation: Multicast routing efficently utilizes bandwidth by eliminating duplicate streams to clients located near each other on the network.
Demo: Transfer MP3 files using this multicast transport mechanism.

How to Get Started

After you have selected one of the projects from the list above, send an email to egs@cs.cornell.edu where the subject line says "GROUP #GROUPNO: PROJECT #PROJECTNO" where GROUPNO is your group's number, and PROJECTNO is the number of the project from the list above.

Grading

The usual criteria apply: correctness, robustness, and elegance. An additional criterion for this project is the challenge posed by the problem picked, and the flashiness of the resulting system.

If your system requires that you be present to set up and demonstrate, i.e. us running demo.exe will not be able to showcase your project well, then please arrange to meet with the course staff for a 10-minute demo.

Submissions

Follow the following steps to submit.

Log into CSUGLAB using the login id given to you in class. This login id should be a member of the cs415 group.
Create your submission:
1. Get your group numbers from the Group Assignments link on the web-page. Create a folder named as proj4-yourgroupnumber. So if your group number is, say, 10, then you would create proj4-10.
2. Create a subfolder within this folder named Code .
3. Put all your .c files and project files into the Code folder. We should be able to go to this folder and just click on your project file to open up all your working code.
4. Check to make sure that you can compile your code and generate an executable - this is what we will be doing to test it.
5. Put a README file into the proj4-yourgroupnumberfolder. It should contain:
- Your names, netid's and group number
- Describe whether or not your code works, and how you tested it. Include all test programs.
Check your submission:
1. Check: that you are able to open your project by double-clicking on it (in the Code folder).
2. Check: that you are able to run your demo by double-clicking on the executable file in the Executable folder.
3. Check: that Executable and Code folders, and README is actually within the main folder you created (i.e., with your netid).
Submit:
1. Log into goose using the same id as in step 1.
2. Right click on the proj4-yourgroupnumber folder you created above, and choose copy.
3. Go to the folder \Courses\cs415-sp01\Project4-submit on goose, right click on this folder, and choose paste.
4. Do not drag and drop the proj4-yourgroupnumber folder into the \Courses\cs415-sp01\Project4-submit folder. We will not be able to access it.
Your project will have been submitted ! You won't get any confirmation, or be able to view anyone' submissions (including your own).
If you absolutely need to submit a second time, create a proj4-resubmit-n-yourgroupnumber (where n is higher than your earlier submissions) and repeat the above steps with it instead of proj4-yourgroupnumber. So, if your group number is 10, your first resubmission would be proj4-resubmit-2-10, your second resubmission would be proj4-resubmit-3-10 ...

Final Word

If you need help with any part of your project, we are here to help.

cs414@cs.cornell.edu