CS5412 Spring 2012 Project Ideas
How CS5412 Projects Work:
Below we have a list of projects broken into three categories: easy, medium
and hard. The idea is that an easy project would be safe for a single
person who is unsure of his or her skills to do on their own for the class.
The medium and hard projects would be suitable for a pair of students to tackle
jointly (they would get the identical grade and so should do equal
work). At the end of the semester you will tell us if you did the work as
a team of two. If a team falls apart, you will need to each finish up
separately.
How to tell us what project you are doing:
We need you to tell us what project you will be doing. By February 15,
please upload a one-page (two pages, absolute maximum) document to the CMS
system (http://cms.csuglab.cornell.edu)
into the "Project Plan" assignment. Later, on the last day of classes when
we review projects, we'll have this original plan with us and will want you to
explain how you departed from the plan if the thing you actually do isn't quite
what you originally had in mind.
This document should have the following information:
1) The full list of project participants (either just yourself, or if you
work as a two-person team, full names and netids for both of you). Both of
you must upload the same document, separately, to your respective CMS accounts.
2) The project title and difficulty level (either from the list below or,
if you propose a new project, similar in style).
3) The short description copied from below or, if you propose a new
project, similar in length and style
4) If you will use your project for MEng credit, a sentence saying
"This project will be used for CS MEng credit, approved by (Ken/Hussam/Qi/Z/Li).
Please note the rules given below! You must respect them or we can't
approve the CS MEng credit request. The name to list can be Professor
Birman or one of the TAs and that person must meet with you, discuss your plan
for MEng project credit and approve your plan.
5) A paragraph on what you will do to carry out the project: "We
will be downloading the Isis2 and Live Distributed Objects system, building them
under Windows, implementing our own architecture for monitoring electric-power
"phasor management unit" devices (PMUs), placing simulated PMUs on the NSF GENI
testbed...." etc. You can evolve this plan later if needed; the one you
file is an initial concept.
6) A paragraph explaining how you will demonstrate the project (on
completion, we will have a visual demo and a poster. The demo will
show....) This can evolve over time too.
7) If team of two, who will do what.
MEng Project Credit:
If you wish to use CS5412 for MEng project credit, just sign up for 3
credits, graded, of CS5999 with Professor Birman's code. We will use the
CS5412 grade as the CS5999 grade. Note that this means your quiz scores in
CS5412 actually count towards you CS5999 grade too.
MEng projects done by a single person must of medium or hard difficulty.
We rarely approve MEng projects for teams of two and when we do, they must
always be hard and we must always have a clear explanation in advance of who
will do what and why both students would deserve MEng credit for the work.
Due Date: CS5412 projects are due on the last day of
the course, which is set aside as a project demo day. On request, short
extensions of at most 10 days may be granted, but you must request the
extension, explain precisely why you need extra time, and get actual permission
from Professor Birman or a TA, in writing. Otherwise, late projects will
be reviewed during the same 10 day period but if you didn't get permission to
finish late, a penalty to your grade may apply (e.g. A+ work might get an A
grade if you finished a week late and didn't have permission to work a week
longer).
Grading: Your MEng project will be graded by doing a
demo and also presenting a poster that shows what you did to the grading team
composed of Professor Birman and the TAs. We grade in the range B to
A+ for most projects. Sometimes a very weak effort may receive a B- or
lower. Our aim is to have the median grade be on the B+/A- border: half
above and half below.
To get an A+ in CS5412 you must be one of the very best projects that
the team saw. We award very few A+ grades. Sometimes we don't award
any; more often, four or five students in the entire class might receive an A+.
The quiz scores also count towards your CS5412 grade.
Extra credit: An MEng project shown at the BOOM
projects fair will recieve extra credit (e.g. B work might receive a B+ grade).
However, extra credit will not boost your grade beyond A.
Projects not on our List: You can suggest a project of
your own but it should be similar to the ones on the list and you should tell us
which chapters of the textbook you hope to draw on in developing your solution.
We do not allow CS5412 projects to come from completely different courses or
areas. Thus while you might manage to find a project that overlaps between
the security class and the cloud computing class (in which case we would
probably let you do the one project for both courses), more often it would be
hard to pull that off because the coursex cover different material. A
CS5412 project, in short, must be based on what we learn in the CS5412 class.
-
[Easy] Build a
distributed web crawler and indexer.
Build a system that uses multiple crawler
processes (potentially on tens of machines) to crawl a set of websites
(bit-torrents or any other open system also work), and process/organize the
data in a way that is easy to search. One example is to create an index page
listing the acquired data through the crawl. Students can use Isis2 to ease
their development.
- [Easy] Integrate the
Isis2 system with the Live Objects system, both available for
download from codeplex.com. By creating a new kind of network
monitoring "sensor" show how this solution could let a cloud management team
easily build applications to monitor the network behavior of cloud-hosted
applications.
- [Easy]
Experiment to see how fast FaceBook updates propagate and how consistent
their data is. Again, you would mix updates coming from
Cornell with test systems running on PlanetLab (or vice versa).
- [Medium] Develop a
RAMCloud-like System. The system need not be a kernel module, but
can instead reside in user-space.
- [Medium] Develop a
DiskCloud-like System. The students can use Fuse to build a
distributed storage service that can be mounted as a local file system.
Note: this is a reuse of the project in Hakim's distributed storage class.
- [Medium] P2P
MapReduce. The idea is to build a distributed processing platform
based on a P2P substrate. Students should write code to schedule and manage
the processes running the different tasks. The platform need not be
MapReduce, it just needs to be a platform that cam take in jobs and process
them in a distributed fashion.
- [Medium] Design
a P2P solution for mobile users who want to connect with their friends or
avoid some people. It should run on mobile phones and
let you set your status (in a hurry, eager to hang-out, etc) and on that
basis, if two people come within some range of each other could pop up a
notification of the right kind.
-
[Medium] Build a
distributed storage system with a simple client interface like dropbox.
Clients can connect through a web interface or a
desktop application. The system should support different users with
different files. The system should scale with the addition of new servers
and balance load across all servers. Data should be replicated to tolerate
server failure. Other interesting features may be added as well
- [Medium] Using gossip,
build a system that runs on a cloud and senses DDoS attacks on some of its
first-tier applications. You should invent an
instrumentation API to sense the events that you are using as your
indication of an attack (so the application would "help" by providing you
with information). Your service should have a visual GUI that an
administrator could use to see where hot-spots are arising and maybe even a
way to tell a service to shift from a hot-spot to some other node that isn't
under attack.
- [Medium] Implement a
gossip-based failure detector inside cloud systems. A group nodes
should organize themselves by using gossip protocol, and monitoring its
neighbors' healthiness. Since failures in cloud can be server-based,
rack-based and even cluster-based, it is worthwhile thinking about exploring
group's geological deployment and optimize a layered neighboring selection.
For example, virtual machines on the same server should be 1st-layer
neighbors, servers in the same rack for 2nd-layer. Then, different fan-out
and gossip intervals can be tuned for less cross-layer traffic but fast
failure notification across the whole group. (Leave out the virtual host
detection feature if this is too hard.)
- [Medium/Hard]
Build a purely P2P version of Twitter that has strong
guarantees of privacy and also anonymity. It should have a notion of
groups that can be created and with access keys that get shared outside of
the system. A user should be able to post private messages that only
members of the right groups can access, and the system should be designed so
that even if someone was spying on it, they would not be able to tell who
posted which message.
-
[Medium] Distributed Query Processing Service: Using
Cornell's new MiCA gossip programming language, implement the following
application. Individual nodes maintain their own logs,
which are collated to form a system-wide log partially ordered using vector
clocks. Queries can be executed on the log, and ideally a query's result
should be updated continuously as new events are appended to the log
(without recomputing the result from scratch). For example, every node
might periodically log its CPU usage and available disk capacity.
Reasonable queries would be "what's the average CPU usage system-wide?" and
"what's the total available disk space in the system?"
-
[Medium] Distributed Log Service: Using
Cornell's new MiCA gossip programming language, implement
cloud
tomography: IInfer the shape of the underlying
communication network through gossip.
-
[Medium]:
Using Cornell's new MiCA gossip programming language, implement
sensor
network visualization. Make a general system for visualizing
the output of a gossip-based sensor network on Google Maps using PlanetLab.
-
[Medium]: Using Cornell's new MiCA gossip programming language,
reinvent
Facebook: Create an eventually consistent distributed
publish-subscribe system with a social network interface.
-
[Hard]: Using Cornell's new MiCA gossip programming language,
implement
Gossip Objects as a MiCA layer. Gossip Objects
improves the performance of probabilisic publish-subscribe by speculatively
delivering messages to intermediary nodes which may in turn deliver messages
to their intended recipients.
-
[Medium]: Using Cornell's new MiCA gossip programming language,
Implement distributed
cache optimization. Memcached is a distributed
in-memory cache that stores key-value pairs for rapid lookup. Create a
gossip system that helps memcached nodes coordinate by speculatively caching
popular keys and evicting unpopular ones. (See also Beehive by
Ramasubramanian and Sirer)
-
[Hard] Using Isis2 design a replicated file system service
that brings a restarting file system "close" to synchronization with
existing active ones before joining and transfers just the remaining delta
of file system state, to minimize restart disruptions. Demonstrate it
on Amazon EC2 or RedCloud or Azure.
- [Hard] Using the
(delayed) feed of aircraft location data from the FAA, build a
high-assurance ATC system. Carefully justify the assurance
properties of the solution.
- [Hard] Customized web
service that adapts its behavior according to load. Monitor the
load servers and if the load is too high, may start more instances(EC2) to
serve request. If the load is low, may shutdown instances. Your solution
should make sure the instances are consistent with each other.
-
[Hard] Cloud Geo-Caching: build
a storage service to manage data on at least two geographical locations
(e.g. different Amazon AWS availability zones) and a local client. The local
client can be either a smart phone or a desktop application. The storage
service should transparently move the data between the different locations
(and perhaps do prefetching for reads) in order to minimize client-perceived
latency. The three locations (2 cloud and 1 local) should not be full
mirrors of one another. Instead, one of them should be a master and the
other 2 should act as caches to improve performance. The master can be
changed to a different location if the client relocates (and you need to be
able to show that). The space at each of the cache locations is limited.
For example, assume the client application is a photo viewer+editor. The
end-user might add new photos to his album on the local client, and the data
will transparently move to the master. If the client views a picture from
one album, the storage layer can perhaps pre-fetch all the pictures of that
album in order to minimize the latency for future views.
The use of caching in your project should minimize cost and improve
performance over a non-cached system.
- [Medium or Hard]
Port/Build an application based on Isis running on Android. Mono
has an android version and can support native C#/.Net code. So I am
interested in seeing an Android phone/tablet running something based on
Isis. I haven't look into it yet, and have no idea how to implement it so it
might be listed harder than it should be. The application can be running on
a cluster of phones and then with this bunch of wimpy computers we can
provide a portable consistent cluster...