CS5412 Spring 2015 Homework and Projects
Prior offerings of CS5412 had a large semester-long project that could be done in teams. However, we found that some students were unprepared to tackle a completely open-ended effort and weren't really ready to define their own project. This lead to too many delays and confusion and people didn't get started early enough in the semester.
Accordingly, we've decided that in spring 2015 we will experiment with a slightly more directed approach. This semester, only students who are enrolling for MEng project credit will do larger semester-long projects. Everyone else will do a series of assignments we'll pin down fairly concretely, to make sure that you get hands on experience with the key cloud computing technologies. And we'll run an evening recitation section on Wednesdays to answer questions about how to approach those. But CS5412 is a graduate course and we won't hold your hand. This is because as a MEng student you need to become very independent and can't count on people to tell you step by step what to do.
With this in mind, in the spring 2015 semester every CS5412 student will be expected to undertake a series of individual homeworks aimed at building up a technical skill set using real cloud computing technologies, and then demonstrating those skills in an integrated cloud application for the final assignment.
How about MEng project credit? Here it gets complicated. In the past we used to let you count your CS5412 project for MEng credit, but now we don't have a project. So instead, if you want MEng project credit in association with CS5412, you'll need to enroll for 3 credits of MEng 5999. Then you will do the basic requirements for CS5412 like everyone else, but will also do an additional personalized project.
We should probably warn you that we use a surprisingly effective software system to check for cheating of any kind. You are welcome to use open-source code from (only): Amazon cloud computing (AWS) itself, Oracle's Java.com, or Microsoft's .NET. Include comments if you cut and paste code from one of those sites and make sure to indicate the source. Do not hand in code developed by other students, or purchased from "we'll do your homework" web sites, and please do not share your code with other students. We expect each and every student to do his or her own work!
If you fall behind, talk to a TA or to Professor Birman. We can usually work something out, and that is way better than borrowing a solution from someone else and then having both of you get into trouble,
In assigning these homeworks, we're assuming you have satisfied the prerequisites for this class. To be in CS5412 you need either CS4410 or an equivalent background. The homework assignments will also require programming skills in Java or C# or C++. If you are not skilled in one of those languages, or have never taken an operating systems course, or do not know how to open and read and write a file, you are not yet prepared for CS5412. You should also be familiar with multithreaded programming and locking; again, if this is something unfamiliar, you are not yet ready for this course and should take a more basic course.
Homework assignments:
CS5412 will have a series of homework assignments, with 2 or 3 weeks to work on each. All are required and will be graded. They are aimed at building up your skill set for cloud computing and some draw heavily on material covered in class. We do have a weekly recitation planned, but it will be for trouble-shooting, not to teach any topics beyond those covered in the main lectures.
The first assignment involves reading a large data set into memory and writing code that can search the data set using various search criteria such as GPS coordinates. You will need to code the solution in Java, C# or C++ under Linux or Windows. We'll give a more detailed handout to everyone in our first recitation section meeting on Wednesday January 28, but here's a summary:
We will be providing you with a restaurant database that has names, a bit of classification information, and address information for about 650,000 restaurants, formatted as a table in "comma-separated" format. The database is for use by CS5412 students only and should not be copied from our web site or shared outside of the class. Your program should access it directly from the course web site when running, and you will need to be on a Cornell network (or VPN). You have our permission to copy a few dozen rows to your own machine for debugging if that would be helpful.
Your job will be to build a small program in Java, C# or C++, that can take an address and list the restaurants that are (1) within the same zip (postal) code and also (2) closest to it in "as the crow flies" distance. We'll tell you how far to go in this search. We realize that if the address is on the boundary of a zip code we would miss nearby restaurants in the adjacent zip code region, but for our purposes, we don't plan to fuss over that.
To do this task you need an ability to convert from a string address like "123 Main Street, Madison WI, 75432" to GPS coordinates. For that task you will be using a cloud-based "Geolocation" service. Microsoft, Google, Apple and several other sources offer free versions. They are fast and quite easy to use, but any given account is usually limited to some maximum number of queries per day (Google, for example, only allows 2500 queries per day, and although they do allow you to do a "batch" address lookup, there are limits on how many items can be in one batch). So you'll need to be smart about which addresses you convert to GPS coordinates, or you will rapidly use up your "quota" if we ask you to search from a location with a lot of nearby restaurants. Please don't cache the results; that would confuse our way of measuring performance.
Don't even think of Geolocating all 650,000 rows: find a way to only translate "likely" matches. If you wish, you can use other APIs from your favorite GeoLocation provider. As noted, you can restrict to the same zipcode, and in many cases that would be all you need to do. But keep in mind that some zip codes have a lot of restaurants (like the one for downtown Manhattan). You also need to use the restaurants from the database we provided, not from Google's built-in restaurant database. We realize that this makes the job harder.
We actually realize that there are some cloud platforms that would solve this for you (given the address, they would give you that list of nearby restaurants). Your solution is not allowed to use such an API. You need to solve this as a query against the database we provided, because we want you to think about how to solve that problem -- combining cloud resources with local data.
We won't be providing any instructions at all about the details of how to build this program. You will need to learn to read in data in CSV format, or perhaps write your own parser, and will have to decide how to represent the database, build your own GUI, use web documentation to pick a Geolocation service and learn to use it, and put all of this together. Furthermore, you must work on your own. You can talk to friends, but the detailed design of your solution and code must be yours alone.
Then you will test the speed of your program and its accuracy by handing in lists of the closest restaurants for various addresses we will provide to you (we'll give you a distance limit, like 0.5 miles "as the crow flies"). You'll print the list of restaurants you found and also the amount of time in milliseconds that elapsed from start to finish. Grades will be partly based on the speed. We'll also check a random subset of the programs to make sure they actually work as advertised (we'll ask you to help us if we decide to run your solution, to avoid mixups caused by using different hardware or the wrong networking setup).
For extra credit, your GUI should also display the restaurants on a Google map centered at the address given in the query, using the push-pin mashups that Google offers and popping up the data from the spreadsheet when the cursor clicks or hovers over a pushpin.
You'll have two weeks to do this starting when we hand out the assignment, on Tuesday January 27.
The second assignment will be an introduction to Amazon's EC2 platform. You will need to get an account on EC2, and we are checking to see if we can do this for the whole class in one shot. The assignment involves reading the AWS documentation, writing some code, and in this way acquiring and demonstrating some basic skils using elastic beanstalk.
The third assignment will involve putting the server you created in your first assignment onto Amazon EC2. Although EC2 has built-in abilities similar to what your server can do, those are not what we want you to do. We are asking you to "port" your actual solution to assignment 1 onto EC2, so that it can be accessed over the web, making minimal changes to your solution. The reason we want you to do it this way is to gain experience in porting a program to the cloud and talking to it from a browser -- a real hands-on experience of the full sequence of cloud technologies.
In the fourth assignment we will ask you to compare the performance of your solution to assignment three with Amazon's built-in AWS technology used for the same tasks. So here you will be able to redevelop the same code using technology build into Elastic Beanstalk.
In the fifth and final assignment, you will be modifying your server into a replicated cloud-hosted application using the Isis2 system, running it on your own laptop computer or one in our MEng or CSUG lab, to gain experience with fault-tolerance and consistency. Isis2 can be used from C#, Python, C++/CLI, and native C++, so you will need to work in one of those languages, either on Windows or using Linux and Mono. The Windows .NET approach is probably easier. Notice that Isis2 cannot currently be used from native Java. We'll provide details later, but the idea will be to support a form of social networking service in which groups of people who like to meet up after work can easily collaborate to pick a place to meet and connect up at that place. It will build on your solution from the fourth assignment but you'll add a data replication layer that tracks dynamic information about where group members are currently, what restaurant they favor for tonight, and then helps with the vote to select a specific meeting spot.
Projects are for MEng Project Credit only.
As noted earlier, CS5412 no longer has a required large project. The series of homework assignments has replaced it. In spring 2015, only students seeking to get independent project credit under CS5999 (or for undergraduates, CS4999 or a similar course code) would undertake a large independent project.
If you plan to sign up with Professor Birman for CS5999 credits, please arrange to meet with him for approval first. He'll want to make sure you have the prerequisites for these projects. For example, you do need experience using the kind of software platform the project requires. You'll work pretty hard for your 3 credits, so he'll also want to be sure you actually have time for what you are proposing to do.
We have placed a number of project suggestions on Isis2.codeplex.com, and we recommend that students who wish to get MEng project credit in connection with CS5412 look at that list and consider doing one of those projects. You can then take 3 credits of CS5999 for your project. This semester, due to a lack of TA resources, we do not expect to supervise any projects outside of this basic set, but there are ways to adapt some of the Isis2 projects to look a bit different. For example, you could get access to a radar track and flight plans database and do a fault-tolerant air traffic control system. This would resulting in a project rather similar to one of the existing isis2.codeplex.com suggestions, although you would develop a different display (GUI) and obtain data from a different source. So that would be an example of how one can customize the existing suggestions.