Who should take this class? CS5412 aims primarily at graduate students (MEng, MS, PhD), from any area. We welcome undergraduates who are far enough into their studies to have a solid "systems" background, but require that all students will be strong programmers with plenty of experience writing systems software. CS4414 is an ideal background, but some people would have taken operating systems, distributed systems, networking, database systems. So background can come from many kinds of courses -- but every student must have a strong background.
I am mostly interested in machine learning. Will this be useful to me? Absolutely! Machine learning systems need to run on a platform -- and the cloud is the platform everyone uses. Even our projects will be ML-based in many cases, except that we also think hard about how to embed that ML into a context where it can do something like making a dairy farm smart.
Help! I'm unable to sign up! In spring 2022 there is no cap, but because the CS department caps many other courses, the same procedure is being used for us. In practice what we anticipate is that you would add the course in add/drop week. Doing so will put you on a waitlist, initially, but then you will be given an enrollment PIN, and with the PIN should be able to add the class without any problem. The PINs are given out group by group in a standardized sequence, so you may wait a few days, but there is no enrollment cap, so eventually you will get a PIN. That is the thing to keep in mind. Even if you somehow don't get the PIN in time for the first lecture, you will eventually receive one.
How does this class differ from the one Cristina Delimitrou teaches in ECE? Christina has a focus on the microservices infrastructure used to support web sites and web services, although she has a recent interest in IoT. Our course is mostly focused on using ML or AI within a cloud platform context, with quite a lot of attention to IoT and a bit less focus on the microservices infrastructure. We do not teach you any AI or ML tools, but we do study the platforms on which those tools run.
What about big data? We focus on the connection between the cloud and the IoT edge, but we leave topics like big data analytics for other courses.
What background is required? CS5412 assumes that you have a solid background doing systems programming and are comfortable writing small amounts of code in C# or Python or a similar language (some people have used C++ 17, Java, Scala, and other languagues work in the cloud too). Most cloud programming involves writing just a few methods -- "lambdas" or "trigger functions" to customize existing machinery provided by the cloud vendor -- Microsoft Azure, in our case. Our TAs are mostly experienced with using Python in a container environment called Flask, and a few also know C#, so examples they show you would generally be in one of those two languages.
Can I base my project on Jupyter Notebook? Mixed answer here. We think that Jupyter Notebook is a fantastic Python-based big-data computing environment, so we understand why you would love it. And it could definitely be used to create a new ML component for your project -- but only if your project also uses other cloud services. Most cloud services are customizable in actual Python, not in the Jupyter Notebook version, which is automatically parallelized and has a number of specialized features for big data analysis. So, you can use Jupyter Notebook as one of many technologies, primarily in the AI portion of a solution, but you can't base the whole project on Jupyter Notebook. Your project will require some coding in C++, C#, Python, Java or some other "real" language.
If the course won't drill down and teach the specific cloud APIs and subsystems I am supposed to customize, how can I learn them? We are focusing on Azure IoT because the environment is really a Linux-based one. The specialized tools Azure provides have APIs of their own, but they are super-well documented. Moreover, typical projects start by downloading the code for a precreated demo that Microsoft has posted online, and then customizing it. So rather than memorizing a whole new system, the style of coding is really more like downloading "Hello World", and then changing the string to "Hey look, my cloud project is working!". Well, perhaps a little more complicated. But this is not a class where writing heroic amounts of code is necessary.
Is Azure PaaS or IaaS? Even though Azure programming is generally done on Linux using a language like Python, C# or C++ (but there are tons of languages, many more), most Azure work fits the PaaS model: You use some existing Azure platform and then customize it with little "lambdas" (functions) written in one of those same languages, but rarely more than a few dozen lines of code in length -- and even then, often based on a template Azure provides, with most of those lines already in the template. This is in contrast to IaaS programming, where you work with clusters of Linux VMs.
How do we grade? We normally have in-person exams, sometimes even in-class, plus projects; 50% of your grade comes from each. Whether the exams are done in class or at home depends partly on the Covid situation and partly on the prelim and final dates Cornell offers us.
Projects: For the project, we encourage group work. Whether a project is done individually or by a team, we grade everyone the same way. We do expect that every person does individual work, but this can mean doing different aspects of a shared solution. We generally expect that a group of size N>1 can do more than a single person. We generally give the entire project group the same grade.
You'll need to hand various things in for your project, once every few weeks: an initial topic writeup, a project plan, then a few mid-point status updates with code your team has written and screenshots. This is to ensure that the projects get attention from week one, not just in the last ten days before the due date for demos.
Grading will be S/U with comments as needed for the various things you hand in, but will be on a numberical scale (0-100) for the actual final project demo and overall grade. Then we translate that back to a letter grade for the class.
MEng project: Historically, some CS5412 students have been MEng students and have asked to do a slightly larger project that can count as their MEng degree project. For this, sign up for additional credits using CS5999, the MEng project credit course. We will expect you to explain to us precisely how you enlarged the project relative to what you would have done had it not been an MEng project, in the form of a written document you will submit to us when doing project demos. If one group has some people doing MEng and some people not doing MEng credit, this becomes hard to justify because in effect, you would be asserting that some people did X amount of work, but some other people did 2X work. We will generally not approve such plans: in any group, everyone should do equal work. But if everyone in the group is doing MEng, that would be ok with us.
If you do an MEng project we always give you the identical grade for CS5999 that we give you for CS5412.
Groups that experience setbacks. It isn't common, but now and then some group has a problem -- people don't get along, or a person wants to join a group, or two small groups wish to merge. We allow this, but you must ask our permission, explain what the plan is, and everyone involved must agree.
Group size: We find that groups of size 2 or 3 are best. We almost never agree that a group can have more than 4 people. The 2 or 3 would not include the external "ANSC (dairy science) consultant" many teams will work with.
We wanted an ANSC team member, but there weren't enough ANSC students. This happens. Sorry.
Will we need to take a ANSC class too? No. CS5412 students only take this course, and the CALS students will take classes in ANSC. But our courses are working together to make it possible for us to collaborate on shared projects. Grading will be separate: the CALS students get graded in ANSC for their class, and you'll be graded in CS5412. Still, by working with the ANSC "customers", we will get a kind of direct hands-on experience with real users.
Lectures: We will not be taking attendance, but we test on the lecture content, and we expect all projects to be smart about things covered in lecture. You cannot get a high grade in this course without demonstrating that you have mastered the material covered in class.
Textbook: The textbook link lists a few books you could use to learn more.