|
Kenneth P. Birman
N. Rama Rao Professor of Computer Science
435 Gates Hall, Cornell University
Ithaca, New York 14853
W: 607-255-9199; M: 607-227-0894; F: 607-255-9143
Email: ken@cs.cornell.edu
CV: Sept 24
|
Current Research (full
publications list).
-
Cascade, Vortex.
This pair of systems is my current main focus, although the effort splits between work on
Vortex (which is new), Cascade itself (closer to finished) and Derecho, described below. Cascade centers on the
observation that data movement is a huge overhead in modern AI and ML
applications. How can we run these systems at "peak possible speed" if
we have this data movement barrier? Cutting to the chase,
Cascade is often 5x, 10x, and sometimes 20x or 100x faster than other
platforms when running identical AI logic! We gain these huge speedups
through a few innovations.
Vortex extends Cascade to include a
bunch of specialized features in support of RAG LLM systems where vector
databases are a major component (these need to support approximate search).
We haven't published anything on Vortex yet.
We maintain a full project web site here, and
our GitHub site is here.
I'll limit myself to a summary on this page, but you can find links to some
papers at the bottom of the Cascade project page and also on our
publications list.
One is to offer ways to move the user's AI or ML code into our storage
server, so that the code runs right where data is located and can access
objects via pointers with no copying needed. A second idea is to use a
mixture of scheduling and planned placement for computing (and for objects
created during computation), so that when a computation is needed, the data
it requires is collocated at the node where we schedule the AI program to
run. This pays off because the models used by AI programs (ML training
results in big parameter vectors called model objects) can be
enormous. The win is even larger for applications doing computer
vision, because photos and videos are huge, too.
One puzzle seen with this example is that it departs from the widely
prevalent cloud computing model in so many ways. Yet there are reasons
to believe that applications of this inevitably depart from cloud computing
as we do it today. For example, in settings where the AI or ML
will access sensitive data, it is often important to do that "close" to
where the data is gathered and then to either discard the data after
computing without ever storing it, or store it only where the user "lives".
The European focus on privacy could mandate this architecture... and the
cloud isn't very well-prepared for it today. Cascade could offer an
answer. A second consideration centers on the wide-area internet link
needed to upload data from a camera to a service on a cloud: often, uploads
of this kind can be the slowest step.
We solve these problems by running Cascade directly on hardware close to the
edge: a cluster of computers that might sit right in a hospital computing
center, or on a factory floor, or in an airplane servicing center.
Then we can also leverage shared memory to reduce data movement between the
user's logic and the Cascade data storage layer, and leverage hardware
accelerated communication for node-to-node communication. That last
idea uses Derecho, discussed below. And we do a lot of work on
scheduling, to ensure that compute and storage tend to be collocated on the
same nodes.
When people talk about storage, it is common for them to mean "in a scalable
file system". Cascade is very flexible in this sense. It can be
used as a file system through POSIX file system APIs, but can also be used
as a key-value storage layer (like MemCached), or treated like a pub-sub
system (similar to Kafka). In fact our APIs are often identical to
standard tools in those different areas. Use whichever storage
abstraction layer you prefer!
When configured this way edge cameras can be connected directly to the same
machines where the user's AI logic is running. But we also want
Cascade to look very transparent to the AI designer: platforms like PyTorch,
Tensor Flow, Julia, Apache Spark/Databricks, MXNET and so forth are very
popular, and we want to be fully compatible with them. That leads to
the view that Cascade should have a second hosting option, as a service on a
normal cloud, able to run the user's AI and ML through a function (lambda)
model, or in containers. In work we hope to do during 2024, we'll
connect these two options into a single service that would be perceived as a
cloud service and yet might manage resources right on the cloud edge.
Read a paper about Cascade here.
Or check out my slide deck here.
A Vortex-centric slide deck is
here. By the way, this first link is not yet a published paper: we do have a bunch
of papers in the publication pipeline, but are only just starting to see
them come out.
Students interested in joining the Cascade or
Vortex effort should reach out to me
directly.
-
Derecho. Cascade is actually built using Derecho, a project that was very active from
2017 through 2019, but continues at a lower pace today (notably the DCCL
work mentioned below).
We maintain a full project web site
here, and
our GitHub site is here.
I'll limit myself to a summary on this page.
Derecho looks at ways of leveraging remote DMA
(RDMA) technologies to move large data objects at wire speeds, and modern
storage technologies to persist data. Recently we ported Derecho to run
over DPDK too (a pure software solution... not quite as fast, but we still
set records), and we support normal TCP as well (slowest of all). Learn more from our ACM TOCS paper
here and our two DSN papers,
here and
here. A paper on optimizations for small objects based on a
methodology called Spindle is
here.
Our newest work on Derecho centers on an implementation of the Collective
Communication Library (MapReduce/AllReduce) APIs, sometimes called the CCLs.
Weijia Song completed a Derecho CCL (DCCL) and it substantially outperforms
alternatives, notably beating the Open MPI CCL "in its own home stadium",
namely on clusters configured as HPC systems! We get as much as a 2x
speedup for AllReduce, for example. Weijia has not yet written the
work up, but you can already use it in the most current Derecho release.
We plan to do a deep integration of DCCL into Cascade soon.
Same comment applies here: if you are a student with a distributed systems,
networking or "low level" focus, the Derecho work could be a great
opportunity for you to pursue your passion while being relevant to the
modern AI-centric world. And we have lots of opportunities for pushing
the work forward. Some center on a mix of
PL, verification and theory, while others are very practical.
Again, reach out to Ken.
- Derecho Secure Audit Log / BlockChain.
Edward Tremel is extending Derecho
to include a novel BFT layer over the object store. It could be used
much like a permissioned BlockChain. Details soon... but I should note
that Edward leads on this and is now a faculty member at University of
Augusta. We are collaborating on this work, but he is the person with
the real vision on where to take it.
- Using all of this technology for
IoT applications, notably in the smart power grid, healthcare, and
industrial settings (IIoT).
My group generally has application areas in mind, and in recent years the
bulk electric power grid has been a rich source of ideas. We've also
been branching out and thinking about other kinds of environments that are
rich in sensors and actuators, such as healthcare and industrial automation
(sometimes called "digital twin" systems). PhD students
Alicia Yang, Tiancheng Yuan and Yifan
Wang are leading this work, in collaboration with Siemens
Corporate Research and in a smart farming setting (a dairy).
Teaching:
I teach two courses, both in the fall. Other people also teach
these from time to time, but I would normally be teaching one of the two.
- The first is cs4414: Systems
Programming.
The course aims at students who have learned programming and data
structures, but don't have direct experience building real applications.
The course tackles systems programming from the perspective of a more
performance-oriented programming language (we'll teach the basics of C++),
directly accessing operating system calls in Linux to create high-quality
solutions to more complex tasks. The material will be a mix of computer
architecture, operating systems and programming abstractions, but the real
center of the course focuses on how to efficiently combine simple programs
to do surprisingly complex things, with guarantees of correctness. One
unifying theme concerns ways sharing information when programs need to
cooperate, for example using Linux pipes, shared files, mapped files (shared
memory), threads, networking. This creates concurrency and coordination
challenges, and we'll learn about ways to ensure correctness using the
monitor synchronization pattern.
- In years when I don't teach cs4414, my fall course is
cs5412, and is MEng-oriented treatment of cloud computing and smart IoT systems, again with
an emphasis on edge issues (we do look at big data issues too, but the edge
creates demanding response deadlines and real-time consistency puzzles and
we spend a lot of time on those). In the most recent offering my focus was on
platforms that allow you to apply ML to IoT data, but we look at the cloud
as a whole, not just this narrow topic. This course is updated pretty
frequently because the cloud is such a rapidly evolving environment.
Don't take it without a lot of prior systems work: the cloud brings
everything together in one massive environment, so you would have a tough
time if you don't already know how to write code in languages like Python
and C++, scripts in bash or other scripting languages, perhaps do GUIs in
Javascript/Flask, databases via the SQL interfaces seen in PyTorch and
Tensor Flow and Spark, etc. You don't have to be a cloud wizard to
take cloud computing, but you definitely do need to walk in with a strong
comfort level around systems topics!
Video links:
- I keep some videos and pptx files about our work
here.
- SOSP '15
History Day talk on fault-tolerance and consistency, the CATOCS
controversy, and the modern-day CAP conjecture. My video is
here and an accompanying essay is
here.
- Robbert van Renesse and me discussing how we got into this area of
research: here.
My Textbook (last revised in 2012):
Guide to Reliable Distributed Systems: Building High-Assurance
Applications and Cloud-Hosted Services. Click
here to get to my cloud computing course, which has slide sets and other
materials that include some lectures strongly tied to content from the book. You are welcome to use these in
your own courses if you like. The 2018 slide set is quite new and was
one of the outcomes of my 2016-2017 sabbatical during which I visited
widely and hopefully, came home with an updated appreciation of the
contemporary perspectives seen in industry. But this means that by now,
I've departed significantly from the treatment in the book; earlier
slide sets that are closer to the book treatment can be found in
http://www.cs.cornell.edu/courses/cs5412/XXXXsp, where XXXX would be the
year. There was no 2013 or 2017 offering.
The bad news is that the material evolves at a
breathtaking pace, which is why I keep revising the slides. Natually,
this also means that the book is already out of date. I don't have the
time to revise it, right now. |
|
Older work. I've really worked in Cloud Computing for most
of my career, although it obviously wasn't called cloud computing in the early
days. As a result, our papers in this area date back to 1985.
Some examples of mission-critical systems on which my software was used in
the past include the New York Stock Exchange and Swiss Exchange, the French Air
Traffic Control system, the AEGIS warship and a wide range of applications in
settings like factory process control and telephony. In fact, every stock quote
or trade on the NYSE from 1995 until early 2006 was reported to the overhead
trading consoles through software I personally implemented - a cool (but also
scary) image, for me at least! During the ten years this system was running,
many computers crashed during the trading day, and many network problems have
occurred - but the design we developed and implemented has managed to
reconfigure itself automatically and kept the overall system up, without
exception. They didn't have a single trading disruption during the entire
period. As far as I know, the other organizations listed above have similar
stories to report.
Today, these kinds of ideas are gaining "mainstream" status. For example,
IBM's Websphere 6.0 product includes a multicast layer used to replicate data
and other runtime state for high-availability web service applications and web
sites. Although IBM developed its own implementation of this technology, we've
been told by the developers that the architecture was based on Cornell's Horus
and Ensemble systems, described more fully below. The CORBA architecture
includes a fault-tolerance mechanism based on some of the same ideas. And we've
also worked with Microsoft on the technology at the core of the next generation
of that company's clustering product. So, you'll find Cornell's research not
just on these web pages, but also on web sites worldwide and in some of the
world's most ambitious data centers and high availability computing systems.
In fact we still have very active dialogs with many of these companies:
Cisco, IBM, Intel, Microsoft, Amazon, and others. An example of a more recent
dialog is this: a few years ago worked with Cisco to invent a new continuous
availability option for their core Internet routers, the CRS-1 series. You can
read about this work
here.
My group often works with vendors and industry researchers. We maintain a
very active dialog with the US government and military on research challenges
emerging from a future generation communication systems now being planned by
organizations like the Air Force and the Navy. We've even worked on new ways of
controlling the electric power grid, but not in time to head off the big
blackout in 2003! Looking to the future, we are focused on needs arising in
financial systems, large-scale military systems, and even health-care networks.
(In this connection, I should perhaps mention that although we do get research
support from the government and the US military, none of our research is
classified or even sensitive, and all of it focuses on widely used commercial
standards and platforms. Most of our software is released for free, under open
source licenses.)
I'm just one of many members of a group in this area at Cornell. My closest
colleagues and co-leaders of the group are Robbert van Renesse and Hakim
Weatherspoon. But the systems group is very strong and broad right now, and the
three of us have great relationships and collaborations with many other systems
faculty here at Cornell (both in the systems area within CS, but also folks in
ECE where we have great ties, MAE, IS, and down in New York City, where a few
faculty are members of our fast-growing New York City Technology "outpost" on
Roosveldt Island.
Four generations of reliable distributed systems research!
Overall, our group has developed three generations of technology and is now
working on a fourth generation system: The Isis Toolkit, developed mostly during
1987-1993, the Horus system, developed starting in 1990 until around 1995, the
Ensemble system, 1995-1999. Right now we're developing a number of new systems
including Isis2, Gradient, and the reliable TCP solution mentioned above, and
working with others to integrate those solutions into settings where
reliability, security, consistency and scalability are make-or-break
requirements. Older Research web pages:
Live Objects,
Quicksilver, Maelstrom, Ricochet and Tempest projects
Ensemble project
Horus project
Isis Toolkit (really
old stuff! This is from the very first version of Isis). A collection of papers
on Isis, edited by myself with Robbert van Renesse, may still be available -- it
was called Reliable Distributed Computing with the Isis Toolkit and was in the
IEEE Press Computer Science series.
Graduate Studies in Computer Science at Cornell: At this
time of the year, we get large numbers of inquiries about our PhD program. I
want to recommend that people interested in the program not contact faculty
members like me directly with routine questions like "can your research group
fund me".
As you'll see from the web page, Cornell does admissions by means of a
committee, so individual faculty members don't normally play a role. This is
different from many other schools -- I realize that at many places, each faculty
member admits people into her/his own group. But at Cornell, we admit you first,
then you come here, and then you affiliate with a research group after a while.
Funding is absolutely guaranteed for people in the MS/PhD program during the
whole time they are at Cornell. On the other hand, students in the MEng program
generally need to pay their own way.
Obviously, some people have more direct, specific questions, and there is no
problem sending those to me or to anyone else. But as for the generic "can I
join your research group?" the answer is that while I welcome people into the
group if they demonstrate good ideas and talent in my area, until you are here
and take my graduate course and spend time talking with me and my colleagues,
how can we know if the match is good? And most such inquiries are from people
who haven't yet figured out quite how many good projects are underway at
Cornell. Perhaps, on arrival, you'll take Andrew Myer's course in language based
security and will realize this is your passion. So at Cornell, we urge you to
take time to find out what areas we cover and who is here, to take some courses,
and only then affiliate with a research group. But please knock on my door any
time you like! I'm more than happy to talk to any student in the department
about anything we're doing here!
Photo credit: Dave Burbank