CS 614:
Advanced Course in Computer Systems
Spring 2001
(Ken Birman)
CS 614 is a graduate level course in computing
systems, with a strong focus on operating systems and distributed
computing. The course is aimed at PhD
students seeking broad background in the areas of importance to modern systems
researchers, or having an interest in possible research topics. The course involves a considerable amount of
reading, and all students who take the class are expected to read all the papers,
for the experience to be satisfactory.
It is hoped that the class will be small, in which
case we’ll share the teaching load.
Students in the class will be asked to present one or more of the
lectures – I’ll do some too – in such a manner as to ensure that all the
students do at least one or two presentations.
Each lecture will consist of a formal presentation, normally covering
several papers, and is expected to go into considerable depth by focusing on some
aspect of the material and treating it thoroughly. We’ll be using powerpoint slides (we caÐÏࡱán make the slides
available on the web if anyone wants them).
A presentation will normally last perhaps 30 to 45 minutes. The lecture period will be followed by a
broader discussion that I’ll lead. All
students in the class are expected to participate.
In lieu of homework assignments, students in the
class will be asked to do some sort of operating systems
research project. You can select a
topic of interest to you; some topics may be more theoretical, but most should
be fairly practical. At the end of the
semester, you’ll write up the results of your work in the form of a short paper
conforming to the format and submission rules used by the Symposium on Operating
Systems Principles, which is the major conference in this area. Many of the papers we’ll be reading were
originally published in this conference.
Tips on preparing a presentation appear at the end of
this document.
The topics we’ll cover span a range of hot areas
within modern systems, although avoiding database research (we do touch on
transactional computing models as used in systems, but we won’t do this in the
context of databases). In particular,
we’ll study some of the major results in the theory of distributed computing
(covering topics such as Byzantine Agreement, asynchronous consensus, time and
clocks and ordering, consistent cuts and probabilistic protocols). We’ll look at systems issues relating to
high performance communication, including various approaches to reducing message
latency and improving response times.
We’ll study the problem of replication and consistency in distributed
systems, looking at virtual synchrony, paxos, and some of the other major
systems and models for the area, and we’ll look at the way that operating
systems are typically structured to accommodate protocols. We’ll then shift attention to language-level
issues including language embeddings of transactions and replication, mobility
and language-based security, and object-oriented computing architectures.
In each of these areas, I’ll identify two to four
published papers, mostly from the ACM online archives or from readily
accessible journals and conferences.
I’m hoping to avoid using real paper, so as much as possible we’ll work
from URL’s and scanned documents. A
typical lecture will be drawn out of one or more papers, but you really are
expected to read all of the relevant papers (and even some of the background
papers we don’t include), so the real challenge is to distill a tremendous
amount of information into a pithy presentation of the right length and
depth. During the discussion following
each presentation, we’ll try to understand the broader context and importance
of each result.
The papers we’ll be reading aRE dense, technically
meaty papers and the issues they raise are very broad. Expect to be challenged both by the amount
of reading and by the difficulty of synthesis – placing work into a broader
context that makes sense.
There is no point in taking cs614 if you lack a
solid background in the basics.
Accordingly, we’ll assume that the students in the class have all taken
(and mastered) the material covered in a conventional operating systems class,
an architecture class, and an undergraduate database class – cs314, cs414 and
cs432.
Who |
Date |
Topic |
Readings |
|
---|---|---|---|---|
1/23 |
Organizational meeting |
|||
Ken |
1/25 |
Byzantine Agreement[kpb1] |
Michael J. Fischer, Nancy A. Lynch, and Michael Merritt. Easy impossibility proofs for distributed consensus problems. Distributed Computing, 1(1):26-39, 1986. |
|
M. Rabin. Randomized Byzantine Generals. Proceedings 23rd FOCS (1983), 403-409. |
||||
Leslie Lamport, Robert Shostak and Marshall Pease. The Byzantine Generals Problem. ACM Transactions on Programming Languages and Systems 4(3):382-401, July 1982 |
||||
Vassos Hadzilacos and Sam Toueg. A modular approach to fault-tolerant broadcast and related problems. Dept. of Computer Science TR 94-1425, Cornell University (May 1994). |
||||
Ken |
1/30 |
|
Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374-382, April 1985 |
|
The weakest failure detector for solving consensus; Tushar Deepak Chandra, Vassos Hadzilacos and Sam Toueg; J. ACM 43, 4 (Jul. 1996), Pages 685 - 722 |
||||
Tony Faradjian |
2/1 |
Time, Clocks and the Ordering of Events in a Distributed System. Leslie Lamport, Communications of the ACM 21:7 (July 1978), 558-565. |
||
Ken |
2/6 |
|
Using Time Instead of Timeout for Fault-Tolerant Distributed Systems. Leslie Lamport. ACM Transactions on Programming Languages and Systems 6:2 (April 1984), 254-280. |
|
Ken |
2/8 |
Optimal clock synchronization; T. K. Srikanth and Sam Toueg; J. ACM 34, 3 (Jul. 1987), Pages 626 - 645 |
||
Barbara Simons, Jennifer Welsh, Nancy Lynch. An Overview of Clock Synchronization. Springer-Verlag LNCS 448, 84-96, 1990. |
||||
Fred B. Schneider. Understanding protocols for Byzantine Clock Synchronization. Dept. of Computer Science TR 87-859, Cornell University (August 1987). |
||||
Probabilistic Internal Clock Synchronization. Flaviu Cristian and Cristof Fetzer. | ||||
Lower bounds for convergence function based clock synchronization; Christof Fetzer and Flaviu Cristian; Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing , 1995, Pages 137 - 143 |
||||
Ken |
2/13 |
|
Distributed snapshots: determining global states of distributed systems; K. Mani Chandy and Leslie Lamport; ACM Trans. Comput. Syst. 3, 1 (Feb. 1985), Pages 63 - 75 |
|
Sergio Wong |
2/15 |
|
Distributed programming in Argus; Barbara Liskov; Commun. ACM 31, 3 (Mar. 1988), Pages 300 - 312 |
|
Implementation of Argus; B. Liskov, D. Curtis, P. Johnson and R. Scheifer; Proceedings of the Eleventh ACM Symposium on Operating systems principles , 1987, Pages 111 - 122 |
||||
An Introduction to Nested Transactions J. Eliot B. Moss. NCSTRL Document ID ncstrl.umassa_cs/UM-CS-1986-041 |
||||
Experience with transactions in quickSilver; Frank Schmuck and Jim Wylie; Proceedings of the thirteenth ACM symposium on Operating systems principles , 1991, Pages 239 - 253 |
||||
Lightweight recoverable virtual memory; M. Satyanarayanan, Henry H. Mashburn, Puneet Kumar, David C. Steere and James J. Kistler; Proceedings of the fourteenth ACM symposium on Operating systems principles , 1993, Pages 146 - 160 |
||||
Robbert van Renesse |
2/20 |
Distributed process groups in the V Kernel; David R. Cheriton and Willy Zwaenepoel; ACM Trans. Comput. Syst. 3, 2 (May. 1985), Pages 77 - 107 |
||
The process group approach to reliable distributed computing; Kenneth P. Birman; Commun. ACM 36, 12 (Dec. 1993), Pages 37 - 53 |
||||
Reliable communication in the presence of failures; Kenneth P. Birman and Thomas A. Joseph; ACM Trans. Comput. Syst. 5, 1 (Feb. 1987), Pages 47 - 76 |
||||
Lightweight causal and atomic group multicast; André Schiper, Kenneth Birman and Pat Stephenson; ACM Trans. Comput. Syst. 9, 3 (Aug. 1991), Pages 272 - 314 |
||||
Horus a flexible group communication system; Robbert van Renesse, Kenneth P. Birman and Silvano Maffeis; Commun. ACM 39, 4 (Apr. 1996), Pages 76 - 83 |
||||
Building reliable, high-performance communication systems from components; Xiaoming Liu, Christoph Kreitz, Robbert van Renesse, Jason Hickey, Mark Hayden, Kenneth Birman and Robert Constable; Proceedings of the 17th ACM symposium on operating systems principles on Operating systems principles , 1999, Pages 80 - 92 |
||||
Sabina Petride |
2/22 |
Implementing fault-tolerant services using the state machine approach: a tutorial; Fred B. Schneider; ACM Comput. Surv. 22, 4 (Dec. 1990), Pages 299 - 319 |
||
The part-time parliament; Leslie Lamport; ACM Trans. Comput. Syst. 16, 2 (May. 1998), Pages 133 - 169 |
||||
Revisiting the Paxos Algorithm R. De Prisco, Butler Lampson, Nancy Lynch. Workshop on Distributed Algorithms (WDAG'97, Saarbrucken, Germany, September 1997), volume 1320 of Lecture Notes in Computer Science, pages 111-125. Springer-Verlag 1997. |
||||
Ken |
2/27 |
Group Communication Specifications: A Comprehensive Study. Roman Vitenberg, Idit Keidar, Gregory V. Chockler and Danny Dolev. Accepted for publication, Computing Surveys. |
||
On the impossibility of group membership; Tushar Deepak Chandra, Vassos Hadzilacos, Sam Toueg and Bernadette Charron-Bost; Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing , 1996, Pages 322 - 330 |
||||
Kevin O'Neill |
3/1 |
The Design of the Transis System. Danny Dolev and Dahlia Malkhi. Technical report, Hebrew University (Jerusalem), 1995 |
||
Totally Ordered Broadcast in the Face of Network Partitions: Exploiting Group Communication for Replication in Partitionable Networks. Idit Keidar and Danny Dolev |
||||
Abhinandan Das |
3/6 |
Multicast routing in datagram internetworks and extended LANs; Stephen E. Deering and David R. Cheriton; ACM Trans. Comput. Syst. 8, 2 (May. 1990), Pages 85 - 110 |
||
Scalable Reliable Multicast (SRM) and Application Level Framing (ALF) (several papers) |
||||
Reliable Multicast Transport Protocol (RMTP) (paper 1; paper 2) |
||||
PGM: Pretty Good Multicast (PGM) Transport Protocol Specification Jim Speaksman.
|
||||
Kate Jenkins |
3/8 |
Epidemic algorithms for replicated database maintenance; Alan Demers, Dan Greene, Carl Hauser, Wes Irish and John Larson; Proceedings of the Sixth Annual ACM Symposium on Principles of distributed computing , 1987, Pages 1 - 12 |
||
Bimodal multicast; Kenneth P. Birman, Mark Hayden, Oznur Ozkasap, Zhen Xiao, Mihai Budiu and Yaron Minsky; ACM Trans. Comput. Syst. 17, 2 (May. 1999), Pages 41 - 88 |
||||
Managing update conflicts in Bayou, a weakly connected replicated storage system; D. B. Terry, M. M. Theimer, Karin Petersen, A. J. Demers, M. J. Spreitzer and C. H. Hauser; Proceedings of the fifteenth ACM symposium on Operating systems principles 1995, 172 – 182. |
||||
Flexible update propagation for weakly consistent replication; Karin Petersen, Mike J. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers; Proceedings of the sixteenth ACM symposium on Operating systems principles , 1997, Pages 288 - 301 |
||||
Jeff Hoy |
3/13 |
Performance of the Firefly RPC; Michael D. Schroeder and Michael Burrows; ACM Trans. Comput. Syst. 8, 1 (Feb. 1990), Pages 1 - 17 |
||
Lightweight remote procedure call; Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska and Henry M. Levy; ACM Trans. Comput. Syst. 8, 1 (Feb. 1990), Pages 37 - 55 |
||||
Scheduler activations: effective kernel support for the user-level management of parallelism; Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska and Henry M. Levy; ACM Trans. Comput. Syst. 10, 1 (Feb. 1992), Pages 53 - 79 |
||||
Mike Frei |
3/15 |
Novel Communications Architectures[kpb16]
|
Active messages: a mechanism for integrating communication and computation; Thorsten von Eicken, David E. Culler, Seth Copen Goldstein and Klaus Erik Schauser; 25 years of the international symposia on Computer architecture (selected papers) , 1998, Pages 430 - 440 |
|
U-Net: a user-level network interface for parallel and distributed computing (includes URL); T. von Eicken, A. Basu, V. Buch and W. Vogels; Proceedings of the fifteenth ACM symposium on Operating systems principles , 1995, Pages 40 - 53 |
||||
3/20 |
No class: spring break |
|||
3/22 | ||||
Brian Cody |
3/27 |
Modularity
and costs[kpb17]
|
N. C. Hutchinson and L. L. Peterson. The x-Kernel: An architecture for implementing network protocols. IEEE Transactions on Software Engineering, 17(1):64#76, Jan. 1991. |
|
Protocol implementation using integrated layer processing; Torsten Braun and Christophe Diot; Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication , 1995, Pages 151 - 161 |
||||
Masking the overhead of protocol layering; Robbert van Renesse; Conference proceedings on Applications, technologies, architectures, and protocols for computer communications , 1996, Pages 96 - 104 |
||||
Panos Papadimitratos |
3/29 |
Rover:
a toolkit for mobile information access, A. D.Joseph, A. F.de Lespinasse,
J. A.Tauber, D. K.Gifford, and M. F.Kaashoek, Proceedings of the fifteenth ACM symposium on Operating systems principles, December 3 -6, 1995, Copper Mountain, CO USA, Pages 156-171 |
||
Mobile Computing with the Rover Toolkit, Anthony D. Joseph, Joshua A. Tauber, and M. Frans Kaashoek, IEEE Transactions on Computers: Special issue on Mobile Computing, 46(3), March 1997 | ||||
Isolation-Only Transactions for Mobile Computing, Lu, Q., Satyanarayanan, M. Operating Systems Review, Apr. 1994, Vol. 28, No. 2, pp. 81-87 | ||||
Managing update conflicts in Bayou, a weakly connected replicated storage system; D. B. Terry, M. M. Theimer, Karin Petersen, A. J. Demers, M. J. Spreitzer and C. H. Hauser; Proceedings of the fifteenth ACM symposium on Operating systems principles 1995, 172 – 182. |
||||
Flexible update propagation for weakly consistent replication; Karin Petersen, Mike J. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers; Proceedings of the sixteenth ACM symposium on Operating systems principles , 1997, Pages 288 - 301 |
||||
|
||||
Riccardo Pucella |
4/3 |
Tuple spaces[kpb19]
|
Linda in context; Nicholas Carriero and David Gelernter; Commun. ACM 32, 4 (Apr. 1989), Pages 444 - 458 |
|
Generative communication in Linda; David Gelernter; ACM Trans. Program. Lang. Syst. 7, 1 (Jan. 1985), Pages 80 - 112 |
||||
The Jini architecture for network-centric computing; Jim Waldo; Commun. ACM 42, 7 (Jul. 1999), Pages 76 - 82 |
||||
JavaSpacesTechnology , Sun Microsystems Inc. |
||||
Matthew Fluet |
4/5 |
|
Rick Rashid. The Mach Microkernel |
|
Experiences with the Amoeba distributed operating system; Andrew S. Tanenbaum, Robbert van Renesse, Hans van Staveren, Gregory J. Sharp and Sape J. Mullender; Commun. ACM 33, 12 (Dec. 1990), Pages 46 - 63 |
||||
Rohit Fernandes |
4/10 |
Extensibility safety and performance in the SPIN operating system; B. N. Bershad, S. Savage, P. Pardyak, E. G. Sirer, M. E. Fiuczynski, D. Becker, C. Chambers and S. Eggers; Proceedings of the fifteenth ACM symposium on Operating systems principles , 1995, Pages 267 - 283 |
||
Exokernel: an operating system architecture for application-level resource management; D. R. Engler, M. F. Kaashoek and J. O'Toole; Proceedings of the fifteenth ACM symposium on Operating systems principles , 1995, Pages 251 - 266 |
||||
Application performance and flexibility on Exokernel systems; M. Frans Kaashioek, Dawson R. Engler, Gregory R. Ganger, He´ctor M. Bricen˜o, Russell Hunt, David Mazier`res, Thomas Pinckney, Robert Grimm, John Jannotti and Kenneth Mackenzie; Proceedings of the sixteenth ACM symposium on Operating systems principles , 1997, Pages 52 - 65 |
||||
Clive Saha |
4/12 |
Scalable clusters[kpb22] |
Vogels, W., Dumitriu, D., Birman, K. Gamache, R., Short, R., Vert, J., Massa, M., Barrera, J., and Gray, J., "The Design and Architecture of the Microsoft Cluster Service -- A Practical Approach to High-Availability and Scalability", Proceedings of the 28th symposium on Fault-Tolerant Computing, Munich, Germany, June 1998 |
|
Frangipani a scalable distributed file system; Chandramohan A. Thekkath, Timothy Mann and Edward K. Lee; Proceedings of the sixteenth ACM symposium on Operating systems principles , 1997, Pages 224 - 237 |
||||
Ashish Motivala |
4/17 |
Using Group Communication Technology to Implement a Reliable and Scalable Distributed IN Coprocessor. Roy Friedman and Ken Birman Document ID ncstrl.cornell/TR96-1605 |
||
Manageability, availability and performance in Porcupine: a highly scalable, cluster-based mail service; Yasushi Saito, Brian N. Bershad and Henry M. Levy; Proceedings of the 17th ACM symposium on operating systems principles on Operating systems principles , 1999, Pages 1 - 15 |
||||
Sunny Gleason |
4/19 |
Measurements of a distributed file system; Mary G. Baker, John H. Hartman, Michael D. Kupfer, Ken W. Shirriff and John K. Ousterhout; Proceedings of the thirteenth ACM symposium on Operating systems principles , 1991, Pages 198 - 212 |
||
File system usage in Windows NT 4.0; Werner Vogels; Proceedings of the 17th ACM symposium on operating systems principles on Operating systems principles , 1999, Pages 93 - 109 |
||||
Ke Wang |
4/24 |
Integrating security in a large distributed system; M. Satyanarayanan; ACM Trans. Comput. Syst. 7, 3 (Aug. 1989), Pages 247 - 280 |
||
Scale and performance in a distributed file system; John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, Robert N. Sidebotham and Michael J. West; ACM Trans. Comput. Syst. 6, 1 (Feb. 1988), Pages 51 - 81 |
||||
Caching in the Sprite network file system; Michael N. Nelson, Brent B. Welch and John K. Ousterhout; ACM Trans. Comput. Syst. 6, 1 (Feb. 1988), Pages 134 - 154 |
||||
Kamen Yotov |
4/26 |
The design and implementation of a log-structured file system; Mendel Rosenblum and John K. Ousterhout; ACM Trans. Comput. Syst. 10, 1 (Feb. 1992), Pages 26 - 52 |
||
The Zebra striped network file system; John H. Hartman and John K. Ousterhout; Proceedings of the fourteenth ACM symposium on Operating systems principles , 1993, Pages 29 - |
||||
Yong Yao |
5/1 |
File-system development with stackable layers; John S. Heidemann and Gerald J. Popek; ACM Trans. Comput. Syst. 12, 1 (Feb. 1994), Pages 58 - 89 |
||
The Echo Distributed File System. Andrew D. Birrell, Andy Hisgen, Chuck Jerian, Timothy Mann, and Garret Swart |
||||
Mark Zarins |
5/3 |
Nomadic
and mobile computing[kpb28] |
Primarily
Disconnected Operation: Experiences with Ficus. John S. Heidemann, Thomas
W. Page, Jr., Richard G. Guy, and Gerald J. Popek |
|
Disconnected operation in the Coda File System; James J. Kistler and M. Satyanarayanan; ACM Trans. Comput. Syst. 10, 1 (Feb. 1992), Pages 3 - 25 |
||||
How to prepare a cs614
presentation
Each time we meet, we’ll start with a fairly short presentation
by some student (or by Ken), lasting perhaps 45 minutes, after which there will
be a discussion lasting an additional half hour or so. Preparing the lecture is a significant
challenge and you should expect to spend as much as a week or two working on it
before the date of your presentation.
For each date, there are several suggested
references. Any one of these papers
could be the subject of a full-length lecture in itself, and several would take
multiple lectures to cover in any depth.
Accordingly, you face a tough choice: either learn to speak at an
extremely rapid rate, which will be completely incomprehensible, or select
within the material, focusing on a core aspect or idea and leaving the
surrounding material for others to either absorb simply by virtue of having
done the readings, or to raise during the discussion.
Your choice of material and approach to the
presentation should satisfy several properties:
1)
Within
45 minutes you only have time to convey a very small number of technical
points. For example, you probably have
time to state a single theorem and to present its proof, but even if the paper
contains many theorems, unless they are very simple you won’t have time to talk
about more than one (perhaps two).
2)
Every
talk needs to be placed in context. You
need to plan on starting by spending at least a few minutes (10 would be a good
estimate) explaining what the problem attacked by the authors was, why this was
an important problem (what alternative ideas were being advanced at the time?)
and what prior work people need to be aware of. You should say enough about this prior work to place the results
you’ll focus on into a broader setting.
3)
You
should tell us what you’ll be telling us, but in a non-technical way. For example, if you are presenting the
“consistent cuts” paper, give some intuition into what a consistent cut is all
about before trying to define the idea formally. As you move into the formal material, remind
us of this intuition and show us how the formal definition captures the
intuitive one. Your job is to make the
formalism seem natural. If you are
talking about a systems result, try and talk about the thing the authors are
doing from a high level – why did this excite them, and the community? What was the key question they were
interested by, and what was their “angle”?
4)
Now,
perhaps 10 to 15 minutes into your “slot” you’ll get to the more detailed
material. This is where you might
briefly explain a formalism and present a theorem and its proof, or lay out the
major architectural pieces of a system and explain how they were put together.
5)
In
the second half of the technical part of your talk, you might illustrate the
way that a proof works using a small example, or present the experimental data
associated with a systems paper.
Systems people tend to be nearly obsessive about experimental data. It should be presented lovingly with
attention to the insights that can only be gained by really building something
and putting it through its paces. Where
are the fundamental costs in an approach?
How well does it scale? Are
their little warts and hidden problems that the data reveals?
6)
In
the final part of your talk, you should wrap things together by reminding us
what the original premise of the paper you presented was, how the authors
attacked the problem, and what the data revealed about the limitations and
successes of their approach. Express
your opinions at this stage. For a
theory talk, try and put the results back into the broader context. Do they say something broad and powerful, or
is the main insight narrow?
Your
talk will need to be on Powerpoint slides (or some similar system, if you
prefer something else). Powerpoint
makes suggestions about font sizes but in fact you can certainly pack more on a
slide than is possible using the intial choices. Keep in mind that your audience has to be able to read the slides
when projected on a screen at a sensible size!
Either make plastic slides, or have your material on a laptop that can
connect to the overhead display system.
PLEASE DON’T MAKE US WAIT 10 MINUTES WHILE YOU FIDDLE WITH CONNECTIONS
AT THE START OF CLASS. COME EARLY AND
MAKE SURE YOUR DISPLAY SETUP WILL WORK.
DO IT THE DAY BEFORE IF POSSIBLE.
People
who have never given formal talks before often make one of two kinds of
mistakes: they estimate their own pace incorrectly, or they simply make far too
many slides. Generally, figure that you
can cover one slide with text on it per minute. About half of your material should be graphics or illustrations
of some sort – people get bored with endless text.
So,
for a 45 minute talk, you would probably make up something like 30 to 35 slides
of actual text and an addition 10 to 15 that are mostly pictures you’ll explain
during the talk. Often you can copy
some of this material right out of the online versions of papers or from the
web site of the author. However, we
probably won’t have a network connection in the lecture room, so your slides do
need to be self contained.
A
fast speaker who is comfortable in front of a group and familiar with her
slides could perhaps use as many as 60 slides for a 45 minute talk. But she would need to have relatively little
text on each slide to get away with this.
Feel
free to stray from the papers I’ve suggested.
But do read the papers I suggested.
It may be wise to start reading them two weeks early, since many are
tough, high-content papers with a lot of meat to them. People attending the class really should
read them too, but probably will do so the day before your talk and won’t have
time to really pick away at the material and think hard about it. So your job is to take them to a deeper
level of understanding than they will have gotten in reading the papers quickly
over a few beers. And their job is to
have read the papers carefully enough to contribute to a spirited discussion in
class.
Each
student is expected to undertake a project as part of cs614. A typical project would result in a short
paper of the sort we’ll be reading – think in terms of the format and length
associated with papers that appear in the top ACM SIGOPS conference, Symposium
on Operating Systems Principles. These
papers typically take on some question or new idea and present it in a
systematic manner, including material appropriate to evaluating the idea. Your idea might be about operating systems
architecture, protocols, network-level mechanisms (we won’t have time to cover
these in cs614, unfortunately, but they are “fair game” for your project),
etc. If your work is more formal, a
treatment focused on a theoretical topic would be fine, in which case the
better conference to look at for ideas and format is ACM Principles of
Distributed Computing (PODC). For practical
work, you might undertake an implementation, use simulation, or evaluate
existing systems in an aggressive way.
The
hope is that your CS614 project will be of a nature that could be publishable and that
could lead to further research.
[kpb1]This is a big – huge, really – topic for the theory community. I picked a few representative papers but there are probably 200 or 300 papers, literally, in the area. The first paper uses elegant geometrical symmetry arguments to demonstrate the various Byzantine impossibility results. The second one is a ground-breaking paper that overcomes the basic Byzantine limits using a probabilistic algorithm (a theme to which we’ll return later in the course, because it represents the most important emerging area in scalable distributed computing – although the Byzantine model is not really one we use very much). The last papers include the original one introducing the problem and a survey by Hadzilacos and Toueg looking at a variety of solutions, including some of the probabilistic ones that people currently favor.
[kpb2] Whereas the Byzantine work was mostly done in the synchronous model, consensus is posed in a purely asynchronous one. The general view is that impossibility results for the asynchronous model hold in the real world, although this is debatable (and we should debate it). The FLP theorem is quite hard to understand – we’ll need to look at it carefully to gain intuition into the way it works. The Chandra/Toueg result is the best known for the area, but depends on a type of failure detector that, in practice, can only be approximated.
[kpb3] This paper introduces the idea of using “logical” time rather than physical time when designing distributed programs. We’ll be using this concept (indirectly) in many ways throughout the course, so we’ll spend a whole lecture on it.
[kpb4] To finish off the the whole issue of time, we’ll look at a more subtle paper that combines logical and real time. This paper basically introduces the "state machine" approach to fault-tolerance, although it also does some other things with time.
[kpb5] Overall, clock synchronization is a natural topic to consider in light of the preceding papers. The lecture should probably focus on the surveys, although I’ve included two very good synchronization algorithms.
[kpb7] The idea of a consistent cut, and their applications, is a natural outgrowth of the idea of logical cuts,
[kpb8] Transactions were the hot topic in distributed computing during the early 1980’s. But over time the luster faded and by the time of Schmuck’s paper on Quicksilver, only a few pieces of the model were being retained. There are too many papers for this one meeting so we’ll need to focus on some sub-topics.
[kpb9] Group communication emerged for two reasons: as a natural way to group server programs into services but also (simultaneously) as a way to take apart the transactional model into a form of transaction on a set of processes with semantics, but without the negatives of the transactional model itself (which by this time was seen as costly, slow and restrictive).
[kpb10] By now, we’re back to theory: the style of replication supported by groups is called “state machine” replication. Fred wrote a great tutorial on this. Meanwhile, Lamport developed his own approach to implementing state machine replication in groups. A puzzle: how does this relate to what we just discussed in the previous lecture? (Hint: there is a paper by Dolev, Keidar and, I think, Chockler on precisely this question).
[kpb11] We’ll look at the problem of membership agreement. For a while, people thought that membership protocols evaded consensus. By now we know better and hence know that group membership is “impossible” in a purely asynchronous model. Question: is group membership possible in a practical model? By the way, there is a flaw in Aleta’
[kpb12] A sample of some of the other work on multicast. Like transactions there are dozens of systems and papers in this area.
[kpb13] Meanwhile, the network community was inventing the same idea, but with a focus on scalability and a weak notion of reliability. When presenting this paper, it might be useful to pull up a recent technical report by Oznur Ozkasap, me and Zhen Xiao giving some overhead measurements for SRM. Network multicast remains a very hot research area. PGM is one focus of activity, and RMTP-II is a second one, but both are proprietary. SRM has issues but is free and hence widely used.
[kpb14] Our recent focus here at Cornell mixes scalable multicast with some ideas about gossip that were first studied in depth by Demers while he was at Xerox Parc.
[kpb15] Back to the systems community. Their focus, as one might have guessed, is on performance... at the microsecond and cache-stall level. These papers are mostly focused on scheduling issues.
[kpb16] Von Eicken’s work was revolutionary – he tossed out the whole traditional stack to get higher speed, and it worked too! Microsoft ships this approach as its new “winsock” architecture for NT 2000.
[kpb17] A tough call for me – I was tempted to go to
my own paper, “The Next
Generation Internet: Unsafe at Any Speed?” But although I’m fond of my own papers, I decided we should
pursue the community which has spent the last few years struggling with
perspectives involving modularity.
[kpb18] Ben proposed to focus on nomadic and mobile computing, so these references are changed from the original handout.
[kpb19] Another cute idea. Core of JNI, Sun’s new architecture for Java.
[kpb20] An endless debate for the OS community. Seems to have ended inconclusively.
[kpb21] The OS answer to Java?
[kpb22] Over the past 3 years, a major research focus. But waiting for some sort of breakthrough. The work seems sort of dull.
[kpb23] A glimpse of some past Cornell work on this topic, side by side with an “award winning” paper from 1999.
[kpb24] Everyone needs to know their basic numbers....
[kpb25] Historically, a huge topic for the OS community
[kpb26] The interest in performance led to these novel architectures. There are others but these, and RAID, were the main ones. RAID could have been a good topic for us, but we are short on lecture slots!
[kpb27] People played with modularity in the file system, but it seems to have become a dead end.
[kpb28] Nomadic and mobile computing: the next frontier. So far, much of the work focuses on the usual paradigm and tries to emulate it. Time for something completely different? I considered doing some cache consistency papers at this point but we ran out of time...