CS614: Advanced Course in Computer Systems

Spring 2004

Instructor: Ken Birman
4119B Upson, x5-9199
Ken can be found in his office daily, 10am-2pm except at lunch time

TA: Kevin Walsh, Upson 5138 Phone: x5-9201
Kevin's office hours are Tuesday and Thursday before class, 10-11:30

Classes: Tuesdays and Thursdays, 11:40 - 12:55am; OH165

CS 614 is a graduate level course in computing systems, with a strong focus on operating systems and distributed computing. The course is aimed at PhD students seeking broad background in the areas of importance to modern systems researchers, or having an interest in possible research topics. The course involves a considerable amount of reading, and all students who take the class are expected to read the papers for the experience to be satisfactory. Most papers are chosen from the best conferences and journals in systems, and most are available on-line. The papers we�ll be reading are dense, technically meaty papers and the issues they raise are very broad. Expect to be challenged both by the amount of reading and by the difficulty of synthesis � placing work into a broader context that makes sense.

The topics we�ll cover span a range of hot areas within modern systems. The course has three parts. We'll start out with design and performance considerations of operating systems. We'll then shift our attention to the design, performance, and scalability of distributed systems. Finally, we'll look into issues of fault tolerance. Each lecture features two papers. The papers were selected to represent the topic of the lecture, and are often related in some way or another. For example, one paper may be theoretical, while the other practical. Or one paper may propose a paradigm, while the other critiques it. Or the paper may propose alternative paradigms for the same problem.

It is hoped that the class will be small, in which case we�ll share the teaching load. Students in the class will be asked to present one or more of the lectures � I�ll do some too � in such a manner as to ensure that all the students do at least one presentation. Each lecture will consist of a formal presentation, and is expected to go into considerable depth by focusing on some aspect of the material and treating it thoroughly. We�ll be using Powerpoint slides (we'll make the slides available on the web). A presentation will normally last perhaps 30 to 45 minutes. (Each presenter is required to meet with either me or the TA before class in order to discuss the slides.) The lecture period will be followed by a broader discussion that I�ll lead (or the TA should I not be able to make class). During the discussion following each presentation, we�ll try to understand the broader context and importance of each result. All students in the class are expected to participate.

Thorough preparation is mandatory, even if you are not presenting the paper. Before each class, students are required to email to me (ken@cs.cornell.edu) two paragraphs discussing the contributions and at least one major weakness of each of the two papers (one paragraph for each paper). If email is impossible, submit the two paragraphs on paper to me in class.

In addition to reading papers and lecturing, each student will do a project and write a high-quality paper about the project. The paper has to conform to the format and submission rules used by the Symposium on Operating Systems Principles, which is the major conference in this area. Many of the papers we�ll be reading were originally published in this conference. Tips on preparing a paper appear appear here. The paper will be done in four stages:

In the first stage, choose an area based on your interest and background reading, such as replication or multicast. Email your choice to me by January 31, 2003.
In the second stage, do a literature survey, and write a short survey paper (2-3 pages) describing the literature in the field. This paper is due February 20th. Also choose the topic of your paper, and discuss this with me before February 25th.
In the third stage, write a full paper on your topic. This is paper due April 15th. Your first-pass paper will be circulated among the class for anonymous peer review. The reviews are due April 22th.
In the fourth and final stage, revise your paper based on peer reviews. This final paper is due May 8th.

Grading

Grading will be based on the following:

15% Paper summaries.
15% Class participation. I will keep track of this.
10% Survey paper.
15% Project.
15% Final paper, first pass.
10% Review of papers.
20% Final paper + rebuttal, second pass.

Prerequisites

There is no point in taking cs614 if you lack a solid background in the basics. Accordingly, we�ll assume that the students in the class have all taken (and mastered) the material covered in a conventional operating systems class and an architecture class � cs314, cs414. If you didn't earn an A in undergraduate operating systems, you may feel uncomfortable in cs614. An undergraduate database class is also highly recommended (cs432).

Who (click for slides)

Date

Topic

Required Reading

Suggested Reading

Ken Tu 1/27 Organizational Meeting none none

Ken Th 1/29 Concurrency, Threads, and Events. Using Threads in Interactive Systems. Hauser et al. 14th SOSP, Dec 1993. Goal-oriented programming, or composition using events,or threads considered harmful. Van Renesse. ACM SIGOPS European Workshop, Sep 1998.
Eraser. A Dynamic Data Race Detector for Multi-Threaded Programs. Savage et al. 16th SOSP, 1997.

SEDA: An Architecture for Well Conditioned, Scalable Internet Services. Welsh et al. 18th SOSP, Oct 2001.

William Josephson Tu 2/3 File Systems Soft Updates: A Solution to the Metadata Update problem in File Systems. Ganger et al. ACM TOCS 18(2). May 2000. A Fast File System for UNIX. McKusick et al. ACM TOCS 2(3), Aug 1984.
The Zebra striped network file system; Hartman and Ousterhout; 14th SOSP, 1993.

When to forget in the Elephant file system. Santry et al. 17th SOSP, 1999.

The Design and Implementation of a Log-Structured File System. Rosenblum and Ousterhout. 13th SOSP, Oct 1991.

Peter Sirokman Th 2/5 OS Kernels The Performance of �-Kernel-based Systems. H�rtig et al. 16th SOSP, Oct 1997. The UNIX Time-Sharing System. Richie and Thompson.
EMERALDS: A Small-Memory Real-Time Microkernel. Zuberi et al. 17th SOSP, 1999.

The Flux OSKit: A Substrate for OS and Language Research. Ford et al. 16th SOSP, Oct 1997.

Art Munson Tu 2/10 Virtual Memory Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Machines, Rashid et al. IEEE TOC C-37(8), Aug 1988. Integrating Segmentation and Page Protection for Safe, Efficient and Transparent Software Extensions. Chiueh et al. 17th SOSP, 1999.
Sharing and Protection in a Single Address Space Operating System. Chase et al. ACM TOCS, November 1994.

Virtual Memory Primitives for User Programs. Appel and Li. 4th ASPLOS, April 1991.

Mingsheng Hong Th 2/12 VM and I/O IO-Lite: A Unified I/O Buffering and Caching System. Pai et al. 3rd OSDI, Feb 1999. Fbufs: A high bandwidth cross-domain transfer facility. Druschel and Peterson. 14th SOSP, 1993.
A Trace-Driven Comparison of Algorithms for Parallel Prefetching and Caching. Kimbrel et al. 2nd OSDI, Oct 1996.

Software prefetching and caching for Translation Lookaside Buffers. Bala et al. 1st OSDI, Nov 1994.

Bernard Wong Tu 2/17 Multi-Processors Disco: Running Commodity Operating Systems on Scalable Multiprocessors. Bugnion et al. 16th SOSP, 1997. Towards Transparent and Efficient Software Distributed Shared Memory. Scales and Gharachorloo. 16th SOSP, 1997.
Performance Isolation: Sharing and Isolation in Shared-Memory Multiprocessors. Verghese et al. 8th ASPLOS, Oct 1998.

Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system. Gamsa et al. 3rd OSDI, Feb 1999.

Mingsheng Hong Th 2/19 Extensible Kernels Exokernel: an operating system architecture for application-level resource management. Engler et al. 15th SOSP, Dec 1995. Application performance and flexibility on Exokernel systems. Kaashoek et al. 16th SOSP, 1997.
Microkernels meet Recursive Virtual Machines. Ford et al. 2nd OSDI, Oct 1996.

Extensibility, Safety and Performance in the SPIN Operating System. Bershad et al. 15th SOSP, Dec 1995.

Add Distribution/Networking

Benyah Shaparenko Tu 2/24 RPC Implementing Remote Procedure Calls. Birrell and Nelson. ACM TOCS 2(1), Feb. 1984.
RPC in the x-Kernel: Evaluating New Design Techniques. Peterson et al. 12th SOSP, Nov. 1989.

Performance of Firefly RPC. Schroeder and Burrows. ACM TOCS 8(1), 1990.

Vidhyashankar Venkataraman

Th 2/26

Faster!

U-Net: A User-Level Network Interface for Parallel and Distributed Computing. Von Eicken, Basu, Buch and Werner Vogels. 15th SOSP, Dec1995,
Evaluation of the Virtual Interface Architecture (VIA)

Lightweight remote procedure call; Bershad et al. ACM TOCS 8(1), Feb 1990.

Dan Williams Tu 3/2 Network Objects Network Objects. Birrell et al. 14th SOSP, Dec 1993. Fine-Grained Mobility in the Emerald System
Jul et al. ACM TOCS 6(1), Feb 1988.
The Jini architecture for network-centric computing. Waldo. CACM 42(7), Jul 1999.

Linda in Context. Carriero and Gelernter. CACM 32(4), Apr 1989.

Ken Th 3/4 Network File Systems Serverless Network File System. Anderson et al. 15th SOSP, Dec 1995. Frangipani a scalable distributed file system; Thekkath et al. 16th SOSP, 1997.
Measurements of a distributed file system; Baker et al. 13th SOSP, 1991.

Cooperative Caching: Using Remote Client Memory to Improve File System Performance. Dahlin et al. 1st OSDI, Nov 1994.

A low-bandwidth network file system. Muthitacharoen, Chen, Mazi�res18th SOSP

William Josephson Tu 3/9 System Design Hints for Computer Systems Design. Lampson. ACM OSR 15(5), Oct 1983. The Impact of Architectural Trends on Operating System Performance. Rosenblum et al. 15th SOSP, 1995.
Interposition Agents: Transparently Interposing User Code at the System Interface. Jones. 14th SOSP, 1993.

End-to-End Arguments in System Design. Saltzer et al. ACM TOCS 2(4), Nov 1984.

Maya Haridasan Th 3/11 Clusters / Grid Computing A Case for NOW (Networks of Workstations). Anderson et al. IEEE Micro, Feb. 1995. Locality-Aware Request Distributiom in Cluster-Based Network Servers. Pai et al. 8th ASPLOS, Oct. 1998.
Manageability, availability and performance in Porcupine: a highly scalable, cluster-based mail service; Saito et al. 17th SOSP, 1999.

Wide-Area Computing: Resource Sharing on a Large Scale. Grimshaw et al. IEEE Computer 32(5), May 1999.

Prashanth Reddy Tu 3/16 Multimedia / Real-Time Scheduling and IPC Mechanisms for Continuous Media. Govindan and Anderson. 13th SOSP, Oct 1991. An Active Service Framework and its Application to Real-time Multimedia Transcoding. Amir et al. ACM SIGCOMM, Sep 1998.
Handling Audio and Video Streams in a Distributed Environment. Jones and Hopper. 14th SOSP, 1993.

A High Performance Multi-Structured File System Design. Muller and Pasquale. 13th SOSP, 1991.

User-specified Adaptive Scheduling in a Streaming Media Network. Hicks, Nagarajan, and Van Renesse. OpenArch, April 2003.

Aseem Bajaj Th 3/18 Publish/Subscribe The Information Bus---an Architecture for Extensible Distributed Systems. Oki et al. 14th SOSP, Dec. 1993. Mesh-based Content Routing using XML. Snoeren et al. 18th SOSP, Oct 2001.
Design and evaluation of a wide-area event notification service. Carzaniga et al. TOCS 19(3), Aug 2001.

Matching Events in a Content-based Subscription System. Aguilera et al. 18th PODC, 1999.

Tu 3/23 Spring Break none none

Th 3/25

Mohamed Elhawary Tu 3/30 Mobility Disconnected operation in the Coda File System; Kistler and Satyanarayanan; ACM TOCS 10(1), Feb 1992. Agile Application-Aware Adaptation for Mobility. B. Noble et al. 16th SOSP, Oct 1997.
Rover: a toolkit for mobile information access, Joseph et al. 15th SOSP, 1995.

Flexible Update Propagation for Weakly Consistent Replication. Petersen et al. 16th SOSP, Oct 1997.

Mahesh Balakrishnan Th 4/1 Large Scale Sharing File and storage systems: The Google file system. Ghemawat, Gobioff, Leung. SOSP-19, Oct 2003.   A Distributed Decentralized Information Storage and Retrieval System. Ian Clark. University of Edinburgh, 1999 (Original unpublished technical report on Freenet).
Why Gnutella can't scale, no really? Ritter, Feb 2001.

Storage management and caching in PAST, a large scale, persistent peer-to-peer storage utility. Rowstron and Druschel. 18th SOSP, Oct 2001.

Milo Polte Tu 4/6 Epidemic Techniques Epidemic algorithms for replicated database maintenance; Demers et al.   6th PODC, 1987. Bimodal Multicast. Birman et al. ACM TOCS 17(2), May 1999.
The REFDBMS Distributed Bibliographic Database System. Golding et al. 1994 Usenix Winter Technical Conference , Jan 1994.

Astrolabe: A Robust and Scalable Technology for Distributed System Monitoring, Management, and Data Mining. Van Renesse et al. ACM TOCS 2003 (to appear).

Hitesh Ballani Th 4/8 Application-Level Multicast Routing Enabling Conferencing Applications on the Internet using an Overlay Multicast Architecture, Chu et al. ACM SIGCOMM, Aug 2001. SCRIBE: A large-scale and decentralized application-level multicast infrastructure. Castro et al. IEEE Journal on Selected Areas in communications (JSAC), 2002.
SelectCast. Bozdog et al. Draft, 2003.

Overcast: Reliable Multicasting with an Overlay Network Jannotti et al. 4th OSDI, Dec 2000.

Add Fault Tolerance

Biswanath Panda Tu 4/13 Ordering and Consistent Cuts Time, Clocks, and the Ordering of Events in a Distributed System. Lamport. CACM 21(7). July 1978.

Distributed snapshots: Determining global states of distributed systems. Chandy, Lamport. ACM TOCS 3(1), 1985, 63-75.

Maya Haridasan Th 4/15 Time Optimal Clock Synchronization. Srikanth and Toueg. JACM 34(3), July 1987. Understanding protocols for Byzantine Clock Synchronization. Schneider. Cornell CS TR 87-859. Aug 1987.
Using Time Instead of Timeout for Fault-Tolerant Distributed Systems. Lamport. ACM TOPLAS 6:2, 1974

Probabilistic Internal Clock Synchronization. Cristian and Fetzer.

Krzysztof Ostrowski Tu 4/20 Consensus Impossibility of Distributed Consensus with One Faulty Process. Fisher et al. JACM 32(2), Apr 1985. Revisiting the Paxos Algorithm De Prisco et al. WDAG, Sep 1997.
The weakest failure detector for solving consensus; Chandra et al. J. ACM 43, 4, Jul. 1996.

Paxos Made Simple. Lamport. ACM SIGACT NEWS 32(4). Dec. 2001.

Brice Daniels Th 4/22 Quorums and Voting Weighted Voting for Replicated Data. Gifford. 7th SOSP, Dec 1979.
Nonblocking Commit Protocols. Skeen. ACM SIGMOD, May 1981.
The Generalized Tree Quorum Protocol: An Efficient Approach for Managing Replicated Data. Agrawal and El Abbadi. ACM TODS 17(4), Dec 1992.
Memory Coherency in Shared Virtual Memory Systems. ACM TOCS 7(4), Nov 1989.

Consistency in Partitioned Networks. Davidson et al. ACM Computing Surveys 17(3), Sep 1985.

Bernard Wong Tu 4/27 Practical Replication The Costs and Limits of Availability for Replicated Services. Yu and Vahdat. 18th SOSP, Oct 2001. Maintaining Availability in Partitioned Replicated Databases. Abbadi and Toueg. ACM TODS 14(2), Jun 1989.

Dangers of Replication and a Solution. Gray et al. ACM SIGMOD, Jun 1996.

Krzysztof Ostrowski Th 4/29 Virtual Synchrony The Process Group Approach to Reliable Distributed Computing. Birman. CACM, Dec 1993, 36(12):37-53. A modular approach to fault-tolerant broadcast and related problems. Hadzilacos and Toueg. Cornell CS TR 94-1425,May 1994.
Building reliable, high-performance communication systems from components. Liu et al. 17th SOSP, 1999.

Understanding the Limitations of Causally and Totally Ordered Communication. Cheriton and Skeen. 14th SOSP, 1993.

Rohan Murthy Tu 5/4 Message Logging
(was transactions)
A Survey of Rollback-Recovery Protocols in Message-Passing Systems. Elnohazy et al. ACM Computing Surveys, 34(3), Sep 2002 Experience with Transactions in Quicksilver. Schmuck and Wyllie. 13th SOSP, Oct 1991.
Distributed programming in Argus; Liskov; CACM 31, 3, Mar. 1988.

Isolation-Only Transactions for Mobile Computing, Lu and Satyanarayanan. ACM OSR 24(2), Apr 1994.

Radu Haiduc Th 5/6 Byzantine Techniques The Byzantine Generals Problem. Lamport et al. ACM TOPLAS 4, 1982. Randomized Byzantine Generals. Rabin. FOCS, 1983.
Byzantine Quorum Systems. Malkhi and Reiter.

Practical Byzantine Fault Tolerance. Castro and Liskov. 3rd OSDI, Feb 1999.

How to prepare a cs614 presentation

Each time we meet, we�ll start with a fairly short presentation, lasting perhaps 45 minutes, after which there will be a discussion lasting an additional half hour or so. Preparing the lecture is a significant challenge and you should expect to spend as much as a week or two working on it before the date of your presentation.

For each date, there are several suggested references. Any one of these papers could be the subject of a full-length lecture in itself, and several would take multiple lectures to cover in any depth. Accordingly, you face a tough choice: either learn to speak at an extremely rapid rate, which will be completely incomprehensible, or select within the material, focusing on a core aspect or idea and leaving the surrounding material for others to either absorb simply by virtue of having done the readings, or to raise during the discussion.

Your choice of material and approach to the presentation should satisfy several properties:

1) Within 45 minutes you only have time to convey a very small number of technical points. For example, you probably have time to state a single theorem and to present its proof, but even if the paper contains many theorems, unless they are very simple you won�t have time to talk about more than one (perhaps two).

2) Every talk needs to be placed in context. You need to plan on starting by spending at least a few minutes (10 would be a good estimate) explaining what the problem attacked by the authors was, why this was an important problem (what alternative ideas were being advanced at the time?) and what prior work people need to be aware of. You should say enough about this prior work to place the results you�ll focus on into a broader setting.

3) You should tell us what you�ll be telling us, but in a non-technical way. For example, if you are presenting the �consistent cuts� paper, give some intuition into what a consistent cut is all about before trying to define the idea formally. As you move into the formal material, remind us of this intuition and show us how the formal definition captures the intuitive one. Your job is to make the formalism seem natural. If you are talking about a systems result, try and talk about the thing the authors are doing from a high level � why did this excite them, and the community? What was the key question they were interested by, and what was their �angle�?

4) Now, perhaps 10 to 15 minutes into your �slot� you�ll get to the more detailed material. This is where you might briefly explain a formalism and present a theorem and its proof, or lay out the major architectural pieces of a system and explain how they were put together.

5) In the second half of the technical part of your talk, you might illustrate the way that a proof works using a small example, or present the experimental data associated with a systems paper. Systems people tend to be nearly obsessive about experimental data. It should be presented lovingly with attention to the insights that can only be gained by really building something and putting it through its paces. Where are the fundamental costs in an approach? How well does it scale? Are their little warts and hidden problems that the data reveals?

6) In the final part of your talk, you should wrap things together by reminding us what the original premise of the paper you presented was, how the authors attacked the problem, and what the data revealed about the limitations and successes of their approach. Express your opinions at this stage. For a theory talk, try and put the results back into the broader context. Do they say something broad and powerful, or is the main insight narrow?

Your talk will need to be on Powerpoint slides (or some similar system, if you prefer something else). Powerpoint makes suggestions about font sizes but in fact you can certainly pack more on a slide than is possible using the intial choices. Keep in mind that your audience has to be able to read the slides when projected on a screen at a sensible size! Either make plastic slides, or have your material on a laptop that can connect to the overhead display system.

PLEASE DON�T MAKE US WAIT 10 MINUTES WHILE YOU FIDDLE WITH CONNECTIONS AT THE START OF CLASS. COME EARLY AND MAKE SURE YOUR DISPLAY SETUP WILL WORK. DO IT THE DAY BEFORE IF POSSIBLE.

People who have never given formal talks before often make one of two kinds of mistakes: they estimate their own pace incorrectly, or they simply make far too many slides. Generally, figure that you can cover one slide with text on it per minute. About half of your material should be graphics or illustrations of some sort � people get bored with endless text.

So, for a 45 minute talk, you would probably make up something like 30 to 35 slides of actual text and an addition 10 to 15 that are mostly pictures you�ll explain during the talk. Often you can copy some of this material right out of the online versions of papers or from the web site of the author. However, we probably won�t have a network connection in the lecture room, so your slides do need to be self contained.

A fast speaker who is comfortable in front of a group and familiar with her slides could perhaps use as many as 60 slides for a 45 minute talk. But she would need to have relatively little text on each slide to get away with this.

Feel free to stray from the papers I�ve suggested. But do read the papers I suggested. It may be wise to start reading them two weeks early, since many are tough, high-content papers with a lot of meat to them. People attending the class really should read them too, but probably will do so the day before your talk and won�t have time to really pick away at the material and think hard about it. So your job is to take them to a deeper level of understanding than they will have gotten in reading the papers quickly over a few beers. And their job is to have read the papers carefully enough to contribute to a spirited discussion in class.

Who (click for slides)	Date	Topic	Required Reading	Suggested Reading
Ken	Tu 1/27	Organizational Meeting	none	none
Ken	Th 1/29	Concurrency, Threads, and Events.	Using Threads in Interactive Systems. Hauser et al. 14th SOSP, Dec 1993.	Goal-oriented programming, or composition using events,or threads considered harmful. Van Renesse. ACM SIGOPS European Workshop, Sep 1998. Eraser. A Dynamic Data Race Detector for Multi-Threaded Programs. Savage et al. 16th SOSP, 1997.
			SEDA: An Architecture for Well Conditioned, Scalable Internet Services. Welsh et al. 18th SOSP, Oct 2001.
William Josephson	Tu 2/3	File Systems	Soft Updates: A Solution to the Metadata Update problem in File Systems. Ganger et al. ACM TOCS 18(2). May 2000.	A Fast File System for UNIX. McKusick et al. ACM TOCS 2(3), Aug 1984. The Zebra striped network file system; Hartman and Ousterhout; 14th SOSP, 1993. When to forget in the Elephant file system. Santry et al. 17th SOSP, 1999.
			The Design and Implementation of a Log-Structured File System. Rosenblum and Ousterhout. 13th SOSP, Oct 1991.
Peter Sirokman	Th 2/5	OS Kernels	The Performance of �-Kernel-based Systems. H�rtig et al. 16th SOSP, Oct 1997.	The UNIX Time-Sharing System. Richie and Thompson. EMERALDS: A Small-Memory Real-Time Microkernel. Zuberi et al. 17th SOSP, 1999.
			The Flux OSKit: A Substrate for OS and Language Research. Ford et al. 16th SOSP, Oct 1997.
Art Munson	Tu 2/10	Virtual Memory	Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Machines, Rashid et al. IEEE TOC C-37(8), Aug 1988.	Integrating Segmentation and Page Protection for Safe, Efficient and Transparent Software Extensions. Chiueh et al. 17th SOSP, 1999. Sharing and Protection in a Single Address Space Operating System. Chase et al. ACM TOCS, November 1994.
			Virtual Memory Primitives for User Programs. Appel and Li. 4th ASPLOS, April 1991.
Mingsheng Hong	Th 2/12	VM and I/O	IO-Lite: A Unified I/O Buffering and Caching System. Pai et al. 3rd OSDI, Feb 1999.	Fbufs: A high bandwidth cross-domain transfer facility. Druschel and Peterson. 14th SOSP, 1993. A Trace-Driven Comparison of Algorithms for Parallel Prefetching and Caching. Kimbrel et al. 2nd OSDI, Oct 1996.
			Software prefetching and caching for Translation Lookaside Buffers. Bala et al. 1st OSDI, Nov 1994.
Bernard Wong	Tu 2/17	Multi-Processors	Disco: Running Commodity Operating Systems on Scalable Multiprocessors. Bugnion et al. 16th SOSP, 1997.	Towards Transparent and Efficient Software Distributed Shared Memory. Scales and Gharachorloo. 16th SOSP, 1997. Performance Isolation: Sharing and Isolation in Shared-Memory Multiprocessors. Verghese et al. 8th ASPLOS, Oct 1998.
			Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system. Gamsa et al. 3rd OSDI, Feb 1999.
Mingsheng Hong	Th 2/19	Extensible Kernels	Exokernel: an operating system architecture for application-level resource management. Engler et al. 15th SOSP, Dec 1995.	Application performance and flexibility on Exokernel systems. Kaashoek et al. 16th SOSP, 1997. Microkernels meet Recursive Virtual Machines. Ford et al. 2nd OSDI, Oct 1996.
			Extensibility, Safety and Performance in the SPIN Operating System. Bershad et al. 15th SOSP, Dec 1995.

Add Distribution/Networking
Benyah Shaparenko	Tu 2/24	RPC	Implementing Remote Procedure Calls. Birrell and Nelson. ACM TOCS 2(1), Feb. 1984.	RPC in the x-Kernel: Evaluating New Design Techniques. Peterson et al. 12th SOSP, Nov. 1989.
			Performance of Firefly RPC. Schroeder and Burrows. ACM TOCS 8(1), 1990.
Vidhyashankar Venkataraman	Th 2/26	Faster!	U-Net: A User-Level Network Interface for Parallel and Distributed Computing. Von Eicken, Basu, Buch and Werner Vogels. 15th SOSP, Dec1995,	Evaluation of the Virtual Interface Architecture (VIA)
			Lightweight remote procedure call; Bershad et al. ACM TOCS 8(1), Feb 1990.
Dan Williams	Tu 3/2	Network Objects	Network Objects. Birrell et al. 14th SOSP, Dec 1993.	Fine-Grained Mobility in the Emerald System Jul et al. ACM TOCS 6(1), Feb 1988. The Jini architecture for network-centric computing. Waldo. CACM 42(7), Jul 1999.
			Linda in Context. Carriero and Gelernter. CACM 32(4), Apr 1989.
Ken	Th 3/4	Network File Systems	Serverless Network File System. Anderson et al. 15th SOSP, Dec 1995.	Frangipani a scalable distributed file system; Thekkath et al. 16th SOSP, 1997. Measurements of a distributed file system; Baker et al. 13th SOSP, 1991.
			Cooperative Caching: Using Remote Client Memory to Improve File System Performance. Dahlin et al. 1st OSDI, Nov 1994.
			A low-bandwidth network file system. Muthitacharoen, Chen, Mazi�res18th SOSP
William Josephson	Tu 3/9	System Design	Hints for Computer Systems Design. Lampson. ACM OSR 15(5), Oct 1983.	The Impact of Architectural Trends on Operating System Performance. Rosenblum et al. 15th SOSP, 1995. Interposition Agents: Transparently Interposing User Code at the System Interface. Jones. 14th SOSP, 1993.
			End-to-End Arguments in System Design. Saltzer et al. ACM TOCS 2(4), Nov 1984.
Maya Haridasan	Th 3/11	Clusters / Grid Computing	A Case for NOW (Networks of Workstations). Anderson et al. IEEE Micro, Feb. 1995.	Locality-Aware Request Distributiom in Cluster-Based Network Servers. Pai et al. 8th ASPLOS, Oct. 1998. Manageability, availability and performance in Porcupine: a highly scalable, cluster-based mail service; Saito et al. 17th SOSP, 1999.
			Wide-Area Computing: Resource Sharing on a Large Scale. Grimshaw et al. IEEE Computer 32(5), May 1999.
Prashanth Reddy	Tu 3/16	Multimedia / Real-Time	Scheduling and IPC Mechanisms for Continuous Media. Govindan and Anderson. 13th SOSP, Oct 1991.	An Active Service Framework and its Application to Real-time Multimedia Transcoding. Amir et al. ACM SIGCOMM, Sep 1998. Handling Audio and Video Streams in a Distributed Environment. Jones and Hopper. 14th SOSP, 1993. A High Performance Multi-Structured File System Design. Muller and Pasquale. 13th SOSP, 1991.
			User-specified Adaptive Scheduling in a Streaming Media Network. Hicks, Nagarajan, and Van Renesse. OpenArch, April 2003.
Aseem Bajaj	Th 3/18	Publish/Subscribe	The Information Bus---an Architecture for Extensible Distributed Systems. Oki et al. 14th SOSP, Dec. 1993.	Mesh-based Content Routing using XML. Snoeren et al. 18th SOSP, Oct 2001. Design and evaluation of a wide-area event notification service. Carzaniga et al. TOCS 19(3), Aug 2001.
			Matching Events in a Content-based Subscription System. Aguilera et al. 18th PODC, 1999.
	Tu 3/23	Spring Break	none	none
	Th 3/25
Mohamed Elhawary	Tu 3/30	Mobility	Disconnected operation in the Coda File System; Kistler and Satyanarayanan; ACM TOCS 10(1), Feb 1992.	Agile Application-Aware Adaptation for Mobility. B. Noble et al. 16th SOSP, Oct 1997. Rover: a toolkit for mobile information access, Joseph et al. 15th SOSP, 1995.
			Flexible Update Propagation for Weakly Consistent Replication. Petersen et al. 16th SOSP, Oct 1997.
Mahesh Balakrishnan	Th 4/1	Large Scale Sharing	File and storage systems: The Google file system. Ghemawat, Gobioff, Leung. SOSP-19, Oct 2003.	A Distributed Decentralized Information Storage and Retrieval System. Ian Clark. University of Edinburgh, 1999 (Original unpublished technical report on Freenet). Why Gnutella can't scale, no really? Ritter, Feb 2001.
			Storage management and caching in PAST, a large scale, persistent peer-to-peer storage utility. Rowstron and Druschel. 18th SOSP, Oct 2001.
Milo Polte	Tu 4/6	Epidemic Techniques	Epidemic algorithms for replicated database maintenance; Demers et al. 6th PODC, 1987.	Bimodal Multicast. Birman et al. ACM TOCS 17(2), May 1999. The REFDBMS Distributed Bibliographic Database System. Golding et al. 1994 Usenix Winter Technical Conference , Jan 1994.
			Astrolabe: A Robust and Scalable Technology for Distributed System Monitoring, Management, and Data Mining. Van Renesse et al. ACM TOCS 2003 (to appear).
Hitesh Ballani	Th 4/8	Application-Level Multicast Routing	Enabling Conferencing Applications on the Internet using an Overlay Multicast Architecture, Chu et al. ACM SIGCOMM, Aug 2001.	SCRIBE: A large-scale and decentralized application-level multicast infrastructure. Castro et al. IEEE Journal on Selected Areas in communications (JSAC), 2002. SelectCast. Bozdog et al. Draft, 2003.
			Overcast: Reliable Multicasting with an Overlay Network Jannotti et al. 4th OSDI, Dec 2000.
Add Fault Tolerance
Biswanath Panda	Tu 4/13	Ordering and Consistent Cuts	Time, Clocks, and the Ordering of Events in a Distributed System. Lamport. CACM 21(7). July 1978.
			Distributed snapshots: Determining global states of distributed systems. Chandy, Lamport. ACM TOCS 3(1), 1985, 63-75.
Maya Haridasan	Th 4/15	Time	Optimal Clock Synchronization. Srikanth and Toueg. JACM 34(3), July 1987.	Understanding protocols for Byzantine Clock Synchronization. Schneider. Cornell CS TR 87-859. Aug 1987. Using Time Instead of Timeout for Fault-Tolerant Distributed Systems. Lamport. ACM TOPLAS 6:2, 1974
			Probabilistic Internal Clock Synchronization. Cristian and Fetzer.
Krzysztof Ostrowski	Tu 4/20	Consensus	Impossibility of Distributed Consensus with One Faulty Process. Fisher et al. JACM 32(2), Apr 1985.	Revisiting the Paxos Algorithm De Prisco et al. WDAG, Sep 1997. The weakest failure detector for solving consensus; Chandra et al. J. ACM 43, 4, Jul. 1996.
			Paxos Made Simple. Lamport. ACM SIGACT NEWS 32(4). Dec. 2001.
Brice Daniels	Th 4/22	Quorums and Voting	Weighted Voting for Replicated Data. Gifford. 7th SOSP, Dec 1979.	Nonblocking Commit Protocols. Skeen. ACM SIGMOD, May 1981. The Generalized Tree Quorum Protocol: An Efficient Approach for Managing Replicated Data. Agrawal and El Abbadi. ACM TODS 17(4), Dec 1992. Memory Coherency in Shared Virtual Memory Systems. ACM TOCS 7(4), Nov 1989.
			Consistency in Partitioned Networks. Davidson et al. ACM Computing Surveys 17(3), Sep 1985.
Bernard Wong	Tu 4/27	Practical Replication	The Costs and Limits of Availability for Replicated Services. Yu and Vahdat. 18th SOSP, Oct 2001.	Maintaining Availability in Partitioned Replicated Databases. Abbadi and Toueg. ACM TODS 14(2), Jun 1989.
			Dangers of Replication and a Solution. Gray et al. ACM SIGMOD, Jun 1996.
Krzysztof Ostrowski	Th 4/29	Virtual Synchrony	The Process Group Approach to Reliable Distributed Computing. Birman. CACM, Dec 1993, 36(12):37-53.	A modular approach to fault-tolerant broadcast and related problems. Hadzilacos and Toueg. Cornell CS TR 94-1425,May 1994. Building reliable, high-performance communication systems from components. Liu et al. 17th SOSP, 1999.
			Understanding the Limitations of Causally and Totally Ordered Communication. Cheriton and Skeen. 14th SOSP, 1993.
Rohan Murthy	Tu 5/4	Message Logging (was transactions)	A Survey of Rollback-Recovery Protocols in Message-Passing Systems. Elnohazy et al. ACM Computing Surveys, 34(3), Sep 2002	Experience with Transactions in Quicksilver. Schmuck and Wyllie. 13th SOSP, Oct 1991. Distributed programming in Argus; Liskov; CACM 31, 3, Mar. 1988. Isolation-Only Transactions for Mobile Computing, Lu and Satyanarayanan. ACM OSR 24(2), Apr 1994.
Radu Haiduc	Th 5/6	Byzantine Techniques	The Byzantine Generals Problem. Lamport et al. ACM TOPLAS 4, 1982.	Randomized Byzantine Generals. Rabin. FOCS, 1983. Byzantine Quorum Systems. Malkhi and Reiter.
			Practical Byzantine Fault Tolerance. Castro and Liskov. 3rd OSDI, Feb 1999.

CS614: Advanced Course in Computer Systems

Spring 2004

Instructor: Ken Birman 4119B Upson, x5-9199 Ken can be found in his office daily, 10am-2pm except at lunch time

TA: Kevin Walsh, Upson 5138 Phone: x5-9201 Kevin's office hours are Tuesday and Thursday before class, 10-11:30

Grading

Prerequisites

Who (click for slides)

Date

Topic

Required Reading

Suggested Reading

Add Fault Tolerance

Instructor: Ken Birman
4119B Upson, x5-9199
Ken can be found in his office daily, 10am-2pm except at lunch time

TA: Kevin Walsh, Upson 5138 Phone: x5-9201
Kevin's office hours are Tuesday and Thursday before class, 10-11:30