ENSEMBLE GROUPWARE SYSTEM

K. BIRMAN, K. GUO, M. HAYDEN, T. HICKEY, R. FRIEDMAN,
S. MAFFEIS, R. VAN RENESSE, A. VAYSBURD, W. VOGELS.

PROJECT SUMMARY

Under the direction of Ken Birman and Robbert van Renesse, Cornell's Horus research project is developing a new generation of groupware communication tools. Very briefly, this effort seeks to introduce guarantees such as reliability, high availability, fault-tolerance, consistency, security and real-time responsiveness into applications that run on modern networks.

Our broad approach is to focus on what we call process group communication structures that arise in many settings involving cluster-style computing, scaleable servers, groupware and conferencing, distributed systems management, fault-tolerance and other execution guarantees, replicated data or computing, distributed coordination or control, and so forth. Group computing has been employed successfully in a great variety of real-world settings. Our emphasis is on the underlying communication systems support for this model, on simplifying and standardizing the interfaces within our support environment, and on making the model as transparent (and hence as easy to use) as possible.

This short introduction to our effort starts with an overview, and then provides a series of subsections amplifying on technical points raised.

Horus/C, Electra and the HOT object-oriented tools.

Up to the present, our project has developed two related software systems, both solving the group communication problem, but differing in focus. Our first is called Horus/C, and was developed over a five year period that started in 1991. In building this system, our intention was to reimplement the mechanisms first used in the much earlier Isis Toolkit, but in a manner that would demonstrate greater flexibility and performance than was possible with Isis.

Horus/C achieved these goals through an architecture that supports group communication and the virtual synchrony runtime model, but without imposing the model on applications that need something weaker. The architecture is layered, and the layers supporting a particular group communication application are selected at runtime. By hiding the system behind standard interfaces, we have demonstrated a good degree of transparency: this tactic allows us to slip groupware mechanisms into applications that were not designed with group communication as an explicit goal. For example, Horus supports an interface to the CORBA architecture called Electra. Electra makes Horus a highly modular extension to a widely accepted industry standard.

Electra in turn is implemented over another object-oriented interface to Horus called HOT. HOT and Electra are quite mature and has been adopted by a number of users; primarily Isis Toolkit users who have migrated towards Horus to benefit from its greater performance and flexibility. HOT can also be used independently from Electra, as a C++ toolkit for Horus.

Horus/C has demonstrated sub-millisecond end-to-end latency, and sustained throughputs of as high as 5,000 null multicasts per second, within a group of five processes executed on standard SUN Sparcstation 10's over Ethernet. These numbers are roughly 10 times superior to those of the Isis Toolkit. However, it should be noted that the Horus/C environment is more focused in comparison to the older Isis system. Isis evolved, over time, to become a form of distributed operating system, complete with its own name servers, spooling services, and so forth. Horus/C only provides support for process groups and group communication and has not been used to implement such a full-scale distributed computing environment.

The "/C" extension denotes the choice of implementation language: this version of our system was coded in C, and in fact was targeted primarily at UNIX environments with high speed Ethernet or ATM communication substrates. The expected application is a cluster-style server, possibly with real-time and other demanding performance-intensive requirements. For example, recent work has demonstrated how to build a telephone switch coprocessor using Horus and an SP2 or a similar cluster multicomputer. Horus orchestrates the management issues that arise as nodes are swapped on and off line, and the overall coprocessor is able to sustain 22,000 SS7 telephone-call routing requests per second, without disruption even when failures or recoveries occur. Such performance is good enough to impress the telecommunications community. Further work on this problem will demonstrate scalability of both coprocessor memory resources and computing performance as a function of the number of nodes, and will extend these results into other server settings, such as Web and file-system servers.

Ensemble System.

As Horus/C has matured, we have also encountered issues that recently lead to a complete reimplementation of the system using a subset of the ML programming language. To avoid confusion, we have begun to call this version of our system Ensemble. The subset of ML employed for this work translates directly into C, which can then be compiled in a normal manner, and makes no use of ML's garbage collection features. Thus, the choice of ML has no negative performance implications, and the code itself looks like C++. We have interfaces for coding applications in C, C++, Tcl/Tk, Java, ML, etc.

By moving to ML, we have made it possible to use formal verification tools to prove the correctness of critical Horus protocols and algorithms. The ML version of the system is also amenable to semi-automated protocol optimizations, which have slashed overhead and latency for heavily used protocols: latency is as low as 75us on ATM, and throughput as high as 80,000 multicasts per second. We increasingly think of "protocol compilation" as a part of our task.

An important shift in emphasis has occurred, however, that may lead Ensemble away from Horus/C over time. As noted above, the focus of Horus/C is on cluster-style architectures in which problems like server replication and load-balanced fault-tolerance are major goals. With Ensemble, we hope to broaden the applicability of our work by supporting PC applications. These would include clusters of PC's used to support servers, but also direct groupings of applications running on PC clients. In addition, whereas server applications are typically targeted for LAN's, Ensemble will also support scaleable WAN-based groupware applications. This shift in thinking leads us to characterize Ensemble as a "groupware programming environment," and to emphasize in our research such problems as media transport protocols and embedding Ensemble into media viewers. Over time, we seek to develop a comprehensive set of tools for building multimedia conferencing systems and applications in a variety of communication environments. The Ensemble reliability, fault-tolerance and security properties brings substantial benefits to such applications, while the virtual synchrony model supports easy management and dynamic reconfiguration of the applications themselves.

As a groupware programming environment, Ensemble would most likely be used from the Java language, and accessed as a plug-in to Netscape or Internet Explorer.

Project Vision for the Next Five Years

The vision of our effort revolves around the seamless, highly transparent, introduction of "strong properties" into network applications developed using standard tools and programming practices. A fundamental premise underlying our work is that most critical applications are being developed using conventional off-the-shelf building blocks and combined into applications using standard techniques. We have come to believe that even the most critical applications are essentially forced to do this because this represents the only practical way to take advantage of modern computing technology. The challenge, as we see it, is to "harden" systems constructed in this manner.

Thus, the Ensemble effort has a dual mission. On the one hand, we need to improve our understand of the fundamental scientific issues that underly reliability, consistency, guaranteed real-time performance, and security in distributed settings. The resulting tools need to be powerful and appealing when used directly, as is likely to be the case in applications explicitly developed as groupware systems or computer-supported collaborative work systems. At the same time, however, we need to be able to slide fault-tolerance in "under the surface" in applications that depend critically upon a specific server or data object that should be replicated for high availability. We need a way to introduce security mechanisms, also transparently, so as to protect an application against a potentially hostile environment, or a critical system against a possibly untrustworthy application.

If we succeed at Cornell, Ensemble may become popular in a way that previous group communication tools have not. Whereas systems like Isis or Horus/C were fairly difficult to use, and in the case of Isis, were also rigid in their interpretation of "reliability", Ensemble should be easy to use and flexible. The freeware licensing of Ensemble minimizes the practical and legal barriers to its widespread adoption. And, such a success would pave the way for future standards in the area of groupware tools, or their introduction into standard operating systems as a common feature, like file systems, TCP/IP communication, and Web technologies.

CLUSTER COMPUTING AND GROUPWARE APPLICATIONS

The short sections that follow amplify briefly upon the technical topics raised above.

Application domain.

We have emphasized that our work on Ensemble is directed towards support for conferencing and groupware applications in conventional PC environments. In fact, we focus on that subset of applications in which guarantees or "strong properties" are likely to be important. Examples of these include applications for business conferencing on sensitive issues, military intelligence analysis, collaboration on engineering and business tasks, power systems management, air traffic control, and telecommunications switching system management. All of these applications exhibit a mixture of reliability and security needs that exceed the capabilities of the Web and associated network technologies.

Success in tackling these problems involves the resolution of issues at multiple levels of the system. Close to the user, our work focuses on communications technologies that support groups of individuals who collaborate in rich, computer-supported environments. Within a server, they offer support for cluster management and communication services of the sorts enumerated in the call for position papers. Close to the network, the same communication tools offer tools for interconnecting servers in WANs and for network management and control.

An important issue for us will be to arrive at an appropriate embedding of Ensemble into the new generation of application-oriented tools for active displays, such as Java. To accomplish this, we expect to support Ensemble as a plug-in for Netscape and Internet Explorer, in addition to more traditional library or kernel embeddings of the technology. We will need to develop appropriate Java interfaces for accessing Ensemble's groupware features, and for integrating these with continuous media and various media processing applications. Protocol work will be required at the level of guaranteed quality of service for groupware configurations of the system. We anticipate that our effort will invest significant resources on these topics in coming years.

Status of Isis Toolkit Effort

We suggested that the success of the Isis Toolkit points to an emerging market for groupware solutions. A first generation groupware system, the Isis Toolkit (SDK), became popular during the late 1980's through a free public distribution, and later migrated into commercial use. Isis Distributed Systems, the company that commercialized this technology, has used it in the communications infrastructure of the New York Stock Exchange, the new Swiss Electronic Bourse, the next generation French Air Traffic Control System, AMD's FAB-25 factory-floor process control system, Hiper-D (a prototype of the next generation naval AEGIS system), MCI's back-office billing system, and hundreds of other demanding systems. The company has also built a cluster multiprocessor, RADIO, using Isis; this machine was selected as best new product of the year by Interop 95 and PC magazine in 1995.

However, the need today is for scalability and flexibility beyond the capabilities of the Isis solutions. Pressing requirements for security, group communication, and reliability are now seen in settings where Isis could not easily be used, such as the Web. Moreover, we believe that Isis is too complex to use, and requires a mindset too different from "standard" ways of developing distributed systems. The Horus/C and Ensemble systems respond to these needs.

Virtual Synchrony Execution Model.

Although this brief discussion can't include details, the essence of our approach is to use a rigorous, mathematically justifiable model called "virtual synchrony." Virtual synchrony is conceptual structure that makes it easy to build groups of cooperating programs, for example to replicate a critical service or type of data, or to subdivide a costly computational task. This model is presented through a set of easily used software tools, or hidden under existing application program interfaces. The benefit of virtual synchrony is that the designer faces a simpler runtime model, and can draw on powerful theoretical tools to reason about his application and to prove that it offers desired properties.

As the term suggests, a virtually synchronous system is one that mimics a synchronous one. The user thinks in terms of a very simple, step by step, execution model. The true execution is less stringently ordered, but only in ways that would not be evident to a typical user. Virtual synchrony, particularly in a primary-component partitioning model, is known to provide strong guarantees of consistency in replicated data, and is a key to our work. Our experience suggests that virtual synchrony is a completely tractable property for use in "real" distributed settings, despite well known limits on the ability of such systems to tolerate failures.

Security in Distributed Systems.

Our group cooperates closely with Danny Dolev's Transis project at the Hebrew University of Jerusalem, and Bob Constable's NuPrl effort at Cornell. Jointly, these collaborations are attacking issues of security and trust in the context of groupware systems layered over Ensemble.

To date, we have developed Ensemble protocol layers that can use Fortezza or Kerberos to encrypt sensitive data, authenticate process membership in process groups, and sign messages. In the future, we are exploring high availability and partitioned operation issues as they arise in the Fortezza and Kerberos architectures, the use of split secrets to spread sensitive information within a group so that compromised members are unable to reveal the secret, and other related issues in the security of highly available, highly reliable distributed systems. We are also investigating the challenges of proving properties of the most critical protocol layers within Ensemble using NuPrl, a tool for automating mathematical proofs.

Anticipated Java Embedding of Ensemble.

From Java, Ensemble will let a set of applets (running on different workstations) form a group within which they can consistently and fault-tolerantly handle replicated data that is updated dynamically in real-time, cooperate to share work in a load-balanced manner, and automatically reconfigure when a machine, communication link, or application fails. Such functionality might arise in conferencing applications, or in applications that monitor a data feed that emits real-time updates.

To understand how this might solve the problems identified earlier, consider a potential air-traffic application involving cooperation by a group of controllers as they direct a set of flights. The controllers run applications belonging to one or more communication groups, within which routing updates are communicated. These applications display the status of the airspace through Web-like interfaces on the control consoles. A Java applet representing a given flight, for example, flies slowly across the screen, displaying more detail if the controller clicks upon it, and popping up advisory messages as needed. Controllers are given point-and-click access to multimedia conferencing services that displays audio and video, for use in coordinating routing changes. The servers maintaining flight control data and providing critical advisories are replicated, running on clusters of nodes managed for high availability, load balancing, and fault-tolerance.

Within this example, one sees many uses of communication groups. Yet the group "structuring" corresponds to different communication requirements in different uses, and would therefore require different protocol stacks. For example, some applications need virtual synchrony; others real-time communication or security, and some a mixture of these. Ensemble responds to such requirements uniform interfaces and flexible semantics.

ONLINE INFORMATION

[http://www.cs.cornell.edu/Info/Projects/HORUS.html]

REFERENCES

Building Reliable and Secure Network Applications. K. Birman, Prentice Hall, forthcoming (Oct. 1996). 550pp.

This book provides a comprehensive review of the tools used to build modern network applications, focusing on their reliability and security properties, as well as the best available technologies for introducing reliability and security into real networks.
Software for Reliable Networks. K. Birman and R. van Renesse. Scientific American 274:5 (May 1996), 64-69.

A discussion of fault-tolerance in networked software, and the approach used by the Horus effort to overcome failures.
Horus: A Flexible Group Communications System. R. van Renesse, K. Birman and S. Maffeis. Commun. of the ACM 39:4 (Apr. 1996), 76-83.

This paper gives a technical overview of Horus with a focus on the CORBA embedding of the system, which we call Electra.

SOFTWARE RELEASES

The Horus project has produced two generations of software. The first generation, Horus/C, is stable, and available for general use, but is oriented towards UNIX machines and focuses exclusively on cluster-style server applications. The new system, Ensemble, is focused primarily toward PC's and clusters of PC-servers. In addition to the cluster-server applications for which Horus/C would be most useful, Ensemble tackles the groupware and conferencing problems cited above.

There are no licensing fees for research use of Horus/C. Horus/C commercial rights, however, have been exclusively licensed by Cornell University to Isis Distributed Systems, which is developing a commercial product in this area. Contact rcbc@isis.com (Dr. Robert Cooper) for details concerning the commercial product offering, support, or other services.

Cornell University is making Ensemble available at no fee, in source form, for both research and commercial researchers. Availability is scheduled for late fall, 1996.

Last modified: Sun Jul 21 11:27:45 EDT 2002