Under the direction of Ken
Birman and Robbert van Renesse, Cornell's Horus research project
is developing a new generation of groupware communication tools.
Very briefly, this effort seeks to introduce guarantees such as
reliability, high availability, fault-tolerance, consistency,
security and real-time responsiveness into applications that run
on modern networks.
Our broad approach is to focus
on what we call process group communication structures
that arise in many
settings involving cluster-style computing, scaleable servers,
groupware and conferencing, distributed systems management, fault-tolerance
and other execution guarantees, replicated data or computing,
distributed coordination or control, and so forth. Group computing
has been employed successfully in a great variety of real-world
settings. Our emphasis is on the underlying communication systems
support for this model, on simplifying and standardizing the interfaces
within our support environment, and on making the model as transparent
(and hence as easy to use) as possible.
This short introduction to
our effort starts with an overview, and then provides a series
of subsections amplifying on technical points raised.
Horus/C, Electra and
the HOT object-oriented tools.
Up to the present, our project
has developed two related software systems, both solving the group
communication problem, but differing in focus. Our first is called
Horus/C, and was developed over a five year period that started
in 1991. In building this system, our intention was to reimplement
the mechanisms first used in the much earlier Isis Toolkit, but
in a manner that would demonstrate greater flexibility and performance
than was possible with Isis.
Horus/C achieved these goals through an architecture that supports group communication and the virtual synchrony runtime model, but without imposing the model on applications that need something weaker. The architecture is layered, and the layers supporting a particular group communication application are selected at runtime. By hiding the system behind standard interfaces, we have demonstrated a good degree of transparency: this tactic allows us to slip groupware mechanisms into applications that were not designed with group communication as an explicit goal. For example, Horus supports an interface to the CORBA architecture called Electra. Electra makes Horus a highly modular extension to a widely accepted industry standard.
Electra in turn is implemented
over another object-oriented interface to Horus called HOT. HOT
and Electra are quite mature and has been adopted by a number
of users; primarily Isis Toolkit users who have migrated towards
Horus to benefit from its greater performance and flexibility.
HOT can also be used independently from Electra, as a C++ toolkit
for Horus.
Horus/C has demonstrated sub-millisecond
end-to-end latency, and sustained throughputs of as high as 5,000
null multicasts per second, within a group of five processes executed
on standard SUN Sparcstation 10's over Ethernet. These numbers
are roughly 10 times superior to those of the Isis Toolkit. However,
it should be noted that the Horus/C environment is more focused
in comparison to the older Isis system. Isis evolved, over time,
to become a form of distributed operating system, complete with
its own name servers, spooling services, and so forth. Horus/C
only provides support for process groups and group communication
and has not been used to implement such a full-scale distributed
computing environment.
The "/C" extension
denotes the choice of implementation language: this version of
our system was coded in C, and in fact was targeted primarily
at UNIX environments with high speed Ethernet or ATM communication
substrates. The expected application is a cluster-style server,
possibly with real-time and other demanding performance-intensive
requirements. For example, recent work has demonstrated how to
build a telephone switch coprocessor using Horus and an SP2 or
a similar cluster multicomputer. Horus orchestrates the management
issues that arise as nodes are swapped on and off line, and the
overall coprocessor is able to sustain 22,000 SS7 telephone-call
routing requests per second, without disruption even when failures
or recoveries occur. Such performance is good enough to impress
the telecommunications community. Further work on this problem
will demonstrate scalability of both coprocessor memory resources
and computing performance as a function of the number of nodes,
and will extend these results into other server settings, such
as Web and file-system servers.
Ensemble System.
As Horus/C has matured, we
have also encountered issues that recently lead to a complete
reimplementation of the system using a subset of the ML programming
language. To avoid confusion, we have begun to call this version
of our system Ensemble. The subset of ML employed for
this work translates directly into C, which can then be compiled
in a normal manner, and makes no use of ML's garbage collection
features. Thus, the choice of ML has no negative performance
implications, and the code itself looks like C++. We have interfaces
for coding applications in C, C++, Tcl/Tk, Java, ML, etc.
By moving to ML, we have made
it possible to use formal verification tools to prove the correctness
of critical Horus protocols and algorithms. The ML version of
the system is also amenable to semi-automated protocol optimizations,
which have slashed overhead and latency for heavily used protocols:
latency is as low as 75us on ATM, and throughput as high as 80,000
multicasts per second. We increasingly think of "protocol
compilation" as a part of our task.
An important shift in emphasis
has occurred, however, that may lead Ensemble away from Horus/C
over time. As noted above, the focus of Horus/C is on cluster-style
architectures in which problems like server replication and load-balanced
fault-tolerance are major goals. With Ensemble, we hope to broaden
the applicability of our work by supporting PC applications.
These would include clusters of PC's used to support servers,
but also direct groupings of applications running on PC clients.
In addition, whereas server applications are typically targeted
for LAN's, Ensemble will also support scaleable WAN-based groupware
applications. This shift in thinking leads us to characterize
Ensemble as a "groupware programming environment," and
to emphasize in our research such problems as media transport
protocols and embedding Ensemble into media viewers. Over time,
we seek to develop a comprehensive set of tools for building multimedia
conferencing systems and applications in a variety of communication
environments. The Ensemble reliability, fault-tolerance and security
properties brings substantial benefits to such applications, while
the virtual synchrony model supports easy management and dynamic
reconfiguration of the applications themselves.
As a groupware programming
environment, Ensemble would most likely be used from the Java
language, and accessed as a plug-in to Netscape or Internet Explorer.
Project Vision for the
Next Five Years
The vision of our effort revolves
around the seamless, highly transparent, introduction of "strong
properties" into network applications developed using standard
tools and programming practices. A fundamental premise underlying
our work is that most critical applications are being developed
using conventional off-the-shelf building blocks and combined
into applications using standard techniques. We have come to
believe that even the most critical applications are essentially
forced to do this because this represents the only practical way
to take advantage of modern computing technology. The challenge,
as we see it, is to "harden" systems constructed in
this manner.
Thus, the Ensemble effort
has a dual mission. On the one hand, we need to improve our understand
of the fundamental scientific issues that underly reliability,
consistency, guaranteed real-time performance, and security in
distributed settings. The resulting tools need to be powerful
and appealing when used directly, as is likely to be the case
in applications explicitly developed as groupware systems or computer-supported
collaborative work systems. At the same time, however, we need
to be able to slide fault-tolerance in "under the surface"
in applications that depend critically upon a specific server
or data object that should be replicated for high availability.
We need a way to introduce security mechanisms, also transparently,
so as to protect an application against a potentially hostile
environment, or a critical system against a possibly untrustworthy
application.
If we succeed at Cornell,
Ensemble may become popular in a way that previous group communication
tools have not. Whereas systems like Isis or Horus/C were fairly
difficult to use, and in the case of Isis, were also rigid in
their interpretation of "reliability", Ensemble should
be easy to use and flexible. The freeware licensing of Ensemble
minimizes the practical and legal barriers to its widespread adoption.
And, such a success would pave the way for future standards in
the area of groupware tools, or their introduction into standard
operating systems as a common feature, like file systems, TCP/IP
communication, and Web technologies.
The short sections that follow
amplify briefly upon the technical topics raised above.
Application domain.
We have emphasized that our
work on Ensemble is directed towards support for conferencing
and groupware applications in conventional PC environments. In
fact, we focus on that subset of applications in which guarantees
or "strong properties" are likely to be important.
Examples of these include applications for business conferencing
on sensitive issues, military intelligence analysis, collaboration
on engineering and business tasks, power systems management, air
traffic control, and telecommunications switching system management.
All of these applications exhibit a mixture of reliability and
security needs that exceed the capabilities of the Web and associated
network technologies.
Success in tackling these
problems involves the resolution of issues at multiple levels
of the system. Close to the user, our work focuses on communications
technologies that support groups of individuals who collaborate
in rich, computer-supported environments. Within a server, they
offer support for cluster management and communication services
of the sorts enumerated in the call for position papers. Close
to the network, the same communication tools offer tools for interconnecting
servers in WANs and for network management and control.
An important issue for us
will be to arrive at an appropriate embedding of Ensemble into
the new generation of application-oriented tools for active displays,
such as Java. To accomplish this, we expect to support Ensemble
as a plug-in for Netscape and Internet Explorer, in addition to
more traditional library or kernel embeddings of the technology.
We will need to develop appropriate Java interfaces for accessing
Ensemble's groupware features, and for integrating these with
continuous media and various media processing applications. Protocol
work will be required at the level of guaranteed quality of service
for groupware configurations of the system. We anticipate that
our effort will invest significant resources on these topics in
coming years.
Status of Isis Toolkit
Effort
We suggested that the success
of the Isis Toolkit points to an emerging market for groupware
solutions. A first generation groupware system, the Isis Toolkit
(SDK), became popular during the late 1980's through a free public
distribution, and later migrated into commercial use. Isis Distributed
Systems, the company that commercialized this technology, has
used it in the communications infrastructure of the New York Stock
Exchange, the new Swiss Electronic Bourse, the next generation
French Air Traffic Control System, AMD's FAB-25 factory-floor
process control system, Hiper-D (a prototype of the next generation
naval AEGIS system), MCI's back-office billing system, and hundreds
of other demanding systems. The company has also built a cluster
multiprocessor, RADIO, using Isis; this machine was selected as
best new product of the year by Interop 95 and PC magazine in
1995.
However, the need today is
for scalability and flexibility beyond the capabilities of the
Isis solutions. Pressing requirements for security, group communication,
and reliability are now seen in settings where Isis could not
easily be used, such as the Web. Moreover, we believe that Isis
is too complex to use, and requires a mindset too different from
"standard" ways of developing distributed systems.
The Horus/C and Ensemble systems respond to these needs.
Virtual Synchrony Execution
Model.
Although this brief discussion
can't include details, the essence of our approach is to use a
rigorous, mathematically justifiable model called "virtual
synchrony." Virtual synchrony is conceptual structure that
makes it easy to build groups of cooperating programs, for example
to replicate a critical service or type of data, or to subdivide
a costly computational task. This model is presented through
a set of easily used software tools, or hidden under existing
application program interfaces. The benefit of virtual synchrony
is that the designer faces a simpler runtime model, and can draw
on powerful theoretical tools to reason about his application
and to prove that it offers desired properties.
As the term suggests, a virtually
synchronous system is one that mimics a synchronous one. The
user thinks in terms of a very simple, step by step, execution
model. The true execution is less stringently ordered, but only
in ways that would not be evident to a typical user. Virtual
synchrony, particularly in a primary-component partitioning model,
is known to provide strong guarantees of consistency in replicated
data, and is a key to our work. Our experience suggests that virtual
synchrony is a completely tractable property for use in "real"
distributed settings, despite well known limits on the ability
of such systems to tolerate failures.
Security in Distributed
Systems.
Our group cooperates closely
with Danny Dolev's Transis project at the Hebrew University of
Jerusalem, and Bob Constable's NuPrl effort at Cornell. Jointly,
these collaborations are attacking issues of security and trust
in the context of groupware systems layered over Ensemble.
To date, we have developed
Ensemble protocol layers that can use Fortezza or Kerberos to
encrypt sensitive data, authenticate process membership in process
groups, and sign messages. In the future, we are exploring high
availability and partitioned operation issues as they arise in
the Fortezza and Kerberos architectures, the use of split secrets
to spread sensitive information within a group so that compromised
members are unable to reveal the secret, and other related issues
in the security of highly available, highly reliable distributed
systems. We are also investigating the challenges of proving
properties of the most critical protocol layers within Ensemble
using NuPrl, a tool for automating mathematical proofs.
Anticipated Java Embedding
of Ensemble.
From Java, Ensemble will let
a set of applets (running on different workstations) form a group
within which they can consistently and fault-tolerantly handle
replicated data that is updated dynamically in real-time, cooperate
to share work in a load-balanced manner, and automatically reconfigure
when a machine, communication link, or application fails. Such
functionality might arise in conferencing applications, or in
applications that monitor a data feed that emits real-time updates.
To understand how this might
solve the problems identified earlier, consider a potential air-traffic
application involving cooperation by a group of controllers as
they direct a set of flights. The controllers run applications
belonging to one or more communication groups, within which routing
updates are communicated. These applications display the status
of the airspace through Web-like interfaces on the control consoles.
A Java applet representing a given flight, for example, flies
slowly across the screen, displaying more detail if the controller
clicks upon it, and popping up advisory messages as needed. Controllers
are given point-and-click access to multimedia conferencing services
that displays audio and video, for use in coordinating routing
changes. The servers maintaining flight control data and providing
critical advisories are replicated, running on clusters of nodes
managed for high availability, load balancing, and fault-tolerance.
Within this example, one sees many uses of communication groups. Yet the group "structuring" corresponds to different communication requirements in different uses, and would therefore require different protocol stacks. For example, some applications need virtual synchrony; others real-time communication or security, and some a mixture of these. Ensemble responds to such requirements uniform interfaces and flexible semantics.
The Horus project has produced
two generations of software. The first generation, Horus/C, is
stable, and available for general use, but is oriented towards
UNIX machines and focuses exclusively on cluster-style server
applications. The new system, Ensemble, is focused primarily
toward PC's and clusters of PC-servers. In addition to the cluster-server
applications for which Horus/C would be most useful, Ensemble
tackles the groupware and conferencing problems cited above.
There are no licensing fees
for research use of Horus/C. Horus/C commercial rights, however,
have been exclusively licensed by Cornell University to Isis Distributed
Systems, which is developing a commercial product in this area.
Contact rcbc@isis.com (Dr. Robert Cooper) for details concerning
the commercial product offering, support, or other services.
Cornell University is making
Ensemble available at no fee, in source form, for both research
and commercial researchers. Availability is scheduled for late
fall, 1996.
Last modified: Sun Jul 21 11:27:45 EDT 2002