NOTES ON "IMPLEMENTING REMOTE PROCEDURE CALLS"
Christopher Stein
Writing distributed apps has historically been a very difficult endeavour, undertaken by only the members of a select group of communications experts. The authors lament this and the corresponding underutilisation of network resources. They aim to bring distributed computing to the masses - with RPC. This paper describes a real system that was built during the early 1980�s at XPARC. The authors are very clear about their primary system design goal. They want to "make distributed computation easy". And they want to do this in a secure and efficient manner.
Implementing an RPC facility is non-trivial. The authors identify several issues:
They elect not to discuss all of these issues - claiming that it is not possible in a single paper. Rather, they choose to give a general overview of the system and discuss the binding mechanism and transport protocol in depth. A discussion of other design decisions is deferred to later papers.
The driving design principle...The semantics of RPC should be as close as possible to that of LPC.
ENVIRONMENTAL ASSUMPTIONS
An important decision...
For better or worse, no time-outs. The authors acknowledge that this choice runs contrary to much distributed systems wisdom, but argue that it is in the interests of consistent PC and keeping things simple. On page 43 the authors make an important statement - "Designing a new time-out mechanism...would needlessly complicate the programmer�s world". Are time-outs really a "needless" complication? Or are they indispensable for designers of reliable, deterministic systems? Are the authors trying to drive a square peg into a round hole?
BINDING
An exported interface registers with the Grapevine database by the server-stub calling ExportInterface in the RPCRuntime. The Grapevine database is used to locate the desired procedure.
Interfaces are tuples - (type, instance).
The Grapevine database is keyed by Rname. There are two varieties of entries; groups and individuals.
Group entry: [ RName, {list of members� RNames}]
Individual entry: [ RName, connect-site]
Each interface has two entries; one for the type and one for the instance:
Type entry: [ Type, Group of exported Instances of this type]
Instance entry: [ Instance, grapevine Individual that last exported this instance]
A type entry is a group entry and an instance entry is an individual entry.
RPCRuntime records information about an export in a table on the exporting machine. This table consists of (interface name, dispatcher, identifier).
The identifier...
Therefore, the binding is broken implicitly if the server goes down. Is this the correct semantics? The authors believe so. What about building networks that must tolerate arbitrary server failures transparently?
AN AD HOC TRANSPORT PROTOCOL FOR RPC
PUP isn�t good enough for these guys. Why not?
Protocol design goal: minimize the elapsed real-time between initiating a call and getting results.
Exactly-once semantics: if the procedure returns a result, it has executed exactly once.
The caller will continue sending calls until an acknowlegement is received. In cases where the remote procedure is executed but the ack is lost, the callee must know not to execute the procedure a second time. This is at-most-once semantics. It is effected by a 32-bit call id.
Security...Grapevine can provide a private-key DES authentication service.
THINGS TO THINK ABOUT...
The authors want to make RPC semantically identical to local procedure call. However, there are fundamental differences between a local and a remote procedure call. What are these differences? Are the authors denying important realities?
The XPARC network was a rather homogeneous networks - consisting mostly of Dorados. In a heterogeneous environment things are not so easy. - marshaling of arguments becomes more difficult. For example, machines may have different representations for floating point.
The authors claim that RPC is characterized by fast request-response where the data transfers are typically small. What about networks where the marshaled arguments are complex objects?
References:
1. "Building Secure and Reliable Network Applications," Ken Birman, 1996 esp. pp. 56-81