Notes by Walter Bell, March 1999. Previous notes by Kevin LoGuidice, March 1998
Introduction
RPC has proven to be a powerful paradigm, giving symantics of a normal procedure call. Although originally designed for machine-to-machine communication, most cross-address-invocations take place between domains within the same machine and not between computers as one might expect in client-server systems. As result, the conventional RPC communication mechanism incurs unnecessary overhead including needless scheduling, excessive run-time indirection, redundant copying, lock contention and unnecessary access validation. They claim that this lightweight approach can be implemented in an RPC package as a special case without removing any of the transparency of the RPC system or making uncommon cases (domain termination, large argument sizes, etc) difficult to handle.
Goal
A lightweight, fast communication facility for cross-address-space invocation based on the idea that most RPC calls are simple (small, simple arguments rather than large byte-streams and linked structures.)
Benefits
- Keeps the familiar symantics of proceedure calls, with improved performance over conventional RPC
- A safe, transparent communication alternative for small-kernel operating systems.
- Simple control transfer: client’s thread executes procedure in server.
- Simple data transfer: param-passing mechanism is similar to that used by procedure call.
- Simple stubs: simple control/data transfer model generates highly optimized stubs.
- Concurrency Support: avoids shared data structure bottlenecks and sensitive to speedup of multiprocessor.
Conventional RPC Overhead
RPC systems are built for the general case of machine-to-machine communications; in the cross-domain case this generality is unneeded and incurs a substantial overhead that cannot be optimized for the common path
- Stubs: a general interface and execution path for both cross domain and cross machine calls which is infrequently needed.
- Message buffer: message transfer can involve copy through kernel requiring two copy operations on call and two operations on return
- Access validation: Kernel validates on call and return
- Message Transfer: Flow control of message queues is often necessary
- Scheduling: indirection of threads is slow as a result of locking
- Context Switching: Virtual Memory context switch from client to server and back again
- Dispatching: Single receiver thread in server interpreting message and dispatching.
Lightweight Approach
Binding
From the high level similar to that of conventional RPC, but in the the guts is much different due to the interaction of the client, server and kernel
- RPC Server exports interface through a clerk in the RPC run-time which can be seen in every domain
- Client binds to a particular interface by making an import call to the kernel
- For each proceedure in the interface (PD), the kernel allocates a number of A-stacks and linkages (where the call will return to in the client) in both domains.
- Kernel returns a binding object to the client, which is used in every call.
Call
High level of integration between Client, Kernel, and Server.
Client
- Client stub dequeues A-stack
- Arguments are copied onto A-Stack
- Registers are loaded with address of A-Stack, Binding Object, and procedure ID
- Kernel is trapped
Kernel
- Verifies the Binding Object and procedure ID, and locates the correct PD.
- Verifies the A-Stack and locates linkage
- Ensures that no other thread is using that A-Stack/linkage pair
- Records the caller’s return address and current SP in linkage
- Pushes the linkage onto the top of a stack of linkages kept in threads control block
- Locates execution stack in the server’s domain
- Updates the thread’s user SP to run off new execution stack
- Reloads the processors virtual memory registers with those of the server domain.
- Performs upcall into the servers stub at the address specified in the PD
Server
- Server procedure executes and can directly access parameters via A-Stack
- Procedure returns through its own stub and traps kernel
- Kernel switches the thread back to client
Additional
- Little locking in the LRPC mechanism which enables mulitple simultaneous RPC calls on multiprocessor machines
- Idle processors on a multiprocessor machine can be used to improve throughput and lower call latency
- Stubs are generated automatically, and are simple so can be generated to machine code (hopefully quite optimized)
- Transparency is preserved. Binding object has bit to indicate that call is to remote server and uses LRPC or RPC respectively.
Performance
- Arguments are only copied once (onto A-Stack), as opposed to 4 times in RPC (client stubmessage->kernal buffer->kernel buffer->message->server stub)
- In general the safety is the same as RPC, but safety can be compromised (ie immutable parameters) for more speed.
- Domain switching is roughly 3 times faster than RPC.
- TLB misses are minimized in LRPC (yet still account for much of the delay)
- No apparent limiting factor for calls-per-second on multiprocessor system.
Questions
- Do these assumptions about RPC usage still hold? Has the presence of larger networks impacted the use of RPC mechanisms
- Can a server control the degree of concurrency for LRPC ?
- What are the implications of migrating a resource from a remote server to a local server and vice versa?
- Do you feel the efficiency of LRPC is justified in terms of memory management costs?
- Do you feel that clients and servers have a higher degree of risk for mutual interference than RPC ?