Lightweight Remote Procedure Call (LRPC)

Notes by Walter Bell, March 1999. Previous notes by Kevin LoGuidice, March 1998

Introduction

RPC has proven to be a powerful paradigm, giving symantics of a normal procedure call. Although originally designed for machine-to-machine communication, most cross-address-invocations take place between domains within the same machine and not between computers as one might expect in client-server systems. As result, the conventional RPC communication mechanism incurs unnecessary overhead including needless scheduling, excessive run-time indirection, redundant copying, lock contention and unnecessary access validation. They claim that this lightweight approach can be implemented in an RPC package as a special case without removing any of the transparency of the RPC system or making uncommon cases (domain termination, large argument sizes, etc) difficult to handle.

Goal

A lightweight, fast communication facility for cross-address-space invocation based on the idea that most RPC calls are simple (small, simple arguments rather than large byte-streams and linked structures.)

Benefits

Keeps the familiar symantics of proceedure calls, with improved performance over conventional RPC

A safe, transparent communication alternative for small-kernel operating systems.

Simple control transfer: client’s thread executes procedure in server.

Simple data transfer: param-passing mechanism is similar to that used by procedure call.

Simple stubs: simple control/data transfer model generates highly optimized stubs.

Concurrency Support: avoids shared data structure bottlenecks and sensitive to speedup of multiprocessor.

Conventional RPC Overhead

RPC systems are built for the general case of machine-to-machine communications; in the cross-domain case this generality is unneeded and incurs a substantial overhead that cannot be optimized for the common path

Stubs: a general interface and execution path for both cross domain and cross machine calls which is infrequently needed.

Message buffer: message transfer can involve copy through kernel requiring two copy operations on call and two operations on return

Access validation: Kernel validates on call and return

Message Transfer: Flow control of message queues is often necessary

Scheduling: indirection of threads is slow as a result of locking

Context Switching: Virtual Memory context switch from client to server and back again

Dispatching: Single receiver thread in server interpreting message and dispatching.

Lightweight Approach

Binding

From the high level similar to that of conventional RPC, but in the the guts is much different due to the interaction of the client, server and kernel

RPC Server exports interface through a clerk in the RPC run-time which can be seen in every domain

Client binds to a particular interface by making an import call to the kernel

For each proceedure in the interface (PD), the kernel allocates a number of A-stacks and linkages (where the call will return to in the client) in both domains.

Kernel returns a binding object to the client, which is used in every call.

Call

High level of integration between Client, Kernel, and Server.

Client

Client stub dequeues A-stack

Arguments are copied onto A-Stack

Registers are loaded with address of A-Stack, Binding Object, and procedure ID

Kernel is trapped

Kernel

Verifies the Binding Object and procedure ID, and locates the correct PD.

Verifies the A-Stack and locates linkage

Ensures that no other thread is using that A-Stack/linkage pair

Records the caller’s return address and current SP in linkage

Pushes the linkage onto the top of a stack of linkages kept in threads control block

Locates execution stack in the server’s domain

Updates the thread’s user SP to run off new execution stack

Reloads the processors virtual memory registers with those of the server domain.

Performs upcall into the servers stub at the address specified in the PD

Server

Server procedure executes and can directly access parameters via A-Stack

Procedure returns through its own stub and traps kernel

Kernel switches the thread back to client

Additional

Little locking in the LRPC mechanism which enables mulitple simultaneous RPC calls on multiprocessor machines

Idle processors on a multiprocessor machine can be used to improve throughput and lower call latency

Stubs are generated automatically, and are simple so can be generated to machine code (hopefully quite optimized)

Transparency is preserved. Binding object has bit to indicate that call is to remote server and uses LRPC or RPC respectively.

Performance

Arguments are only copied once (onto A-Stack), as opposed to 4 times in RPC (client stubmessage->kernal buffer->kernel buffer->message->server stub)

In general the safety is the same as RPC, but safety can be compromised (ie immutable parameters) for more speed.

Domain switching is roughly 3 times faster than RPC.

TLB misses are minimized in LRPC (yet still account for much of the delay)

No apparent limiting factor for calls-per-second on multiprocessor system.

Questions

Do these assumptions about RPC usage still hold? Has the presence of larger networks impacted the use of RPC mechanisms

Can a server control the degree of concurrency for LRPC ?

What are the implications of migrating a resource from a remote server to a local server and vice versa?

Do you feel the efficiency of LRPC is justified in terms of memory management costs?

Do you feel that clients and servers have a higher degree of risk for mutual interference than RPC ?