Assignment 4

Assignment 4 (due on Thursday, 30 November at 10 pm)

In this assignment, you will be modifying an existing storage server to implement RMI and security. This assignment must to be done in groups of two. If you are without a partner, get in touch with me during class. The storage server has been written in java, and you have to add code, without significantly changing the existing one. More importantly, you are not supposed to change the API.

What to do?

Part I: RMI (Remote Method Invocation)

The current implementation of the storage server uses a very simple Remote Procedure Call (RPC) implementation. The implementation is simple for three reasons:

1. The implementation includes hand-written code to convert objects that must travel to the server (and back) to an array of bytes that can be transported across the network. Converting data objects into a flat byte stream is called marshaling; the reverse process is called unmarshaling. Full-fledged RPC systems provide a stub compiler that can generate marshaling and unmarshaling code from an interface description.

2. The network address of the server is more or less hardwired into the client code, which is undesirable. Normally, the server address is obtained from a name server through a process called binding.

3. The storage server relies on the User Datagram Protocol (UDP) to deliver request and reply messages. UDP, however, is an unreliable protocol. RPC systems deliver messages reliably.

Java's RMI mechanism is an object-oriented flavor of RPC. The RMI distribution includes a stub compiler, rmic,which you can use to generate marshaling and unmarshaling code. In RMI's terminology, marshaling is called serialization and unmarshaling is called deserialization. Binding takes place by means of a server called the RMI registry. Internally, RMI uses TCP/IP to transport messages reliably.

Briefly, RMI works as follows. The implementer of the server writes a Java interface that defines the methods that the server exports to its clients. The implementer also provides a class that implements this interface. Finally, the implementer generates marshaling and unmarshaling code using the stub compiler. The service can now be made available by creating an instance of the server class and by registering this instance with the RMI registry under a well-known name. (This is done using Naming.rebind().)

Clients can query the RMI registry for the name of the service (using Naming.lookup()). If the registry knows the service, it will return to the client an object that implements the service interface. Such an instance is, quite appropriately, called a remote object. The client can invoke the methods defined in the server interface on the remote object. The method's actual parameters will automatically be marshaled into a message and transmitted to the server. At the server, a (popup!) thread is created that unmarshals the message and invokes the server's implementation of the method. After the method has executed, the return value is marshaled into a reply message that is sent back to the client. At the client, the result is unmarshaled and returned to the caller of the method.

So far, RMI does not differ from traditional RPC systems. The main difference is that RMI allows clients to pass objects as parameters to a remote object's methods. As with normal method calls, the caller of the method may supply an instance of a subclass of the method's formal argument. This is an interesting feature, because in a distributed environment the callee (i.e., the server) may not have access to the subclass's code. In this situation, RMI will automatically fetch the subclass's code from the client. This way, client code is shipped to the server.

The details of using RMI are explained in the RMI documentation. Read this documentation carefully. Work your way through the example.

In the RMI part of this project, you must modify the storage server to use RMI instead of UDP. You should do this without modifying the application programming interface (API) in class il.ac.idc.storage.StorageServer. This interface includes a new method

public byte[] rexec(Credentials cred, ClientCode cc).

This method allows client code to be moved to the server. ClientCode.java is provided in the new storage server distribution. This class defines a single (abstract) method

public abstract byte[] run(OnServer server).

The role of OnServer will be explained shortly. Clients can define a subclass of ClientCode and can pass an instance of this subclass as a parameter to rexec(). This subclass must implement the run() method, which will contain client-specific code that can operate on storage units (see below). The server's RMI class loader will download the subclass's code when the implementation of rexec() invokes cc's run method. After the code has been downloaded, the client's run() method is invoked.

The client's run() method can operate on the data storage units managed by the server by invoking the following two methods on server:

� byte[] readUnit(StorageID sid, long start, long len)

� void writeUnit(StorageID sid, long start, long len, byte[] data)

Both methods are defined in the new interface OnServer. You must implement this interface.

Part II: Security

The ability to execute client code on the server introduces serious security concerns. Unless you take precautions, the client code can easily attack the server. For example, the client's code may be able to delete any file that the server has access to or it may hang up the server. Besides the security problems introduced by mobile code, the storage server suffers from several other, more basic problems.

� Clients are not authenticated. Any process that knows how to talk to the storage server can create, read, write, or delete a storage unit.

� The server does not perform access control. Even if the server knows to which client it is talking, there is nothing to prevent that client from accessing storage units created by other clients. The storage server does not keep track of which clients have access to which storage units.

� Message integrity is not guaranteed. It is quite easy for a malicious process to modify messages transmitted by other processes.

To address these problems, you will use a variety of security techniques: secret-key cryptography, capabilities, code signing, and privileged code.

Secret-key cryptography allows two (or more) parties to communicate data in a confidential manner by encrypting and decrypting that data. Encryption and decryption are performed by means of an encryption algorithm that takes two inputs: the data to be encrypted or decrypted and a secret key:

cipher_text = encrypt(key, plain_text)

plain_text = decrypt(key, cipher_text)

All communicating parties share the key. The encryption algorithm is usually known to the entire world, but the key should be kept secret. The main problem with secret-key cryptography is key distribution. The communicating parties need to agree on a secret key before they can communicate securely. To distribute this key in a secure manner to all parties, however, a secure channel is needed... In this project, we will not address this problem. In practice, it is usually solved by introducing a trusted third party or by using another type of cryptography, public-key cryptography, to distribute a key.

Secret-key cryptography can be used to preserve the confidentiality of messages that travel between a client and the storage server. We will assume, however, that confidentiality is not required, so you do not have to encrypt your messages. Instead, we will use secret-key cryptography for authentication purposes. Authentication is usually achieved by showing that you have or know something. In this project, you must arrange for every client to share a key with the server. A client can authenticate itself to the storage server by showing that it knows the secret key that it shares with the server. (Clearly, different clients must be given different keys.)

One way a client can show that it knows a key is by sending that key to the server. This is a bad idea: any one listening in on the conversation can obtain the key. A better way is for the server to send a challenge (a random number) to the client and for the client to reply with the encrypted value of the challenge. The server can decrypt this reply and check that the decrypted value equals its challenge. This way, the client can show that it knows the secret key without revealing the key.

The problem with the scheme above is that it requires a three-way handshake between the client and the server. We do not want to perform such a handshake for each request issued by the client. A simpler scheme, one that uses only a single message, can be used if the client and server clocks are loosely synchronized. The main problem with this simpler scheme is that it is sensitive to replay attacks: another process can record the message sent by the client to the server. Later, this third process can replay the message to the server. If you implement this scheme, you must deal with this type of attack.

An access control mechanism allows one to specify who has access to what. One way to do this is to store an Access Control List (ACL) with each object. An ACL lists who has access to the resource that the ACL is associated with. Before a process is given access to the resource, the operating system checks if the user on whose behalf the process runs is listed in the ACL (explicitly or implicitly, as a member of some group). Windows NT and Solaris use ACLs to protect files.

In this project, we will use another mechanism, capabilities. A capability is like a ticket: if you have it you get in, otherwise you are kept out. The difference between these software tickets and paper tickets is that copying is easy and legal. The owner of a capability is allowed to copy its capability and to hand the copy to another party. From then on, the receiver of the copy can access the resource too.

You are to modify the storage server so that it creates a capability for each storage unit that it creates. This capability must be returned to the creating process. When a process wishes to access a storage unit, it must present its capability for that storage unit to the storage server. Since other processes may listen in on the conversation, capabilities must be encrypted, or else other processes can use them. When the server receives a request to access a storage unit, it must check that the capability stored in that request gives access to the storage unit.

Capabilities are conveniently represented as (large) random numbers. If the set from which the random number is chosen is sufficiently large, then the probability that an adversary can guess a capability is negligible.

The server should not execute malicious client code. In particular, client code should be able to access only those storage units for which it can present a capability. SDK 2 provides a mechanism that assigns permissions to protection domains. When a Java virtual machine downloads a class, it creates a new protection domain and runs the downloaded code in that protection domain. The thread that executes the code in the new protection domain can do everything that is allowed by the permissions of the protection domain. When the thread enters another protection domain - e.g., when client code invokes a server routine - it executes with the intersection of the permissions of both domains.

A policy file specifies assigns permissions to downloaded code. Code that originates from different sources, called code bases, can be assigned different permissions. Permissions can also depend on who signed the code that is downloaded. You must write a sensible policy file for the virtual machine that executes the storage server code (and the downloaded client code). This policy file should give the server full access to the files it manages. Client code should run in a very restricted mode. In particular, a client should not be able to corrupt server data structures and should be given only restricted access to the files that contain storage units.

Code signing is based on public-key cryptography. With this type of cryptography, the sender and the recipient of a message need not share a secret key (as with secret-key cryptography). With public-key cryptography, each party generates a key pair, which consists of a private key and a public key. The public key is given to anyone who wants to have. You can publish your public key on your web page. The private key, in contrast, should not be given away.

Public-key cryptography can be used to encrypt data in the following way:

� The sender of a message encrypts the message using the recipient's public key PUBKr (which is known to everyone):

cipher_text = encrypt(PUBKr, plain_text)

� The receiver of the message decrypts the message using his or her private key PRIVKr:

plain_text = decrypt(PRIVKr, cipher_text)

Public-key cryptography can also be used to sign data. Here the goal is not to hide the data, but to prove that a particular person or organization sent the data. Attaching a digital signature to the data does this. A sender can sign her message by encrypting that message with her private key; the result of that operation is the digital signature. Clearly, only the sender can produce the digital signature, because only the sender knows her private key. After creating the signature, the sender sends both her message and the signature to the receiver. The receiver verifies the sender by decrypting the message using the sender's public key. The result should match the digital signature that the sender sent along with her message.

Summarizing, this is how signing works (in theory). (PRIVKs is the private key of the sender; PUBKs is the public key of the sender).

� Sender computes: signature = encrypt(PRIVKs, plain_text)

� Send to receiver: plaintext and signature

� Receiver computes: x = decrypt(PUBKs, ciphertext)

� If x equals the plaintext, then the receiver knows the owner of PUBKs sent the message.

Public-key operations are expensive. In practice, the sender would not sign her entire message, but a small hash of that message. Computing the hash is not expensive and since the hash is small, signing it isn't very expensive either.

What to submit?

You should submit the following things as a part of this assignment.

1. The entire directory of the assignment.

2. A file called README.txt where you give a tutorial on how to compile and run the modified storage server.� This file should also contain the names, netids and cornellids of all the individuals in the group.

3. A file called CODE.txt where you explain your code: what you implemented and where did you do it.

4. A file called DESIGN. This should have a detailed description of your design: algorithms and protocols for security, implementation of capabilities, RMI and mobile code; possible improvements and discussion. You could submit this file in any of the three formats: Word, Pdf or Postscript. It is advised that you use figures to describe your design.

How will you be graded?

�� The following will play a crucial role in your grades for this assignment.

1. Correctness of the storage server implementation.

2. The design chosen for implementation: efficiency!! (So do not ignore the DESIGN file)

3. Clarity of the java programs (comments!!!). The correlation of CODE.txt with your code.

4. Ease of using the README to test your programs and results.

Storage Server Code and Documentation?

Storage server code and its documentation�

Software:

For this assignment, you would use Sun�s SDK 2 Java implementation. The documentation can be found here. You will have to use the following tools: javac, java, rmic, rmiregistry, keytool and jarsigner. The storage server code for this project is without any of the optimizations implemented earlier. This version includes:

� A modified interface: StorageServer. This new interface adds the rexec() method.

� Class il.ac.idc.storage.CryptKey. This class allows you to create secret keys and to encrypt and decrypt data by means of those keys.

� Class il.ac.idc.storage.ClientCode. Clients can create subclasses of this class and pass instances of this subclass to rexec().

� Interface OnServer. Client code that gets shipped to the server uses this interface to access data storage units.

Note: Modifications to the assignment specifications will be listed on this page, as well as the FAQ page!!!