Assignment 4

Assignment 3 (due on Thursday, 9 November at 10 pm)

In this assignment, you will be modifying an existing storage server to make it more efficient. This assignment must to be done in groups of two. If you are without a partner, get in touch with me during class. The storage server has been written in java, and you have to add code, without significantly changing the existing one.

What to do?

Part I: Client-Side Caching

Probably the biggest performance problem with the Storage Server is that every time a client needs to access data, it has to go back to the remote storage server. In most applications, data that is used once will be used soon again. Rather than run back over the network to the storage server, it would be better for the client to cache frequently used data.

One of the main issues in designing a cache is dealing with cache coherency. The issue arises when two different processes have cached copies of the same data, and one of them changes the data. How does the other learn of the modification?

A simple way of handling this is to make sure that processes do not cache blocks that are cached by other processors. So process needs to learn about someone else’s access to a piece of data that it is caching. When this occurs, the cached data may have to be invalidated and the client may have to refetch the data from the server.

Ideally, you could do this in a more fine-grained manner by keeping track of whether a certain block is being read or written. Lots of processes can safely share data if all of them are reading, but not if even one of them is writing. Thus, the server would send out cache-invalidation messages when one process wanted to write to a cached block that is shared by other processes, or when a block, which is cached for a writer process, is to be read by another process.

The process of having the server notify the clients of changes to the cache is known as a callback. You will need to build a callback mechanism in order to handle cache invalidations.

(Note: it is possible to do cache invalidation without callbacks -- by having clients periodically ping the server. Both approaches have problems and advantages. In any case, you should implement a callback-based mechanism.)

To get some more insight into caching, take a look at chapter 17, in particular, 17.3, which talks about caching. You might also want to look at 17.6.3.3 and 17.6.4.3, which talks about how caching is handled in the Andrew and Sprite distributed file systems.

Part II: Indexing

In the present implementation, the server maintains for every storage unit an index file that contains a list of variable-length block boundaries. These block boundaries describe a mapping between logical storage unit addresses and addresses on the file used to back the storage unit. Whenever a disk-access is attempted, the server does a linear search through the list to see if the request matches an existing block. If not, it creates a new block as necessary. (It doesn't create a new block on a read. The errors you get for reading uninitialized data come when you try to read data that doesn't belong to any block.)

This scheme is problematic in a number of ways:

· Accessing a memory location takes time proportional to the number of blocks in the file.

· Blocks are created on every write to a new location, meaning that blocks can be very small. This could potentially have a lot of overhead for indexing and a lot of wasted space.

· In order to do a search, you need to look through all the blocks. To do this efficiently, you need to load all of the blocks in memory. This puts a pretty severe limit on the amount of data (or at least, the number of blocks) you can have in a storage unit.

For some insight into how these problems are solved in traditional file systems, take a look at Chapter 11 of Operating Systems Concepts (OSC). Note that regular files are pretty different from storage units -- storage units can have data scattered all around them, with holes where no data is stored in the middle. Files don't have these holes.

Part III: Deallocation of Memory Space

The current storage server never deallocates data. This means that a StorageUnit will eventually get filled up with chunks of memory that were used once, but are not being used anymore. The server keeps data on the disk to back all of the initialized data, leading to a possibly unbounded waste of space.

This problem is difficult to fix transparently i.e. without changing the interface -- so you will need to add explicit methods for clients to allocate and deallocate chunks of memory. Rather than allocating memory on the fly as it is used, users will have to use a separate method to declare what memory they want, and then free that memory when they are done.

The semantics of the allocation and deallocation should be as follows. As soon as a piece of memory is allocated it can be used. Memory should remain usable until some process has deallocated it. Processes should only attempt to deallocate a range of memory that has been allocated previously. The same range of memory can be doubly allocated, but then it needs to be doubly deallocated before it is freed. Allocations are allowed to overlap arbitrarily.

The storage server doesn't have to complain if a process accesses memory that hasn't previously been allocated. This is important because it gives you a lot more flexibility in coming up with a reasonable implementation. The requirements are only that allocated data is accessible, and that the storage server does a reasonable job of reclaiming data that is no longer needed.

The interface that you will have to implement is in the newest distribution, and its called AllocStorageServer.java. This file is present in versions 1.3 (and higher) of the storage server.

What to submit?

You should submit the following things as a part of this assignment.

1. The entire directory of the assignment.

2. A file called README.txt where you give a tutorial on how to compile and run the modified storage server. This file should also contain the names, netids and cornellids of all the individuals in the group.

3. A file called CODE.txt where you explain your code: what you implemented and where did you do it.

4. A file called DESIGN. This should have a detailed description of your design; the optimizations you made and why you expect it to perform better. You could submit this file in any of the three formats: Word, Pdf or Postscript. It is advised that you use figures to describe your design.

How will you be graded?

The following will play a crucial role in your grades for this assignment.

1. Correctness of the storage server implementation.

2. The design chosen for implementation: efficiency!! (So do not ignore the DESIGN file)

3. Clarity of the java programs (comments!!!). The correlation of CODE.txt with your code.

4. Ease of using the README to test your programs and results.

Storage Server Code and Documentation?

Storage server code and its documentation…

The extra interface in this phase of the project

The only extra code that you are given to start with is the interface AllocStorageServer. Everything else is just a copy of the previous storage server code. You should avoid modifying existing classes. Instead of changing ThreadsafeUDPStorageServer, you should write a new class, with some other name. You can choose whatever name you want for you new classes, except for two cases:

· The counterpart of ThreadsafeUPDStorageServer should be called MyThreadsafeUDPStorageServer

· The counterpart to ServerStart should be called MyServerStart

All of your new classes should be stored in a directory called indexcache at the top level of the storage server directory hierarchy

Note: Modifications to the assignment specifications will be listed on this page, as well as the FAQ page!!!