Programming Assignment 1 (due Wednesday, 27 February, 11 pm)

In this assignment, you will implement a caching web proxy.  This assignment has to be done in groups of two. Assignments submitted with less than two group members will not be accepted.  This assignment will have to be done in Java, specifically jdk 1.2.

 

 

What is a Web Proxy?

A web proxy is an intermediary between the web browser and the servers on the Internet.  All requests from the browser go to the proxy, which may act upon the request before resending it to the web servers.  The response from the servers is sent to the proxy, which might process the response before sending it to the browser (see Figure below).

 

 

 

 

 

 

 

 

 

 


There are a number of uses of a web proxy.  A proxy is commonly used for interaction between web browsers inside a firewall and servers over the Internet.   Proxies also help in providing access control for certain documents or protocols based on the IP address.  For example, you want to allow a client to use HTTP, but disallow the use of FTP.  Further, Jeanna might want to restrict access to this web page to users in Cornell.  Proxies are also useful in providing web access on devices with low power, or a low-resolution display.  Another important use of the web proxy is caching.   Frequently accessed web pages are cached by the proxy to reduce access time on further requests.  An image of the requested web page is temporarily stored at the web proxies and is served in response to further requests for the same web page.  In this assignment you will implement a web proxy and incorporate a simple caching scheme.

 

What to do?

The web proxy will have to be developed in three phases. 

Phase 1:

In the first phase you will implement a simple web proxy.  The proxy will take requests from the browser, parse the request, and send the request to the web server.  The response gathered by the proxy will be sent back to the browser.  For this functionality, the proxy should open a socket connection on startup, and listen for incoming requests.  On getting a request from the browser, the proxy should parse the HTTP request to determine the destination server, and open a connection to it.  It should then send the request, process the reply, and send it back to the browser.  The port number for the proxy should be a command line argument.  In this assignment you are expected to implement only the GET requests.

Phase 2:

The web proxy developed in the previous phase should be able to handle multiple simultaneous requests.  In the current scenario, the proxy will not process a request until it has serviced the previous one.  This is undesirable and will give poor performance.   A more efficient technique will be to spawn a new thread for every new request.   In this phase you will add multithreading to the web proxy of Phase 1.   Concurrency could also be added using the select system call in UNIX, but threads are strongly recommended in this assignment.

Phase 3:

Finally, you will add caching to your proxy.  Requested web pages will be temporarily stored at the proxy to satisfy further requests for the same web page.  For uniformity, a LRU cache replacement strategy should be implemented.  One way to implement the cache is using a hash.  However, you are free to use any scheme of your choice.  The cache size, in KB, should be specified as a parameter to the proxy program.  

 

You proxy should start by giving the following command on the command line:

java web_proxy –c <cache_size> -p <port number>

 

What to submit?

You should submit the following for this assignment.

1.      Complete Java implementation of the proxy.

2.      A file called README.txt where you give a tutorial on how to compile and run a program.  This file should also contain the names, netids and cornellids of all the individuals in the group.  A template for README.txt will be provided soon.

3.      A file called DESIGN.txt describing the design of your proxy.  The design of all the three phases should be explained in this file.  You should also mention any external references or code used.  A template for DESIGN.txt will be provided soon.

 

How will you be graded?

The following will play a crucial role in your grades for this assignment.

1.      Correct implementation of the basic web proxy (phase 1).  [30 points]

2.      Successful handling of multiple requests using multithreading (phase 2).  [15 points]

3.      Correct implementation of the LRU caching strategy (phase 3).  [30 points]

4.      Clarity of your java programs (comments!).

5.      Ease of using the README.txt to test your programs and results.  We will not make extra effort to get your program running.

6.      Design decisions as described in DESIGN.txt

Bullets 4, 5 and 6 carry a total of 25 points.  If you are unable to complete this assignment, you should specifically mention why you think the program did not work, and what your approach for the remaining phases would be.  This could earn you some extra points.

  

Resources

1.      A tutorial on how to use threads in Java.

2.      Help on using sockets in Java.

3.      You will need some basic HTTP knowledge for this assignment.  We expect you to be able to handle only GET requests.  RFC 2616 gives the HTTP 1.1 specification in detail. 

4.      You should test your proxy with an available browser.  For Netscape, go to Edit->Preferences->Advanced->Proxies->Manual Proxy Configuration.  For Internet Explorer, choose Tools->Options->Connections->LAN Settings->Use a Proxy Server.  In either case specify the address and port number.