Programming Assignment 1 (due Wednesday, 27
February, 11 pm)
In this assignment,
you will implement a caching web proxy. This assignment has to be done in groups of two.
Assignments submitted with less than two group members will not be
accepted. This assignment will have to
be done in Java, specifically jdk 1.2.
What
is a Web Proxy?
A
web proxy is an intermediary between the web browser and the servers on the
Internet. All requests from the browser
go to the proxy, which may act upon the request before resending it to the web
servers. The response from the servers
is sent to the proxy, which might process the response before sending it to the
browser (see Figure below).
There
are a number of uses of a web proxy. A
proxy is commonly used for interaction between web browsers inside a firewall
and servers over the Internet. Proxies
also help in providing access control for certain documents or protocols based
on the IP address. For example, you
want to allow a client to use HTTP, but disallow the use of FTP. Further, Jeanna might want to restrict
access to this web page to users in Cornell.
Proxies are also useful in providing web access on devices with low
power, or a low-resolution display.
Another important use of the web proxy is caching. Frequently accessed web pages are cached by
the proxy to reduce access time on further requests. An image of the requested web page is temporarily stored at the
web proxies and is served in response to further requests for the same web
page. In this assignment you will
implement a web proxy and incorporate a simple caching scheme.
What
to do?
The
web proxy will have to be developed in three phases.
Phase 1:
In
the first phase you will implement a simple web proxy. The proxy will take requests from the
browser, parse the request, and send the request to the web server. The response gathered by the proxy will be
sent back to the browser. For this
functionality, the proxy should open a socket connection on startup, and listen
for incoming requests. On getting a
request from the browser, the proxy should parse the HTTP request to determine
the destination server, and open a connection to it. It should then send the request, process the reply, and send it
back to the browser. The port number
for the proxy should be a command line argument. In this assignment you are expected to implement only the GET
requests.
Phase 2:
The
web proxy developed in the previous phase should be able to handle multiple
simultaneous requests. In the current
scenario, the proxy will not process a request until it has serviced the
previous one. This is undesirable and
will give poor performance. A more
efficient technique will be to spawn a new thread for every new request. In this phase you will add multithreading
to the web proxy of Phase 1.
Concurrency could also be added using the select system call in
UNIX, but threads are strongly recommended in this assignment.
Phase 3:
Finally,
you will add caching to your proxy.
Requested web pages will be temporarily stored at the proxy to satisfy
further requests for the same web page.
For uniformity, a LRU cache replacement strategy should be
implemented. One way to implement the
cache is using a hash. However, you are
free to use any scheme of your choice.
The cache size, in KB, should be specified as a parameter to the proxy
program.
You
proxy should start by giving the following command on the command line:
java web_proxy –c <cache_size> -p <port number>
What
to submit?
You should submit the following for this
assignment.
1.
Complete
Java implementation of the proxy.
2.
A file called
README.txt where you give a tutorial on how to compile and run a program. This file should also contain the names,
netids and cornellids of all the individuals in the group. A template for README.txt will be provided
soon.
3.
A file
called DESIGN.txt describing the design of your proxy. The design of all the three phases should be
explained in this file. You should also
mention any external references or code used.
A template for DESIGN.txt will be provided soon.
How
will you be graded?
The following will play a crucial role in
your grades for this assignment.
1.
Correct
implementation of the basic web proxy (phase 1). [30 points]
2.
Successful
handling of multiple requests using multithreading (phase 2). [15 points]
3.
Correct
implementation of the LRU caching strategy (phase 3). [30 points]
4.
Clarity of
your java programs (comments!).
5.
Ease of
using the README.txt to test your programs and results. We will not make extra effort to get your
program running.
6.
Design
decisions as described in DESIGN.txt
Bullets 4, 5 and 6 carry a total of 25
points. If you are unable to complete
this assignment, you should specifically mention why you think the program did
not work, and what your approach for the remaining phases would be. This could earn you some extra points.
1.
A tutorial
on how to use threads in Java.
2.
Help
on using sockets in Java.
3.
You will
need some basic HTTP knowledge for this assignment. We expect you to be able to handle only GET requests. RFC
2616 gives the HTTP 1.1 specification in detail.
4.
You should
test your proxy with an available browser.
For Netscape, go to Edit->Preferences->Advanced->Proxies->Manual
Proxy Configuration. For Internet
Explorer, choose Tools->Options->Connections->LAN Settings->Use
a Proxy Server. In either case
specify the address and port number.