CS 501
Software Engineering
Spring 2006

Project Suggestion: Data Tracking System for the Web Library

William Arms, wya@cs.cornell.edu.

The Web Library

The Web Library is a Cornell project to build a research library based on the Web crawls that have been collected by the Internet Archive since 1996. It is described at: http://www.infosci.cornell.edu/SIN/WebLib/index.html.

By the end of 2007 the Web Library is planned to contain 10 billion Web pages, occupying 240 TB of disk storage. To achieve this, millions of files have to be transferred, indexed, and stored. In fall 2005, two M.Eng. students designed and began implementation of a system to manage these files and their transfer to Cornell. Their report is at:
   Kohli, S., Sanghi, L., Data Monitoring and Tracking. December 2005.
The aim of this project is to build on the earlier work and to create a production system.

