CS 501
Software Engineering
Spring 2002

Project Concepts

The Eprint Archives


Clients

Prof. Paul Ginsparg, Physics and Computer Science (ginsparg@cornell.edu)
Simeon Warner, Computer Science (simeon@cs.cornell.edu)

Prof. William Arms, Computer Science (wya@cs.cornell.edu) [CoRR project only

Project outlines

The science eprint archives, http://www.arxiv.org/, were founded by Paul Ginsparg at Los Alamos National Laboratory before the development of the web.  They are now a central part of the publication of research in physics and  related fields.  Last year, Ginsparg and his colleague Simeon Warner moved to Cornell.  Two projects have been suggested:

A a full-text search service for the archives 

The objective is to add a full-text search service to the archives. A great deal of the work will be clarifying the requirements, finding a suitable back-end engine, and devising how to interface it to the archives, with a suitable user interface. The development component would then be providing the necessary programs/scripts for a web interface and to allow the maintainers of the archives to build and incrementally update the indexes.

The computer science archives (CoRR)

Some years ago, a set of archives were created for computer science.  They are know as CoRR.  Professor Joe Halpern of Cornell, in conjunction with the Publications Board of the ACM, was the driving force behind their creation.  CoRR has not been as successful as hoped, at least in part because the design of the archives was tailored to theresearch processes of physics.  The objective of this project is to study how the requirements of computer science research differ and make appropriate modifications.  A substantial portion of the work will be in creating a new user interface.

Technical

You can select the technical environment for this project in consultation with the client.  Much of the arXiv is coded in Perl.


[CS 501 Home Page]

William Y. Arms

(wya@cs.cornell.edu)
Last changed: January 18, 2002