CS 501 Software Engineering: Project Suggestion

CS 501
Software Engineering
Spring 2006

Project Suggestions: Legal Information Institute

CS 501 Home

Syllabus

Client

Tom Bruce, Director Legal Information Institute, trb2@cornell.edu.

The Legal Information Institute

Cornell Law School's Legal Information Institute (LII) is one of the most highly ranked source of legal information on the Web. It is also one of the most heavily used web sites at Cornell.

In previous years, there have been several successful CS 501 projects for the Legal Institute. This year, two projects have been proposed.

The United States Code

The United States Code, the compilation of all laws enacted by Congress, can be thought of as a tree structure. At the top level it is divided into fifty Titles. Each Title has distinct (and possibly different) subdivisions, terminating at the "leaf" level with documents known as "sections", which contain the actual text of the Code. The LII offers the US Code on the Web as a series of HTML documents derived (via XSLT) from a single XML document per title; each Web page offered in this way is either an HTML table of contents representing some intermediate branch of the tree, or an HTML document containing the text of a single section. The collection is very popular, drawing in the vicinity of half a million hits per day.

Frequently, visitors to the site wish to print out more than a single section at a time -- often, they want some aggregation that corresponds to a branch of the "tree" described above. We would like to offer this possibility as a "print on demand" service that would deliver a PDF document aggregating all of a "subtree" into a PDF document that the user may then print and place in a looseleaf binder. We may charge a fee for this service.

There are a number of challenges associated with this project that demand careful work in the feasibility-study stage in order to find a usable combination of software capability and policy or business-rule constraints. First, some aggregations are potentially very large; we need to either limit what users may do or develop a system with capacity to handle it. Second, demand is likely to be high; it will be useful to build either for very high performance or for a system that caches conversions for reuse (or perhaps builds them in advance). Third, formatting XML into PDF is not without its challenges. Finally, the system will either need to integrate with a legacy shopping-cart system, or we will need to select a new one that will be usable for existing LII sale items.

Plea Rolls

During the 13th and 14th centuries, the actions of the English law courts were recorded on so-called "plea rolls", actual rolls of parchment with writings describing the actions taken by the court. Over the intervening centuries the rolls have suffered some deterioration, particularly in outer layers of the roll that were more exposed to outside air, but also from "bleed through" of ink from one side of the parchment to the other and from one layer of the rolled document to the next. The project, generally, is to make TIFF images of the documents more legible to scholars. There are about 2400 such images; they are large (64M) high-quality TIFFs.

We can see two possible approaches, either or both of which might be pursued by the project team (we are also open to suggestions). The first is to build a front-end for an open-source software package such as GIMP that would provide the scholar with a simple-to-use set of image transformation tools via an intuitive interface. The emphasis here would be on selecting a set of tools especially for this task and making them as easy as possible for an imaging novice to manipulate and, if necessary, reverse. The second, related effort would be to build a series of tools that could improve the images themselves, perhaps by analyzing glyphs in the existing documents and matching them to "partial glyphs" that are indistinct or incomplete -- a sort of information-based interpolation, as opposed to simple image manipulation. The clients would be Thomas R. Bruce of the Legal Information Institute, Terry Martin of the Harvard Law School Library, and Charles Donahue, a legal historian at Harvard ( http://www.law.harvard.edu/faculty/directory/facdir.php?id=14).

William Y. Arms
(wya@cs.cornell.edu)
Last changed: January 18, 2006