![]() |
CS
501 Project Concepts Legal Information Institute |
Thomas R. Bruce, Co-Director of the Legal Information Institute (trb2@cornell.edu)
Cornell's Legal Information Institute is the premier open access legal service on the Internet (http://www.law.cornell.edu/). Two projects have been suggested:
The United States Code is released to the general public by the US House of Representatives on its Web site (at http://uscode.house.gov/download.htm ). This is a fairly plain-vanilla ASCII version to which the Legal Information Institute adds value (visible at http://www4.law.cornell.edu/uscode/ ). An earlier CS 501 project team and a later student project developed programs for the Legal Information Institute that convert the raw ASCII output of the House of Representatives to XML, for subsequent reuse in various settings. This is a flagship of the Legal Information Institute. The US Code currently gets about half a million hits daily.
The new project is to create PDF versions from the XML. Creating PDF is not hard. The hard part is building a user interface/delivery system that will allow this to be done at arbitrary levels of structure (up to an entire title), to come up with some way of either pre-building and caching the PDF versions or of building them on demand quickly enough for on-the-fly ordering, and tying the whole in with some kind of catalog/shopping cart system. In other words, the idea is that the user should be able to walk up to the system via browser, say "gimme that chunk of the Code", possibly pay for the service, and then get the chunk in reasonable time without the system having to prebuild the whole thing at every level.
Users of legal information, including the Legal Information Institute need, on open-source search engine that is fast, good, and law specific. The concept is to take an existing open-source engine like SWISH-E or SINO, modify it to run on a Beowulf cluster, and then add some refinements that are useful in law, such as better treatment of punctuation characters and possibly granting of extra weight to citation matches and the like.
You can select the technical environment for this project. Work in this area is typically carried out in Perl.
William Y. Arms
(wya@cs.cornell.edu)
Last changed: January 22, 2002