CS 5150 Software Engineering: Project Suggestion

CS 5150
Software Engineering
Fall 2011

Project Suggestion:
Legal Information Institute

Notice

Kobi Acquay is assembling a team of people who would like to work on the first of these two projects, Incentives for Wiki Participation. If you are interested, send him email: kaa49@cornell.edu.

Legal Information Institute

Client

Thomas R. Bruce, Director Legal Information Institute, Cornell Law School
Email: <tom@liicornell.org>

Project

The Legal Information Institute (LII) is the single most visible web site at Cornell, accounting for approximately 65% of Cornell's unique web visitors each year. In past years, several successful CS 5150 projects have developed software for the LII.

The LII proposes two separate projects:

A. Incentives for Wiki Participation

A major section of the site -- the WEX legal encyclopedia and dictionary -- is devoted to providing explanations of law and legal information to interested professionals and lay people. WEX is wiki-based, with participation restricted to experts (not all of them lawyers). WEX articles consistently rank very high in Google, with many in first and second place for searches on terms involving legal expertise, and they are referred to regularly in the press, in blogs, and in tweet streams as people learn about and debate legal issues.

The LII has designed a system of incentives to encourage lawyers and other legal experts to contribute and maintain contributions to WEX. It is a diverse system aimed at different expert cultures (lawyers, law librarians, legal academics, judges, law students, and business experts, all of whom need different inducements). Ideas for incentives are drawn from many sources, including games like World of Warcraft, airline frequent-flyer programs, location-based "games" like FourSquare, merit-badge systems, and the like. Those ideas have been distilled in a way that should encourage rapid development of a specification that can be turned into software. We wish to integrate the resulting incentive system with various parts of the LII site, including authentication systems, user profiles, and the LII's lawyer directory and content-management systems.

Coding will be in PHP, and involve Drupal module development or extension. There may be a significant UI component to the project (involving the presentation of awards in the user-profile and acknowledgement systems).

B. Semantic Search Framework

For many years, the LII has been a leader in adapting new Web technologies to the business of making law freely available to everyone. In particular, it has been actively involved in the process of identifying and structuring useful metadata that will make it easier for a global audience of ordinary people to find and understand the law. We now wish to find ways to apply that knowledge to semantic search (whatever that might be). This can be particularly helpful to non-lawyer users who lack the background knowledge needed to create good queries and to know where it makes most sense to launch them.

There are numerous ways in which metadata or other knowledge, encoded in RDF and placed in a triple store, might be used to improve search experience for users of the LII site. Obvious choices involve the integration of information about semantically-related terms to expand a user's query, based on a SKOS-encoded thesaurus, or knowledge of how particular searches might be directed toward different collections contained within a full-text index, based on knowledge of what different patterns of search terms suggest about what the user might be seeking, or about which results ought to be privileged in the result set.

We would like to build a software framework that will allow for the integration of knowledge encoded in a triple store with search via the Solr search engine. The idea is to make it simpler for a designer who understands the data modeled in the triple store, and a related full-text corpus, to integrate inferences from the semantic data into the mechanics of a full-text search.

In general, the framework should provide for:

Identification of a particular user query as belonging to one or more classes of query that could benefit from additional information available from the triple store. This might be a matter of simple pattern matching, or a matter of drawing inferences about the query based on RDF-encoded information.
The application of one or more tools/filters to the search process, to implement:
1. database selection (or selection of some other type of target population for the query), based on inferences about the query or about the data itself. This might include referral to non-LII resources known to the system.
2. query expansion/substitution via thesaurus or more sophisticated inferencing regime.
3. triggering of additional queries/searches useful in populating, e.g., recommended-resource sidebars, identifying materials that provide context for search results, and so on.

The framework should allow the user -- in this case, an LII software engineer or "search designer" -- to specify a sequence of actions by the framework that can be applied to an incoming query that fits a certain classification that the designer has specified -- perhaps one or more SPARQL queries -- with a number of possible presentation targets as suggested in 3) above. Development of an exact specification for this project is likely to prove a significant challenge. We are agnostic about how it is coded.

[ Home ]

wya@cs.cornell.edu
Last changed: August 2011