![]() |
CS 5150
Software Engineering
Fall 2013
Project Suggestion:
CED²AR XML repository
|
CED²AR XML repository Client Bill Block Technical advisor Ben Perry, Cornell Institute for Social and Economic Research Summary The Comprehensive Extensible Data Documentation and Access Repository (CED²AR) is designed to improve the discoverability of both public and restricted data from the federal statistical system. CED²AR is a National Science Foundation funded project, developed by The Cornell Institute for Social and the Cornell Labor Dynamics Institute. The CED²AR project is based upon leading metadata standards and will be flexibly designed to ingest documentation from a variety of sources. Technical Summary The CED²AR system is currently released as version 1.0. It ingests metadata from a number of heterogeneous data sources to an XML repository. This repository is based on the Data Documentation Initiative (DDI) schema (version 2.5) and serves as the backend to a search and discovery API. Finally, a user interface has been developed that interacts with the API. Scope Currently the CED²AR project needs a way to automatically update the XML repository, and provide version control for those changes. The proposed software must validate the XML against the DDI schema, then log differential changes between the existing and the updated XML. These changes can stored in our PostGreSQL database. In addition to logging changes, the software must also store annotations made by the author who is updating that XML. Finally, the software has to reverse engineer older versions of a specific XML file from the incremental changes. Opportunity This project has the potential to make an enormous impact on the way research is conducted using data from the federal statistical system. It is also an opportunity to work closely with a number of Cornell faculty members in Economics and Information Science.
|
[ Home ]
wya@cs.cornell.edu
Last changed: August 2013