CS 5150
Software Engineering
Fall 2013

Project Suggestion:
CED²AR XML repository


 

CED²AR XML repository

Client

Bill Block
Director, Cornell Institute for Social and Economic Research
Executive Director, Social Science History Association
<block@cornell.edu>

Technical advisor

Ben Perry, Cornell Institute for Social and Economic Research
<bap63@cornell.edu>

Summary

The Comprehensive Extensible Data Documentation and Access Repository (CED²AR) is designed to improve the discoverability of both public and restricted data from the federal statistical system. CED²AR is a National Science Foundation funded project, developed by The Cornell Institute for Social and the Cornell Labor Dynamics Institute. The CED²AR project is based upon leading metadata standards and will be flexibly designed to ingest documentation from a variety of sources.

Technical Summary

The CED²AR system is currently released as version 1.0. It ingests metadata from a number of heterogeneous data sources to an XML repository. This repository is based on the Data Documentation Initiative (DDI) schema (version 2.5) and serves as the backend to a search and discovery API.  Finally, a user interface has been developed that interacts with the API.
Technologies extensively used: Java, XQuery, BaseX, DDI 2.5 XML Schema, JQuery, PostGreSql.

Scope

Currently the CED²AR project needs a way to automatically update the XML repository, and provide version control for those changes. The proposed software must validate the XML against the DDI schema, then log differential changes between the existing and the updated XML. These changes can stored in our PostGreSQL database. In addition to logging changes, the software must also store annotations made by the author who is updating that XML. Finally, the software has to reverse engineer older versions of a specific XML file from the incremental changes.

Opportunity

This project has the potential to make an enormous impact on the way research is conducted using data from the federal statistical system. It is also an opportunity to work closely with a number of Cornell faculty members in Economics and Information Science.

 


[ Home ]


wya@cs.cornell.edu
Last changed: August 2013