CS 502
Architecture of Web Information Systems
Spring 2003

Projects

 

Philosophy

There will be two programming projects during the semester with tentative due dates March 28 and May 14.  The assignments are designed to give students some practical experience in dealing with the technologies that make the Web and digital libraries work.  In general, the assignments will require students to understand relevant protocol or specifications documents and write a moderate amount of java code that demonstrates an understanding of those specifications.

These assignments are not mainly a test of your programming skills.  Rather they meant to encourage you to read protocol specifications and understand the APIs that implement them.  In the real world this is not done in isolation.  Thus,  students are expected to work in groups on these assignments.  At the beginning of the semester the class will break up into groups of three that will remain together for the remainder of the semester.  Members of the group are expected to share information, jointly understand protocol documents and APIs, and write the final code product.  Every member of the group will receive the same grade for the assignment and it is the responsibility of the group to ensure that the work is apportioned fairly.

Prerequisites

The assignments assume that students can program in Java and understand how to download and use class libraries.  No java or programming tutorials will be offered.

Grading

This is not a programming course.  Imaginative algorithms or data structures will not be required or play a role in grading.  Instead, grading will be based on completion of the assigned task and demonstrated understanding of the concepts and protocols underlying the assignment.  Nevertheless, assignments should demonstrate good programming practices and documentation commensurate with the 500 level of this course. 

Programming Environment

Metrowerks CodeWarrior and Borland JBuilder are the preferred environments for developing and testing programs. One of the two is required for submitting assignments.  CodeWarrior is in all the CIT and CSUG labs.  A personal version of JBuilder is available for free at http://www.borland.com/jbuilder/personal/. The assignments have been tested in both environments.

Tools

Working with XML, XSLT, and the like is considerably easier if you don't have to worry about syntactic details.  Fortunately, there are a number of excellent tools available to avoid this.  Two that I recommend are xmlspy and Stylus Studio.  I will be using the latter throughout the course.  You may download it onto your personal machine for a 30-day free trial, which may be renewed.  Both tools are Windows-only (sorry Mac users). 

Submitting Assignments

All assignments are due by MIDNIGHT on the due date.  NO LATE ASSIGNMENTS WILL BE ACCEPTED.

To identify your assignments and  make grading easier, assignments MUST conform to the following guidelines.  :

Submissions that fail to conform to these guidelines will be rejected.

Assignment 0 - Due 2/7/2003

The purpose of this assignment is to ensure that you are familiar with the assignment submission process.  It will not be graded but your submission of it registers the existence of your project group.

Resources for assignment 0

Directions for assignment 0

Write a java program that prints to the console two lines:

  1. firstlastassignment0
  2. The three names of group members separated by commas

Assignment 1 - Due 3/28/2003

This assignment will require you to work with a number of web and information technologies, including the HTTP protocol, the Open Archives Protocol for Metadata Harvesting, XML parsing, and RDF.  You will harvest Dublin Core metadata from the Web in XML, transform that metadata into RDF/XML, build an in memory model of the RDF, and do a transformation of the metadata based on a new schema.

Resources for assignment 1

Directions for assignment 1

Your group will write a single java program that takes no arguments and does the following:

  1. Harvest metadata from baseURL http://services.nsdl.org:8080/nsdloai/OAI.  You should restrict your harvest to the set 'pri' and metadata format 'nsdl_dc'.  You can do a single harvest, ignoring the resumptionToken (indicating that there is another group of records to harvest for this request).
  2. Transform the resulting xml document via XSLT to the simple RDF/XML format defined in Expressing Simple Dublin Core in RDF/XML.  Note two things:
  3. Construct an in-memory RDF model from the resulting RDF/XML using the Jena Semantic Web Toolkit
  4. The DC properties have been criticized because they are a simple flat list.  Semantically the properties can be partitioned among the four entities in the IFLA FRBR entity model: work, manifestation, expression, and item
  5. Write out the resulting RDF/XML.

Guidance for assignment 1

This assignment really doesn't require a significant amount of programming.  The bulk of the work is understanding the schema design, protocol specifications, APIs, and tools such as XSLT.  Much of the material will be introduced in lecture over the next few weeks.  I'd recommend, however, that you get an early start by looking at and downloading the relevant resources and experimenting with them (e.g., the tutorial for Jena is extremely helpful). 

Assignment 2 - Due week of 5/12 (see below)

This assignment will require you to create a digital object encoded in the METS format.  (Paths throughout the assignment assume that you have installed Fedora on your C drive).  You can look at the sample digital objects that come with Fedora as examples.  (See C:\mellon\dist\server\demo).  A good sample digital object that can be used as an starting point for this assignment is found in: C:\mellon\dist\server\demo\local-server-demos\document-transform-demo\obj-document-fedoraAPIA.xml .  We suggest you make a copy of this object and just use it as a model.

Your submission of this project will be via a 45 minute presentation, which will be scheduled on May 12-15.   Scheduling will be by email request to anat@cs.cornell.edu to request one of the available slots in the table below.  Your email should have as subject line 'CS502 FINAL PROJECT SCHEDULE' and the body should include the names of the team members, a primary requested time, and a secondary requested time.  Please select only times that are open in the following table.

Monday May 12  
9:00  
10:00 Taewoo Kim, Alan Leung, Agya Soni, Yaw Shin Yeo
11:00 Jon Aizen, Murali Kumar, Yeong Cheah
12:00 Ross Housewright, Naqi Khan, and Gregg Herlacher
13:00  
14:00 Atit Patel, Andrew Carter, Benjamin Kraus
15:00 Aleksandr Kirshenbaum, Ramiro Rodriguez
16:00  
17:00  
   
Tuesday May 13  
9:00  
10:00  
11:00  
12:00 Robert Iwan, Brian Babey, Karan Suri
13:00  
14:00  
15:00 Jinen Kamdar, Euwyn Poon & Heng-Scheng Chuang
16:00 Greg Truax, Peter Burns, Ilya Ryzhov
17:00 Elliott Davis, Jason Keller, Sameer Parwani
   
Wednesday May 14  
9:00 Justin Yang, Bob Campbell, Edward Fu
10:00
Edmund Fung, Prashanth Chandrasekar, Priyanka Atul Nishar
11:00 Shahed Masud, Andrew Naumov, Michael Sorokorensky
12:00 Cheng, San-Yiu, Chen, Henry, Young, Pi-Yu
13:00 Reza Shakoori, Farez Alibay, Andrew Lee
14:00 Jacob Hoffman-Andrews, Grant Pitel, Scott Selikoff
15:00 Beiying Zhang, Kiran Kannancheri, Bing Pan 
16:00 Derrick Yuen, Ashley Lin, Cesar Ho
17:00 Danny Falkov, Gabor Foldes, Rohan Murty

 

Resources for assignment 2

  1. Payette, Sandra and Thornton Staples, "The Mellon Fedora Project: Digital Library Architecture Meets XML and Web Services,", http://www.fedora.info/documents/ecdl2002final.pdf.  This is a technical overview of the FEDORA architecture
  2. FEDORA project site - http://www.fedora.info.  Of particular interest are the sample objects at http://www.fedora.info/techdoc.shtml
  3. FEDORA beta release software (location to be announced)
  4. Amazon.com Associates XML interface http://www.perfectxml.com/articles/XML/TheXMLWeb.asp - the syntax for issuing search requests to Amazon and retrieving XML formatted results.

Directions for assignment 2

  1. Go to the Fedora software URL to download the zip file with the pre-defined behavior objects to use with this assignment. (http://www.fedora.info/release/0.95/) .   This will require Windows 2000 or Windows XP operating system on your machine.  You can do your development work on any machine you wish, but the presentation of your work will require that you bring your finished work in on a laptop.  Please contact me as soon as possible if this is a problem. 
  1. Design a Fedora digital object that has three disseminators.  Refer to the demo objects found in the directory C:\mellon\dist\server\demo\.   Using an XML Editor, you must create a new digital object with the following disseminators:
    1. Disseminator 1:  This disseminator will have a method that will produce an HTML page displaying a static �booklist� that is stored as a datastream in the digital object.  The booklist dissemination should contain information about a set of books pertaining to a topic of interest to you (e.g., dogs, computer scientists, flowers).  This information should include basic metadata for each book (e.g., title, author, publisher), and an image of the book cover if available. 
    2. Disseminator 2:  This disseminator will have a method that will produce an XML that encodes a �booklist� that is dynamically generated.  The booklist should reflect a topic that is relevant to the �subject� element of a Dublin Core record that is stored as a datastream in the digital object.  The Dublin Core record should be an inline XML datastream in the digital object. 
    3. Disseminator 3:  This disseminator will have a method that will produce an HTML page displaying the dynamic booklist generated via Disseminator #2.  
  1. You do not have to create a special web page to access your object disseminations.  Instead, you may use the built-in displays that are rendered via the operations in Fedora�s Access service (API-A-LITE).  From a web browser you can issue simple URLs to view and disseminate your digital object.  See the Fedora documentation for the URL syntax: http://www.fedora.info:8080/userdocs/apialite/index.html
  1. (OPTIONAL)  If you get inspired, you can write an HTML-based front-end that acts as a browsing application for objects in your repository.  For example, you can create a collection of booklist objects in the repository, with each object representing a different subject area.  You can then use the API-A-LITE search operation to query the repository from your front-end application to provide a list of all objects of the �booklist� type. (Hint: you can use the PROFILE element on the root METS element in the digital object XML to provide a type label such as �fedora:booklist�).  You may want to present a result list from the search query, then provide a link on each result item that will launch a dissemination on that object.

 Hints for implementing your digital object and disseminators:

Disseminator 1:  The basic goals of this disseminator are for you to (1) design an appropriate XML format for a booklist, (2) to create a datastream that represents the booklist, and (3) hook up a disseminator that transforms the datastream to from XML to HTML for presentation in a web browser.  Take a look at Amazon.com to see the kind of information that is relevant to book browsing.   Again, the booklist dissemination should be sourced from an XML file that you create and embed as a datastream in the Fedora digital object.  You should also create an XSLT stylesheet as another datastream in the digital object.  This stylesheet will be used to perform an XSLT transformation on the XML.  Use the demo object found in the document-transform-demo directory as guidance.   The disseminator will point to a behavior mechanism object that represents the Saxon XSLT service that is installed with Fedora.  The Saxon service is a Java servlet that takes two arguments (a URL for an XML file and a URL for an XSLT stylesheet for transforming that XML).  We have set up the behavior mechanism object to have this servlet be automatically called by the Fedora repository at runtime.  So your task is to assemble a digital object so that the disseminator is properly configured with these behavior objects:

a.  Behavior Definition Object (bdef_staticBookList.xml) � contains abstract definition of method �viewStaticBookList�

b. Behavior Mechanism Object (bmech_staticBookList.xml) � contains bindings for running �viewStaticBookList� method using the local Saxon XSLT service that is pre-installed with Fedora)

Disseminator 2: The basic goal of this disseminator is for you to design a service that can produce a dynamic booklist.  This will require that you write a Java Servlet and make sure that the service is described by Fedora behavior objects.  Your disseminator will point to these behavior objects.  In particular a behavior mechanism object will contain metadata that will record how to run your Java servlet via a URL.  These behavior objects have been set up already for you (described below).  So, your main task is writing the Java servlet.  Once your servlet works in a stand-alone mode, you can plug its binding info into the proper behavior mechanism object.  Your servlet must achieve the following tasks:

  1. Receive the URL of a Dublin Core record as an input parameter and return XML that represents a list of books pertaining to the subject element of the Dublin Core.  Remember that the Dublin Core record will be a datastream that is stored in your digital object.
  2. Parse the Dublin Core record to obtain a topic via subject element(s).
  3. Construct an HTTP request to the Amazon.com XML interface to get information on books pertaining to the topic.  (See section below entitled �URL Syntax for Amazon and Library�.)
  4. Process the Amazon XML results to obtain ISBN numbers for the books (you can choose to limit results to 5 books if you have large result set).
  5. Construct an HTTP requests to the Cornell Library Catalog to see if the books are available at your campus library.  (See section below entitled �URL Syntax for Amazon and Library�.)
  6. Return the booklist as XML.  For any books that are available at the Cornell Library, include a URL for the page that displays the library catalog information.
  7. Install your Java servlet as a local service under Fedora.  Put your class files in a directory under the Tomcat webapps area of the Fedora installation (see C:\mellon\dist\server\tomcat41\webapps).  Create your own sub-directory (e.g., booklist) and follow the example of how the �saxon� servlet is configured.  This will enable you to run your Java servlet right out of Fedora�s Tomcat container when the Fedora server is running.  (e.g., http://localhost:8080/booklist/)
  8. Use the following pre-existing Fedora behavior objects to represent you new Java servlet within a disseminator:
    1. Behavior Definition Object (bdef_getBookList.xml) � contains abstract definition of method �getBookList.�
    2. Behavior Mechanism Object (bmech_getBookList.xml) � contains binding information to run your Java servlet  implementation of the method �getBookList.�  The key thing here is to make sure that the WSDL metadata in the Behavior Mechanism Object properly points to your Java servlet.  Look for the http:address element in the WSDL metadata.  This is the base URL for your servlet if you have properly installed it under Fedora.

Example:                    

<http:address location="http://localhost:8080/booklist/"/>

 Look for the http:operation element in the WSDL metadata.  This is a relative URL for binding to your �getBookList� method of your Java servlet.  The best thing to do is just make sure your servlet can be called with this URL syntax.  

 Example:                    

<http:operation location="getbooklist?dc=(DC)"/>

  (The parameter ?dc=(DC) is a placeholder syntax.  Your servlet should be able to receive a URL for a Dublin Core record as the value for the dc parameter.  At runtime, the Fedora repository will automatically replace �(DC)� with a callback URL for the Dublin Core record that�s in your digital object (if you have created your digital object disseminator properly).  

Disseminator 3:  The basic goal of this disseminator is for you to pipe the results of one disseminator into another disseminator.  Observing that a datastream in a digital object is represented as a URL, and observing that a dissemination on digital objects can be run via a URL, think about how you can get the XML disseminated from Disseminator #2 to be the input for Disseminator #3.   Given that you can figure this out, your task is to have Disseminator #3 transform the booklist XML into HTML.  So, the behavior objects look very similar to those in Disseminator #1.  So your task is to create this final disseminator configured with these behavior objects:

a.  Behavior Definition Object (bdef_displayBookList.xml) � contains abstract definition of method �displayBookList�

b. Behavior Mechanism Object (bmech_displayBookList.xml) � contains bindings for running �displayBookList� method using the local Saxon XSLT service that is pre-installed with Fedora)

 URL Syntax for Amazon and Library:

 Here's the amazon.com URL syntax for the topic "dreamcatcher".  Just replace the value of �&search=� with your subject word.

 http://rcm.amazon.com/e/cm?t=encyclozine&l=st1&search=dreamcatcher&mode=books&p=102&o=1&f=xml

 ...and here's the Cornell library catalog URL syntax.  Just replace Search_Arg with the  ISBN number for a book (which is called ASIN in the amazon.com xml response).

 http://catalog.library.cornell.edu/cgi-bin/Pwebrecon.cgi?Search_Arg=0743211383&Search_Code=FT%2A&CNT=50

Presenting your work

As mentioned above, you will present your work in demonstration session. (Please schedule your session as soon as possible). You will have 45 minutes to:

Please come to the demo with your work set up on a laptop that can be plugged into a network connection and projected onto a screen.  Please let us know as soon as possible if there is some problem with this.