 |
CS 502
Architecture of Web Information Systems
Spring 2003
Projects |
Philosophy
There will be two programming projects during the semester with tentative due
dates March 28 and May 14.
The assignments are designed to give students some practical experience in
dealing with the technologies that make the Web and digital libraries
work. In general, the assignments will require students to understand
relevant protocol or specifications documents and write a moderate
amount of java code that demonstrates an understanding of those
specifications.
These assignments are not mainly a test of your programming skills. Rather
they meant to encourage you to read protocol specifications and understand the APIs
that implement them. In
the real world this is not done in isolation. Thus, students are
expected to work in groups on these assignments. At the beginning of the
semester the class will break up into groups of three that will remain together
for the remainder of the semester. Members of the group are expected to
share information, jointly understand protocol documents and APIs, and write the
final code product. Every member of the group will receive the same grade
for the assignment and it is the responsibility of the group to ensure that the
work is apportioned fairly.
Prerequisites
The assignments assume that students can program in Java and understand how
to download and use class libraries. No java or programming tutorials will
be offered.
Grading
This is not a programming course. Imaginative algorithms or data
structures will not be required or play a role in grading. Instead,
grading will be based on completion of the assigned task and demonstrated
understanding of the concepts and protocols underlying the assignment.
Nevertheless, assignments should demonstrate good programming practices and
documentation commensurate with the 500 level of this course.
Programming Environment
Metrowerks CodeWarrior and Borland JBuilder are the preferred environments
for developing and testing programs. One of the two is required for
submitting assignments. CodeWarrior is in all the CIT and CSUG labs.
A personal version of JBuilder is available for free at http://www.borland.com/jbuilder/personal/.
The assignments have been tested in both environments.
Tools
Working with XML, XSLT, and the like is considerably easier if you don't have
to worry about syntactic details. Fortunately, there are a number of
excellent tools available to avoid this. Two that I recommend are
xmlspy and
Stylus Studio.
I will be using the latter throughout the course. You may download it onto
your personal machine for a 30-day free trial, which may be renewed. Both
tools are Windows-only (sorry Mac users).
Submitting Assignments
All assignments are due by MIDNIGHT on the due
date. NO LATE ASSIGNMENTS WILL BE ACCEPTED.
To identify your assignments and make grading easier, assignments MUST
conform to the following guidelines. :
- Each group should identify a group leader when they form. That
persons name will serve as the "firstlast" in the remainder of these
instructions.
- The CodeWarrior or JBuilder project must be named as firstlastassignment#
(e.g., CarlLagozeAssignment1)
- The first executable line of the program should be
System.out.println("firstlastassignment#")
- The assignment must be submitted as an email to both lagoze@cs.cornell.edu and
sv69@cornell.edu with:
- one attachment consisting of a zip file containing the project directory
tree. The name of the zip file should be as "firstlastassignment#.zip". An example of such a zip file is
here.
- the subject line of the email should be "CS502 firstlastassignment#"
Submissions that fail to conform to these guidelines will be rejected.
The purpose of this assignment is to ensure that you are familiar with the
assignment submission process. It will not be graded but your
submission of it registers the existence of your project group.
Resources for assignment 0
- Sample submission zip file - here.
Directions for assignment 0
Write a java program that prints to the console two lines:
- firstlastassignment0
- The three names of group members separated by commas
This assignment will require you to work with a number of web and information
technologies, including the HTTP protocol, the Open Archives Protocol for
Metadata Harvesting, XML parsing, and RDF. You will harvest Dublin Core
metadata from the Web in XML, transform that metadata into RDF/XML, build an in
memory model of the RDF, and do a transformation of the metadata based on a new
schema.
Resources for assignment 1
Directions for assignment 1
Your group will write a single java program that takes no arguments and does
the following:
- Harvest metadata from baseURL
http://services.nsdl.org:8080/nsdloai/OAI. You should restrict your
harvest to the set 'pri' and metadata format 'nsdl_dc'. You can do a
single harvest, ignoring the resumptionToken (indicating that there is another
group of records to harvest for this request).
- Transform the resulting xml document via XSLT to the simple RDF/XML format
defined in
Expressing Simple Dublin Core in RDF/XML. Note two things:
- You will loose some information in the translation process since nsdl_dc
is a qualified form of Dublin Core.
- Since your harvest will include multiple metadata records, your
resulting RDF should have multiple rdf:Description elements.
- Construct an in-memory RDF model from the resulting RDF/XML using the Jena Semantic Web Toolkit.
- The DC properties have been criticized because they are a simple flat
list. Semantically the properties can be partitioned among the four
entities in the IFLA FRBR entity model: work, manifestation, expression,
and item.
- Write a new RDF schema (expressed in RDF/XML) that expresses the four
classes of resources expressed by the FRBR, expresses properties to
associate the FRBR entities with the described resource, and then associates
the respective Dublin Core properties with the proper FRBR entity via domain
constraints. You should include the schema in your submitted zip file
with the name dc_frbr.rdfs.
- Transform your in-memory models according to the rules defined by this
schema (note that Jena does not have schema support so there is no
verification mechanism for this transform).
- Write out the resulting RDF/XML.
Guidance for assignment 1
This assignment really doesn't require a significant amount of programming.
The bulk of the work is understanding the schema design, protocol specifications, APIs, and
tools such as XSLT. Much of the material will be introduced in lecture
over the next few weeks. I'd recommend, however, that you get an early
start by looking at and downloading the relevant resources and experimenting
with them (e.g., the tutorial for Jena is extremely helpful).
Assignment 2 - Due week of 5/12 (see below)
This assignment will require you to create a digital object encoded in the
METS format. (Paths throughout the assignment assume that you have
installed Fedora on your C drive). You can look at the sample digital
objects that come with Fedora as examples. (See C:\mellon\dist\server\demo).
A good sample digital object that can be used as an starting point for this
assignment is found in: C:\mellon\dist\server\demo\local-server-demos\document-transform-demo\obj-document-fedoraAPIA.xml
. We suggest you make a copy of this object and just use it as a model.
Your submission of this project will be via a 45 minute presentation, which
will be scheduled on May 12-15. Scheduling will be by email request
to anat@cs.cornell.edu to request one
of the available slots in the table below. Your email should have as
subject line 'CS502 FINAL PROJECT SCHEDULE' and the body should include the
names of the team members, a primary requested time, and a secondary requested
time. Please select only times that are open in the following table.
Monday
May 12 |
|
9:00 |
|
10:00 |
Taewoo
Kim, Alan Leung, Agya Soni, Yaw Shin Yeo |
11:00 |
Jon
Aizen, Murali Kumar, Yeong Cheah |
12:00 |
Ross
Housewright,
Naqi
Khan, and
Gregg Herlacher
|
13:00 |
|
14:00 |
Atit
Patel, Andrew Carter, Benjamin Kraus |
15:00 |
Aleksandr
Kirshenbaum, Ramiro Rodriguez |
16:00 |
|
17:00 |
|
|
|
Tuesday
May 13 |
|
9:00 |
|
10:00 |
|
11:00 |
|
12:00 |
Robert
Iwan, Brian Babey, Karan Suri |
13:00 |
|
14:00 |
|
15:00 |
Jinen
Kamdar, Euwyn Poon & Heng-Scheng Chuang |
16:00 |
Greg
Truax, Peter Burns, Ilya Ryzhov |
17:00 |
Elliott
Davis, Jason Keller, Sameer Parwani |
|
|
Wednesday
May 14 |
|
9:00 |
Justin
Yang, Bob Campbell, Edward Fu |
10:00 |
Edmund
Fung, Prashanth Chandrasekar, Priyanka Atul Nishar
|
11:00 |
Shahed Masud, Andrew Naumov, Michael Sorokorensky |
12:00 |
Cheng,
San-Yiu, Chen, Henry, Young, Pi-Yu |
13:00 |
Reza
Shakoori, Farez Alibay, Andrew Lee |
14:00 |
Jacob
Hoffman-Andrews, Grant Pitel, Scott Selikoff |
15:00 |
Beiying
Zhang, Kiran Kannancheri, Bing Pan |
16:00 |
Derrick
Yuen, Ashley Lin, Cesar Ho |
17:00 |
Danny
Falkov, Gabor Foldes, Rohan Murty |
Resources for assignment 2
- Payette, Sandra and Thornton Staples, "The Mellon Fedora Project:
Digital Library Architecture Meets XML and Web Services,", http://www.fedora.info/documents/ecdl2002final.pdf.
This is a technical overview of the FEDORA architecture
- FEDORA project site - http://www.fedora.info.
Of particular interest are the sample objects at http://www.fedora.info/techdoc.shtml.
- FEDORA beta release software (location to be announced)
- Amazon.com Associates XML interface http://www.perfectxml.com/articles/XML/TheXMLWeb.asp
- the syntax for issuing search requests to Amazon and retrieving XML formatted
results.
Directions for assignment 2
- Go
to the Fedora software URL to download the zip file with the pre-defined
behavior objects to use with this assignment. (http://www.fedora.info/release/0.95/)
.
This will require Windows 2000 or Windows XP operating system on your
machine. You can do your
development work on any machine you wish, but the presentation of your work
will require that you bring your finished work in on a laptop.
Please contact me as soon as possible if this is a problem.
- Design
a Fedora digital object that has three disseminators.
Refer to the demo objects found in the directory C:\mellon\dist\server\demo\.
Using an XML Editor, you must create a new digital object with the
following disseminators:
- Disseminator
1: This disseminator will
have a method that will produce an HTML page displaying a static
�booklist� that is stored as a datastream in the digital object.
The booklist dissemination should contain information about a set
of books pertaining to a topic of interest to you (e.g., dogs, computer
scientists, flowers). This
information should include basic metadata for each book (e.g., title,
author, publisher), and an image of the book cover if available.
- Disseminator
2: This disseminator will
have a method that will produce an XML that encodes a �booklist� that
is dynamically generated. The
booklist should reflect a topic that is relevant to the �subject�
element of a Dublin Core record that is stored as a datastream in the
digital object. The Dublin
Core record should be an inline XML datastream in the digital object.
- Disseminator
3: This disseminator will
have a method that will produce an HTML page displaying the dynamic
booklist generated via Disseminator #2.
- You
do not have to create a special web page to access your object
disseminations. Instead, you may
use the built-in displays that are rendered via the operations in Fedora�s
Access service (API-A-LITE). From
a web browser you can issue simple URLs to view and disseminate your digital
object. See the Fedora
documentation for the URL syntax: http://www.fedora.info:8080/userdocs/apialite/index.html
- (OPTIONAL)
If you get inspired, you can write an HTML-based front-end that acts
as a browsing application for objects in your repository.
For example, you can create a collection of booklist objects in the
repository, with each object representing a different subject area.
You can then use the API-A-LITE search operation to query the
repository from your front-end application to provide a list of all objects
of the �booklist� type. (Hint: you can use the PROFILE element on the
root METS element in the digital object XML to provide a type label such as
�fedora:booklist�). You may
want to present a result list from the search query, then provide a link on
each result item that will launch a dissemination on that object.
Hints
for implementing your digital object and disseminators:
Disseminator 1: The basic
goals of this disseminator are for you to (1) design an appropriate XML format
for a booklist, (2) to create a datastream that represents the booklist, and (3)
hook up a disseminator that transforms the datastream to from XML to HTML for
presentation in a web browser. Take a look at Amazon.com to see the kind
of information that is relevant to book browsing. Again, the
booklist dissemination should be sourced from an XML file that you create and
embed as a datastream in the Fedora digital object. You should also create
an XSLT stylesheet as another datastream in the digital object. This
stylesheet will be used to perform an XSLT transformation on the XML. Use
the demo object found in the document-transform-demo directory as guidance.
The disseminator will point to a behavior mechanism object that represents the
Saxon XSLT service that is installed with Fedora. The Saxon service is a
Java servlet that takes two arguments (a URL for an XML file and a URL for an
XSLT stylesheet for transforming that XML). We have set up the behavior
mechanism object to have this servlet be automatically called by the Fedora
repository at runtime. So your task is to assemble a digital object so
that the disseminator is properly configured with these behavior objects:
a.
Behavior Definition Object (bdef_staticBookList.xml) � contains
abstract definition of method �viewStaticBookList�
b. Behavior Mechanism Object (bmech_staticBookList.xml)
� contains bindings for running �viewStaticBookList� method using the
local Saxon XSLT service that is pre-installed with Fedora)
Disseminator 2: The basic goal of
this disseminator is for you to design a service that can produce a dynamic
booklist. This will require that you
write a Java Servlet and make sure that the service is described by Fedora
behavior objects. Your disseminator
will point to these behavior objects. In
particular a behavior mechanism object will contain metadata that will record
how to run your Java servlet via a URL. These
behavior objects have been set up already for you (described below).
So, your main task is writing the Java servlet.
Once your servlet works in a stand-alone mode, you can plug its binding
info into the proper behavior mechanism object.
Your servlet must achieve the following tasks:
- Receive
the URL of a Dublin Core record as an input parameter and return XML that
represents a list of books pertaining to the subject element of the Dublin
Core. Remember that the Dublin
Core record will be a datastream that is stored in your digital object.
- Parse
the Dublin Core record to obtain a topic via subject element(s).
- Construct
an HTTP request to the Amazon.com XML interface to get information on books
pertaining to the topic. (See
section below entitled �URL Syntax
for Amazon and Library�.)
- Process
the Amazon XML results to obtain ISBN numbers for the books (you can choose
to limit results to 5 books if you have large result set).
- Construct
an HTTP requests to the Cornell Library Catalog to see if the books are
available at your campus library. (See
section below entitled �URL Syntax
for Amazon and Library�.)
- Return
the booklist as XML. For any
books that are available at the Cornell Library, include a URL for the page
that displays the library catalog information.
- Install
your Java servlet as a local service under Fedora.
Put your class files in a directory under the Tomcat webapps area of
the Fedora installation (see C:\mellon\dist\server\tomcat41\webapps).
Create your own sub-directory (e.g., booklist) and follow the example
of how the �saxon� servlet is configured.
This will enable you to run your Java servlet right out of Fedora�s
Tomcat container when the Fedora server is running.
(e.g., http://localhost:8080/booklist/)
- Use
the following pre-existing Fedora behavior objects to represent you new Java
servlet within a disseminator:
- Behavior
Definition Object (bdef_getBookList.xml) � contains abstract definition
of method �getBookList.�
- Behavior
Mechanism Object (bmech_getBookList.xml) � contains binding information
to run your Java servlet implementation
of the method �getBookList.� The
key thing here is to make sure that the WSDL metadata in the Behavior
Mechanism Object properly points to your Java servlet.
Look for the http:address element in the WSDL metadata.
This is the base URL for your servlet if you have properly
installed it under Fedora.
Example:
<http:address location="http://localhost:8080/booklist/"/>
Look for the http:operation
element in the WSDL metadata. This
is a relative URL for binding to your �getBookList� method of your Java
servlet. The best thing to do is
just make sure your servlet can be called with this URL syntax.
Example:
<http:operation location="getbooklist?dc=(DC)"/>
(The parameter ?dc=(DC) is a
placeholder syntax. Your servlet
should be able to receive a URL
for a Dublin Core record as the value for the dc parameter.
At runtime, the Fedora repository will automatically replace �(DC)�
with a callback URL for the Dublin Core record that�s in your digital object
(if you have created your digital object disseminator properly).
Disseminator 3:
The basic goal of this disseminator is for you to pipe the results of one
disseminator into another disseminator. Observing
that a datastream in a digital object is represented as a URL, and observing
that a dissemination on digital objects can be run via a URL, think about how
you can get the XML disseminated from Disseminator #2 to be the input for
Disseminator #3. Given that
you can figure this out, your task is to have Disseminator #3 transform the
booklist XML into HTML. So, the
behavior objects look very similar to those in Disseminator #1.
So your task is to create this final disseminator configured with these
behavior objects:
a.
Behavior Definition Object (bdef_displayBookList.xml) � contains
abstract definition of method �displayBookList�
b. Behavior Mechanism Object (bmech_displayBookList.xml)
� contains bindings for running �displayBookList� method using the local
Saxon XSLT service that is pre-installed with Fedora)
URL Syntax for Amazon and Library:
Here's the amazon.com URL syntax for the topic "dreamcatcher".
Just replace the value of �&search=� with your subject word.
http://rcm.amazon.com/e/cm?t=encyclozine&l=st1&search=dreamcatcher&mode=books&p=102&o=1&f=xml
...and here's the Cornell library catalog URL syntax.
Just replace Search_Arg with the ISBN
number for a book (which is called ASIN in the amazon.com xml response).
http://catalog.library.cornell.edu/cgi-bin/Pwebrecon.cgi?Search_Arg=0743211383&Search_Code=FT%2A&CNT=50
Presenting your work
As mentioned above, you will present your work in demonstration session.
(Please schedule your session as soon as possible). You will have 45 minutes to:
- Describe the design and construction of your objects
- Show the corresponding METS and WSDL documents
- Demonstrate the required disseminations and show any related optional
work.
Please come to the demo with your work set up on a laptop that can be plugged
into a network connection and projected onto a screen. Please let us know
as soon as possible if there is some problem with this.