Gloss for Slide 1: OpCit and the Cornell Digital Library Research Group
-----------------------------------------------------------------------

Part I has 4 bullets under Achievements and Shared Activity:

1.  Defines the role we chose to play in the OpCit project.
You guys were doing the arXiv analysis, we chose to take a higher
level architectural view of reference linking.  With the hope that
the goals would be compatible, of course.

2.  We designed the API from the top down but always knew that parsing
the reference strings at the end of an online paper was going to have
to be done in order to implement the API.  For this, Carr's deciter
software was the linchpin.  Our API was implemented on top of this.
One of the methods in the API is "getRefList()".  This XML file essentially
encapsulates what the deciter parses out of the reference strings.  Of
course, we added other things like "getLinkedText()".

3.  Status of the Reference API is that it is sufficiently coded to be
usable at this point (though buggy).  I would say alpha status -- but
usable.  Lots of promise here.  At this point, we need to do something
about collecting citation information (Southampton is way ahead here),
continue to evaluate and improve quality, write applications and tools,
handle more online journals.

4.  Performance as of end of 2000 is determined on the basis of 89 papers
analyzed during the second half of 2000.  The extracted data was graded
(by hand, groan) against objective, quantitative accuracy criteria.

Reference analysis is on a per-reference string basis and averaged over
all items (item = an analyzed online paper).  The accuracy metric for
reference data is number of elements correctly extracted divided by
total number of elements in the reference string.  Elements include:
title, each author's name, year of publication, and URL's if any are
included right in the reference string.  Note that I am currently ignoring
journal name and page number.  We are interested at getting our hands on
the "work", not the "item".  I believe that Southampton's work is aimed
to recognizing the "item".  ASK IF YOU HAVE A QUESTION ABOUT THIS.

Item analysis is on a per-item basis and averaged over all items (of
which there were 89 as of this writingg).  The accuracy metric again is
number of elements correctly extracted divided by the total number of
elements.  The elements include: title of item, each author name, year
of publication, for each reference the context of the reference, and
the average reference accuracy for the references in this item.  For
this metric, we are running at 82.42%.

Part II has 3 bullets for Implementability

1.  Applications and tools are meant to be build on top of the Reference Linking
API.  The basic call shown here analyzes the item located at the specified URL.
Once the constructor returns, the URL has been fetched and partially analyzed.
The Surrogate object is returned to the caller.  If there are problems, the
Surrogate is null, and error messages can be found on syserr. (System.err in Java.)

With the surrogate in hand, the caller can then invoke methods, such as 
"s.getLinkedText()".

2.  The software seems to be portable.  It should be portable because it is
written in Java.  On microsoft machines, one must use Sun's jdk 1.2 or jdk 1.2.
One java file contains the configuration (i.e. directories and filenames) and
must be edited in order to set up the software on a new machine.

3.  The jar files include the Soton Harvester (Carr's deciter software), an XML
parser, the JTidy conversion tool, and an XSLT processor.