STARTS Reference Implementation - Release 1.1 
Extended Attribute Set Support
Introduction
Description of Changes
Issues Not Addressed
Demonstration
Direct questions to help@ncstrl.org
Introduction:
This version of the STARTS reference implementation was inspired by the
desire to support the Dublin
Core attribute set in addition to the basic-1 attribute set that was
developed with the STARTS protocol. While conceiving of the changes required to allow the
introduction of the Dublin Core attribute set to the existing STARTS reference
implementation, we began to consider an attribute set as an abstraction in itself.
Influenced by the proposed container architecture described in the Warwick Framework, we now regard an attribute set as a first class object that has a relationship
with a STARTS source, but is not part of the source itself. Consistent with this
perspective, we created STARTS 1.1 which is a first step in making the reference
implementation compliant with the evolving notion of metadata packages in a container
architecture. This version of the reference implementation achieves the following specific
goals:
- Attribute sets are first class objects that can be linked to one or more
STARTS sources.
- STARTS sources can support one or more attribute sets
- STARTS queries can handle explicit attribute set qualifiers on fields
- STARTS results can report fields from one or more attribute sets
Description of Changes:
The STARTS 1.0 protocol permits the use of attribute sets other than
basic-1. The reference implementation provided for this via variables and methods in the
source classes, most notably through static arrays that define attribute-set/field pairs.
This embedding of attribute-set-related data and methods in STARTS source classes did not
recognize the attribute set as a significant entity. Also, while the source classes
allowed for the definition of fields from different attribute sets, other parts of the
system could not support the use of alternate attribute sets in STARTS queries. For
instance, the parser grammar (see classes YYparse and YYlex) was unable to accommodate
queries that specified an attribute set other than basic-1. In particular, the lexer
defined a token called "BASIC1-FIELD" and scanned for each of the basic-1
fields.
To rectify these problems and achieve the goals stated in the
introduction, the following changes to STARTS were made:
1. Make the parser more generic, and not tied to specific
attribute-set fields.
The lexical analyzer and parser grammar was changed to identify an
attribute-set and a field, but not particular values for these two entities. It is
preferable to have a generic parser, one that can detect syntactical patterns without
pre-defining too much. Again, attribute-set definitions are more appropriate in
attribute-set objects, not in parsing machinery.
Problems:
- The explicit "BASIC1-FIELD" token required that the parser be
re-built every time a new attribute set, or attribute-set field was added. Any new fields
for an existing attribute set would have to be added to the lexical specification file. If
a new attribute set were to be added (for example a Dublin Core attribute set), new tokens
would have to be defined in both the lexical and parser specification files, and
individual fields would have to be added. Also, the current grammar was not poised to
handle the same field name existing in multiple attribute sets.
- The definition of specific attribute-set/field combinations in the
parser, resulted in some fairly brusque parser errors when an invalid field was entered in
a query.
Changes:
- Removed explicit fields from the lexer specification file. The parser now
identifies a field without scanning for particular values of the field (e.g.
"author").
- Eliminated the "BASIC1-FIELD token from parser grammar. The parser
now identifies a field by reducing unquoted strings (token UNQUOTED_STRING) to either :
- field_spec
- [attribute_set field_spec]
2. Make attribute sets first-class objects with their own
variables and methods.
An attribute set is now a first-class object. Each attribute set is
defined as a sub-class of the abstract class named AttrSet. With this new
abstraction, objects can be created that can carry around information about each attribute
set in general (static data), as well as attribute- set information in the context of a
specific source. Using attribute-set objects, we can discover what fields are part of the
attribute set, and whether these fields are supported in the context of a particular
source. Also, we can obtain WAIS translations of attribute-set fields in the context of a
particular source. The previous STARTS implementation embedded attribute set information
and behaviors inside source objects. This limited the flexibility and extensibility of
both existing and future attribute sets. The new design should allow the system to better
handle the integration of additional attribute sets. Also, it positions STARTS to evolve
with metadata developments such as the Warwick Framework's container architecture which supports multiple metadata "packages" for
a digital object.
Changes to Java Classes:
Created classes AttrSet (abstract), with sub-classes
AttrSetBasic1 and AttrSetDcore1:
- Class variable AttrSetFields - a 2D array of arrays
containing all valid fields for the attribute set
- Variable attrSetFieldsTable - a hashtable containing all
valid fields for the attribute set, loaded from the static array AttrSetFields.
- Variable sourceFields - a hashtable containing the
attribute-set fields supported by a source in the context of a query object.
- Variable fieldsXlate - a hashtable containing the
attribute-set fields and their WAIS translations for all fields supported by a source in
the context of a query object.
- Constructor method - receives a source-specific field array, with
source-specific field translations.
- Method attrSetFieldSupported_p - to determine whether a
field is part of the attribute set
- Method fieldSupportBySource - to determine whether a
particular field is supported by a particular source in the context of a query object.
- Method TranslateField to translate an attribute-set field
specification to a WAIS field specification in the context of a query object.
Class CSTRSourceDescription and Class
LINUXSourceDescription:
- Eliminated the fieldsSupported static array variable which
listed all possible fields in an attribute set and flagged those fields that were
supported by the source. The full list of attribute-set fields pertains to the attribute
set, not the source, so this information was moved to the sub-classes of AttrSet.
- Modified the format of the fieldsTranslation static array
so that it is defines all things about fields that are source-specific. Only those fields
that are supported by the source should have entries in this array. This array is passed
to the constructor method as part of the initialization of new attribute set objects. Each
instance of an attribute set will carry around this source-specific information in the
context of a query. The new format is a 2D array-of-arrays: (<attribute-set>,
<field>, <field translation>, <languages supported>)
- The static hashtable variable attrSetTable was added to the
source classes. This is a static variable that is initialized when the source class is
loaded. The hashtable key is the attribute-set name, and the hashtable value is an
instantiation of the respective attribute set. So, a source object carries around an
attribute-set object for each attribute set supported by the source.
- The method TranslateField was commented out. It is now a
member of the attribute-set classes. Source-specific instances of attribute-set classes
contain a translation hashtable that is used by the new TranslateField
method.
Class WAISSourceDescription:
- The invocation of the TranslateField method was modified to
reflect the method's new membership (in the Attrset class and its sub-classes). The TranslateField
method must be called in the proper context, namely in the context of a particular
attribute set, in the context of a particular source, in the context of a query. The new
method call to translate a field is:
String translatedField = query.source.GetAttrSet(attrSet).TranslateField(field);
Class SourceDescription
- Added the instance variable attrSetsTable that is the
hashtable that contains attribute set objects for each attribute set supported by the
source.
- Added method AttrSetSupported to determine whether an
attribute set is supported by the source. This method does a lookup in the attrSetsTable.
- Added method GetAttrSet to return a specific attribute set
object that was instantiated in the context of the source.
- Commented out abstract method TranslateField that is now an
abstract method of the AttrSet class and implemented in the specific
attribute-set sub-classes.
- Modified method SourceToSOIF to include a new SOIF
attribute named AttributeSetsSupported that lists all attribute sets
supported by the source. When source metadata is requested, the list of supported
attribute sets will now be included in the SOIF object template
"SMetaAttributes."
- Commented out method FieldSupported_p that determined
whether a specific attribute-set/field pair was supported by the source. This method has
been replaced by the method FieldSupportBySource which is an abstract method
in the class AttrSet and implemented in the specific attribute-set
sub-classes.
Class Field
- Added method AttrSetSupported_p to determine that the
specified attribute set is supported by the source. (This invokes the method AttrSetSupported
in the context of the source and query.)
- Added method FieldinAttrSet to determine whether the field
is supported by the specified attribute set. Depending on what attribute-set/field
combination is being evaluated, the method must be called in the context of the
appropriate attribute-set object in the context of the source and the query.
- Modified method Supported_p to invoke the new attribute-set
method FieldSupportBySource that determines whether the field is supported
by the source in the context of the attribute-set object, source and query.
public boolean Supported_p() {
return((query.source.GetAttrSet(GetAttributeSet())).FieldSupportBySource(GetFieldName()));
}
- Modified method Check to do three validations for the query
input, using the above three methods:
- Is the attribute set supported by the source? (AttrSetSupported_p)
- Is the field part of the specified attribute set? (FieldinAttrSet)
- Is the attribute-set field supported by the source in the context of
the query? (Supported_p)
Class Document
- Modified method DocumentToSOIF to incorporate the
attribute-set name in the SOIF attribute "FIELD". Previously the SQRDocument
object (i.e., query result document) did not qualify a reported field with its
attribute-set name. With the extended attribute-set support in STARTS 1.1, any
attribute-set/field combination can be specified as a desired "answer field" via
the query input or by setting the default variable answerFields in Class ServerConfiguration.
The query result document will now report FIELDS with their attribute-set qualifiers, for
example:
@SQRDocument{
Version{10}: STARTS 1.1
[basic-1 author]{12}: Lagoze, Carl
[basic-1 title]{99}: dkfjkdjfkdjfkjdjfdkjfkdjkfjdkjfdkjfk
[dcore-1 IDENTIFIER]{99}: CORNELLCS:jfkdjfkdjkfjdjfkdjfkdj
In this example we see the that the requested "answer fields"
are author and title from the basic-1 attribute set, and IDENTIFIER from the dcore-1
(Dublin Core) attribute set. Although the mixing of fields from different attribute sets
in the "answer field" specification may not seem practical in the example above,
we can envision cases where this would be practical. For instance, a source may have been
created with MARC records, but the mapping to another attribute set has been enabled (such
as a mapping to Dublin Core). If the user opts to primarily "speak" Dublin Core
to the system, but there are MARC fields that do not map to the Dublin Core Attribute set,
then the ability to mix and match attribute-set/field combinations, as in the above
example, becomes useful. Essentially, this provides the ability to operate with a
preferred attribute set across multiple sources, even if the underlying documents/document
surrogates were created with using another attribute-set template.
Class CSTRDocument and Class LINUXDocument
- Changed the static hashtable transTable to map WAIS fields
to actual document tags, instead of attribute-set fields to document tags. The mapping
from attribute-set fields to document tags was problematic because it assumed the
implementation of only one attribute set, namely basic-1. Upon enhancing the system to
incorporate additional attribute sets such as Dublin Core (dcore-1), this translation
table became quirky. The first problem was that the original hashtable did not define the
attribute set as part of its key. Even if we created a concatenated key of attribute-set
and field, this mapping would still be awkward, since multiple attribute sets may define
the same field, and thus there is not a one-to-one mapping between a field and a document
tag. Instead, we see a one-to-many relationship, for example, "basic-1 linkage"
equates to "ID" and "dcore-1 IDENTIFIER" equates to "ID".
Upon further consideration of this translation table, a better design would be to map WAIS
fields to document tags. Attribute-set/field combinations have already been translated to
WAIS fields by the time we get to the point of creating a WAIS result document.
Furthermore, one WAIS field will map to one document tag for a given source. Finally, this
makes sense because the source document classes (CSTRDocument and LINUXDocument) are
already sub-classes of WAISResultDocument. The new transTable correlates
WAIS fields to document tags, irrespective of attribute sets:
static Hashtable transTable = new Hashtable();
static {
transTable.put("ti", "TITLE");
transTable.put("au", "AUTHOR");
transTable.put("dm", "ENTRY");
transTable.put("id", "ID");
transTable.put("bd", "BODY");
}
The method GetFieldValueFromDB uses this hashtable.
Changes were made to Class WAISResultDocument to ensure that a WAIS field is
passed to this method.
Class WAISResultDocument
- Attribute-set fields are now converted to WAIS fields prior to invoking
the method GetFieldValueFromDB in the context of a source result document.
(See discussion above in description of Classes CSTRDocument and LINUXDocument
changes.)
Issues Not Addressed:
- Modifiers were not made attribute-set specific. They currently default to
basic-1 modifiers. More thought is required on the meaning and functionality of
attribute-set specific modifiers.
- Multiple "meta" attribute sets for describing a source
were not implemented. Currently, there is just one such "meta" set, namely
mbasic-1. (See MetaAttributeSet in the SOIF object SMetaAttributes.)
- The STARTS query syntax was not modified to support Dublin Core scheme
qualifiers for fields. See section 5.0 of the Dublin
Core Metatdata Report for details on the use of schemes.