STARTS 1.1: changes for adding Attribute Sets

STARTS Reference Implementation - Release 1.1

Extended Attribute Set Support

Introduction:

This version of the STARTS reference implementation was inspired by the desire to support the Dublin Core attribute set in addition to the basic-1 attribute set that was developed with the STARTS protocol. While conceiving of the changes required to allow the introduction of the Dublin Core attribute set to the existing STARTS reference implementation, we began to consider an attribute set as an abstraction in itself. Influenced by the proposed container architecture described in the Warwick Framework, we now regard an attribute set as a first class object that has a relationship with a STARTS source, but is not part of the source itself. Consistent with this perspective, we created STARTS 1.1 which is a first step in making the reference implementation compliant with the evolving notion of metadata packages in a container architecture. This version of the reference implementation achieves the following specific goals:

Attribute sets are first class objects that can be linked to one or more STARTS sources.
STARTS sources can support one or more attribute sets
STARTS queries can handle explicit attribute set qualifiers on fields
STARTS results can report fields from one or more attribute sets

Description of Changes:

The STARTS 1.0 protocol permits the use of attribute sets other than basic-1. The reference implementation provided for this via variables and methods in the source classes, most notably through static arrays that define attribute-set/field pairs. This embedding of attribute-set-related data and methods in STARTS source classes did not recognize the attribute set as a significant entity. Also, while the source classes allowed for the definition of fields from different attribute sets, other parts of the system could not support the use of alternate attribute sets in STARTS queries. For instance, the parser grammar (see classes YYparse and YYlex) was unable to accommodate queries that specified an attribute set other than basic-1. In particular, the lexer defined a token called "BASIC1-FIELD" and scanned for each of the basic-1 fields.

To rectify these problems and achieve the goals stated in the introduction, the following changes to STARTS were made:

1. Make the parser more generic, and not tied to specific attribute-set fields.

The lexical analyzer and parser grammar was changed to identify an attribute-set and a field, but not particular values for these two entities. It is preferable to have a generic parser, one that can detect syntactical patterns without pre-defining too much. Again, attribute-set definitions are more appropriate in attribute-set objects, not in parsing machinery.

Problems:

The explicit "BASIC1-FIELD" token required that the parser be re-built every time a new attribute set, or attribute-set field was added. Any new fields for an existing attribute set would have to be added to the lexical specification file. If a new attribute set were to be added (for example a Dublin Core attribute set), new tokens would have to be defined in both the lexical and parser specification files, and individual fields would have to be added. Also, the current grammar was not poised to handle the same field name existing in multiple attribute sets.
The definition of specific attribute-set/field combinations in the parser, resulted in some fairly brusque parser errors when an invalid field was entered in a query.

Changes:

Removed explicit fields from the lexer specification file. The parser now identifies a field without scanning for particular values of the field (e.g. "author").

Eliminated the "BASIC1-FIELD token from parser grammar. The parser now identifies a field by reducing unquoted strings (token UNQUOTED_STRING) to either :

- field_spec

- [attribute_set field_spec]

2. Make attribute sets first-class objects with their own variables and methods.

An attribute set is now a first-class object. Each attribute set is defined as a sub-class of the abstract class named AttrSet. With this new abstraction, objects can be created that can carry around information about each attribute set in general (static data), as well as attribute- set information in the context of a specific source. Using attribute-set objects, we can discover what fields are part of the attribute set, and whether these fields are supported in the context of a particular source. Also, we can obtain WAIS translations of attribute-set fields in the context of a particular source. The previous STARTS implementation embedded attribute set information and behaviors inside source objects. This limited the flexibility and extensibility of both existing and future attribute sets. The new design should allow the system to better handle the integration of additional attribute sets. Also, it positions STARTS to evolve with metadata developments such as the Warwick Framework's container architecture which supports multiple metadata "packages" for a digital object.

Changes to Java Classes:

New Class Attrset
New Class AttrSetBasic1
New Class AttrSetDcore1
Class CSTRSourceDescription
Class LINUXSourceDescription
Class WAISSourceDescription
Class SourceDescription
Class Field
Class Document
Class CSTRDocument
Class LINUXDocument
Class WAISResultDocument

Created classes AttrSet (abstract), with sub-classes AttrSetBasic1 and AttrSetDcore1:

Class variable AttrSetFields - a 2D array of arrays containing all valid fields for the attribute set
Variable attrSetFieldsTable - a hashtable containing all valid fields for the attribute set, loaded from the static array AttrSetFields.
Variable sourceFields - a hashtable containing the attribute-set fields supported by a source in the context of a query object.
Variable fieldsXlate - a hashtable containing the attribute-set fields and their WAIS translations for all fields supported by a source in the context of a query object.
Constructor method - receives a source-specific field array, with source-specific field translations.
Method attrSetFieldSupported_p - to determine whether a field is part of the attribute set
Method fieldSupportBySource - to determine whether a particular field is supported by a particular source in the context of a query object.
Method TranslateField to translate an attribute-set field specification to a WAIS field specification in the context of a query object.

Class CSTRSourceDescription and Class LINUXSourceDescription:

Eliminated the fieldsSupported static array variable which listed all possible fields in an attribute set and flagged those fields that were supported by the source. The full list of attribute-set fields pertains to the attribute set, not the source, so this information was moved to the sub-classes of AttrSet.
Modified the format of the fieldsTranslation static array so that it is defines all things about fields that are source-specific. Only those fields that are supported by the source should have entries in this array. This array is passed to the constructor method as part of the initialization of new attribute set objects. Each instance of an attribute set will carry around this source-specific information in the context of a query. The new format is a 2D array-of-arrays: (<attribute-set>, <field>, <field translation>, <languages supported>)
The static hashtable variable attrSetTable was added to the source classes. This is a static variable that is initialized when the source class is loaded. The hashtable key is the attribute-set name, and the hashtable value is an instantiation of the respective attribute set. So, a source object carries around an attribute-set object for each attribute set supported by the source.
The method TranslateField was commented out. It is now a member of the attribute-set classes. Source-specific instances of attribute-set classes contain a translation hashtable that is used by the new TranslateField method.

Class WAISSourceDescription:

The invocation of the TranslateField method was modified to reflect the method's new membership (in the Attrset class and its sub-classes). The TranslateField method must be called in the proper context, namely in the context of a particular attribute set, in the context of a particular source, in the context of a query. The new method call to translate a field is:

String translatedField = query.source.GetAttrSet(attrSet).TranslateField(field);

Class SourceDescription

Added the instance variable attrSetsTable that is the hashtable that contains attribute set objects for each attribute set supported by the source.
Added method AttrSetSupported to determine whether an attribute set is supported by the source. This method does a lookup in the attrSetsTable.
Added method GetAttrSet to return a specific attribute set object that was instantiated in the context of the source.
Commented out abstract method TranslateField that is now an abstract method of the AttrSet class and implemented in the specific attribute-set sub-classes.
Modified method SourceToSOIF to include a new SOIF attribute named AttributeSetsSupported that lists all attribute sets supported by the source. When source metadata is requested, the list of supported attribute sets will now be included in the SOIF object template "SMetaAttributes."
Commented out method FieldSupported_p that determined whether a specific attribute-set/field pair was supported by the source. This method has been replaced by the method FieldSupportBySource which is an abstract method in the class AttrSet and implemented in the specific attribute-set sub-classes.

Class Field

Added method AttrSetSupported_p to determine that the specified attribute set is supported by the source. (This invokes the method AttrSetSupported in the context of the source and query.)
Added method FieldinAttrSet to determine whether the field is supported by the specified attribute set. Depending on what attribute-set/field combination is being evaluated, the method must be called in the context of the appropriate attribute-set object in the context of the source and the query.
Modified method Supported_p to invoke the new attribute-set method FieldSupportBySource that determines whether the field is supported by the source in the context of the attribute-set object, source and query.

public boolean Supported_p() {

return((query.source.GetAttrSet(GetAttributeSet())).FieldSupportBySource(GetFieldName()));

}

Modified method Check to do three validations for the query input, using the above three methods:

- Is the attribute set supported by the source? (AttrSetSupported_p)

- Is the field part of the specified attribute set? (FieldinAttrSet)

- Is the attribute-set field supported by the source in the context of the query? (Supported_p)

Class Document

Modified method DocumentToSOIF to incorporate the attribute-set name in the SOIF attribute "FIELD". Previously the SQRDocument object (i.e., query result document) did not qualify a reported field with its attribute-set name. With the extended attribute-set support in STARTS 1.1, any attribute-set/field combination can be specified as a desired "answer field" via the query input or by setting the default variable answerFields in Class ServerConfiguration. The query result document will now report FIELDS with their attribute-set qualifiers, for example:

@SQRDocument{

Version{10}: STARTS 1.1

[basic-1 author]{12}: Lagoze, Carl

[basic-1 title]{99}: dkfjkdjfkdjfkjdjfdkjfkdjkfjdkjfdkjfk

[dcore-1 IDENTIFIER]{99}: CORNELLCS:jfkdjfkdjkfjdjfkdjfkdj

In this example we see the that the requested "answer fields" are author and title from the basic-1 attribute set, and IDENTIFIER from the dcore-1 (Dublin Core) attribute set. Although the mixing of fields from different attribute sets in the "answer field" specification may not seem practical in the example above, we can envision cases where this would be practical. For instance, a source may have been created with MARC records, but the mapping to another attribute set has been enabled (such as a mapping to Dublin Core). If the user opts to primarily "speak" Dublin Core to the system, but there are MARC fields that do not map to the Dublin Core Attribute set, then the ability to mix and match attribute-set/field combinations, as in the above example, becomes useful. Essentially, this provides the ability to operate with a preferred attribute set across multiple sources, even if the underlying documents/document surrogates were created with using another attribute-set template.

Class CSTRDocument and Class LINUXDocument

Changed the static hashtable transTable to map WAIS fields to actual document tags, instead of attribute-set fields to document tags. The mapping from attribute-set fields to document tags was problematic because it assumed the implementation of only one attribute set, namely basic-1. Upon enhancing the system to incorporate additional attribute sets such as Dublin Core (dcore-1), this translation table became quirky. The first problem was that the original hashtable did not define the attribute set as part of its key. Even if we created a concatenated key of attribute-set and field, this mapping would still be awkward, since multiple attribute sets may define the same field, and thus there is not a one-to-one mapping between a field and a document tag. Instead, we see a one-to-many relationship, for example, "basic-1 linkage" equates to "ID" and "dcore-1 IDENTIFIER" equates to "ID". Upon further consideration of this translation table, a better design would be to map WAIS fields to document tags. Attribute-set/field combinations have already been translated to WAIS fields by the time we get to the point of creating a WAIS result document. Furthermore, one WAIS field will map to one document tag for a given source. Finally, this makes sense because the source document classes (CSTRDocument and LINUXDocument) are already sub-classes of WAISResultDocument. The new transTable correlates WAIS fields to document tags, irrespective of attribute sets:

static Hashtable transTable = new Hashtable();

static {

transTable.put("ti", "TITLE");

transTable.put("au", "AUTHOR");

transTable.put("dm", "ENTRY");

transTable.put("id", "ID");

transTable.put("bd", "BODY");

}

The method GetFieldValueFromDB uses this hashtable. Changes were made to Class WAISResultDocument to ensure that a WAIS field is passed to this method.

Class WAISResultDocument

Attribute-set fields are now converted to WAIS fields prior to invoking the method GetFieldValueFromDB in the context of a source result document. (See discussion above in description of Classes CSTRDocument and LINUXDocument changes.)

Issues Not Addressed:

Modifiers were not made attribute-set specific. They currently default to basic-1 modifiers. More thought is required on the meaning and functionality of attribute-set specific modifiers.
Multiple "meta" attribute sets for describing a source were not implemented. Currently, there is just one such "meta" set, namely mbasic-1. (See MetaAttributeSet in the SOIF object SMetaAttributes.)
The STARTS query syntax was not modified to support Dublin Core scheme qualifiers for fields. See section 5.0 of the Dublin Core Metatdata Report for details on the use of schemes.