the Santa Fe Convention : The Open Archives Dienst Subset |
Introduction
Protocol
Features
Unique Identifiers
Partitions
Verbs
and Versions
HTTP
embedding of Dienst requests
Dates
Protocol
Messages
Disseminate
List-Contents
List-Meta-Formats
List-Partitions
Structure
Any changes to this specification after February 15, 2000 will be noted here.
This document describes the portion of the Dienst protocol that is used for basic interoperability within archives in the Open Archives initiative, as recommended in its Santa Fe Convention. The goal of the Open Archives initiative is to provide the mechanisms for interoperability among distributed e-print archives. The protocol described in this document allows harvesting of metadata for uniquely identified records in an archive. The word document is purposely avoided and the notion of a record is purposely imprecise. Some archives may just provide access to metadata, others may also provide access to metadata and full content in some form, others may provide other services associated with the metadata and content such as access to the full content in various manifestations (formats) or structural decompositions (e.g., individual pages, chapters, and the like).
The protocol described in this document is a subset of the full Dienst protocol, which provides for communications with services in a distributed digital library. When this subset of Dienst needs to be differentiated from the full Dienst protocol, it will be referred to as the Open Archives Dienst Subset for the remainder of this document. Readers will notice the use of the word Repository in the Dienst protocol requests. This follows from the use of the term Repository in the broader Dienst system in lieu of the term Archive.
All archives participating in the Open Archives Initiative have a unique archive identifier. This identifier is restricted to alphanumeric characters. Registration of this identifier is part of the Open Archives registration process for data providers, described in Step 6 of the Santa Fe Convention of the Open Archives Initiative. All records in an archive have a unique record identifier - unique within the scope of that archive. These two identifiers - the unique archive identifier and the unique record identifier - can then be concatenated (separated by any printable non-alphanumeric character) to form a unique full identifier (referred to as a fullID in the protocol documentation). For example, the unique archive identifier handlecorp can be combined with the unique record identifier 11223 and separated with the / (slash) character to form the full identifier handlecorp/11223. The full identifier is then used and returned by Dienst requests.
The Dienst protocol defines the notion of a partition within an archive. A partition is an administrator-defined subset of the archive. Each partition has a (one token) name and a (possibly) longer description. Depending on the policy of an archive an individual record may exist in one or more partitions. Note that there is, in general, no way to predict the partition in which a record appears from its full identifier, or even given full knowledge of the record.
An archive may have one or more partition hierarchies. For example, an administrator may decide to partition an archive into two hierarchies, one based on institutional affiliation and one based on subjects as follows:
The partition hierarchies in an archive are available via the List-Partitions request .
The List-Contents verb includes, as an argument, a partition specification. Partition specifications are expressed in the following grammar where partitionname is the short one token name for the partition:
partitionspec := partitionlist partitionlist := partitionsel | partitionsel;partitionlist partitionsel := partitionname partitionname := [A-Za-z0-9-_]+
Example:
Institutions;Florida;Frenetics
Where Florida is the short name for the partition Valley View University of Florida and Frenetics is the short name for the partition Department of Frenetics.
Individual Dienst protocol requests are called Verbs. There may be more than one version of a verb, with each version differing in syntax or semantics. A version takes the form of two integers, separated by a period. This version applies to the individual verb, not the protocol as a whole. (The protocol as a whole does not have a version number. The date on the protocol document indicates the set of verbs that are defined as of that date.) Including a version number in the message allows for backward-compatible extension to the Dienst system.
An archive might support verbs in various versions. An archive receiving a message with an older version number must either reply using the old syntax and semantics, or reply with an error. If an archive receives a message with a newer version number, then it must return an error.
Software supporting the Open Archives Dienst Subset may or may not be versioned. If a software version number exists, that number is independent of the Dienst protocol verbs and versions of those verbs that the software supports.
Dienst protocol requests are expressed as URLs embedded in HTTP requests. A typical implementation uses a standard Web server, such as Apache, that is configured to dispatch Dienst URLs to the software implementing these requests. The remainder of this section describes the aspects of the protocol that are specific to the HTTP embedding.
All messages are encoded into URLs where the path portion of the URL consists of the following tokens, in the following order:
The separator between tokens in the path is the slash, except that the separator before the keyword arguments is a question mark.
If the Repository service implemented the Shred verb, and if version 1.2 of that verb accepted two keyword arguments (delay and volume), then an example request is:
/Dienst/Repository/1.2/Shred?delay=9&volume=7.4.
The full URL for this request at a particular Web server might be:
http://bar.com/Dienst/Repository/1.2/Shred?delay=9&volume=7.4.
The syntax rules for URIs restrict a few characters to special roles in certain contexts and require that if these characters are used in any other way that they be written as an escape sequence; a percent sign followed by the character code in hexadecimal. The reserved characters are.
Character | Role | Escape Sequence |
/ |
Path Component Separator |
%2F |
? |
Query Component Separator |
%3F |
# |
Fragment Identifier |
%23 |
= |
Name/Value Separator |
%3D |
& |
Argument Separator in Query Component |
%26 |
: | Host Port Separator | %3A |
; | Authority Namespace Separator | %3B |
Finally, the space character may not appear anyplace in a URL. It must be written with a "+" (or with the percent sign escape sequence %20.)
As a result, use of these characters must be escaped within a Dienst protocol request if their use does not correspond to their established URI role. Note that in the examples used throughout this document, special character escaping is shown.
Responses to messages are formatted as HTTP responses, with appropriate HTTP header fields. The return type specified for each protocol request in this document will, therefore, correspond to the MIME type included in the HTTP Content-Type header field
The responses to all Open Archives Dienst Subset requests are structured streams with MIME type text/xml. An appendix to this document lists the DTD (Document Type Definition) for every verb. All XML responses to Dienst protocol requests have the following uniform features.
Status codes and error returns correspond to those defined for HTTP (refer to that protocol documentation). A normal response from a Dienst message in HTTP is signaled with the 200 reply code. Error returns are signaled with the appropriate 4xx code as specified in the HTTP protocol. The use of HTTP error codes is as follows:
For each error return, the HTTP reason-phrase returned with the code should provide additional information useful to a human reader.
All dates in the protocol (requests and responses) are encoded using the "Complete date" variant of ISO8601. This format is CCYY-MM-DD where CC is the century, YY is the year, MM is the month of the year between 01 (January) and 12 (December), and DD is the day of the month between 01 and 28 or 29 or 30 or 31, depending on length of month and whether it is a leap year.
This section lists the messages (verbs) implemented by the Open Archives Dienst Subset. Each message has a Name (which is used for purposes of discussion), a Verb (a unique name for the message, used in the protocol to name the message), a Version, a list of Fixed arguments, a list of Keyword arguments, a Return MIME type and return status codes. The documentation for every message includes an example request and response (where appropriate) and the meaning of HTTP error codes that may be returned. These examples uniformly use the full identifier handlecorp/970101.
To make reading of this document easier, the DTDs for responses to verbs that return text/xml are separated from the main body of the document into an appendix.
Verb: Disseminate
Version: 1.0
Fixed
args: fullID, meta-format, content-type
Keyword args: none
Return MIME type: text/xml
Return Status Codes: 200, 400, 404
Request the metadata in a specific format from a record.
In addition to the fullID , the required fixed arguments are:
Example Request:
Dienst/Repository/1.0/Disseminate/handlecorp/970101/%23oams/xml
Example Response:
<?xml version="1.0"
encoding="UTF-8"?>
<Disseminate version="1.0">
<oams:oams
xmlns:oams="http://www.openarchives.org/sfc/sfc_oams.htm">
<oams:title>A protocol for Interoperable
Archives</oams:title>
<oams:accession date="1994-06-24" />
<oams:fullId>ncstrl.cornell/TR94-1418</oams:fullId>
<oams:author>
<oams:name>James R.
Davis</oams:name>
<oams:organization>Xerox</oams:organization>
</oams:author>
<oams:author>
<oams:name>Carl
Lagoze</oams:name>
<oams:organization>Cornell</oams:organization>
</oams:author>
</oams:oams>
</Disseminate>
Verb: List-Contents
Version:
4.0
Fixed args: none
Keyword
args: partitionspec, file-after,
meta-format
Return MIME type: text/xml
Return Status Codes: 200, 400
Return a structured list of the full identifiers for records stored in this archive. Without any arguments the list includes all stored records.
The meaning of the keyword arguments is as follows:
Example Request:
List the full identifiers of records added or modified after January 15, 1998 in the high energy (hep) partition within the physics partition.
/Dienst/Repository/4.0/List-Contents ?partitionspec=physics;hep&file-after=1998-01-15
Example Response:
<?xml version="1.0" encoding="UTF-8"?> <List-Contents version="4.0"> <record>arXiv:hep-th/9801001</record> <record>arXiv:hep-th/9801002</record> </List-Contents>
Example Request:
List the Open Archive Metadata Set format along with the full identifiers
/Dienst/Repository/4.0/List-Contents ?partitionspec=physics;hep&meta-format=oams&file-after=1998-01-15
Example Response:
Note that every record includes an oams metadata record. If another meta-format were requested (e.g., rfc1807) there might be instances where an empty metadata record was returned (with no data between the metadata format tags) indicating that there is no metadata in that format for the record.
<?xml version="1.0" encoding="UTF-8"?> <List-Contents version="4.0"> <record> ncstrl.cornell/TR94-1418 <oams:oams xmlns:oams="http://www.openarchives.org/sfc/sfc_oams.htm"> <oams:title>A protocol for Interoperable Archives</oams:title> <oams:accession date="1994-06-24" /> <oams:fulId>ncstrl.cornell/TR94-1418<oams:fullId> <oams:author> <oams:name>James R. Davis</oams:name> <oams:organization>Xerox</oams:organization> </oams:author> <oams:author> <oams:name>Carl Lagoze</oams:name> <oams:organization>Cornell</oams:organization> </oams:author> </oams:oams> </record> <record> hdl://cnri.dlib/june96-varian <oams:oams xmlns:oams="http://www.openarchives.org/sfc/sfc_oams.htm"> <oams:title>Pricing Electronic Journals</oams:title> <oams:accession date="1996-06-24" /> <oams:fullId>hdl://cnri.dlib/june96-varian<oams:fullId> <oams:author> <oams:name>Hal R. Varian</oams:name> <oams:organization>UC Berkeley</oams:organization> </oams:author> </oams:oams> </record> </List-Contents>
Verb: List-Meta-Formats
Version: 1.0
Fixed
args:
none
Keyword args:
none
Return MIME type: text/xml
Return Status Codes: 200, 400
Returns the metadata formats that are supported by this archive. Note that the fact that a metadata format is supported does not mean that it is available for all records in that archive. For each metadata format, the following information is returned:
Example Request:
/Dienst/Repository/1.0/List-Meta-Formats
Example Response:
<?xml version="1.0"
encoding="UTF-8"?>
<List-Meta-Formats version="1.0">
<meta-format name="rfc1807"
namespace="http://info.internet.isi.edu/in-notes/rfc/files/rfc1807.txt"
/>
<meta-format name="dc"
namespace="http://purl.org/dc"
/>
<meta-format name="oams"
namespace="http://www.openarchives.org/sfc/sfc_oams.htm">
</List-Meta-Formats>
Verb: List-Partitions
Version: 2.0
Fixed args:
none
Keyword args: none
Return MIME type: text/xml
Return Status Codes: 200, 400
Return a structured list of the administrator-defined partitions for this archive. The list contains the hierarchy of partitions and sub-partitions. For each partition, both the short name and long description is returned. Depending on the policy for a particular archive, a record may be a member of more than one partition.
Example Request:
/Dienst/Repository/2.0/List-Partitions
Example Response:
The following response indicates a partition hierarchy with two top level partitions - Oceanside and ValleyView - each with partitions hierarchies within them.
<?xml version="1.0" encoding="UTF-8"?> <List-Partitions version="2.0"> <partition name="Oceanside"> <display>Oceanside University of Nebraska</display> <partition name="CompEnt"> <display>Department of Computational Entomology</display> </partition> <partition name="MetPhen"> <display>Department of Metaphysical Phenomenology</display> </partition> </partition> <partition name="ValleyView"> <display>Valley View University of Florida</display> <partition name="Fren"> <display>Department of Frenetics</display> </partition> <partition name="Hist"> <display>Department of Histrionics</display> </partition> </partition> </List-Partitions>
Verb: Structure
Version:
2.0
Fixed
args: fullID
Keyword args:
view
Return MIME type: text/xml
Return Status Codes: 200,
400, 404
This verb returns a structured response that describes the metadata formats available for a record. A client may use this information as the basis for metadata requests using the Disseminate verb.
There is one required keyword argument that can only take one value (the same verb in the full Dienst protocol has more keyword arguments that take more values):
Example Request:
/Dienst/Repository/2.0/Structure/handlecorp/970101?view=%23
Example Response:
<?xml version="1.0" encoding="UTF-8"?> <Structure version="2.0"> <meta-formats> <rfc1807 /> <dc /> </meta-formats> </Structure>
This response says that the record can disseminate two metadata formats rfc1807 and dc (Dublin Core).