Open Archives Software 2000-03-07 13:54:51 -0500

Open Archives Reference Software
Description and Installation

Introduction

This document describes the use of and installation of the Open Archives Software  (OA software) subset of the Dienst software.  This software provides a simple to install and use front end for archives that choose to support the Open Archives Subset of the Dienst Protocol.   This protocol provides a mechanism for harvesting common metadata - the Open Archives Metadata Set - and archive-specific metadata from records (e.g., documents) in participating archives.  The Open Archives Initiative home page provides complete information on participation in the initiative. 

Software Overview

The OA Software is small set of Perl files that manage dispatching of protocol requests defined by the Open Archives Subset.  The OA Software is intended for use in conjunction with site-specific software that manages the individual archive.  Use of the OA software will require programming to establish the actual functional interface between the dispatched protocol requests and the individual archive.  

Organizations wishing to participate in the Open Archives Initiative that do not already have archive software should look at the full Dienst software release.

The Open Archives software is designed and written to be run in conjunction with an HTTP server.  (In fact, the protocol is designed to be embedded in URLs carried in HTTP requests).  The installation instructions support two mechanisms for linking an HTTP server to the Open Archives Software:

  1. Using standard CGI, which is supported by virtually all HTTP servers (although the OA Software is intended to be run with the Apache HTTP server).

  2. Using mod_perl, which embeds a persistent Perl interpreter and the Open Archives software in an Apache HTTP server.  This significantly speeds up the handling of protocol requests by avoiding the overhead of starting the Perl interpreter at each request.  mod_perl is only supported for various flavors of UNIX (e.g., linux, solaris, hp-ux).

System Requirements

Operating System

All of the Perl code in the Open Archives Software will run on any computer system that supports Perl (many flavors of UNIX, many flavors of Windows, MacOS).  However, use of the OA Software in conjunction with an HTTP server requires URL rewriting (in order to redirect Open Archive protocol requests to the OA Software).  To the best of our knowledge, URL rewriting is available only through the mod_rewrite module in Apache.  While Apache is supported on both flavors of Unix and flavors of WIN32, the follow caveat for WIN32 exists (lifted from the Apache Windows Web Page):

Warning: Apache on NT has not yet been optimized for performance. Apache still performs best, and is most reliable on Unix platforms. Over time we will improve NT performance. Folks doing comparative reviews of webserver performance are asked to compare against Apache on a Unix platform such as Solaris, FreeBSD, or Linux.

Furthermore, installers of the software who wish to exploit the performance gains offered by mod_perl can only do so on UNIX systems.

Hardware

Any system that is capable of hosting a Web server should be capable of running the Open Archives Software.  That includes a standard desktop workstation (e.g., Sun, IBM, etc.) or a garden variety Pentium-class PC running Linux (i.e., 400 MhZ processor, 128M or memory, Ethernet connection, multi-gigabyte hard disk).  Obviously the higher the capacity of the system, in terms of both processor and memory, will determine its ability to handle a very high volume of HTTP and Open Archive protocol requests.  Disk space consumption by the actual Dienst software is minimal - in the several megabyte range.  We expect that actual hardware requirements for running an archive site will depend on the archive-specific software rather than the Open Archives Software itself.

Software

The following software is required for installation and execution of the Open Archives Software:

Organization of the Code

The physical organization of the OA Software is as follows.  The code is organized into five directories:

Instructions on changes to these files at installation time are provided in the installation section.

Installing the Code

The following steps should be followed in to install the Open Archives Software.  Note that the installation process assumes that your site has not installed and is already running an Apache HTTP server.  If you are already running Apache, you will need to modify the configuration file for that server as described below:

  1. Download the latest version of the Apache source into a temporary directory.
    (Skip this step if you already have Apache installed and running at your site).  The Apache source is located here.  Untar the Apache source file and create the Apache source directory (the full path of that directory will be called apache_src in the following steps).

  2. Download the latest version of mod_perl source into the same temporary directory.
    (Skip this step if you already have an Apache server built with mod_perl, or if you wish to use standard CGI - which will degrade performance of your Open Archives Server). The mod_perl source is located here.  Untar the mod_perl source file and create the mod_perl source directory (the full path of that directory will be called mod_perl_src in the following steps).

  3. Build mod_perl.
    In the mod_perl_src directory run the following commands:

      perl Makefile.PL \
        APACHE_SRC=apache_src \
        DO_HTTPD=1 \
        USE_APACI=1 \
        PREP_HTTPD=1 \
        EVERYTHING=1
      make
      make install

    Note that to run these commands you will need write access to your Perl installation (in most cases this means that you have root access to your machine).  Note that detailed information on installing mod_perl is available in the installation files in the mod_perl source directory.

  4. Build Apache.
    Choose the directory into which you wish to install Apache (this will be called apache_run for the remainder of this document).  In the apache_src directory run the following commands:

      ./configure \
        --prefix=apache_run \
        --activate-module=src/modules/perl/libperl.a \
        --enable-module=rewrite \
        --enable-shared=rewrite
      make
      make install

    Note that detailed information on installing Apache is available in the installation files in the Apache source directory.  

  5. Test Apache.
    Go to apache_run/conf and edit the httpd.conf file.  Find the line that says:
        
      Port xxxx

    where xxxx is a number like 8090 and either leave it or change it to the port on which you want to run your Apache HTTP server.  Now go to apache_run/conf and run the command: 

      apachectl start

    Your Apache server should start.  If it doesn't, refer to the Apache documentation for help.

  6. Download the Open Archives Software.
    The OA Sofware is available here. Untar the source into a directory that is readable by the Apache Web server installed above.  This directory will be called OA_src for the remainder of this document.  The directory tree below OA_src should look like that described above.  

  7. Configure the Open Archives Software.
    You must modify two files in order to configure the Open Archives Software:

  8. Configure the Apache Server to use the Open Archives Software
    Go to apache_run/conf and edit the httpd.conf file.  Add the following lines at the end of the file.

    RewriteEngine on
    RewriteRule ^/Dienst(.*) OA_src/Main/dienst.pl
    <Directory OA_src/Main>
      SetHandler perl-script
      PerlHandler Apache::Registry
      Options ExecCGI
      allow from all
      PerlSendHeader Off
    </Directory>

    Note that the line allow from all specifies that all clients can execute the Open Archives protocol requests.  If you want more constrained access consult the Apache documentation.

    Now go to apache_run/conf and run the command: 

      apachectl restart

    to restart your Apache Server with the configuration changes.  

  9. Perform Basic Installation Tests
    Edit the file OA_src/Tests/InstallTest.htm and change all occurrences of the string host:port to the actual host and port of your Apache Server.  Open this file in a Web browser and check that each link successfully returns  an XML document with a root element that has the same name as the verb of the respective request.  The root element should have a single attribute named version, which has a value that is the version of the verb of the respective request.  For example, a Disseminate verb with version 1.0 will produce text/xml content with an wrapped in a tag like
        <Disseminate version="1.0">

Linking the Open Archives Software to your Archive

You now need to do the actual programming task of linking the Open Archives Software to the archive software that you are running at your site.  All Open Archives Protocol requests are dispatched to the set of subroutines in the file OA_src/Services/Repository/Repository_stubs.pl.  This file is commented to show the points at which each protocol request is handled and at which point you should insert the linkages to your own archive code. When customizing the code for your individual site you should refer to:

  1. The Open Archives Protocol Document for the description of each protocol request and DTD of each protocol response.
  2. The documentation of the Perl XML::Writer module for the tools to package XML responses.  This package is used to formulate the basic XML responses in the distributed Open Archives Software.