STARTS
Stanford Protocol Proposal for Internet Search and Retrieval

Reference Implementation


Modification and Customization

Providing access to other sources

The StartsServer is structured in manner that permits easy addition or modification of the WAIS sources. The reference implementation provides access to two sample sources:

Steps for changing the sources are as follows:

  1. Follow the instructions for freeWAIS-sf for creating and indexing a new set of documents (let's call the wais database for these documents db) using the "fields" type for waisindex. A quick summary of the steps to do this are:
    1. Place the documents themselves within a directory tree - let's call that directory dbdocs.
    2. Create a directory in which the wais indexes should be placed - let's call that directorydbiind.
    3. Create the wais field description file (in this case db.fmt) in the dbind directory. This field must includes fields that semantically match the required STARTS fields, which are title, linkage, and date/time last modified. Refer to the STARTS specification for a complete list of fields as a guide to what you might want to specify in your wais field description file.
    4. Index the sources using the waisindex utility.
  2. Create a new StartsServer class, in the package resource, to represent the new source. This class should extend the class WAISSourceDescription, which is an abstract class sub-classing all varieties of WAIS sources. The source description class does things such as define where the indexing files are, what the mapping from STARTS fields to source fields is, etc. Take a look at the classes CSTRSourceDescription and LINUXSourceDescription for examples of how to create this sub-class.
  3. Create a new StartsServer class, in the package resource, to represent the documents in the new source. This class should extend the class WAISDocuments, which is an abstract class sub-classing all varieties of WAIS documents. The main function of the document class is to extract document information (e.g., title, author) from the document files you have indexed with WAIS. The code to do this is idiosyncratic to each type of document. Take a look at the classes CSTRDocuments and LINUXDocuments for examples of how to create this sub-class.
  4. Modify the class ResourceDescription, in the package STARTSConfiguration, to provide access to the new source(s). Specifically, you should modify the static code that loads the sources for the resource. In the reference implementation this is coded as:
// Load the sources hashtable
     static {
        sources.put("cstr", new CSTRSourceDescription());
        sources.put("linux", new LINUXSourceDescription());
     }

You should modify this code so that the keys in the hashtable correspond to the names of your sources, and their values the class that is the description of that source.

Using a native search engine other than freeWAIS

Using the reference implementation to support another native search engine (not freeWAIS-sf) is, by nature, a more complicated task. However, StartsServer is structured in a manner that allows this to be done via sub-classing rather than rewriting source and engine independent pieces of the code. All wais-specific code is isolated to the package wais. The two core classes in this package are:

  1. WAISSourceDescription - an abstract sub-class of the abstract class resource.SourceDescription. This class describes generic attributes and methods of a wais source.
  2. WAISResultDocument - an abstract sub-class of the abstract class results.Document.html. This class describes generic attribute and methods of a wais document that is part of a query results set.
  1. You will need to create two such sub-classes for the engine to which you wish to provide access. You can then add new sources for this engine, in a manner similar to that described above.

Carl Lagoze
lagoze@cs.cornell.edu