STARTS
Stanford Protocol
Proposal for Internet Search and Retrieval
Reference
Implementation
Modification and Customization
The StartsServer is structured in manner that permits
easy addition or modification of the WAIS sources. The reference
implementation provides access to two sample sources:
- cstr - a subset of computer science technical reports
available through the NCSTRL
server at Cornell University.
- linux - a subset of the comp.os.linux.announce
archives available at sunSITE.
Steps for changing the sources are as follows:
- Follow the instructions for freeWAIS-sf
for creating and indexing a new set of documents (let's
call the wais database for these documents db)
using the "fields" type for waisindex. A
quick summary of the steps to do this are:
- Place the documents themselves within a directory
tree - let's call that directory dbdocs.
- Create a directory in which the wais indexes
should be placed - let's call that directorydbiind.
- Create the wais field description file (in this
case db.fmt) in the dbind directory.
This field must includes fields that semantically
match the required STARTS fields, which are title,
linkage, and date/time last modified.
Refer to the STARTS
specification for a complete list of fields
as a guide to what you might want to specify in
your wais field description file.
- Index the sources using the waisindex
utility.
- Create a new StartsServer class, in the package resource, to represent
the new source. This class should extend the class WAISSourceDescription,
which is an abstract class sub-classing all varieties of
WAIS sources. The source description class does things
such as define where the indexing files are, what the
mapping from STARTS fields to source fields is, etc. Take
a look at the classes CSTRSourceDescription
and LINUXSourceDescription
for examples of how to create this sub-class.
- Create a new StartsServer class, in the package resource, to represent
the documents in the new source. This class should extend
the class WAISDocuments,
which is an abstract class sub-classing all varieties of
WAIS documents. The main function of the document class
is to extract document information (e.g., title, author)
from the document files you have indexed with WAIS. The
code to do this is idiosyncratic to each type of
document. Take a look at the classes CSTRDocuments
and LINUXDocuments
for examples of how to create this sub-class.
- Modify the class ResourceDescription,
in the package STARTSConfiguration,
to provide access to the new source(s). Specifically, you
should modify the static code that loads the
sources for the resource. In the reference implementation
this is coded as:
// Load the sources hashtable
static {
sources.put("cstr", new CSTRSourceDescription());
sources.put("linux", new LINUXSourceDescription());
}
You should modify this code so that the keys in the hashtable
correspond to the names of your sources, and their values the
class that is the description of that source.
Using a native search engine other than freeWAIS
Using the reference implementation to support another native
search engine (not freeWAIS-sf) is, by nature, a more complicated
task. However, StartsServer is structured in a manner that
allows this to be done via sub-classing rather than rewriting
source and engine independent pieces of the code. All
wais-specific code is isolated to the package wais. The two core classes in
this package are:
- WAISSourceDescription
- an abstract sub-class of the abstract class resource.SourceDescription.
This class describes generic attributes and methods of a
wais source.
- WAISResultDocument
- an abstract sub-class of the abstract class results.Document.html.
This class describes generic attribute and methods of a
wais document that is part of a query results set.
- You will need to create two such sub-classes for the
engine to which you wish to provide access. You can then
add new sources for this engine, in a manner similar to
that described above.
Carl
Lagoze
lagoze@cs.cornell.edu