STARTS
Stanford Protocol
Proposal for Internet Search and Retrieval
Reference
Implementation
Implementation Overview
The STARTS reference implementation is composed of five major
parts:
- A short Perl CGI script that interfaces between a WWW
server and remainder of the reference implementation.
- The core of the reference implementation written in Java,
which will be referred to as StartsServer. This
Java code operates as a stand-alone server that accepts
four service requests that correspond to the services
provided by STARTS:
- QUERY - Takes a SOIF input of type SQuery
and returns a SOIF output of type SQResults
and zero or more SOIF outputs of type SQDocument
(depending on the number of "hits" in
the result set.
- SOURCEMETA - Returns a SOIF output of type
SMetaAttributes containing metadata for
the respective source.
- SOURCECONTENT - Returns a SOIF output of
type SContentSummary containing data about
the contents of the respective source.
- RESOURCEMETA - Returns a SOIF output of
type Sresource containing metadata for the
respective resource.
- A modified version of the freeWAIS waissearch
utility with which the Java STARTS server communicates as
a native method. The modifications to waissearch
are of two types:
- Rather than acting as a stand-alone program, it
is a function that takes an ASCII wais query and
returns an array of strings, each of which is a
"hit" for the query.
- Argument and return types are conversions of Java
types as required for Java native methods.
- The data returned for a search "hit"
has been modified to include the pathname of the
respective document (this allows mapping from the
"hit" to the actual document so that
data such as author, title, etc. can be extracted
by StartsServer for the STARTS query
return.
- The unmodified freeWAIS-sf
search engine that runs as a stand-alone server.
- Two sets of document sources.
- In summary, the control flow of the reference
implementation is:
- The WWW server receives a request. For all but
the QUERY service this is simply a GET
on a URL that is mapped by the WWW server to the
Perl CGI script. For the QUERY service
this is a POST request, where the input of
the POST is the Squery SOIF that
specifies the query. The WWW server maps this POST
request to the Perl CGI script.
- The Perl CGI script turns the WWW request into a
STARTS service request, which is one of the four
defined above, and sends this request via a
socket to the Java StartsServer.
- The Java StartsServer receives the request
via a socket and process it. For all but the QUERY
service tthis processing is done internal to the StartsServer,
with data drawn - when necessary - from the
freeWAIS indexing files (dictionary, inverted
index, etc.). For the QUERY service, the StartsServer
makes a native method call to the modified waissearch
utility, whch sends the query (translated by StartsServer)
to the freeWAIS engine.
- The freeWAIS engine processes
the query and returns the query
"hits" to the modified waissearch
utility.
- waissearch returns the query
"hits" as a string array to StartsServer.
- StartsServer processes the hits (e.g.,
extracting required information from the
documents) and writes the constructed SOIF(s) to
the WWW socket.
Carl
Lagoze
lagoze@cs.cornell.edu