STARTS
Stanford Protocol Proposal
for Internet Search and Retrieval
Reference
Implementation
Implementation Notes
This page describes some features of the STARTS reference
implementation of interest to developers and testers. Information
contained within is:
The Java StartsServer application is organized into
nine packages and a unpackaged main class. The main
class is called Server. The packages are as follows:
- http - code specific to the http protocol.
- parsing - classes used for parsing ranking and
filter expressions in STARTS queries.
- query - classes used for non-engine and source
specific processing of STARTS queries.
- regexp - methods for manipulating Perl-like
regular expressions.
- resource - classes specific to the resource and
source used in the reference implementation.
- soif - classes for reading and outputting SOIF
formatted data.
- STARTSConfiguration - general server configuration
data.
- wais - classes specific to the freeWAIS search
engine.
Note: no resource or search engine specific code is located
outside the wais and resource packages. Full documentation on these
packages is available in javadoc format.
The reference implementation was built and/or uses as runtime
a number of publicly available software packages:
- freeWAIS-sf
search engine
freeWAIS-sf as used in the STARTS reference
implementation is unchanged from the distributed version.
- jb
- Java Bison Parser generator from the CU Arcadia
Project.
jb is used to generate java source files from .lex and .y
files. The respective .y and .lex definitions are
included in the StartsServer distribution, in
the parsing/ranking and parsing/filter
directories (representing the scanning and parsing
definitions for both filter and ranking expressions
respectively). These directories also contain the
generated java source files - YYparse.java, YYlex.java,
and YYtokentypes.java. If you want to regenerate these
files, you will need to read the jb documentation,
available at jb
- Java Bison Parser. Also note that after you
generate the files using jb you will have to make a few
manual changes to integrate the files into the StartsServer
code:
- Add to each file the proper package
definition - this is package parsing.filter
for the generated filter expression files and package
parsing.rankingfor the genreated ranking
expression files.
- Replace the import jbf.* statement in
all files with import parsing.*.
- In the generated YYparse.java file for
both filter and ranking expressions, change the
declaration of yyval from protected
to public.
- Jonathan Payne's Regular
Expression Package for Java
freeWAIS does not natively handle separate ranking and filter
expressions. The behavior of separate ranking and filter
expressions is handled as follows. Both the STARTS filter and
ranking expressions are translated to wais queries. The two
distinguishing components of STARTS ranking expressions are
handled as follows:
- The list operator is translated to an
"ored" set of terms - for example, the STARTS
ranking expression list((body-of-text
"distributed")(body-of-text
"database")) is translated to the WAIS
query (bd=distributed) or (bd=database).
- The weighted ranking syntax in STARTS is converted
to a term repetition that corresponds to the integer
factor of the weight from the lowest weight. That is, the
STARTS ranking expression list(("distributed"
0.7)("databases" 0.3)) is translated to the
WAIS query (distributed or distributed or database).
- Following this translation, both the filter and ranking
expressions are submitted to the wais engine. All
"hits" from the filter expression are returned
with their scores modified as follows. If the
"hit" appears in the results set from the
ranking expression, the score is set to that of the
ranking expression. Otherwise, the score is set to 0.
Let's review the components of the reference implementation:
- The httpd server, which accepts HTTP STARTS
requests -- communicates using CGI with...
- The perl CGI script, which bridges
from HTTP requests to STARTS requests -- communicates via
a TCP socket (default 6789) with...
- The StartsServer application, which does all
STARTS processing except for the actual searching --
communicates via a Java native method call with...
- The modified freeWAIS waissearch code, which takes
the ASCII WAIS search string from StartsServer and
turns it into a WAIS (quasi-Z39.50) query, and which
turns the WAIS results into an ASCIII result list for use
by StartsServer -- communicates via a TCP socket
(hard-coded as 5000 in StartsServer) with ...
- The freeWAIS server.
These components are all intended to be run on a Solaris
machine, since freeWAIS has not been ported to NT. Yes, I
know that I could run the freeWAIS server on Solaris and
communicate with it from NT via sockets. However, I would still
need to port the waissearch component to NT.
BUT, I wanted to use Symantec Café as my development
environment, and this only runs on a Windows platform. To make
the code run on both platforms, NT and Solaris, I've done the
following:
- Isolated a few places in the StartsServer
code where the code varies according to the platform it
is run on.
- Supplied a dummy server to bridge
between an NT resident StartsServer and Solaris
resident freeWAIS.
- Supplied a dummy client for
talking to the StartsServer application
independent of a HTTP server.
Code Differences between NT and Solaris StartsServer
There are a few code fragments that are
specific to the platform that StartsServer is running on.
These fragments are all delimited by the comments:
/* !!!!! PLATFORM SPECIFIC CODE !!!!!
*/
/* !!!!! END PLATFORM SPECIFIC CODE
!!!!! */
The NT specific code is preceded by
the comment:
/* !!!!! NT VERSION !!!!! */
The Solaris specific code is preceded by
the comment:
/* !!!!! SOLARIS VERSION !!!!! */
These code fragments are located in the
following files:
You should go through these files and
uncomment the code for the appropriate platform and comment out
the code for the other platform.
Dummy Server Bridge
In the main StartsServer source directory, you will
find a Java source file called DummyServer.java. This is a simple
server that listens on port 6790. This corresponds to the port
opened in the NT specific code in
wais/WAISSourceDescription.java. This dummy server accepts the
ASCII WAIS search strings over the socket and then uses a native
call to talk to the waissearch code.
Dummy Client
In the main StartsServer source directory, you will
find a Perl script client.pl that you can run under Perl5
(on either Solaris or NT) to talk to StartsServer. This
script takes two arguments <host> and <port>
of StartsServer. The script will prompt you for the filter
and ranking expression and the source on which the search should
be done (cstr or linux).
To summarize, to test StartsServer on an NT machine
(under Café) do the following:
- Copy the StartsServer directory tree over to
Solaris.
- Start up freeWAIS on Solaris on port 5000 with the
argument to the -d option being the location of the index
files (e.g. waisserver -p 5000 -d <indexdir>).
- Start up the bridge server (DummyServer) on
Solaris and it will listen on port 6790.
- Make sure the code fragments in the StartsServer
version on your NT machine are set to the NT code.
- Start up StartsServer in the Café debugger.
- Use the Perl script client.pl to talk to the
running StartsServer application (arguments should
be <host> <port> of the StartsServer
application.)
Carl
Lagoze
lagoze@cs.cornell.edu