STARTS
Stanford Protocol Proposal for Internet Search and Retrieval
Reference Implementation
Implementation Notes
This page describes some features of the STARTS reference implementation of interest to
developers and testers. Information contained within is:
STARTS Release 1.1
: Extended Attribute Set Support, with Dublin Core demonstration
The Java StartsServer application is organized into nine packages and a
unpackaged main class. The main class is called Server. The packages
are as follows:
- http - code specific to the http protocol.
- parsing - classes used for parsing ranking and filter expressions in STARTS
queries.
- query - classes used for non-engine and source specific processing of STARTS
queries.
- regexp - methods for manipulating Perl-like regular expressions.
- resource - classes specific to the resource and source used in the reference
implementation.
- soif - classes for reading and outputting SOIF formatted data.
- STARTSConfiguration - general server configuration data.
- wais - classes specific to the freeWAIS search engine.
Note: no resource or search engine specific code is located outside the wais and
resource packages. Full documentation on these
packages is available in javadoc format.
The reference implementation was built and/or uses as runtime a number of publicly
available software packages:
- freeWAIS-sf search engine
freeWAIS-sf as used in the STARTS reference implementation is unchanged from the
distributed version.
- jb - Java Bison Parser
generator from the CU Arcadia Project.
jb is used to generate java source files from .lex and .y files. The respective .y and
.lex definitions are included in the StartsServer distribution, in the parsing/ranking
and parsing/filter directories (representing the scanning and parsing
definitions for both filter and ranking expressions respectively). These directories also
contain the generated java source files - YYparse.java, YYlex.java, and YYtokentypes.java.
If you want to regenerate these files, you will need to read the jb documentation,
available at jb - Java
Bison Parser. Also note that after you generate the files using jb you will have to
make a few manual changes to integrate the files into the StartsServer code:
- Add to each file the proper package definition - this is package
parsing.filter for the generated filter expression files and package
parsing.rankingfor the genreated ranking expression files.
- Replace the import jbf.* statement in all files with import parsing.*.
- In the generated YYparse.java file for both filter and ranking expressions,
change the declaration of yyval from protected to public.
- Jonathan Payne's Regular
Expression Package for Java
freeWAIS does not natively handle separate ranking and filter expressions. The behavior
of separate ranking and filter expressions is handled as follows. Both the STARTS filter
and ranking expressions are translated to wais queries. The two distinguishing components
of STARTS ranking expressions are handled as follows:
- The list operator is translated to an "ored" set of terms - for
example, the STARTS ranking expression list((body-of-text
"distributed")(body-of-text "database")) is translated to the WAIS
query (bd=distributed) or (bd=database).
- The weighted ranking syntax in STARTS is converted to a term repetition that
corresponds to the integer factor of the weight from the lowest weight. That is, the
STARTS ranking expression list(("distributed" 0.7)("databases"
0.3)) is translated to the WAIS query (distributed or distributed or database).
- Following this translation, both the filter and ranking expressions are submitted to the
wais engine. All "hits" from the filter expression are returned with their
scores modified as follows. If the "hit" appears in the results set from the
ranking expression, the score is set to that of the ranking expression. Otherwise, the
score is set to 0.
Let's review the components of the reference implementation:
- The httpd server, which accepts HTTP STARTS requests -- communicates using CGI
with...
- The perl CGI script, which bridges from HTTP requests to STARTS requests
-- communicates via a TCP socket (default 6789) with...
- The StartsServer application, which does all STARTS processing except for the
actual searching -- communicates via a Java native method call with...
- The modified freeWAIS waissearch code, which takes the ASCII WAIS search string
from StartsServer and turns it into a WAIS (quasi-Z39.50) query, and which turns
the WAIS results into an ASCIII result list for use by StartsServer -- communicates
via a TCP socket (hard-coded as 5000 in StartsServer) with ...
- The freeWAIS server.
These components are all intended to be run on a Solaris machine, since freeWAIS
has not been ported to NT. Yes, I know that I could run the freeWAIS server on
Solaris and communicate with it from NT via sockets. However, I would still need to port
the waissearch component to NT.
BUT, I wanted to use Symantec Café as my development environment, and this only runs
on a Windows platform. To make the code run on both platforms, NT and Solaris, I've done
the following:
- Isolated a few places in the StartsServer
code where the code varies according to the platform it is run on.
- Supplied a dummy server to bridge between an NT
resident StartsServer and Solaris resident freeWAIS.
- Supplied a dummy client for talking to the StartsServer
application independent of a HTTP server.
Code Differences between NT and Solaris StartsServer
There are a few code fragments that are specific to the platform
that StartsServer is running on. These fragments are all delimited by the comments:
/* !!!!! PLATFORM SPECIFIC CODE !!!!! */
/* !!!!! END PLATFORM SPECIFIC CODE !!!!! */
The NT specific code is preceded by the comment:
/* !!!!! NT VERSION !!!!! */
The Solaris specific code is preceded by the comment:
/* !!!!! SOLARIS VERSION !!!!! */
These code fragments are located in the following files:
You should go through these files and uncomment the code for the
appropriate platform and comment out the code for the other platform.
Dummy Server Bridge
In the main StartsServer source directory, you will find a Java source file
called DummyServer.java. This is a simple server that listens on port 6790. This
corresponds to the port opened in the NT specific code in wais/WAISSourceDescription.java.
This dummy server accepts the ASCII WAIS search strings over the socket and then uses a
native call to talk to the waissearch code.
Dummy Client
In the main StartsServer source directory, you will find a Perl script client.pl
that you can run under Perl5 (on either Solaris or NT) to talk to StartsServer.
This script takes two arguments <host> and <port> of StartsServer.
The script will prompt you for the filter and ranking expression and the source on which
the search should be done (cstr or linux).
To summarize, to test StartsServer on an NT machine (under Café) do the
following:
- Copy the StartsServer directory tree over to Solaris.
- Start up freeWAIS on Solaris on port 5000 with the argument to the -d option
being the location of the index files (e.g. waisserver -p 5000 -d <indexdir>).
- Start up the bridge server (DummyServer) on Solaris and it will listen on port
6790.
- Make sure the code fragments in the StartsServer version on your NT machine are
set to the NT code.
- Start up StartsServer in the Café debugger.
- Use the Perl script client.pl to talk to the running StartsServer
application (arguments should be <host> <port> of the StartsServer
application.)
Send questions to help@ncstrl.org