Toward a Taxonomy of Logical Document Structures

In Electronic Publishing and the Information Superhighway: Proceedings of the Dartmouth Institute for Advanced Graduate Studies (DAGS '95), pages 124 - 133, Boston, May 1995.

Abstract
The automated discovery of logical structure in text documents is an important problem that has recently received a good deal of attention; it can enable the creation of flexible and sophisticated document manipulation tools that will greatly increase the impact of electronic documents. This paper addresses aspects of the nature of these logical structures, in order to develop categories of structures that reflect the variance in requirements for discovery and the variance in significance for applications. A complete taxonomy is not developed, but relevant attributes are identified in three forms of categorization: fundamental, based on structure definitions; discovery, based on required observables to find structures; and usage, based on roles structures play in applications. The attributes themselves are independent of the choice of particular logical structures to consider in a given application, and their direct implications are discussed.

You can view the full postscript file, view an html version at the conference papers site, or return to my home page.