Marking up document structure



[Next] [Up] [Previous]
Next: Translating between different Up: Electronic documents Previous: Electronic documents

Marking up document structure

Standard Generalized Markup Language (SGML) marks up document logical structure in a layout-independent manner [Gol90][HPR92][Org90][SGM86]. A Document Type Definition (DTD) is used to encapsulate the logical structure of specific classes of documents. Thus, SGML provides a notation for describing classes of structured documents and for coding documents belonging to described classes. An advantage of SGML and other grammar-based document representations is the ability to perform multiple applications on a single document source file. The International Committee on Accessible Documents (ICAD) has been working on defining an accessible DTD[+], but at present their work does not encompass mathematical content.

Though SGML is now used to markup a variety of documents by many government agencies, it still has very little support for marking up technical content, e.g., mathematics. There is ongoing work to remedy this situation. In the last year, the SGML-Math committee has been working on a math DTD for SGML. This work is not yet complete, but it has raised a few interesting issues. The main point of discussion has been whether it is possible to design a math DTD that captures semantic information about the mathematical constructs being marked up. Though it would be nice to have all of a mathematical construct's semantic content when processing the document, e.g., in our case producing audio renderings, this seems almost unattainable. There is as yet no firm agreement on this point, but the trend seems to be to move towards a math DTD that captures the layout as embodied by TeX. Defining a DTD that captures full mathematical semantics would make it difficult to invent new notation. TeX, by only capturing the layout constructs used to build up written mathematics, side-steps this issue, and the resulting system makes it easy to invent new notation. However, this also makes recognition more difficult. Some of the problems present in La)TeX are being addressed by ongoing work on the project.

Significant work has been carried out in the context of structure-sensitive editors for documents. This work has focused on the design of appropriate document encodings that capture high-level structure unambiguously. Another topic of interest has been the capture of hypertext links within the context of structured documents. The logical structure of documents is typically captured using a tree-like representation consisting of hierarchical units. The challenge of integrating this model with the notion of hypertext links has been successfully addressed by the design of HyperText Markup Language (HTML), an SGML-based markup system for encoding structured hypertext documents. Finally, the aim of achieving the best of two worlds, i.e., the power afforded by a grammar-based markup system and the user-interface provided by WYSIWYG systems (what you see is what you get) has led to work on providing multiple synchronized views of a document [Har88]. See [Bro88][QNA90][PS88][Ver90][BG90][CJ90][KS84][Ass86][Kat87][FS89][SFR92][Lev88][FBN<6287>>+90][SF90][SF88][PI88][KLMN90][BB90][LG90][QV92] for details on relevant work in this area.



[Next] [Up] [Previous]
Next: Translating between different Up: Electronic documents Previous: Electronic documents



TV Raman
Thu Mar 9 20:10:41 EST 1995