We refine the quasi-prefix form by adding the following subtypes. This makes recognizing and handling complex mathematical content cleaner.
We first introduce object math subformula, which is used to capture subexpressions appearing within the [tex2html_wrap5306] and [tex2html_wrap5308] of La)TeX. Object math subformula can be thought of as being the math equivalent of object text block described in s:high-level-models. It has the following structure:
We need object math subformula to represent expressions of the form:
[displaymath5302]
[displaymath5303]
In representing each of the above examples, object math subformula is essential in capturing the expression to which the overbrace/underbrace applies.
To enable recognition of written mathematics, tokens have to be appropriately classified. Our classification of tokens when processing written mathematics is inspired by appendix F of the TeX Book, [Knu84].
The symbols divide naturally into groups based on their mathematical class (Ord, Op, Bin, Rel, Open, Close, or Punct), [tex2html_wrap5310]
We introduce subtypes of object math object to correspond to each token type:
Written mathematical notation uses juxtaposition as an infix
operator. Juxtaposition, as in [tex2html_wrap5340], mostly denotes
multiplication, but can mean function application in certain contexts
-[tex2html_wrap5342]. We introduce a new operator to represent
juxtaposition, and to define it precisely, we also assert that all
mathematical variables are single letters. Thus, [tex2html_wrap5344] is represented
as the juxtaposition of three ordinary objects. This assertion
is not specific to our internal representation,
rather, it specifies the concrete syntax used in the electronic markup
and reflects the choice made in the design of TeX. We do allow
mathematical variables made up of more than one character, but these
should be clearly marked up as such, e.g., as [tex2html_wrap5346], by
using \mbox
as in $\mbox{cab}=cab$
.
The classification of a math object is defined using the following command: (define-math-classification token classification)
In certain special cases, the predefined classification shown above can be modified. A good example of this is recognizing a mathematical text that consistently uses the letters [tex2html_wrap5348], [tex2html_wrap5350] and [tex2html_wrap5352] to denote functions. Using the predefined classification, the recognizer would treat [tex2html_wrap5354] as object ordinary, leading to [tex2html_wrap5356] being represented as the juxtaposition of two objects, namely, [tex2html_wrap5358] and [tex2html_wrap5360]. Declaring [tex2html_wrap5362] to be a mathematical function by executing (define-math-classification f mathematical-function-name)
results in occurrences of [tex2html_wrap5364] being treated as a function. Hence, [tex2html_wrap5366] is correctly recognized as a function application. Note that the correct interpretation of such notation is more important for browsing than for speaking the expression.