PPG is a parser generator for extensible grammars, based on the
CUP
parser generator. It provides the ability to extend an existing base
language grammar written in CUP or PPG with localized, easily
maintained changes.
Distribution of PPG
PPG was designed and written by Michael Brukman, Andrew C. Myers, and Chinawat
Isradisaikul at Cornell University. It is distributed as part of the
Polyglot extensible
compiler framework.
PPG syntax
The PPG grammar syntax extends that of CUP with several new declarations:
- The first line of a PPG specification is where filename
is a relative path to the inherited specification. A PPG spec can
include a CUP spec or another PPG spec. It is an error for the
chain of included files to contain a cycle.
- A PPG grammar specification can modify the inherited grammar
using the following commands:
- precedence [ left | right |
nonassoc ] tokenlist;
is the syntax
for specifying new precedence rules, which is exactly as they
are specified in CUP. Specification of new precedence rules
in a PPG specification replaces the precedence rules in the
inherited grammar with these.
- precedence;
deletes all precedence rules
from the inherited grammar, and is mutually exclusive with
the syntax above: to add new precedence rules use the above
syntax, use of this statement will only remove precedence
rules.
- drop { symbol }
where symbol is an inherited terminal or
nonterminal.
The specified symbol is removed from grammar, and
if a nonterminal, all productions where it is on the
left-hand side are also eliminated. It is an error to drop
a non-terminal and not drop productions where it is mentioned on the
right-hand side.
- drop { S ::= <
productions> ; }
where S
is an inherited nonterminal, and productions
are inherited productions. The specified productions are not
inherited from the base grammar. The nonterminal remains in the
grammar, even if ALL of its productions are dropped in this way;
drop { S } must be used if the
nonterminal is to be eliminated.
- override S ::= <
productions> ;
The specified productions replace productions of S.
- extend S ::= <
productions> ;
The specified productions are added to the nonterminal S.
- transfer S to
A1 { rhs1 } to
An { rhsn }
where
rhsi is one or more right-hand sides
of productions of S. Each of the nonterminals
Ai is extended with productions as specified,
and the transferred productions are not inherited
by S, so a single production can be transferred to
multiple nonterminals. Note that S may be one of the
Ai, which has the effect of retaining the
productions in S.
- New terminals, nonterminals, and productions can be defined
as in CUP and extend the base grammar.
- PPG supports multiple start symbols in grammars. To specify
which symbols may be used as start symbols, PPG provides the
following syntax:
start with S1
func1
...
start with Sn
funcn
where each Si is a non-terminal symbol that
you would like to start the parser with, every time you call
the function funcn. PPG will automatically
generate a new nonterminal symbol for each of these productions
to differentiate between the different nonterminals that the
parsing may start with now, and patch the grammar accordingly.
After the parser class is instantiated, any of the
funci can be called on it to parse the appropriate
part of the grammar.
Note: if you are using multiple inheritance, make sure to specify the
symbols class you are planning to use with the -symbols
switch. See the section Running PPG.
Running PPG
The syntax for invocation of PPG is
PPG [-symbols <class name>]
<grammar file>
where
- -symbols <class name>
is an optional switch which specifies the name of the constant class,
as in CUP, and is only applicable if you are using multiple inheritance.
You should specify the same class name to PPG as you will to CUP
when you invoke it on the resulting output, because PPG will use that
class name. The default class name is "sym", as it is in CUP.
- <grammar file>
is either a standard CUP file or a PPG-based grammar inheritance file.
The inherited base grammar file is searched for relative to the location
of the PPG source file.
Semantics of PPG
PPG is declarative: the order in which commands are specified is not
significant. The resulting productions for any nonterminal N is as follows:
N' = N – drop(N) – transferL(N) + override(N) + transferR(N) +
extend(N) + newProductions(N)
The resulting available set of terminals is simply:
T' = T – drop(T) + newTerminals
where T is the set of terminals from base grammar, drop(T) is the
set of dropped terminals using the drop command, newTerminals
are terminals defined using CUP syntax. The resulting set of nonterminals
is similar.
For precedence rules:
if "
precedence;" is specified by the PPG grammar, thus
not inheriting any precedence rules from the base grammar, or
where any newly defined precedence rules override any inherited precedence
rules from the base grammar.
Debugging PPG grammars
PPG has a nearly unique feature that helps debug grammars: it generates
explanatory counterexamples showing why grammar conflicts arise, either
because the grammar is ambiguous or because it is simply not LALR. A
counterexample is either a single string of nonterminals and terminals that can
be parsed in two ways, or a pair of such strings that are identical up to the
point of conflict. PPG both reports the string or strings and also
shows how they are derived.
For more information, see the following paper: