5 Pattern Matching
Pattern matching provides a concise, convenient way to bind parts of
large objects to new local variables. Two Cyclone constructs use
pattern matching, let declarations and switch statements. Although
the latter are more common, we first explain patterns with
let declarations because they have fewer
complications. Then we describe all the pattern
forms. Then we describe switch
statements.
You must use patterns to access values carried by
tagged unions, including exceptions. In
other situations, patterns make code more readable and less verbose.
Note that this section does not include rules for matching against
unique pointers; this is explained in
Section 8.4.3113Pattern Matching on Unique Pointerssubsubsection.8.4.3.
5.1 Let Declarations
In Cyclone, you can write
let x = e;
as a local declaration. The meaning is the same as t x = e;
where t is the type of e. In other words,
x is bound to the new variable. Patterns are much more
powerful because they can bind several variables to different parts of
an aggregate object. Here is an example:
struct Pair { int x; int y; };
void f(struct Pair pr) {
let Pair(fst,snd) = pr;
...
}
The pattern has the same structure as a struct Pair with parts
being variables. Hence the pattern is a match for pr and the
variables are initialized with the appropriate parts of pr. Hence
``let Pair(fst,snd) = pr'' is equivalent to
``int fst =pr.x; int snd = pr.y''. A let-declaration's
initializer is evaluated only once.
Patterns may be as structured as the expressions against which they
match. For example, given type
struct Quad { struct Pair p1; struct Pair p2; };
patterns for matching against an expression of type struct Quad could
be any of the following (and many more because of constants and
wildcards---see below):
-
Quad(Pair(a,b),Pair(c,d))
- Quad(p1, Pair(c,d))
- Quad(Pair(a,b), p2)
- Quad(p1,p2)
- q
In general, a let-declaration has the form ``let p = e;'' where p is a
pattern and e is an expression. In our example, the match
always succeeds, but in general patterns can have compile-time errors
or run-time errors.
At compile-time, the type-checker ensures that the pattern makes sense
for the expression. For example, it rejects ``let Pair(fst,snd) = 0''
because 0 has type int but the pattern only makes sense for type
struct Pair.
Certain patterns are type-correct, but they may not match run-time
values. For example, constants can appear in patterns, so ``let
Pair(17,snd) = pr;'' would match only when pr.x is 17.
Otherwise the exception Match_Exception is thrown. Patterns
that may fail are rarely useful and poor style in let-declarations;
the compiler emits a warning when you use them. In switch statements,
possibly-failing patterns are the norm---as we explain below, the
whole point is that one of the cases' patterns should match.
5.2 Pattern Forms
So far, we have seen three pattern forms: variables patterns, struct
patterns, and constant patterns. We now describe all the pattern
forms.1 For each
form, you need to know:
-
The syntax
- The types of expressions it can match against (to avoid a
compile-time error)
- The expressions the pattern matches against (other expressions
cause a match failure)
- The bindings the pattern introduces, if any.
There is one compile-time rule that is the same for all forms: All
variables (and type variables) in a pattern must be distinct. For
example, ``let Pair(fst,fst) = pr;'' is not allowed.
You may want to read the descriptions for variable and struct patterns
first because we have already explained their use informally.
-
Variable patterns
-
Syntax: an identifer
- Types for match: all types
- Expressions matched: all expressions
- Bindings introduced: the identifier is bound to the expression
being matched
- Wildcard patterns
-
Syntax: _ (underscore, note this use is completely
independent of _ for type
inference)
- Type for match: all types
- Expressions matched: all expressions
- Bindings introduced: none. Hence it is like a
variable pattern that uses a fresh identifier. Using _
is better style because it indicates the value matched is not
used. Notice that ``let _ = e;'' is equivalent to
e.
- As patterns
-
Syntax: x as p where x
is an identifier and p is a pattern.
- Types for match: all types
- Expressions matched: all expressions
- Bindings introduced: if the expression matches the pattern p,
then its value is bound to x. Thus, a variable pattern is
simply shorthand for ``x as _''.
- Reference patterns
-
Syntax: *x (i.e., the * character followed by an
identifier)
- Types for match: all types
- Expressions matched: all expressions. (Very subtle notes: Currently,
reference patterns may only appear inside of other patterns so
that the compiler can determine the region for the pointer type
assigned to x. They also may not occur under a
datatype pattern that has existential types unless there is a
pointer pattern in-between.)
- Bindings introduced: x is bound to the address of
the expression being matched. Hence if matched against a value
of type t in region `r, the type of x is
t@`r.
- Numeric constant patterns
-
Syntax: An int, char, or float constant
- Types for match: numeric types
- Expressions matched: numeric values such that ==
applied to the value and the pattern yields true. (Standard C
numeric promotions apply. Note that comparing floating point
values for equality is usually a bad idea.)
- Bindings introduced: none
- NULL constant patterns
-
Syntax: NULL
- Types for match: nullable pointer types, including ? types
- Expressions matched: NULL
- Bindings introduced: none
- Enum patterns
-
Syntax: an enum constant
- Types for match: the enum type containing the constant
- Expressions matched: the constant
- Bindings introduced: none
- Tuple patterns
-
Syntax: $(p1,...,pn[,...]) where p1,...,pn are patterns.
The trailing comma and ellipses (...) are optional.
- Types for match: if no ellipses, then tuple types with exactly
n fields, where pi matches
the type of the tuple's ith field. If the ellipses are present,
then matches a tuple with at least n fields.
- Expressions matched: tuples where the ith field matches pi for
i between 1 and n.
- Bindings introduced: bindings introduced by p1, ...,
pn.
- Struct patterns
-
Syntax: There are three forms:
-
X(p1,...,pn[,...]) where X is the name of a struct
with n fields and p1,...,pn are patterns. This syntax is
shorthand for
X{.f1 = p1, ..., .fn = pn [,...]}
where fi is the
ith field in X.
X{.f1 = p1, ..., .fn = pn [,...]}
where the fields of X
are f1, ..., fn but not necessarily in that order
{.f1 = p1, ..., .fn = pn [,...]}
which is the same as above
except that struct name X has been omitted.
- Types for match: struct X (or instantiations when
struct X is polymorphic) such that pi matches the type of
fi for i between 1 and n. If the ellipses are not
present, then each member of the struct must have a pattern.
- Expressions matched: structs where the value in fi matches pi
for i between 1 and n.
- Bindings introduced: bindings introduced by p1,...,pn
- Tagged Union patterns
-
Syntax: There are two forms:
-
X{.fi = p}
where the members of X
are f1, ..., fn and fi is one of those members.
{{.f1 = p
which is the same as above
except that union name X has been omitted.
- Types for match: union X (or instantiations when
union X is polymorphic) such that p matches the type of
fi.
- Expressions matched: tagged unions where the last member
written was fi and the value of that member matches p.
- Bindings introduced: bindings introduced by p.
- Pointer patterns
-
Syntax: &p where p is a pattern
- Types for match: pointer types, including ? types.
Also datatype Foo @ (or instantiations of it) when the pattern
is &Bar(p1,...,pn) and Bar is a
variant of datatype Foo and pi matches the type of the ith
value carried by Bar.
- Expressions matched: non-null pointers where the value pointed
to matches p. Note this explanation includes the case where the
expression has type datatype Foo @ and the pattern is
&Bar(p1,...,pn) and the current variant of the expression is
``pointer to Bar''.
- Bindings introduced: bindings introduced by p
- Datatype patterns
-
Syntax: X if X is a variant that carries no values.
Else X(p1,...,pn[,...]) where X is the name of a variant
and p1, ..., pn are patterns. As with tuple and struct patterns,
the ellipses are optional.
- Types for match: datatype Foo (or instantiations of
it).
- Expressions matched: If X is non-value-carrying, then
X. If X is value-carrying, then values created from
the constructor X such that pi matches the ith field.
- Bindings introduced: bindings introduced by p1,...,pn
5.3 Switch Statements
In Cyclone, you can switch on a value of any type and the case
``labels'' (the part between case and the colon) are patterns. The
switch expression is evaluated and then matched against each pattern
in turn. The first matching case statement is executed. Except for
some restrictions, Cyclone's switch statement is therefore a powerful
extension of C's switch statement.
Restrictions
-
You cannot implicitly ``fall-through'' to the next case.
Instead, you must use the fallthru; statement, which has
the effect of transferring control to the beginning of the next
case. There are two exceptions to this restriction: First, adjacent
cases with no intervening statement do not require a fall-through.
Second, the last case of a switch does not require a fall-through
or break.
- The cases in a switch must be exhaustive; it is a
compile-time error if the compiler determines that it could be that
no case matches. The rules for what the compiler determines are
described below.
- A case cannot be unreachable. It is a compile-time error
if the compiler determines that a later case may be subsumed by an
earlier one. The rules for what the compiler determines are
described below. (C almost has this restriction because case labels
cannot be repeated, but Cyclone is more restrictive. For example, C
allows cases after a default case.)
- The body of a switch statement must be a sequence of case
statements and case statements can appear only in such a
sequence. So idioms like Duff's device
(such as ``switch(i%4) while(i-- >=0) { case 3: ... }'')
are not supported.
- A constant case label must be a constant, not a constant
expression. That is, case 3+4: is allowed in C, but not in
Cyclone. Cyclone supports this feature with a separate construct:
switch "C" (e) { case 3+4: ... }. This construct is much
more like C's switch: The labels must be constant
numeric expressions and fallthru is never required.
An Extension of C
Except for the above restrictions, we can see Cyclone's switch is an
extension of C's switch. For example, consider this code (which has
the same meaning in C and Cyclone):
int f(int i) {
switch(i) {
case 0: f(34); return 17;
case 1: return 17;
default: return i;
}
}
In Cyclone terms, the code tries to match against the constant 0. If
it does not match (i is not 0), it tries to match against the pattern
1. Everything matches against default; in fact, default is just
alternate notation for ``case _'', i.e., a case with a
wildcard pattern. For performance reasons,
switch statements that are legal C switch statements are translated to
C switch statements. Other switch statements are translated to,
``a mess of tests and gotos''.
We now discuss some of the restrictions in terms of the above example.
Because there is no ``implicit fallthrough'' in non-empty cases, the
return statement in case 0 cannot be omitted. However, we can replace
the ``return 17;'' with ``fallthru;'' a special Cyclone statement that
immediately transfers control to the next case. fallthru does not
have to appear at the end of a case body, so it acts more like a goto
than a fallthrough. As in our example, any case that matches all
values of the type switched upon (e.g., default:,
case _:,
case x:) must appear last, otherwise later cases would be
unreachable. (Note that other types may have even more such patterns.
For example Pair(x,y) matches all values of type
struct Pair int x; int y;).
Much More Powerful
Because Cyclone case labels are patterns, a switch statement can match
against any expression and bind parts of the expression to variables.
Also, fallthru can (in fact, must) bind values to the next
case's pattern variables. This silly example demonstrates all of
these features:
extern int f(int);}
int g(int x, int y) {
// return f(x)*f(y), but try to avoid using multiplication
switch($(f(x),f(y))) {
case $(0,_): fallthru;
case $(_,0): return 0;
case $(1,b): fallthru(b+1-1);
case $(a,1): return a;
case $(a,b): return a*b;
}
}
The only part of this example using a still-unexplained feature is
``fallthru(b)'', but we explain the full example anyway. The switch
expression has type $(int,int), so all of the cases must
have patterns that match this type. Legal case forms for this type
not used in the example include ``case $(_,id):'',
``case $(id,_):'',
``case id:'',
``case _:'',
and
``default:''.
The code does the following:
-
It evaluates the pair $(f(x), f(y)) and stores the
result on the stack.
- If f(x) returned 0, the first case matches, control jumps to the second
case, and 0 is returned.
- Else if f(y) returned 0, the second case matches and 0 is returned.
- Else if f(x) returned 1, the third case matches, b is assigned the value
f(y) returned, control jumps to the fourth case after assigning b+1-1 to a,
and a (i.e., b + 1 - 1, i.e., b, i.e., f(y)) is returned.
- Else if f(y) returned 1, the fourth case matches, a is assigned the value
f(x) returned, and a is returned.
- Else the last case matches, a is assigned the value f(x) returned, b is
assigned the value f(y) returned, and a*b is returned.
Note that the switch expression is evaluated only once.
Implementation-wise, the result is stored in a compiler-generated
local variable and the value of this variable is used for the ensuring
pattern matches.
The general form of fallthrus is as follows: If the next case has no
bindings (i.e., identifiers in its pattern), then you must write
fallthru;. If the next case has n bindings, then you must
write fallthru(e1,...,en) where each ei is an expression with
the appropriate type for the ith binding in the next case's pattern,
reading from left to right. (By appropriate type, we mean the type of
the expression that would be bound to the ith binding were the next
case to match.) The effect is to evaluate e1 through en, bind them to
the identifiers, and then goto the body of the next case.
fallthru is not allowed in the last case of a switch, not
even if there is an enclosing switch.
We repeat that fallthru may appear anywhere in a case body, but it is
usually used at the end, where its name makes the most sense. ML
programmers may notice that fallthru with bindings is strictly more
expressive than or-patterns, but more verbose.
Case Guards
We have withheld the full form of Cyclone case labels. In addition to
case p: where p is a pattern, you may write case p && e: where
p is a pattern and e is an expression of type int. (And since
e1 && e2 is an expression, you can write
case p && e1 && e2: and so on.) Let's call e the case's
guard.
The case matches if p matches the expression in the switch and e
evaluates to a non-zero value. e is evaluated only if p matches and
only after the bindings caused by the match have been properly
initialized. Here is a silly example:
extern int f(int);
int g(int a, int b) {
switch ($(a,b-1)) {
case $(0,y) && y > 1: return 1;
case $(3,y) && f(x+y) == 7 : return 2;
case $(4,72): return 3;
default: return 3;
}
}
The function g returns 1 if a is 0 and b is greater than 2. Else if x
is 3, it calls the function f (which of course may do arbitrary
things) with the sum of a and b. If the result is 7, then 2 is
returned. In all other cases (x is not 3 or the call to f does not
return 7), 3 is returned.
Case guards make constant patterns unnecessary (we can replace case 3:
with case x && x==3:, for example), but constant patterns are
better style and easier to use.
Case guards are not interpreted by the compiler when doing
exhaustiveness and overlap checks, as explained below.
Exhaustiveness and Useless-Case Checking
As mentioned before, it is a compile-time error for the type of the
switch expression to have values that none of the case patterns match
or for a pattern not to match any values that earlier patterns do not
already match. Rather than explain the precise rules, we currently
rely on your intuition. But there are two rules to guide your
intuition:
-
In terms of exhaustiveness checking, the compiler acts as if any
case guard might evaluate to false.
- In terms of exhaustiveness checking, numeric constants cannot
make patterns exhaustive. Even if you list out all 256 characters,
the compiler will act as though there is another possibility you
have not checked.
We emphasize that checking does not just involve the ``top-level'' of
patterns. For example, the compiler rejects the switch below because
the third case is redundant:
enum Color { Red, Green };
void f(enum Color c1, enum Color c2) {
switch ($(c1,c2)) {
case $(Red,x): return;
case $(x,Green): return;
case $(Red,Green): return;
default: return;
}
}
Rules for No Implicit Fall-Through
As mentioned several times now, Cyclone differs from C in that a case
body may not implicitly fall-through to the next case. It is a
compile-time error if such a fall-through might occur. Because the
compiler cannot determine exactly if an implicit fall-through could
occur, it uses a precise set of rules, which we only sketch here. The
exact same rules are used to ensure that a function (with return type
other than void) does not ``fall off the bottom.'' The rules
are very similar to the rules for ensuring that Java methods do not
``fall off the bottom.''
The general intuition is that there must be a break, continue, goto,
return, or throw along all control-flow paths. The value of
expressions is not considered except for numeric constants and logical
combinations (using &&, ||, and ? :) of
such constants. The statement try s catch ... is checked as
though an exception might be thrown at any point while s executes.