4 Tagged Unions and Datatypes
In addition to struct, enum, and union,
Cyclone provides @tagged union and datatype
declarations as ways to construct new aggregate types. Like a
union type, each @tagged union and datatype
has a number of variants (or members). Unlike conventional
unions, an object of a @tagged union or
datatype type is exactly one variant, we can detect (or
discriminate) that variant at run-time, and the language prevents
using an object as though it had a different variant.
The difference between @tagged unions and datatypes
is that the former look and behave much like traditional unions,
whereas the latter look and behave more like the algebraic datatypes
found in functional languages such as ML. Furthermore, datatypes
can be either closed or @extensible. A closed datatype's
members are specified all together when the datatype is declared,
whereas an @extensible datatype supports adding new members
after the fact (much like adding a new sub-class to a class-based
OO language.)
In this section, we first discuss @tagged unions,
then closed datatypes, and finally @extensible datatypes.
4.1 Tagged Unions
A tagged union declaration looks just like a C union,
except that it you must specify the @tagged qualifier
when declaring it. For example:
@tagged union Foo {
int i;
double d;
char *@fat s;
};
The primary difference with C unions is that a tagged union
includes a hidden tag. The tag indicates which member was
last written. So, for example:
union Foo x;
x.i = 3;
x.s = "hello";
causes the hidden tag to first indicate that the i
member was written, and then is updated to record that the
s member was written.
When you attempt to read a member of a tagged union, a
check is done on the hidden tag to ensure that this was
the last member written, and thus the union contains a
valid object of that member's type. If some other member
was last updated, then a Match_Exception will
be thrown.
You can test the hidden tag of a tagged union by using
the tagcheck operation. For example:
void printFoo(union Foo x) {
if (tagcheck(x.i))
printf("%d",x.i);
else if (tagcheck(x.d))
printf("%g",x.d);
else if (tagcheck(x.s))
printf("%s",x.s);
}
Alternatively, you can use pattern matching (described
in the next section) which will ensure that you cover
all of the cases properly. For instance, the function
above may be rewritten as:
void printFoo(union Foo x) {
switch (x) {
case {.i = i}: printf("%d",i); break;
case {.d = d}: printf("%g",d); break;
case {.s = s}: printf("%s",s); break;
}
}
If we failed to leave out one of the cases in the
pattern match, then the compiler would warn us. This
is particularly helpful when you add new variants to
a tagged union, for then the compiler pinpoints the
spots that you need to update in your code. Therefore,
we encourage the use of pattern matching where possible.
4.2 Datatypes
At its simplest, a datatype looks just like an enum
declaration. For example, we could say:
datatype Color { Red, Green, Blue };
As with enum, the declaration creates a type (called
datatype Color) and three constants Red, Green, and
Blue. Unlike enum, these constants do not have type
datatype Color. Instead, each variant has its own type,
namely datatype Color.Red, datatype Color.Green, and
datatype Color.Blue. However, a pointer to one of these values
can be treated as a sub-type of a pointer to a
datatype Color. So you can write:
datatype Color.Red red = Red;
datatype Color *c = &red;
In this simple example, we are splitting hairs, but we will soon find
all these distinctions useful.
Unlike enum, datatype variants may carry any fixed
number of values, as in this example:
datatype Shape {
Point,
Circle(float),
Ellipse(float,float),
Polygon(int,float),
};
A Point has no accompanying information, a Circle has a
radius, an Ellipse has two axis lengths, and a (regular)
Polygon has a number of sides and a radius. (The value fields
do not have names, so it is often better style to have a variant carry
one value of a struct type, which of course has named members.) This
example creates five types: datatype Shape,
datatype Shape.Point, datatype Shape.Circle,
datatype Shape.Ellipse, and datatype Shape.Polygon. Like in
our previous example, datatype Shape.Point* is a subtype of
datatype Shape* and Point is a constant of
type datatype Shape.Point.
Variants that carry one or more values are treated differently.
Circle becomes a constructor; given a float it produces an
object of type datatype Shape.Circle, for example Circle(3.0).
Similarly, Ellipse(0,0) has type datatype Shape.Ellipse
(thanks to implicit casts from int to float for 0) and
Polygon(7,4.0) has type datatype Shape.Polygon. The
arguments to a constructor can be arbitrary expressions of the correct
type, for example, Ellipse(rand(), sqrt(rand())).
Here are some examples which allocate a Point and Circle
respectively, but then use subtyping to treat the resulting values
as if they are Shape pointers:
datatype Shape *s1 = new Point;
datatype Shape *s2 = new Circle(3.0);
Datatypes are particularly useful for building recursive structures.
For example, a small language of arithmetic expressions might look
like this:
enum Unops { Negate, Invert};
enum Binops { Add, Subtract, Multiply, Divide };
typedef datatype Exp *@notnull exp_t;
datatype Exp {
Int(int),
Float(float),
Unop(enum Unops, exp_t),
Binop(enum Binops, exp_t, exp_t)
};
A function returning an expression representing the multiplication of
its parameter by two can be written like this:
exp_t double_exp(datatype Exp e) {
return new Binop(Multiply, e, new Int(2));
}
Accessing Datatype Variants
Given a value of a datatype
type, such as datatype Shape, we do not know which variant it is.
It could be a Circle or Shape, etc. In Cyclone, we
use pattern matching to determine which variant a given datatype
value actually is, and to extract the arguments that were used to
build the datatype value. For example, here is how you could define
isCircle:
bool isCircle(datatype Shape *s) {
switch(s) {
case &Circle(r): return true;
default: return false;
}
}
When a switch statement's argument is a pointer to a datatype,
the cases describe variants. One variant of datatype Shape * is a
pointer to a Circle, which carries one value. The
corresponding pattern has & for the pointer, Circle for
the constructor name, and one identifier for each value carried by
Circle. The identifiers are binding occurrences (declarations,
if you will), and the initial values are the values of the fields of
the Circle at which s points. The scope is the extent
of the case clause.
Here is another example:
[The reader is asked to indulge compiler-writers who have
forgotten basic geometry.]
extern area_of_ellipse(float,float);
extern area_of_poly(int,float);
float area(datatype Shape *s) {
float ans;
switch(s) {
case &Point:
ans = 0;
break;
case &Circle(r):
ans = 3.14*r*r;
break;
case &Ellipse(r1,r2):
ans = area_of_ellipse(r1,r2);
break;
case &Polygon(sides,r):
ans = area_of_poly(sides,r);
break;
}
return ans;
}
The cases are compared in order against s. The following are
compile-time errors:
-
It is possible that a member of the datatype type matches
none of the cases. Note that default matches everything.
- A case is useless because it could only match if one of the
earlier cases match. For example, a default case at the end of the
switch in area would be an error.
As you can discover in Section 575Pattern Matchingsection.5, Cyclone has much
richer pattern-matching support than we have used here.
Polymorphism and Datatypes
A datatype declaration may be
polymorphic over types and regions just like a struct
or union definition (see the section on
polymorphism). For example, here is a
declaration for binary trees where the leaves can hold some
BoxKind `a:
datatype Tree<`a> {
Leaf(`a);
Node(datatype Tree<`a>*, datatype Tree<`a>*);
};
In the above example, the root may be in any region, but all children
will be in the heap. The following version allows the children to be in any
region, but they must all be in the same region. (The root can still
be in a different region.)
datatype Tree<`a,`r> {
Leaf(`a);
Node(datatype Tree<`a,`r> *`r,
datatype Tree<`a,`r> *`r);
};
Future
-
Currently, given a value of a variant type (e.g.,
datatype Shape.Circle), the only way to access the fields is
with pattern-matching even though the variant is known. We may
provide a tuple-like syntax in the future.
4.3 Extensible Datatypes
We now explain how an @extensible datatype type differs
from a datatype.
The main difference is that later declarations may continue to
add variants. Extensible datatypes are useful for allowing clients to
extend data structures in unforeseen ways. For example:
@extensible datatype Food;
datatype Food { Banana; Grape;
Pizza(list_t<datatype Food*>) };
datatype Food { Candy; Broccoli };
After these declarations, Pizza(new List(new Broccoli, NULL)) is a
well-typed expression.
If multiple declarations include the same variants, the variants must
have the same declaration (the number of values, types for the values,
and the same existential type variables).
Because different files may add different variants and Cyclone
compiles files separately, no code can know (for sure) all the
variants of an @extensible datatype.
Hence all pattern-matches against a
value of an @extensible datatype
type must end with a case that matches
everything, typically default.
There is one built-in @extensible datatype type:
@extensible datatype exn is the
type of exceptions. Therefore, you declare new exn
constructors like this:
datatype exn {BadFilename(string)};
The implementation of @extensible datatype
types is very similar to that of
datatype types, but variant tags cannot be represented as
small integers because of separate compilation. Instead, these
tags are represented as pointers to unique locations in static
data.