12 Advanced Features
The features in this section are largely independent of the rest of
Cyclone. It is probably safe to skip them when first learning the
language, but it is valuable to learn them at some point because they
add significant expressiveness to the language.
12.1 Existential Types
The implementation of a struct type can have
existentially bound type variables (as well as region
variables, tag variables, and so on). Here is a useless example:
struct T { <`a> `a f1; `a f2; };
Values of type struct T have two fields with the same (boxed)
type, but there is no way to determine what the type is. Different
values can use different types. To create
such a value, expressions of any appropriate type suffice:
struct T x = T{new 3, new 4};
Optionally, you can explicitly give the type being used for
`a:
struct T x = T{<int*@notnull> new 3, new 4};
As with other lists of type variables, multiple existentially bound
types should be comma-separated.
Once a value of an existential variant is created, there is no way to
determine the types at which it was used. For example,
T("hi","mom") and T(8,3) both have type
struct T.
The only way to read fields of a struct with existentially
bound type variables is pattern matching. That is, the
field-projection operators (. and ->) will
not type-check. The pattern can give names to the correct
number of type variables or have the type-checker generate names for
omitted ones.
Continuing our useless example, we can write:
void f(struct T t) {
let T{<`b> x,y} = t;
x = y;
}
We can now see why the example is useless; there is really nothing
interesting that f can do with the fields of t. In
other words, given T("hi","mom"), no code will ever be
able to use the strings "hi" or "mom". In any case,
the scope of the type `b is the same as the scope of the
variables x and y. There is one more restriction:
For subtle reasons, you cannot use a reference pattern (such as
*x) for a field of a struct that has existentially
bound type variables.
Useful examples invariably use function pointers. For a realistic
library, see fn.cyc in the distribution. Here is a smaller (and
sillier) example; see the following two sections for an
explanation of why the regions(`a) > `r stuff is necessary.
int f1(int x, int y) { return x+y; }
int f2(string x, int y) {printf("%s",x); return y; }
struct T<`r::R> {<`a> : regions(`a) > `r
`a f1;
int f(`a, int);
};
void g(bool b) {
struct T<`H> t;
if(b)
t = Foo(37,f1);
else
t = Foo("hi",f2);
let T{<`b> arg,fun} = t;
`b x = arg;
int (*f)(`b,int) = fun;
f(arg,19);
}
We could replace the last three lines with fun(arg)---the
compiler would figure out all the types for us. Similarly, the
explicit types above are for sake of explanation; in practice, we tend
to rely heavily on type inference when using these advanced typing
constructs.
12.2 The Truth About Effects, Capabilities, and Region Bounds
An effect or capability is a set of (compile-time)
region names. We use effect to refer to the region names that
must be ``live'' for some expression to type-check and
capability to refer to the region names that are ``live'' at
some program point. A region bound indicates that all the
regions in a set outlive one particular region. Each program point
has a set of ``known region bounds''.
The intuition is that a program point's capability and region bounds
must imply that an expression's effect describes live regions, else
the expression does not type-check. As we'll see, default effects for
functions were carefully designed so that most Cyclone code runs no
risk of such an ``effect check'' ever failing. But using existential
types effectively requires a more complete understanding of the
system, though perhaps not as complete as this section presents.
The form of effects or capabilities is described as follows:
-
{} is the empty set. At most the heap region
is accessed by an expression having this effect.
- {`r} is the set containing exactly the region name
`r.
- e1 + e2 is the set containing the effects e1 and e2.
That is, we write + for set-union.
- regions(t), where t is a type is the set
containing all of the region names contained in t and
regions(`a) for all type variables `a contained in
t.
The description of regions(t) appears circular, but in fact
if we gave the definition for each form of types, it would not be.
The point is that regions(`a) is an ``atomic effect'' in the
sense that it stands for a set of regions that cannot be further
decomposed without knowing `a. The primary uses of
regions(t) are descibed below.
The form of a region bound is e > r where e is an
effect and r is a region name.
We now describe the capability at each program point. On function
entry, the capability is the function's effect (typically the regions
of the parameters and result, but fully described below). In
a local block or a growable-region statement, the capability is the
capability of the enclosing context plus the block/statement's region
name.
The known region bounds at a program point are described similarly: On
function entry, the bounds are the function prototype's explicit
bounds (typically none, but fully described below). In a local block
or a growable-region statement, we add e > `r where
e is the capability of the enclosing context and `r
is the block/statement's region name. That is, we add that the set of
region names for the enclosing context describes only regions that
will outlive the region described by `r. (As usual, the
compiler generates `r in the common case that none is
explicitly provided.) Creating a dynamic region does not introduce
any region bounds, but opening one does. Creating a
resettable growable region does not introduce any bounds.
We can now describe an expression's effect: If it reads or writes to
memory described by a region name `r, then the effect
contains {`r}. If it calls a function with effect
e, then the effect conatins e. Every function
(type) has an effect, but programmers almost never write down an
explicit effect. To do so, one puts ``; e'' at the end of
the parameter list, wehre e is an effect. For example, we
could write:
`a id(`a x; {}) { return x; }
because the function does not access any memory.
If a function takes parameters of types t1,...,tn and
returns a value of type t, the default effect is simply
regions(t1)+...+regions(tn)+regions(t). In English, the
default assumption is that a function may dereference any pointers it
is passed, so the corresponding regions must be live. In our example
above, the default effect would have been regions(`a). If
the caller had instantiated `a with int*`r, then
with the default effect, the type-checker would require `r to
be live, but with the explicit effect {} it would not.
However, dangling pointers can be created only when using existential
types, so the difference is rarely noticed.
By default, a function (type) has no region bounds. That is, the
function does not assume any ``outlives'' relationships among the
regions it accesses. Adding explicit outlives relationships enables
more subtyping in the callee and more stringent requirements at the
call site (namely that the relationship holds).
We can describe when a capability e and a set of region
bounds b imply an effect, although your intuition probably
suffices. A ``normalized effect'' is either {} or unions
of ``atomic effects'', where an atomic effect is either
{`r} or regions(`a). For any effect e1,
we can easily compute an equivalent normalized effect e2.
Now, e and b imply e1 if they imply every
{`r} and regions(`a) in e2. To imply
{`r} (or regions(`a)), we need {`r} (or
regions(`a)) to be in (a normalized effect of) e) or
for b to contain some e3 > `r2 such that e
and b imply `r2 and e3 and b imply
{`r} (or regions(`a)). Or something like that.
All of these complications are unnecessary except for existential
types, to which we now return. Explicit bounds are usually necessary
to make values of existential types usable. To see why, consider the
example from the previous section, with the struct
declaration changed to remove the explicit bound:
struct T<`r::R> {<`a> : regions(`a) > `r
`a f1;
int f(`a, int);
};
(So the declaration of t should just have type struct
T.) Now the function call f(arg,19) at the end of
g will not type-check because the (default) effect for
f includes regions(`b), which cannot be established
at the call site. But with the bound, we know that
regions(`b) > `H, which is sufficient to prove the call
won't read through any dangling pointers.
12.3 Interprocedural Memory Initialization
We currently have limited support for functions that initialize
parameters. if you have an *@notnulll1 parameter (pointing into any region),
you can use an attribute __attribute__((initializes(1))) (where it's
the first parameter, use a different number otherwise) to indicate
that the function initializes through the parameter.
Obviously, this affects the definite-assignment analysis for the
callee and the call-site. In the callee, we know the parameter is
initialized, but not what it points to. The memory pointed to must be
initialized before returning. Care must be taken to reject this code:
void f(int *@notnull*@notnull x) __attribute__((initializes(1))) {
x = new (new 0);
return x;
}
In the caller, the actual argument must point to a known location.
Furthermore, this location must not be reachable from any other actual
arguments, i.e., there must be no aliases available to the callee.
Two common idioms not yet supported are:
-
The parameter is
initialized only if the return value satisfies some predicate; for
example, it is 0.
- The caller can pass NULL, meaning do not initialize through this
parameter.