Advanced Features

12 Advanced Features

The features in this section are largely independent of the rest of Cyclone. It is probably safe to skip them when first learning the language, but it is valuable to learn them at some point because they add significant expressiveness to the language.

12.1 Existential Types

The implementation of a struct type can have existentially bound type variables (as well as region variables, tag variables, and so on). Here is a useless example:

  struct T { <`a> `a f1; `a f2; };

Values of type struct T have two fields with the same (boxed) type, but there is no way to determine what the type is. Different values can use different types. To create such a value, expressions of any appropriate type suffice:

  struct T x = T{new 3, new 4};

Optionally, you can explicitly give the type being used for `a:

  struct T x = T{<int*@notnull> new 3, new 4};

As with other lists of type variables, multiple existentially bound types should be comma-separated.

Once a value of an existential variant is created, there is no way to determine the types at which it was used. For example, T("hi","mom") and T(8,3) both have type struct T.

The only way to read fields of a struct with existentially bound type variables is pattern matching. That is, the field-projection operators (. and ->) will not type-check. The pattern can give names to the correct number of type variables or have the type-checker generate names for omitted ones. Continuing our useless example, we can write:

  void f(struct T t) {
    let T{<`b> x,y} = t;
    x = y;
  }

We can now see why the example is useless; there is really nothing interesting that f can do with the fields of t. In other words, given T("hi","mom"), no code will ever be able to use the strings "hi" or "mom". In any case, the scope of the type `b is the same as the scope of the variables x and y. There is one more restriction: For subtle reasons, you cannot use a reference pattern (such as *x) for a field of a struct that has existentially bound type variables.

Useful examples invariably use function pointers. For a realistic library, see fn.cyc in the distribution. Here is a smaller (and sillier) example; see the following two sections for an explanation of why the regions(`a) > `r stuff is necessary.

  int f1(int x, int y) { return x+y; }
  int f2(string x, int y) {printf("%s",x); return y; }
  struct T<`r::R> {<`a> : regions(`a) > `r 
    `a f1; 
    int f(`a, int); 
  };
  void g(bool b) {
    struct T<`H> t;
    if(b)
      t = Foo(37,f1);
    else
      t = Foo("hi",f2);
    let T{<`b> arg,fun} = t;
    `b x = arg;
    int (*f)(`b,int) = fun;
    f(arg,19);
  }

We could replace the last three lines with fun(arg)---the compiler would figure out all the types for us. Similarly, the explicit types above are for sake of explanation; in practice, we tend to rely heavily on type inference when using these advanced typing constructs.

12.2 The Truth About Effects, Capabilities, and Region Bounds

An effect or capability is a set of (compile-time) region names. We use effect to refer to the region names that must be ``live'' for some expression to type-check and capability to refer to the region names that are ``live'' at some program point. A region bound indicates that all the regions in a set outlive one particular region. Each program point has a set of ``known region bounds''.

The intuition is that a program point's capability and region bounds must imply that an expression's effect describes live regions, else the expression does not type-check. As we'll see, default effects for functions were carefully designed so that most Cyclone code runs no risk of such an ``effect check'' ever failing. But using existential types effectively requires a more complete understanding of the system, though perhaps not as complete as this section presents.

The form of effects or capabilities is described as follows:

{} is the empty set. At most the heap region is accessed by an expression having this effect.
{`r} is the set containing exactly the region name `r.
e1 + e2 is the set containing the effects e1 and e2. That is, we write + for set-union.
regions(t), where t is a type is the set containing all of the region names contained in t and regions(`a) for all type variables `a contained in t.

The description of regions(t) appears circular, but in fact if we gave the definition for each form of types, it would not be. The point is that regions(`a) is an ``atomic effect'' in the sense that it stands for a set of regions that cannot be further decomposed without knowing `a. The primary uses of regions(t) are descibed below.

The form of a region bound is e > r where e is an effect and r is a region name.

We now describe the capability at each program point. On function entry, the capability is the function's effect (typically the regions of the parameters and result, but fully described below). In a local block or a growable-region statement, the capability is the capability of the enclosing context plus the block/statement's region name.

The known region bounds at a program point are described similarly: On function entry, the bounds are the function prototype's explicit bounds (typically none, but fully described below). In a local block or a growable-region statement, we add e > `r where e is the capability of the enclosing context and `r is the block/statement's region name. That is, we add that the set of region names for the enclosing context describes only regions that will outlive the region described by `r. (As usual, the compiler generates `r in the common case that none is explicitly provided.) Creating a dynamic region does not introduce any region bounds, but opening one does. Creating a resettable growable region does not introduce any bounds.

We can now describe an expression's effect: If it reads or writes to memory described by a region name `r, then the effect contains {`r}. If it calls a function with effect e, then the effect conatins e. Every function (type) has an effect, but programmers almost never write down an explicit effect. To do so, one puts ``; e'' at the end of the parameter list, wehre e is an effect. For example, we could write:

`a id(`a x; {}) { return x; }

because the function does not access any memory.

If a function takes parameters of types t1,...,tn and returns a value of type t, the default effect is simply regions(t1)+...+regions(tn)+regions(t). In English, the default assumption is that a function may dereference any pointers it is passed, so the corresponding regions must be live. In our example above, the default effect would have been regions(`a). If the caller had instantiated `a with int*`r, then with the default effect, the type-checker would require `r to be live, but with the explicit effect {} it would not. However, dangling pointers can be created only when using existential types, so the difference is rarely noticed.

By default, a function (type) has no region bounds. That is, the function does not assume any ``outlives'' relationships among the regions it accesses. Adding explicit outlives relationships enables more subtyping in the callee and more stringent requirements at the call site (namely that the relationship holds).

We can describe when a capability e and a set of region bounds b imply an effect, although your intuition probably suffices. A ``normalized effect'' is either {} or unions of ``atomic effects'', where an atomic effect is either {`r} or regions(`a). For any effect e1, we can easily compute an equivalent normalized effect e2. Now, e and b imply e1 if they imply every {`r} and regions(`a) in e2. To imply {`r} (or regions(`a)), we need {`r} (or regions(`a)) to be in (a normalized effect of) e) or for b to contain some e3 > `r2 such that e and b imply `r2 and e3 and b imply {`r} (or regions(`a)). Or something like that.

All of these complications are unnecessary except for existential types, to which we now return. Explicit bounds are usually necessary to make values of existential types usable. To see why, consider the example from the previous section, with the struct declaration changed to remove the explicit bound:

  
struct T<`r::R> {<`a> : regions(`a) > `r 
  `a f1; 
  int f(`a, int); 
};

(So the declaration of t should just have type struct T.) Now the function call f(arg,19) at the end of g will not type-check because the (default) effect for f includes regions(`b), which cannot be established at the call site. But with the bound, we know that regions(`b) > `H, which is sufficient to prove the call won't read through any dangling pointers.

12.3 Interprocedural Memory Initialization

We currently have limited support for functions that initialize parameters. if you have an *@notnulll1 parameter (pointing into any region), you can use an attribute __attribute__((initializes(1))) (where it's the first parameter, use a different number otherwise) to indicate that the function initializes through the parameter.

Obviously, this affects the definite-assignment analysis for the callee and the call-site. In the callee, we know the parameter is initialized, but not what it points to. The memory pointed to must be initialized before returning. Care must be taken to reject this code:

void f(int *@notnull*@notnull x) __attribute__((initializes(1))) { 
  x = new (new 0); 
  return x; 
}

In the caller, the actual argument must point to a known location. Furthermore, this location must not be reachable from any other actual arguments, i.e., there must be no aliases available to the callee.

Two common idioms not yet supported are:

The parameter is initialized only if the return value satisfies some predicate; for example, it is 0.
The caller can pass NULL, meaning do not initialize through this parameter.