3 Pointers
As in C, one should think of Cyclone pointers as just addresses.
Operations on pointers,
such as *x, x->f, and x[e], behave the same
as in C, with the exception that run-time checks sometimes precede
memory accesses. (Exactly when and where these checks occur is
described below.) However, Cyclone prevents memory errors such as
dereferencing dangling pointers or indexing outside an array's bounds,
so it may reject some operations on pointers that C would accept.
In order to enforce memory safety and distinguish between different
uses of pointers, Cyclone pointer types include additional qualifiers
when compared to their C counterparts. These qualifiers are
described briefly below and in more detail throughout this section:
-
@nullable: Pointers with this qualifier may be NULL.
This qualifier is present by default and overridden by the @notnull
qualifier. A dereference of a @nullable pointer will generally
be preceded by a NULL-check.
- @notnull: Pointers with this qualifier may never be
NULL, and thus never need to be checked for NULL
upon dereference. This qualifier is not present by default and must be
put in explicitly. The qualifier may be abbreviated by using ``@''
in place of the usual pointer ``*''. So, for instance,
the type ``int *@notnull'' can be abbreviated by
``int @''. Currently, the @notnull qualifier cannot be
used on pointers with the @fat qualifier.
- @thin: Pointers with this qualifier have the same
representation as in C (i.e., a single machine word.) However,
arithmetic on thin pointers is not supported except when the
pointer is also qualified as @zeroterm (see below).
This qualifier is present by default and overridden by the @fat
qualifier.
- @fat: Pointers with this qualifier consist of a
thin pointer plus additional information needed to support safe
pointer arithmetic and dereferencing. (The current implementation
uses three words in total.) Each dereference of a fat
pointer incurs both a NULL-check and a bounds check to
ensure that the pointer points to a valid object. The
@fat qualifier
cannot be used with the @notnull or @numelts
qualifiers (though we expect to change this in the future.)
The numelts operation may be applied to fat pointers to
determine the number of elements in the (forward) sequence that
may be safely dereferenced. Finally, the qualifier may be
abbreviated by using ``?''
in place of the usual pointer ``*''. So, for instance,
the type ``int *@fat'' can be abbreviated by
``int ?''.
- @numelts(e): The term e must be a static
expression (i.e., a constant expression or one involving valueof)
and indicates an upper bound on the number of objects that
that the pointer refers to. For example, if p has type
T *@numelts(42), then either p is NULL
or else for 0 £ i < e, the expression
p[i] is guaranteed to
contain a T object. This qualifier may not be used
in conjunction with @fat. If omitted on a @thin
pointer, then @numelts(1) is inserted by default. This
qualifier can be abbreviated by writing the bounds expression e
in curly braces. For instance, the type ``int *@numelts(42)''
can be abbreviated by ``int *{42}''.
- @zeroterm: This qualifier is used for zero-terminated
sequences, such as C-style strings, and provides an alternative to
fat pointers for doing safe pointer arithmetic without knowing
bounds statically. This qualifier can only be used on pointers whose
element type admits zero or NULL as a value, including
integral types, and @nullable pointer types. Arithmetic
in the forward direction is possible with zero-terminated pointers
(e.g., p++) as is a subscript with a positive index
(e.g., p[i]). However, the compiler inserts code to ensure
that the index does not step over the final zero. When updating
a zero-terminated array, the compiler also ensures that the final
zero is not overwritten with a non-zero value. It is generally best
to coerce a thin, zero-terminated pointer to a fat, zero-terminated
pointer to avoid these overheads. This qualifier is
only present by default for char pointers. It can be
overridden with the @nozeroterm qualifier. This qualifier
may also be used on array types.
- @nozeroterm:
This qualifier is present by default
on all pointer types except for char pointers. It is
used to override the implicit @zeroterm qualfier for
char pointers. This qualifier may also be used on array types.
- @region(`r): This qualifier is used to indicate the
region into which a pointer points (in this case region `r).
The qualifier may be abbreviated by simply writing the region name
after any other Cyclone qualifiers.
For instance, the type ``int *@notnull @region`r''
may be abbreviated as ``int @`r''. The rules about default
region annotations are context-dependent and therefore described below.
3.1 Pointer Subtyping
Some pointer types may be safely used in contexts where another
pointer type is expected. In particular, T*@notnull is a
subtype of T*@nullable which means that a not-null pointer
can be passed anywhere a possibly-null pointer is expected.
Similarly, a T*@numelts(42) pointer can be passed anywhere a
T*@numelts(30) pointer is expected, because the former
describes sequences that have at least 42 elements, which satisifes
the constraint that it has at least 30 elements.
In addition, T*@region(`r) is a subtype of T*@region(`s)
when region `r outlives region `s. The
heap region (`H) outlives every region so you can safely
use a heap pointer anywhere another region is expected. Outer
blocks and outer regions outlive inner blocks and regions. For
example the following code is type-correct:
void foo(int x) {
int *@region(`foo) y = &x;
L:{
int *@region(`L) z = y;
}
}
because region `foo outlives region `L. By
default, regions passed in to a function outlive any regions
defined in the function (because they will live across the
function call). Finally, you can specify outlives relations
among region parameters within a function's prototype. The
following code specifies that input region `r outlives
input region `s so it's safe to treat `r pointers
as if they were `s pointers:
void bar(int *@region(`r) x,
int *@region(`s) y : {`s} > `r);
In general, the outlives relation is specified after the function
arguments by separating the relations with a colon (:) and
giving a comma-separated list of primitive outlives relations.
These outlives relations are of the form ``{`r1,...,`rn} > `r''
and specify that region `r outlives all of the regions
`r1 through `rn.
Finally, when T is a subtype of S, then
T* is a subtype of const S*. So, for
instance, if we declare:
// nullable int pointers
typedef int * nintptr_t;
// not-nullable int pointers
typedef int *@notnull intptr_t;
then intptr_t * is a subtype of const nintptr_t *.
Note, however, that ``const'' is important to get this
kind of deep subtyping.
The following example shows what could go wrong if we allowed
deep subtyping without the const:
void f(int *@notnull *@notnull x) {
int *@nullable *@notnull y = x;
// would be legal if int *@nullable *@notnull
// was a subtype of int *@notnull *@notnull.
*y = NULL;
// legal because *y has type int *@nullable
**x;
// seg faults even though the type of *x is
// int *@notnull
}
3.2 Pointer Coercions
In addition to pointer subtyping, Cyclone provides a number of
coercions which allow you to convert a pointer value from one
type to another. For instance, you can coerce a thin pointer
with 42 elements to a fat pointer:
int arr[42];
int *@thin @numelts(42) p = arr;
int *@fat pfat = p;
As another example, you can coerce a thin, zero-terminated pointer
to a fat, zero-terminated pointer:
int strlen(char *@zeroterm s) {
char *@fat @zeroterm sfat = s;
return numelts(s);
}
In both cases, the compiler inserts code to convert from the
thin representation to an appropriate fat representation. In
the former case, the bounds information can be calculated
statically. In the latter case, the bounds information is
calculated dynamically (by looking for the zero that terminates
the sequence.) In both cases, the coercion is guaranteed to
succeed, so the compiler does not emit a warning.
In other cases, a coercion can cause a run-time exception to be
thrown. For instance, if you attempt to coerce a @nullable
pointer to a @notnull pointer, and the value happens to be
NULL, then the exception Null_Exception is thrown.
In general, the compiler will warn you when you try to coerce
from one pointer representation to another where a run-time
check must be inserted, and that check might fail. A dataflow
analysis is used to avoid some warnings, but in general,
it's not smart enough to get rid of all of them. In these
cases, you can explicitly cast the pointer from one representation
to the other, and the compiler will not generate a warning
(though it will still insert the run-time check to ensure
safety.)
Here is a list of some of the coercions that are possible:
-
T can be coerced to S when T is a subtype of
S.
- T*@nullable
can be coerced to T*@notnull but might
throw an exception at run-time.
- T*@thin@numelts(c) can be coerced to T*@fat
when c is a constant expression.
- T*@fat can be coerced to T*@thin @numelts(c)
when c is a constant expression, but might throw an exception
at run-time.
- T*@thin@zeroterm
can be coerced to T*@fat@zeroterm
and vice versa.
- T*@thin@zeroterm can be coerced to const T*@fat@nozeroterm.
- T*@thin@zeroterm can be coerced to T*@fat@nozeroterm, but access to the trailing zero is lost.
3.3 Default Region Qualifiers
The rules the compiler uses for filling in @region qualifiers
when they are omitted from pointer types are a little complicated, but
they are designed to avoid clutter in the common case:
-
In function-argument types, a fresh (polymorphic) region name is used.
- In function-return types, `H is used.
- In type definitions, including typedef, `H is used.
- In function bodies, unification is used to infer the region
based on how the location assigned the pointer type is used in the
function.
Thus, be warned that
typedef int * foo_t;
void g(foo_t);
is different than
void g(int *);
The reason is clear when we fill in the default region qualifiers.
In the first case, we have:
typedef int *@region(`H) foo_t;
void g(foo_t);
whereas in the second case we have:
void g(int *@region(`r));
3.4 Static Expression Bounds
The bound for the @numelts qualifier
must be a static expression. A static
expression is either a constant expression, or an expression involving
valueof(T) for a type-level expression T. The
valueof construct is used to connect the value of a run-time
integer to the static bound on an array. For example, the following
function takes in an integer num and pointer to a sequence
of num integers and returns the sum of the sequence:
int sum(tag_t<`n> num,
int *@notnull @numelts(valueof(`n)) p) {
int a = 0;
for (unsigned i = 0; i < num; i++)
a += p[i];
}
The type of num is specified as tag_t<`n>. This
simply means that num holds an integer value, called `n,
and the number of elements of p is equal to n.
This form of dependency is common enough that it can be abbreviated
as follows:
int sum(tag_t num, int p[num]);
and the compiler will fill in the missing information.