11 Definite Assignment
It is unsafe to allow memory to be used as a value of a particular
type just because the memory has been allocated at that type. In
other words, you cannot use memory that has not been properly
initialized. Most safe languages enforce this invariant by making
allocation and initialization a single operation. This solution is
undesirable in Cyclone for at least two reasons:
-
Many idioms require declaring variables in a wider scope than is
convenient for initializing the variable.
- C code, which we wish to port to Cyclone, is full of separated
allocation and initialization, including all heap-allocated storage
(i.e., malloc).
Inspired by Java's rules for separate declaration and initialization
of local variables, Cyclone has a well-defined, sound system for
checking that memory is written before it is used. The rules are more
complicated than in Java because we support pointers to uninitialized
memory, as is necessary for malloc, and because C's
order-of-evaluation is not completely specified.
Here we begin with idioms that the analysis does and does not permit.
With a basic sense of the idea, we expect programmers can generally
not worry about the exact rules of the analysis. However, when the
compiler rejects code because memory may be uninitialized, the
programmer needs to know how to rewrite the code in order to pass the
analysis. For this reason, we also give a more complete description
of the rules.
We begin with examples not involving pointers. If you are familiar
with Java's definite assignment, you can skip this part, but note that
struct and tuple fields are tracked separately. So you can use
an initialized field before another field of the same object is
initialized. (Java does not allow separate allocation and
initialization of object fields. Rather, it inserts null or 0 for
you.)
Finally, we do allow uninitialized numeric values to be accessed.
Doing so is dangerous and error-prone, but does not compromise type
safety, so we allow it.
The following code is accepted:
extern int maybe();
int f() {
int *x, *y, *z;
if(maybe())
x = new 3;
else
x = new 4;
while(1) {
y = x;
break;
}
if(z = new maybe() && maybe() && q = new maybe())
return q;
else
return z;
}
In short, the analysis checks that every control-flow path between a
variable's declaration and use includes an assignment to the variable.
More generally, the analysis works on memory locations, not just
variables. The analysis knows that loop bodies and conditional
branches are only executed if the value of certain expressions are 0
or not 0.
The following code is safe, but is not accepted:
extern int maybe();
int f() {
int * x = new 1;
int * y;
int b = maybe();
if(b)
y = 2;
if(b)
return y;
return 0;
}
The problem is that the analysis does not know that the second
if-guard is true only if the first one is. General support for
such ``data correlation'' would require reasoning about two different
expressions at different times evaluating to the same value.
Unlike Java, Cyclone supports pointers to uninitialized memory. The
following code is accepted:
extern int maybe();
int f() {
int * x;
int * z;
int ** y;
if(maybe()) {
x = new 3;
y = &x;
} else {
y = &z;
z = new 3;
}
return *y;
}
The analysis does not know which branch of the if will be taken, so
after the conditional it knows that either ``x is initialized
and y points to x'' or ``z is initialized and
y points to z.'' It merges this information to
``y points to somewhere initialized,'' so the function returns
an initialized value, as required. (It is safe to return
uninitialized ints, but we reject such programs anyway.)
However, this code is rejected even though it is safe:
extern int maybe();
int f() {
int * x;
int * z;
int ** y;
if(maybe()) {
y = &x;
} else {
y = &z;
}
x = new 3;
z = new 3;
return *y;
}
The problem is that the analysis loses too much information after the
conditional. Because y may allow (in fact, does allow)
access to uninitialized memory and the analysis does not know exactly
where y points, the conditional is rejected.
A compelling use of pointers to uninitialized memory is porting C code
that uses malloc, such as the following (the cast is not
necessary in Cyclone):
struct IntPair { int x; int y; };
struct IntPair * same(int z) {
struct IntPair * ans =
(struct IntPair *)malloc(sizeof(struct IntPair));
ans->x = z;
ans->y = z;
return ans;
}
There is limited support for passing a pointer to uninitialized memory
to a function that initializes it. See
Section 12150Advanced Featuressection.12.
Certain expression forms require their arguments to be fully
initialized (that is, everything reachable from the expression must be
initialized) even though the memory is not all immediately used.
These forms are the expression in ``let p = e'' and the
argument to switch. We hope to relax these restrictions in
the future.
You should now know enough to program effectively in Cyclone without
immediately initializing all memory. For those wanting a more
complete view of the language definition (i.e., what the analysis does
and does not accept), we now go into the details. Note that the
analysis is sound and well-specified---there is never a reason that
the compiler rejects your program for unexplainable reasons.
For each local variable and for each program point that allocates
memory, the analysis tracks information about each field. We call
each such field a place. For example, in this code:
struct B { int * x; $(int*,int*) y;};
void f() {
struct B b;
struct B * bp = malloc(sizeof(B));
...
}
the places are b.x, b.y[0], b.y[1], bp,
<1>.x, <1>.y[0], and <1>.y[1] where we use
<1> to stand for the malloc expression (a program point
that does allocation). An initialization state can be ``must
point to P'' where P is a path. For example, after the second
declaration above, we have ``bp must point to <1>.'' An
ensuing assignment of the form ``bp->x = new 3'' would therefore
change the initialization state of <1>.x. If there is not a
unique path to which a place definitely points, then we keep track of
the place's initialization level and escapedness. A
place is escaped if we do not know exactly all of the places that must
point to it. For example, both of the following fragments would cause
all the places starting with <1> to be escaped afterwards
(assuming bp must point to <1>):
struct B * bp2; some_fun(bp);
if(maybe())
bp2 = bp;
Note that if ``p must point to P,'' then p is implicitly unescaped
because we cannot know that p points to P if we don't know all the
pointers to p. The initialization level is either None or All.
All means p and everying reachable from p (following as many pointers
as you want) is initialized.
Note that our choice of tracking ``must point to'' instead of ``must
alias'' forces us to reject some safe programs, such as this one:
int f() {
int * x, int *y;
int **p1;
if(maybe())
p1 = &x;
else
p1 = &y;
*p1 = new 7;
return *p1;
}
Even though p1 has not escaped, our analysis must give it
initialization-level None. Moreover, x and y escape
before they are initialized, so the conditional is rejected.
For safety reasons, once a place is escaped, any assignment to it must
be a value that is fully initialized, meaning everything reachable
from the value is initialized. This phenomenon is why the first
function below is accepted but not the second (the list_t
typedefs is defined in the List library):
list_t<`a,`H> copy(list_t<`a> x) {
struct List *@notnull result, *@notnull prev;
if (x == NULL) return NULL;
result = new List{.hd=x->hd,.tl=NULL};
prev = result;
for (x=x->tl; x != NULL; x=x->tl) {
struct List *@notnull temp = malloc(sizeof(struct List));
temp->hd = x->hd;
temp->tl = NULL;
prev->tl = temp;
prev = temp;
}
return result;
}
list_t<`a,`r2> rcopy(region_t<`r2> r2, list_t<`a> x) {
struct List *@notnull result, *@notnull prev;
if (x == NULL) return NULL;
result = rnew(r2) List{.hd=x->hd,.tl=NULL};
prev = result;
for (x=x->tl; x != NULL; x=x->tl) {
prev->tl = malloc(sizeof(struct List));
prev->tl->hd = x->hd;
prev->tl->tl = NULL;
prev = prev->tl;
}
return result;
}
In the for body, we do not know where prev must point (on the first
loop iteration it points to the first malloc site, but on ensuing
iterations it points to the second). Hence prev->tl may be
assigned only fully initialized objects.