Labels can be used to annotate code, and the annotated code can be checked statically to verify that it contains no information leaks. In this section, we discuss some issues related to static analysis of annotated code, though we defer issues of how to extend a programming language till the next section. In Section 4.1, we explain the importance of static checking to information flow control, and the problem of implicit flows [DD77]. In Section 4.2, we describe a simple way to prevent implicit flows from leaking information by using static analysis.
4.1 Static vs. Dynamic Checking
Information flow checks can be viewed as an extension to type checking. For both kinds of static analysis, the compiler determines that certain operations are not permitted to be performed on certain data values. Type checks may be performed at compile time or at run time, though compile-time checks are obviously preferred when applicable because they impose no run-time overhead. Access control checks are usually performed at run time, although some access control checks may be performed at compile time [JL78, RSC92]. In general, it seems that some access control checks must be performed dynamically in order to give the system sufficient flexibility.
By contrast, fine-grained information flow control is practical only with some static analysis, which may seem odd; after all, any check that can be performed by the compiler can be performed at run time as well. The difficulty with run-time checks is exactly the fact that they can fail. In failing, they may communicate information about the data that the program is running on. Unless the information flow model is properly constructed, the fact of failure (or its absence) can serve as a covert channel. By contrast, the failure of a compile-time check reveals no information about the actual data passing through a program. A compile-time check only provides information about the program that is being compiled. Similarly, link-time and load-time checks provide information only about the program, and may be considered to be static checks for the purposes of this work.
Figure 3: Implicit information flow
x := 0
if b then
x := 1
end
Implicit information flows [DD77] are difficult to prevent
without static analysis.
For example, consider
the segment of code shown in Figure 3 and assume
that the storage locations b
and x
belong to different
security classes b and x, respectively. (We will
follow the literature in
using the notation e
to refer to the label of the expression e
.)
In particular, assume b
is more sensitive than x
(more generally,
b
x
), so
data should not flow from b
to x
. However, the code segment
stores 1 into x
if b
is true, and 0 into x
if b
is false; x
effectively contains the value of b
. A run-time check can easily detect
that the assignment x := 1
communicates information improperly, and abort
the program at this point. Consider,
however, the case where b
is false: no assignment to x
occurs within
the context in which b
affects the flow of control.
The fact of the program's aborting
or continuing implicitly communicates information about the value of b
,
which can be used in at least the case where b
is false.
We could imagine inspecting the body of the if
statement at run time to
see whether it contains disallowed operations, but in general this requires
evaluating all possible execution paths of the program, which is clearly
infeasible. Another possibility is to restrict all writes that
follow the if
statement on the grounds that once the process has observed
b
, it is irrevocably tainted. However, this approach seems too
restrictive to be practical.
The advantage of compile-time checking is that in effect, static
analysis efficiently constructs proofs that no possible execution path
contains disallowed operations.
To provide flexibility, some information flow control checks are desirable at run time; such checks are allowed so long as their success or failure does not communicate information improperly--which must itself be checked statically! We examine run-time information flow checks later, in Section 5.10.
As described in Section 3.1,
when a data value
is extracted from a slot, it acquires the slot label.
Furthermore, to ensure that writing
to a slot does not leak information, the label on the slot must be
more restrictive than the label on the data value being written:
x := v
is legal only if
v
x
.
However, while this restriction condition is necessary,
it is not sufficient for avoiding information
leaks, because of the possibility of implicit information flows.
The code in Figure 3 provides an example.
The variables x
and b
are
both slots. The expressions 0 and 1 do not give any information
about other data, so they are labeled by .
Therefore the assignment x := 1
appears to be legal.
However, earlier we showed that this assignment
may be an information leak.
Therefore, our simple rule is not sufficient.
The problem becomes clearer if we rewrite the if
statement:
x := (if b then 1 else x)
Clearly, the value of x
after the statement completes is dependent on
the value of b
, and the assignment is legal only if
bx
.
In general, the flow of control within a program depends on the values of certain expressions. At any given point during execution, various values vi have been observed in order to decide to arrive at the current program location. Any mutation that occurs can potentially leak information about the observed values vi, so the slot that is being mutated must be at least as restricted as the labels on all these variables:
The label
i vi
can be determined through straightforward
static analysis of the program's basic block diagram, and will be
called the basic block label, B. The basic block label indicates
information that might
be inferred by knowing that the program is
executing this basic block. Using the basic block label, we can write
the correct rule for checking assignment: assignment to a variable x
with a value v
is permitted only if
v
Bx
.
Intuitively, a basic block label must include the labels of all
values that were observed to reach that point in the execution.
For example, consider the basic block diagram
shown in Figure 4, which corresponds to the code of
Figure 3;
each basic block, represented as a box in the
diagram, is characterized by a single
basic block label, and has one or two exit points.
Here the basic block for x := 1
has label b
because the value of b
had to be observed to
reach that point in the program. However, the label of the ``final''
block is because at that point the program has no
knowledge of any values. It is true that the program could
discover information by performing tests that read
from slots (e.g., x
); however, the basic
block label captures what the program knows without any further reading.
Labels of basic blocks are derived as follows. The decision about which exit point to follow from a block Bi is made based on the observation of some value vi. The label B for a particular basic block B is the join of some of the labels vi. A label vi is included in the join if it is possible to reach B from Bi, and it is also possible to reach the final node from Bi without passing through B (if all paths from Bi to the final node pass through B, then arriving at B conveys no information about vi.) The set of vi that must be included in each B can be efficiently computed using standard compiler techniques. This rule for basic block label propagation is equivalent to the rule of Denning and Denning [DD77].
Now, consider the execution of a ``while
'' statement, which creates a loop
in the basic block diagram.
This situation is illustrated in Figure 5.
Note that for the final basic block, we obtain
B =
by reasoning in the same way as we did for the
``if
'' statement.
This
labeling might seem strange, since when we arrive at the final block, we
know the value of b
. However, arriving at the final block gives no
information about the value of b
before the code started, and there is
no way to use code of this sort to improperly transmit information.
This labeling rule holds as long as all programs terminate, or at least as long as there is no way to derive information from the non-termination of a program [DD77, AR80]. The way one decides that a program has not terminated is to time its execution, either explicitly or through asynchronous communication with another thread. We do not address timing channels in this paper.
If the language allows the raising of exceptions and of return statements,
the returned value must also be labeled by the label of the basic block
that contains the return
or raise
. This fact can be seen
clearly by converting a procedure that uses a return statement into
one that uses boolean variables to keep track of control flow.
Next: 5 Application to a Language Up: A Model for Decentralized Previous: 3 Decentralized Information Flow
Andrew C. Myers, Barbara Liskov