Information Flow
Lecturer: Professor Fred B. Schneider
Lecture notes by
Lynette I. Millett
We have mostly discussed access control policies. In fact, it can be
tempting to identify security with access control, but this would be a
mistake. Historically, security mechanisms were developed for the
needs of the defense establishment. Their primary concern was secrecy,
with a secondary concern being integrity and denial of
service. Secrecy can be implemented by, but is very different from
access control. For example, consider a tax preparation program. We
would like some guarantee that the program won't misuse an
individual's tax data. We would also like to guarantee that an
individual cannot gain access to proprietary information in the
program's database. To do these things, we need to restrict
information flow--not access by subjects to objects.
Information flow can be seen as a way of preserving secrecy. The
information flow problem is surprisingly difficult. Interest in this
topic has renewed recently due primarily to the Internet and world
wide web. We discuss in this class some of what has been done in the
past to spare you the embarrassment of 'reinventing the wheel' in the
context of the web.
Given a program with assignment statements, say y := x, information
flow from the right-hand side to the left-hand side (from x to y). We
define the relation x->y to mean that information flows from x to y.
The problem is to determine whether the relation holds (or doesn't hold) between
variables of a given program.
What does information flow actually mean? There are
entire courses on information theory. Thus, here we can only discuss
it informally. Think of information as "the enemy of uncertainty."
Information reduces uncertainty. In other words, we reduce uncertainty
only by acquiring information. If the relation x->y holds, then that
means knowing the value of y implies you have less uncertainty about
the value of x. Note carefully this does not say that if you know y
then you know x exactly. (As an example, think of the card game
blackjack: if you know all (or some) of the previous cards that have
turned up, then it is possible to have some idea of what the next card
might be (without knowing it exactly).)
Here are some examples of information flow:
- y := x; Here, there is information flow from x to y (x->y).
- z := x; y := z; Information flow is transitive. x->y since x->z and z->y.
- Assume that the variables x and y are natural numbers. Is there
information flow in the assignment z := x + y ? If z = 0, then we know that
both x and y are 0. If z is 1, then we know that either x or y. And in every
case the value of z gives us an upper bound on the values of x and y.
Therefore, x->z and y->z.
- Assume that the variables x and y are integers. Is there information
flow in the assignment z := x XOR y ? Suppose that z = 1001. In fact,
we can't tell anything about x or y from z. The XOR operator is interesting
this way. Without knowing z and x (y) it is impossible to know anything about
y (x) in this program.
The conclusion we can draw is that information does not always flow across
assignment statements.
We should also keep in mind that programs do not
just consist of assignment statements. Consider the following program
fragment:
y := 0;
if x = 1 then y := 1.
Does information flow from x to y (x->y) ? In fact, if y = 1 after
execution terminates, then we infer that x = 1 holds, and if y = 0,
then we conclude that x is not 1. Uncertainty about x is reduced by
knowing y. This is referred to as an implicit flow. Explicit flows
occur when values are 'pushed' across assignment statements. The
upshot of implicit flows is that even if we were to insist that
assignments whose right-hand sides are constants only be allowed,
information flow is still possible.
Defining Information Flow Policies
What are reasonable information flow policies? For each variable x,
define x to be its information flow class. An
information flow policy restricts flow between certain classes and is a
relation on the set of information flow classes. (Think of
classes as: top secret, secret, confidential, etc.) A policy might be:
no information flows from secret to unclassified.
Given a program, it is possible to write down all the information
flow relations, (e.g. ->) for its variables.
A policy x <= y means: it is permissible for
information to flow from class x to class y. We say a
system is secure with respect to such a policy if, whenever x->y, then
x <= y.
What are reasonable restrictions on the <= relation? It turns out
that enforcement is easier if we insist that <= forms a lattice. What
does this mean? Let SC be the set of security classes x,
y, etc. If (SC, <=) forms a lattice, then the following
statements hold.
- a <= a for all
a.
- a <= b AND b <= c implies a <=
c. This is a restriction on the kinds of policies we can
express. For instance, imagine a policy where one can downgrade the
confidentiality level of information, but only to a certain level
(e.g. from secret to confidential, but no further). This policy is not
possible if we require transitivity <=.
- a <= b AND b <= a implies a =
b.
- There is a least upper bound (lub) for every pair of
classes. That is, for each pair a, b there is some
c such that a <= c, b <= c, and
there is nothing 'lower' than c in the lattice that is 'higher'
than both a and b.
- There is a greatest lower bound (glb) for every pair of classes.
(An analogous definition holds for glbs.)
Why is the lattice assumption useful? (Note that the lub and glb
properties come 'for free.' It is always possible to add elements to a
lattice (top and bottom, for example) to satisfy the lub and glb
requirements.) It turns out that having a lattice will allow us to
compute some things very efficiently. Recall the example where x and y
are natural numbers and we assign z := x + y. We would like to
analyze the expression x + y (as opposed to examining each individual
variable) in testing to see if execution of z := x + y should be
allowed.
The existence of a lattice implies: if x1 <=
y, x2 <= y, . . .,
xn <= y then there exists some x where
x = x1 lub x2 lub
x3 . . . xn and x <=
y. Therefore, flows x1->y, x2->y,
. . .lub xn->y, all are authorized if and only if
x1 lub x2 lub x3
. . . lub xn <= y. And, to check if a policy
is satisfied, it is only necessary to compute one least upper bound,
rather than to check a set of <= relations. If the latter computation
is expensive, it is useful to only have to do it once.
Consider the example: y := x1 + ( x2 *
x3 ). We don't need to check that x1 <=
y, x2 <= y, and so on. It suffices to
find the least upper bound of the variables on the right-hand side and
check that it is <= y.
An analogous result holds for the greatest lower bound. The
relations x->y1, x->y2, . . ., x->yn
are authorized if and only if x <= y1 glb
y2 glb y3 . . . glb
yn. Consider, as an example:
if x then y1 = 0; y2 = 0; y3 = 0
This statement is executable only
if x->y1, x->y2 and x->y3. These flows are
all authorized if
x <= y1 glb y2
glb y3.
Security Mechanisms for Flow Control
Let F be the set of all possible flows in a system S. Let P a subset
of F be the set of all flows authorized by some policy. Let E a subset
of F be the set of actual flows that arise during execution of S given
the security enforcement mechanism in operation. A system is
secure if E is a subset of P (enforcement obeys policy). The
problem is that under this definition of security, making E be the
empty set is sufficient. We would also like the enforcement mechanism
to be precise: E = P. Building precise mechanisms is rarely
possible, but still we strive for enforcement mechanisms that don't
rule out permissible executions, so E will be as close to P as
possible.
Run-Time Mechanisms
A possible implementation of an information flow policy can result
from doing run-time enforcement. The general idea is: before each
assignment statement executes, check that the flow this action causes
is ok. Before the statement y := f(x1, x2,....,
xn) we need to check that x1 -> y, x2
-> y, ..., xn -> y are allowed. (Note: Depending on the
meaning of f, there might not be information flow from some
xi to y. Since we pretend there is, our enforcement
mechanism is conservative and not precise.) In other words, we need
to check that x1 lub x2 ... lub
xn <= y. Can we assume from this that
computing the lubs of the arguments to assignments always works?
Consider:
Here, before executing the assignment to y we would check that
1 <= y, but this check is not sufficient. Checking only
before the assignment statement is too simple.
Assume that low <= high is the
policy. The above program violates this policy. If, at the end of
execution, y = 1, then we can infer that either y = 1 before execution
or x = 1. Thus, after execution completes we know something about x,
and information flow has occurred. The mechanism we postulated would
only check whether 1 -> y before the assignment to y, and, that should
be ok, (since "1" should have security class 'low').
One approach to solving this problem might be to add more checks. But
even announcing that a violation occurs at a particular point in the
program can violate the policy. Recall that the usual way to implement
enforcement is to halt program execution. This halt can be observed
and, therefore, conveys information. A solution might be to avoid
stopping execution by replacing the problematic assignment statement
with 'skip.' In this case, however, we change the expected behavior of
the program, and that could be observed. Even leaking one bit of
information in the right context can be significant. For example, one
bit can be used to code a rich collection of variable values:
if x1 = v1 and x2 = v2 and . . .
then y := 1
A Run-time Flow Restriction Mechanism
Consider a simple machine with a single accumulator. Let acc represent
the accumulator and pc the program counter. For every memory word w,
assume that there is a tag w associated with it, representing
its flow class. In the example
just before the assignment statement we would need to know the class
of x to determine if there's a flow violation. However, actually
reaching that point in the program already conveys
information about x (viz, "x = 1"). Recall that processors only
fetch, decode and execute instructions, they don't know about context
(if, while, etc.). Fetch gets the instruction that the pc points
to. Thus, at the point before the assignment to y in the above
example, the pc contains the information that x = 1. The program
counter is an implicit variable. But, it can be loaded (branch does
this), stored (jsr does this) and have information flow into it. We
therefore need to consider information classes for the pc and (for
similar reasons) the accumulator. We postulate a simple machine
language and describe what needs to happen upon execution of the
available instructions. Keep in mind that by virtue of executing a
particular instruction, we know we got to that instruction. Thus,
information flows into and out of the program counter.
- Load x -- This command puts the value of x into the
acc (acc := x). We also need to update the class of the acc:
acc := x lub pc. The acc has information from
the pc flowing into it, since this instruction can really be seen as
equivalent to acc,pc := x,pc+1.
- Mult x -- Here, acc := acc * x and acc := x
lub acc lub pc.
- Store y -- If (acc lub pc) <= y then
y := acc. (We can't store the value if there's an information flow
violation.)
- Branch label -- pc := label.
- BZ label -- This is the conditional branch. If acc = 0
then pc := pc lub acc; pc := label.
We can now code another version of the previous problematic statement, this
time conditional on x = 0. Again, suppose that the class of x is 'high'
and the class of y is 'low.'
compiles to
Load x
BZ label
.
.
.
.
Label: Load = "1"
Store y
After executing 'Load x' the accumulator's class is 'high.' Any store attempt
to a 'low' variable after the BZ instruction will be a no-op, because the
accumulator's class is now 'high.' After execution of the instruction
labeled 'Label' the accumulator's class is high (since the pc was high and
a lub was computed here.) The final store instruction becomes a no op, because
it is not permitted to store a 'high' variable into a 'low'.
Observe that once the pc's class is 'high', everything thereafter is 'high'.
We have seen something similar before with MLS, where it is not possible to
downgrade security levels. A solution is to use richer mechanisms that
allow more structural forms of control. Consider an if-then-else construction:
At the points right after the 'then' and right after the 'else' we can
infer information about B. Information flows B->pc. However, after S2
(at the join) this information is lost, and we can no longer infer information
about B by examining the pc. We need to include this in our execution
model. Richer control structures provide a way to downgrade the
program counter.
Keep in mind that there could be nested ifs in this structure. Thus, we
should keep track of the pc classes using a stack instead of a single
additional variable.
Compile-time Flow Restriction Mechanism
A problem with run-time mechanisms is that either special hardware
support is needed or efficiency is sacrificed. We now discuss a
compile-time mechanism.
Consider a procedure P that has some arguments:
proc P(x1, x2, ..., xm, var y1,
y2, ..., yn)
end P
(x1, x2, ..., xm are value
parameters; y1, y2, ..., yn are
value/result parameters; and z1, z2, ...,
zp are local variables.) We would like to compile this so
that the appropriate checks are built-in. We assume that the
declarations say what kind of flows are allowed, e.g.: var
x1: integer class {x} means that x->x1 is
allowed. We say a procedure is secure if and only if the body
of the procedure satisfies the specifications given by the
declarations. Consider a procedure to return the max of two integers:
proc MAX(x: int class {x}, y: int class {y}, var m int class {x, y})
if x > y then m := x else m := y
end MAX
The specification for this procedure, which we infer from the variable
declarations, is: x <= m and
y <= m.
A body of a procedure satisfies a specification according to the
following inductive definitions.
- The statement e := f(a1, a2, ...,
ap)
is secure if a1 lub a2 ...lub
ap <= e.
- S1 ; S1 is secure if S1 is
secure and S2 is secure.
- if e then S1 else S2 is secure if S1
and S2 are secure and e <= S1 glb
S2 where S1 is the glb of all the
targets of the assignment statements contained in S1, and
S2 is the glb of all the targets of assignment statements
in S2.
Similar rules can be defined for while statements and call statements.
We can now use the compiler to do a kind of typechecking to be sure that
particular procedures are secure.