CS 513 System Security -- Information Flow

Information Flow

Lecturer: Professor Fred B. Schneider

Lecture notes by Lynette I. Millett

We have mostly discussed access control policies. In fact, it can be tempting to identify security with access control, but this would be a mistake. Historically, security mechanisms were developed for the needs of the defense establishment. Their primary concern was secrecy, with a secondary concern being integrity and denial of service. Secrecy can be implemented by, but is very different from access control. For example, consider a tax preparation program. We would like some guarantee that the program won't misuse an individual's tax data. We would also like to guarantee that an individual cannot gain access to proprietary information in the program's database. To do these things, we need to restrict information flow--not access by subjects to objects.

Information flow can be seen as a way of preserving secrecy. The information flow problem is surprisingly difficult. Interest in this topic has renewed recently due primarily to the Internet and world wide web. We discuss in this class some of what has been done in the past to spare you the embarrassment of 'reinventing the wheel' in the context of the web.

Given a program with assignment statements, say y := x, information flow from the right-hand side to the left-hand side (from x to y). We define the relation x->y to mean that information flows from x to y. The problem is to determine whether the relation holds (or doesn't hold) between variables of a given program.

What does information flow actually mean? There are entire courses on information theory. Thus, here we can only discuss it informally. Think of information as "the enemy of uncertainty." Information reduces uncertainty. In other words, we reduce uncertainty only by acquiring information. If the relation x->y holds, then that means knowing the value of y implies you have less uncertainty about the value of x. Note carefully this does not say that if you know y then you know x exactly. (As an example, think of the card game blackjack: if you know all (or some) of the previous cards that have turned up, then it is possible to have some idea of what the next card might be (without knowing it exactly).) Here are some examples of information flow:

y := x; Here, there is information flow from x to y (x->y).
z := x; y := z; Information flow is transitive. x->y since x->z and z->y.
Assume that the variables x and y are natural numbers. Is there information flow in the assignment z := x + y ? If z = 0, then we know that both x and y are 0. If z is 1, then we know that either x or y. And in every case the value of z gives us an upper bound on the values of x and y. Therefore, x->z and y->z.
Assume that the variables x and y are integers. Is there information flow in the assignment z := x XOR y ? Suppose that z = 1001. In fact, we can't tell anything about x or y from z. The XOR operator is interesting this way. Without knowing z and x (y) it is impossible to know anything about y (x) in this program.

The conclusion we can draw is that information does not always flow across assignment statements.

We should also keep in mind that programs do not just consist of assignment statements. Consider the following program fragment:

Does information flow from x to y (x->y) ? In fact, if y = 1 after execution terminates, then we infer that x = 1 holds, and if y = 0, then we conclude that x is not 1. Uncertainty about x is reduced by knowing y. This is referred to as an implicit flow. Explicit flows occur when values are 'pushed' across assignment statements. The upshot of implicit flows is that even if we were to insist that assignments whose right-hand sides are constants only be allowed, information flow is still possible.

Defining Information Flow Policies

What are reasonable information flow policies? For each variable x, define x to be its information flow class. An information flow policy restricts flow between certain classes and is a relation on the set of information flow classes. (Think of classes as: top secret, secret, confidential, etc.) A policy might be: no information flows from secret to unclassified.

Given a program, it is possible to write down all the information flow relations, (e.g. ->) for its variables.

A policy x <= y means: it is permissible for information to flow from class x to class y. We say a system is secure with respect to such a policy if, whenever x->y, then x <= y.

What are reasonable restrictions on the <= relation? It turns out that enforcement is easier if we insist that <= forms a lattice. What does this mean? Let SC be the set of security classes x, y, etc. If (SC, <=) forms a lattice, then the following statements hold.

a <= a for all a.
a <= b AND b <= c implies a <= c. This is a restriction on the kinds of policies we can express. For instance, imagine a policy where one can downgrade the confidentiality level of information, but only to a certain level (e.g. from secret to confidential, but no further). This policy is not possible if we require transitivity <=.
a <= b AND b <= a implies a = b.
There is a least upper bound (lub) for every pair of classes. That is, for each pair a, b there is some c such that a <= c, b <= c, and there is nothing 'lower' than c in the lattice that is 'higher' than both a and b.
There is a greatest lower bound (glb) for every pair of classes. (An analogous definition holds for glbs.)

Why is the lattice assumption useful? (Note that the lub and glb properties come 'for free.' It is always possible to add elements to a lattice (top and bottom, for example) to satisfy the lub and glb requirements.) It turns out that having a lattice will allow us to compute some things very efficiently. Recall the example where x and y are natural numbers and we assign z := x + y. We would like to analyze the expression x + y (as opposed to examining each individual variable) in testing to see if execution of z := x + y should be allowed.

The existence of a lattice implies: if x₁ <= y, x₂ <= y, . . ., x_n <= y then there exists some x where x = x₁ lub x₂ lub x₃ . . . x_n and x <= y. Therefore, flows x₁->y, x₂->y, . . .lub x_n->y, all are authorized if and only if x₁ lub x₂ lub x₃ . . . lub x_n <= y. And, to check if a policy is satisfied, it is only necessary to compute one least upper bound, rather than to check a set of <= relations. If the latter computation is expensive, it is useful to only have to do it once.

Consider the example: y := x₁ + ( x₂ * x₃ ). We don't need to check that x₁ <= y, x₂ <= y, and so on. It suffices to find the least upper bound of the variables on the right-hand side and check that it is <= y.

An analogous result holds for the greatest lower bound. The relations x->y₁, x->y₂, . . ., x->y_n are authorized if and only if x <= y₁ glb y₂ glb y₃ . . . glb y_n. Consider, as an example:

₁

₂

₃

This statement is executable only if x->y₁, x->y₂ and x->y₃. These flows are all authorized if x <= y₁ glb y₂ glb y₃.

Security Mechanisms for Flow Control

Let F be the set of all possible flows in a system S. Let P a subset of F be the set of all flows authorized by some policy. Let E a subset of F be the set of actual flows that arise during execution of S given the security enforcement mechanism in operation. A system is secure if E is a subset of P (enforcement obeys policy). The problem is that under this definition of security, making E be the empty set is sufficient. We would also like the enforcement mechanism to be precise: E = P. Building precise mechanisms is rarely possible, but still we strive for enforcement mechanisms that don't rule out permissible executions, so E will be as close to P as possible.

Run-Time Mechanisms

A possible implementation of an information flow policy can result from doing run-time enforcement. The general idea is: before each assignment statement executes, check that the flow this action causes is ok. Before the statement y := f(x₁, x₂,...., x_n) we need to check that x₁ -> y, x₂ -> y, ..., x_n -> y are allowed. (Note: Depending on the meaning of f, there might not be information flow from some x_i to y. Since we pretend there is, our enforcement mechanism is conservative and not precise.) In other words, we need to check that x₁ lub x₂ ... lub x_n <= y. Can we assume from this that computing the lubs of the arguments to assignments always works? Consider:

if x = 1 then y = 1 Here, before executing the assignment to y we would check that 1 <= y, but this check is not sufficient. Checking only before the assignment statement is too simple. Assume that low <= high is the policy. The above program violates this policy. If, at the end of execution, y = 1, then we can infer that either y = 1 before execution or x = 1. Thus, after execution completes we know something about x, and information flow has occurred. The mechanism we postulated would only check whether 1 -> y before the assignment to y, and, that should be ok, (since "1" should have security class 'low').

One approach to solving this problem might be to add more checks. But even announcing that a violation occurs at a particular point in the program can violate the policy. Recall that the usual way to implement enforcement is to halt program execution. This halt can be observed and, therefore, conveys information. A solution might be to avoid stopping execution by replacing the problematic assignment statement with 'skip.' In this case, however, we change the expected behavior of the program, and that could be observed. Even leaking one bit of information in the right context can be significant. For example, one bit can be used to code a rich collection of variable values:

₁

₂

A Run-time Flow Restriction Mechanism

Consider a simple machine with a single accumulator. Let acc represent the accumulator and pc the program counter. For every memory word w, assume that there is a tag w associated with it, representing its flow class. In the example

if x = 1 then y := 1 just before the assignment statement we would need to know the class of x to determine if there's a flow violation. However, actually reaching that point in the program already conveys information about x (viz, "x = 1"). Recall that processors only fetch, decode and execute instructions, they don't know about context (if, while, etc.). Fetch gets the instruction that the pc points to. Thus, at the point before the assignment to y in the above example, the pc contains the information that x = 1. The program counter is an implicit variable. But, it can be loaded (branch does this), stored (jsr does this) and have information flow into it. We therefore need to consider information classes for the pc and (for similar reasons) the accumulator. We postulate a simple machine language and describe what needs to happen upon execution of the available instructions. Keep in mind that by virtue of executing a particular instruction, we know we got to that instruction. Thus, information flows into and out of the program counter.

Load x -- This command puts the value of x into the acc (acc := x). We also need to update the class of the acc: acc := x lub pc. The acc has information from the pc flowing into it, since this instruction can really be seen as equivalent to acc,pc := x,pc+1.
Mult x -- Here, acc := acc * x and acc := x lub acc lub pc.
Store y -- If (acc lub pc) <= y then y := acc. (We can't store the value if there's an information flow violation.)
Branch label -- pc := label.
BZ label -- This is the conditional branch. If acc = 0 then pc := pc lub acc; pc := label.

We can now code another version of the previous problematic statement, this time conditional on x = 0. Again, suppose that the class of x is 'high' and the class of y is 'low.'

if x = 0 then y := 1 compiles to

After executing 'Load x' the accumulator's class is 'high.' Any store attempt to a 'low' variable after the BZ instruction will be a no-op, because the accumulator's class is now 'high.' After execution of the instruction labeled 'Label' the accumulator's class is high (since the pc was high and a lub was computed here.) The final store instruction becomes a no op, because it is not permitted to store a 'high' variable into a 'low'.

Observe that once the pc's class is 'high', everything thereafter is 'high'. We have seen something similar before with MLS, where it is not possible to downgrade security levels. A solution is to use richer mechanisms that allow more structural forms of control. Consider an if-then-else construction:

if B then S1 else S2 At the points right after the 'then' and right after the 'else' we can infer information about B. Information flows B->pc. However, after S2 (at the join) this information is lost, and we can no longer infer information about B by examining the pc. We need to include this in our execution model. Richer control structures provide a way to downgrade the program counter.

Keep in mind that there could be nested ifs in this structure. Thus, we should keep track of the pc classes using a stack instead of a single additional variable.

Compile-time Flow Restriction Mechanism

A problem with run-time mechanisms is that either special hardware support is needed or efficiency is sacrificed. We now discuss a compile-time mechanism.

Consider a procedure P that has some arguments:

₁

₂

₁

₂

₁

₂

(x₁, x₂, ..., x_m are value parameters; y₁, y₂, ..., y_n are value/result parameters; and z₁, z₂, ..., z_p are local variables.) We would like to compile this so that the appropriate checks are built-in. We assume that the declarations say what kind of flows are allowed, e.g.: var x₁: integer class {x} means that x->x₁ is allowed. We say a procedure is secure if and only if the body of the procedure satisfies the specifications given by the declarations. Consider a procedure to return the max of two integers:

if x > y then m := x else m := y

The specification for this procedure, which we infer from the variable declarations, is: x <= m and y <= m.

A body of a procedure satisfies a specification according to the following inductive definitions.

The statement e := f(a₁, a₂, ..., a_p) is secure if a₁ lub a₂ ...lub a_p <= e.
S₁ ; S₁ is secure if S₁ is secure and S₂ is secure.
if e then S₁ else S₂ is secure if S₁ and S₂ are secure and e <= S₁ glb S₂ where S₁ is the glb of all the targets of the assignment statements contained in S₁, and S₂ is the glb of all the targets of assignment statements in S₂.

Similar rules can be defined for while statements and call statements. We can now use the compiler to do a kind of typechecking to be sure that particular procedures are secure.