Verification in Coq

Topics:

verification of functions

extraction of OCaml code

verification of data structures

verification of compilers

Require Import List Arith Bool.
Import ListNotations.

Verification of Functions

A function is correct if it satisfies its specification. So to verify a function in Coq, we need to

code the function,
state a theorem that says that function satisfies its specification, and
prove the theorem.

Verifying Factorial

Let's try that with the factorial function. Here's an implementation of it in Coq:

Fixpoint fact (n : nat) :=
  match n with
  | 0 => 1
  | S k => n * (fact k)
  end.

As we learned before, the function has to pattern match against n and recursively call itself on k to demonstrate to Coq that the recursive call will eventually terminate.

What would be a reasonable specification for fact? If we were just going to document it in a comment, we might write something like this:

  (** [fact n] is [n] factorial, i.e., [n!].
      Requires: [n >= 0]. *)

In OCaml, that precondition would be necessary. In Coq, since we are computing on natural numbers, it would be redundant.

But how can we formally state in Coq that fact n is n factorial? There is no factorial operator in most programming languages, including Coq. So we can't just write something like the following:

Theorem fact_correct : forall (n : nat), fact n = n!.

Instead, we need another way to express n!.

Whenever we want to define the meaning of an operator for use in a logic, we need to write down axioms and inference rules for it. We've already seen that in two ways:

With logical connectives, like /\, we saw that axioms and inference rules could define how to introduce and eliminate connectives. For example, from a proof of A /\ B, we could conclude A. Hence A /\ B -> A.
With rings and fields, we saw how axioms (we didn't need inference rules) could define equalities involving operators. For example, 0 * x = 0 allowed us to replace any multiplication by 0 simply with 0 itself.

So, let's define the factorial operator in a similar way:

0! = 1.
If a! = b then (a+1)! = (a+1)*b.

The first line, which is an axiom, defines how the factorial operator behaves when applied to zero. The second line, which is an inference rule, defines hwo the operator behaves when applied to a successor of a natural number.

Another way to think about that definition is that it defines a relation. Call it the "factorial of" relation:

The factorial of 0 is 1.
If the factorial of a is b, then the factorial of a + 1 is a + 1 times b.

Together, the axiom and inference rule give us a way to "grow" the relation. We start from a "seed", which is the axiom: we know that the factorial of 0 is 1. From there we can apply the inference rule, and conclude that the factorial of 0+1 is 0 + 1 times 1, i.e., that the factorial of 1 is 1. We can keep doing that with the inference rule to determine the factorial of any number.

Let's code up that relation in Coq. We're going to define a proposition factorial_of that is parameterized on two natural numbers, a and b. We want factorial_of a b to be a provable proposition whenever a! = b.

Inductive factorial_of : nat -> nat -> Prop :=
  | factorial_of_zero : factorial_of 0 1
  | factorial_of_succ : forall (a b : nat),
    factorial_of a b -> factorial_of (S a) ((S a ) * b).

This definition resembles the definition of an inductive type, which we've done before. But here we are inductively defining a proposition. That proposition, factorial_of, is parameterized on two natural numbers. There are two ways to construct an instance of this parameterized proposition. The first is to use the factorial_of_zero constructor, which corresponds to the axiom we talked about above. The second is the factorial_of_succ constructor, which corresponds to the inference rule.

Another way to think about this definition is in terms of evidence. The factorial_of_zero constructor provides (by definition) the evidence that the factorial of 0 is 1. The factorial_of_succ constructor provides (again by definition) a way of tranforming evidence that the factorial of a is b into evidence that the factorial of S a is (S a) * b.

Now that we have a formalization of the factorial operation, we can state a theorem that says fact satisfies its specification:

Theorem fact_correct : forall (n : nat),
factorial_of n (fact n).

In other words, the factorial of n is the same value that fact n computes. So fact is computing the correct function. Note that we don't have to mention the precondition because of the type of n.

To prove the theorem, we'll need induction.

Proof.
  intros n.
  induction n as [ | k IH].
  - simpl. apply factorial_of_zero.
  - simpl. apply factorial_of_succ. assumption.
Qed.

That concludes our verification of fact: we coded it in Coq, wrote a specification for it in Coq, and proved that it satisfies its specification.

A Reflection on Formalization

If you stop to reflect on what we just did, it has the potential to seem unsatisfying. The skeptic might exclaim, "All you did was say the same thing twice! You coded up fact once as a Coq program, a second time as a Coq proposition, and proved that the two are the same. Isn't that rather trivial and obvious?"

As a response, first, note that we did this verification for a very simple function. It shouldn't be surprising that the formalization of a simple function ends up looking relatively redundant with respect to the program that computes the function.

Second, note that technically the skeptic is wrong: we didn't say the same thing twice. We expressed the idea of the factorial operation in two subtly different ways. The first way, fact, specifies a computation that takes a (potentially large) natural number and continues to recurse on smaller and smaller numbers until it reaches a case case. The second way, factorial_of, specifies a mathematical relation that starts with the base case of 0 and can build up from there to reach larger numbers.

A lot of formal verification has that flavor: express a computation, express a mathematical formalization of the computation, then prove that the two are the same. Or, prove that the two are similar enough: often, the exact details of the computation are irrelevant to the mathematical formalization. It doesn't typically matter, for example, which order the sides of a binary operator are evaluated in, so even though the computation might be explicit, the mathematical formalization need not be. (Side effects would, of course, complicate that analysis.)

Testing and verification are alike in that sense of potential redundancy. With testing, you write down information---inputs and outputs---that you hope is redundant, because the program already encodes the algorithm required to transform those inputs into those outputs. It's only when you are surprised, i.e., the test case fails to agree with the program, that you appreciate the value of saying things twice. By saying the same thing twice, but differently, you make it more likely to expose any errors because you detect the inconsistency.

Verifying Tail-Recursive Factorial

Next, let's verify a different implementation of the factorial operation. This is the tail-recursive implementation. As we learned much earlier, this implementation is more space efficient than the naive recursive implementation.

Fixpoint fact_tr_acc (n : nat) (acc : nat) :=
  match n with
  | 0 => acc
  | S k => fact_tr_acc k (n * acc)
  end.

Definition fact_tr (n : nat) :=
  fact_tr_acc n 1.

To verify the correctness of fact_tr, we'll prove the same kind of theorem as we did for fact. For the most part, the proof proceeds easily:

Theorem fact_tr_correct : forall (n : nat),
  factorial_of n (fact_tr n).
Proof.
  intros n. unfold fact_tr.
  induction n as [ | k IH].
  - simpl. apply factorial_of_zero.
  - simpl.

At this point, we have a k * 1 that we'd like to simplify to just k. There's already a library theorem that can do the job for us:

Check mult_1_r.

We continue the proof using it:

    rewrite mult_1_r.
    destruct k as [ | m].
    -- simpl. rewrite <- mult_1_r.
       apply factorial_of_succ. apply factorial_of_zero.
    --

At this point we'd like to apply factorial_of_succ, but we're stuck: the goal doesn't have the right shape, because the second argument to fact_tr_acc is not 1, and there is no multiplication. We'd like to replace fact_tr_acc (S m) (S (S m)) with (S (S m)) * fact_tr_acc (S m) 1. Let's abort the current proof, and factor out a helper lemma for that purpose.

Abort.

Nothing about the lemma we just realized we needed is actually specific to S (S m): that expression might as well be any natural number, because fact_tr_acc just uses it as the base value of the accumulator. So we can state and prove a slightly more general lemma:

Lemma fact_tr_acc_mult : forall (n m : nat),
fact_tr_acc n m = m * fact_tr_acc n 1.

The proof starts off relatively easy. Just before we get to the point of using the inductive hypothesis, we'll use a new tactic, replace, which replaces one expression with another, and generates a new subgoal requiring us to prove that the two expressions are in fact equal.

Proof.
  intros n m. induction n as [ | k IH].
  - simpl. ring.
  - replace (fact_tr_acc (S k) m) with (fact_tr_acc k ((S k) * m)).
    --

Unfortunately we're now stuck and unable to use the inductive hypothesis. The problem is that hypothesis is:

IH: fact_tr_acc k m = m * fact_tr_acc k 1

but the goal has the expression:

fact_tr_acc k (S k * m)

The left-hand side of the inductive hypothesis doesn't match that goal, because IH has just m, whereas the goal has S k * m.

But, looking at IH, there does seem to be hope. There's no reason IH needs to be "hard-coded" for a specific m. It really would hold for any m. The root of the problem is that we really want m to be univerally quantified in IH, but we already used intros to get rid of that quantification. So, let's start over, and not be so eager to introduce m.

Abort.

Lemma fact_tr_acc_mult : forall (n m : nat),
  fact_tr_acc n m = m * fact_tr_acc n 1.
Proof.
  intros n.
  induction n as [ | k IH].
  - intros p. simpl. ring.
  - intros p.
    replace (fact_tr_acc (S k) p) with (fact_tr_acc k ((S k) * p)).
    --

This time when we get here in the proof, the inductive hypothesis is more general than last time:

IH: forall m : nat, fact_tr_acc k m = m * fact_tr_acc k 1

And that means it's applicable, letting m be S k * p.

rewrite IH. simpl. rewrite mult_1_r.

Now we'd again like to use IH, this time on the right-hand side, but rewrite IH just causes the left-hand side to change. We can help Coq figure out where we want to use IH by telling it what we want the universally quantified m to be; in this case, S k. The syntax for that is as follows:

rewrite IH with (m := S k).

After that, the proof is quickly finished.

ring.
-- simpl. trivial.
Qed.

Using that lemma, we can successfully verify fact_tr:

Theorem fact_tr_correct : forall (n : nat),
  factorial_of n (fact_tr n).
Proof.
  intros n. unfold fact_tr.
  induction n as [ | k IH].
  - simpl. apply factorial_of_zero.
  - simpl. rewrite mult_1_r.
    destruct k as [ | m].
    -- simpl. rewrite <- mult_1_r.
       apply factorial_of_succ. apply factorial_of_zero.
    -- rewrite fact_tr_acc_mult.
       apply factorial_of_succ. assumption.
Qed.

Our hypothetical skeptic from before is not likely to be so skeptical of what we did here. After all, it's not so obvious that fact_tr is correct, or that it computes the factorial_of relation. Nonetheless, we have successfully proved its correctness.

Another Way to Verify Tail-Recursive Factorial

Our previous two verifications of factorial have both proved that an implementation of the factorial operation is correct. Our technique was to state a mathematical relation describing factorial, then prove that the implementation computed that relation.

Let's explore another technique now; a technique that can be easier to use. Instead of using the mathematical relation, let's just prove that the two implementations are equivalent. That is, fact and fact_tr compute the same function.

Before launching into that proof, let's pause to ask: what would it accomplish? The answer is that we'd be showing that a complicated and not-obviously-correct implementation, fact_tr, is equivalent to a simple and more-obviously-correct implementation, fact. So if we believe that fact is correct, we could then also believe that fact_tr is correct.

This technique of proving correctness with respect to a reference implementation is quite useful. (In fact, the verification of the seL4 microkernel used it to great effect.)

Without further ado, here is the theorem and its proof. It uses a helper lemma that we'll just go ahead and state first. You'll notice how much easier these are to prove than our previous verification of fact_tr!

Lemma fact_helper : forall (n acc : nat),
  fact_tr_acc n acc = (fact n ) * acc.
Proof.
  intros n.
  induction n as [ | k IH]; intros acc.
  - simpl. ring.
  - simpl. rewrite IH. ring.
Qed.

Theorem fact_tr_is_fact: forall n:nat,
  fact_tr n = fact n.
Proof.
  intros n. unfold fact_tr. rewrite fact_helper. ring.
Qed.

That concludes our verification of the factorial operation.

Extraction

Coq makes it possible to extract OCaml code (or Haskell or Scheme) from Coq code. That makes it possible for us to

write Coq code,
prove the Coq code is correct, and
extract OCaml code that can be compiled and run more efficiently than the original Coq code.

Let's extract fact_tr as an example.

Require Import Extraction.
Extraction Language OCaml.
Extraction "fact.ml" fact_tr.

That produces the following file:

type nat =
| O
| S of nat

(** val add : nat -> nat -> nat **)

let rec add n m =
  match n with
  | O -> m
  | S p -> S (add p m)

(** val mul : nat -> nat -> nat **)

let rec mul n m =
  match n with
  | O -> O
  | S p -> add m (mul p m)

(** val fact_tr_acc : nat -> nat -> nat **)

let rec fact_tr_acc n acc =
  match n with
  | O -> acc
  | S k -> fact_tr_acc k (mul n acc)

(** val fact_tr : nat -> nat **)

let fact_tr n =
  fact_tr_acc n (S O)

As you can see, Coq has preserved the nat type in this extracted code. Unforunately, computation on natural numbers is not efficient. (Addition requires linear time; multiplication, quadratic!)

We can direct Coq to extract its own nat type to OCaml's int type as follows:

Extract Inductive nat =>
int [ "0" "succ" ] "(fun fO fS n -> if n=0 then fO () else fS (n-1))".
Extract Inlined Constant Init.Nat.mul => "( * )".

The first command says to

use int instead of nat in the extract code,
use 0 instead of O and succ instead of S (the succ function is in Pervasives and is fun x -> x + 1), and
use the provided function to emulate pattern matching over the type.

The second command says to use OCaml's integer ( * ) operator instead of Coq's natural-number multiplication operator.

After issuing those commands, the extraction looks cleaner:

Extraction "fact.ml" fact_tr.

(** val fact_tr_acc : int -> int -> int **)

let rec fact_tr_acc n acc =
  (fun fO fS n -> if n=0 then fO () else fS (n-1))
    (fun _ -> acc)
    (fun k -> fact_tr_acc k (( * ) n acc))
    n

(** val fact_tr : int -> int **)

let fact_tr n =
  fact_tr_acc n (succ 0)

There is, however, a tradeoff. The original version we extracted worked (albeit inefficiently) for arbitrarily large numbers without any error. But the second version is subject to integer overflow errors. So the proofs of correctness that we did for fact_tr are no longer completely applicable: they hold only up to the limits of the types we subsituted during extraction.

Do we truly care about the limits of machine arithmetic? Maybe, maybe not. For sake of this little example, we might not. If we were verifying software to control the flight dynamics of a space shuttle, maybe we would. The Coq standard library does contain a module 31-bit integers and operators on them, which we could use if we wanted to precisely model what would happen on a particular architecture.

Verification of Data Structures

We've now seen how to verify individual functions. But what about a collection of related functions, e.g., a data structure? Now we must be concerned with not just the individual functions, but also how they interact. For example, we expect push and peek to interact in certain ways with a stack, or hd and cons with a list:

peek (push x s) = x
hd (h :: t) = h

We can specify the behavior of a data structure by writing down equations like those. This style of specification is called algebraic specification.

When we discussed testing earlier in the semester, we categorized the operations of a data structure whose representation type is t into

creators, which create values of type t from scratch,
producers, which take values of type t as input and return values of type t as output, and
observers, which take values of type t as input and return values of some other type as output.

With algebraic specification, we want to write down equations that characterize all the possible interactions between creators, producers, and observers.

Algebraic Specification of Lists

As an example, let's write an algebraic specification of lists, then verify the correctness of Coq's list implementation with respect to that specification.

Our only creator will be nil. The producers will be ::, ++, and tl. The observers will be hd and length. The hd operation will take an extra argument compared to OCaml's hd operation, which will be a "default" value to return if the list is empty. We could, of course, include other producers and observers in our specification, such as map or mem, but the ones we have chosen are enough for this example.

These are the equations we expect to hold:

hd x nil = x
hd _ (x::_) = x

tl nil = nil
tl (_::xs) = xs

nil ++ xs = xs
xs ++ nil = xs
(x :: xs) ++ ys = x :: (xs ++ ys)
lst1 ++ (lst2 ++ lst3) = (lst1 ++ lst2) ++ lst3

length nil = 0
length (_ :: xs) = 1 + length xs
length (xs ++ ys) = length xs + length ys

Below, we state each of those equations as a theorem, and prove the theorem. The proofs themselves do not contain any new concepts about Coq, so we pass over them without much comment.

hd x nil = x

Theorem hd_nil : forall (A:Type) (x:A),
hd x nil = x.
Proof. trivial. Qed.

hd _ (h :: _) = h

Theorem hd_cons : forall (A:Type) (x h : A) (t : list A),
hd x (h ::t) = h.
Proof. trivial. Qed.

tl nil = nil

Theorem tl_nil : forall (A:Type),
@tl A nil = nil.
Proof. trivial. Qed.

tl (_ :: xs) = xs

Theorem tl_cons : forall (A:Type) (x : A) (xs : list A),
tl (x ::xs) = xs.
Proof. trivial. Qed.

nil ++ xs = xs

Theorem nil_app : forall (A:Type) (xs : list A),
nil ++ xs = xs.
Proof. trivial. Qed.

xs ++ nil = xs

Theorem app_nil : forall (A:Type) (xs : list A),
  xs ++ nil = xs.
Proof.
  intros A xs.
  induction xs as [ | h t IH]; simpl.
  - trivial.
  - rewrite IH. trivial.
Qed.

(x :: xs) ++ ys = x :: (xs ++ ys)

Theorem cons_app : forall (A:Type) (x : A) (xs ys : list A),
x ::xs ++ ys = x :: (xs ++ ys ).
Proof. trivial. Qed.

lst1 ++ (lst2 ++ lst3) = (lst1 ++ lst2) ++ lst3

Theorem app_assoc : forall (A:Type) (lst1 lst2 lst3 : list A),
  lst1 ++ (lst2 ++ lst3 ) = (lst1 ++ lst2 ) ++ lst3.
Proof.
  intros A lst1 lst2 lst3.
  induction lst1 as [ | h t IH]; simpl.
  - trivial.
  - rewrite IH. trivial.
Qed.

length nil = 0

Theorem length_nil : forall (A:Type),
@length A nil = 0.
Proof. trivial. Qed.

length (_ :: xs) = 1 + length xs

Theorem length_cons : forall (A:Type) (x:A) (xs : list A),
length (x ::xs) = 1 + length xs.
Proof. trivial. Qed.

length (xs ++ ys) = length xs + length ys

Theorem length_app : forall (A:Type) (xs ys : list A),
  length (xs ++ ys) = length xs + length ys.
Proof.
  intros A xs ys.
  induction xs as [ | h t IH]; simpl.
  - trivial.
  - rewrite IH. trivial.
Qed.

Algebraic Specification of Stacks

As a second example, let's specify, implement, and verify stacks. The creator is empty, the producers are push and pop, and the observers are is_empty, peek, and size. (You might quibble with whether pop is a producer or observer; it's not really important, though.)

is_empty empty = true
is_empty (push _ _) = false
peek empty = None
peek (push x _) = Some x
pop empty = None
pop (push _ s) = Some s
size empty = 0
size (push _ s) = 1 + size s

Module MyStack.

AF: We will represent a stack as a list. The head of the list is the top of the stack.

Definition stack (A:Type) := list A.

Definition empty {A:Type} : stack A := nil.

Definition is_empty {A:Type} (s : stack A) : bool :=
  match s with
  | nil => true
  | _::_ => false
  end.

Definition push {A:Type} (x : A) (s : stack A) : stack A :=
  x ::s.

Definition peek {A:Type} (s : stack A) : option A :=
  match s with
  | nil => None
  | x::_ => Some x
  end.

Definition pop {A:Type} (s : stack A) : option (stack A) :=
  match s with
  | nil => None
  | _::xs => Some xs
  end.

Definition size {A:Type} (s : stack A) : nat :=
  length s.

Now that we've implemented all the stack operations, we'll verify their correctness. All the proofs are trivial, because the implementation is so simple.

is_empty empty = true

Theorem empty_is_empty : forall (A:Type),
@is_empty A empty = true.
Proof. trivial. Qed.

is_empty (push _ _) = false

Theorem push_not_empty : forall (A:Type) (x:A) (s : stack A),
is_empty (push x s) = false.
Proof. trivial. Qed.

peek empty = None

Theorem peek_empty : forall (A:Type),
@peek A empty = None.
Proof. trivial. Qed.

peek (push x _) = Some x

Theorem peek_push : forall (A:Type) (x:A) (s : stack A),
peek (push x s) = Some x.
Proof. trivial. Qed.

pop empty = None

Theorem pop_empty : forall (A:Type),
@pop A empty = None.
Proof. trivial. Qed.

pop (push _ s) = Some s

Theorem pop_push : forall (A:Type) (x:A) (s : stack A),
pop (push x s) = Some s.
Proof. trivial. Qed.

size empty = 0

Theorem size_empty : forall (A:Type),
@size A empty = 0.
Proof. trivial. Qed.

size (push x s) = 1 + size s

Theorem size_push : forall (A:Type) (x:A) (s : stack A),
size(push x s) = 1 + size s.
Proof. trivial. Qed.

End MyStack.

To extract our stack implementation to OCaml, it will help to additional declare to Coq that we want to extract its booleans, options, and lists to OCaml's own built-in types for those.

Extract Inductive bool => "bool" [ "true" "false" ].
Extract Inductive option => "option" [ "Some" "None" ].
Extract Inductive list => "list" [ "[]" "(::)" ].
Extract Inlined Constant length => "List.length".

Extraction "mystack.ml" MyStack.

Verification of a Compiler

One of the big success stories of Coq verification is the CompCert C compiler. Its source language is ISO C99. It is an optimizing compiler that targets PowerPC, ARM, RISC-V, and x86 processors. The correctness proofs establish that the executable code it produces will behave exactly as it should according to the semantics of the C source code.

Let's get a sense of what would be required to verify a compiler. We'll take a tiny source language, compile it into a tiny bytecode language, and verify the correctness of that compilation. We'll only worry here about the backend of the compiler, not about the frontend (including parsing). CompCert originally only was a verified backend, too, but in the last few years even the front end has been verified.

Module Compiler.

As the source language, we'll use arithmetic expressions that have only integer constants and addition:

e ::= i | e + e

In OCaml, we could represent that with this AST type:

type expr = 
 | Const of int 
 | Plus of expr * expr

In Coq, the type is very similar, though we'll use nat instead of int:

Inductive expr : Type :=
| Const : nat -> expr
| Plus : expr -> expr -> expr.

The dynamic semantics of expressions is straightforward:

i ==> i
e1 + e2 ==> i
  if e1 ==> i1
  and e2 ==> i2
  and i = i1 + i2

And it's easily implementable. Here's a big-step interpreter:

Fixpoint eval_expr (e : expr) : nat :=
  match e with
  | Const i => i
  | Plus e1 e2 => (eval_expr e1) + (eval_expr e2)
  end.

Here are a couple test cases for our interpreter:

Example source_test_1 : eval_expr (Const 42) = 42.
Proof. trivial. Qed.

Example source_test_2 : eval_expr (Plus (Const 2) (Const 2)) = 4.
Proof. trivial. Qed.

As a target language, let's use something similar to what Java and OCaml use for bytecode. They are based on a stack machine model, in which bytecode instructions manipulate a stack. Our tiny little bytecode language will have the following instruction set:

instr ::= PUSH i | ADD

A program is just a sequence of instructions. For example, the following program pushes 2 on the stack, pushes 2 again, then adds the two values on the stack. Adding causes two values to be popped, and the sum pushed back onto the stack.

PUSH 2
PUSH 2
ADD

We'll implement this stack language in Coq as follows. An instr is a machine instruction. A program prog is a list of instructions.

Inductive instr : Type :=
| PUSH : nat -> instr
| ADD : instr.

Definition prog := list instr.

Now we can write an interpreter for the target language. Evaluation of a program takes in an initial stack, and returns the final stack. But since evaluation could fail (if we try to ADD when there aren't at least two values on the stack), we wrap the return in an option, and return None if an error occurs.

Definition stack := list nat.

Fixpoint eval_prog (p : prog) (s : stack) : option stack :=
  match p,s with
    | PUSH n :: p', s => eval_prog p' (n :: s)
    | ADD :: p', x :: y :: s' => eval_prog p' (x + y :: s')
    | nil, s => Some s
    | _, _ => None
  end.

Here are a couple unit tests for the target language interpreter.

Example target_test_1 : eval_prog [PUSH 42] [] = Some [42].
Proof. trivial. Qed.

Example target_test_2 : eval_prog [PUSH 2; PUSH 2; ADD ] [] = Some [4].
Proof. trivial. Qed.

Now we're ready to translate from the source language to the target language.

To translate a constant c, we just push c onto the stack.
To translate an addition e1 + e2, we translate e2, translate e1, then append the instructions together, followed by an ADD instruction.

The function below, compile e, produces a program p, such that evaluation of p leaves a single new value at the top of the stack, and that value would be the result of evaluating e.

Fixpoint compile (e : expr) : prog :=
  match e with
    | Const n => [PUSH n]
    | Plus e1 e2 => compile e2 ++ compile e1 ++ [ADD ]
  end.

Here are a couple unit tests for our compiler:

Example compile_test_1 : compile (Const 42) = [PUSH 42].
Proof. trivial. Qed.

Example compile_test_2 : compile (Plus (Const 2) (Const 3))
= [PUSH 3; PUSH 2; ADD ].
Proof. trivial. Qed.

Those tests demonstrate that the compiler produces some programs that do seem to correspond to the input expression. But we haven't really tested the postcondition of compile: we want to know whether both sides of the = in those test cases above above evaluate to the same value. So let's check that, too.

Example post_test_1 :
  eval_prog (compile (Const 42)) [] = Some [eval_expr (Const 42)].
Proof. trivial. Qed.

Example post_test_2 :
  eval_prog (compile (Plus (Const 2) (Const 3))) []
  = Some [eval_expr (Plus (Const 2) (Const 3))].
Proof. trivial. Qed.

So far, so good. But as we know from Dijkstra, "testing can only prove the presence of bugs, never their absence." How could we show that the compiler is correct for every input expression? WE PROVE IT!

The following theorem is a specification that says what it means for compile to be correct. In particular, it says these two computations produce the same result:

Compiling e then evaluating the resulting program according to the semantics of the target language, starting with the empty stack.
Evaluating e according to the semantics of the source language, then pushing the result on the empty stack and wrapping it with Some.

Theorem compile_correct : forall (e:expr),
eval_prog (compile e) [] = Some [eval_expr e ].
Abort.

Proving the theorem will require a helper lemma about the associativity of list append.

Lemma app_assoc_4 : forall (A:Type) (l1 l2 l3 l4 : list A),
  l1 ++ (l2 ++ l3 ++ l4 ) = (l1 ++ l2 ++ l3 ) ++ l4.
Proof.
  intros A l1 l2 l3 l4.
  replace (l2 ++ l3 ++ l4) with ((l2 ++ l3) ++ l4);
  rewrite app_assoc; trivial.
Qed.

We'll also need a helper lemma that generalizes the main theorem. Specifically, it says that there could be additional instructions p in the program, and additional values s on the stack, but those won't keep the expression e from being compiled and executed correctly. The proof uses the same technique of a generalized inductive hypothesis, where we won't introduce all the variables right away, as we used when verifying fact_tr above.

Lemma compile_helper : forall (e:expr) (s:stack) (p:prog),
  eval_prog (compile e ++ p) s = eval_prog p (eval_expr e :: s).
Proof.
  intros e.
  induction e as [n | e1 IH1 e2 IH2]; simpl.
  - trivial.
  - intros s p. rewrite <- app_assoc_4.
    rewrite IH2. rewrite IH1. simpl. trivial.
Qed.

Theorem compile_correct : forall (e:expr),
  eval_prog (compile e) [] = Some [eval_expr e ].
Proof.
  intros e.
  induction e as [n | e1 IH1 e2 IH2]; simpl.
  - trivial.
  - repeat rewrite compile_helper. simpl. trivial.
Qed.

End Compiler.

Now we have a verified compiler, and we can extract it (and the two tiny interpreters we also wrote) to OCaml!

Extract Inlined Constant Init.Nat.add => "( + )".
Extract Inlined Constant app => "( @ )".
Extraction "compiler.ml" Compiler.eval_expr Compiler.eval_prog Compiler.compile.

Summary

We've come to fulfillment of our purposes in learning Coq: verification. To reach this point, we had to learn how to program in Coq's functional programming language and how to prove in Coq's logic. Along the way we also learned about the correspondence between proofs and programs.

Forty years ago, verification techniques worked only for really short programs written in toy languages, and it was all done with pen and paper. Today, research projects are able to verify compilers and operating systems, and the computer can check the proofs. In another forty years, who knows?

Perhaps, at the end of this unit on formal methods, you find yourself wondering why we spent so much time on it. One reason is that it's an important (if niche) area in programming languages and software engineering. To be well educated in this field requires that you know something about it. Even if you never touch formal methods again, you now can talk from first hand experience with other people in industry. Another reason is that the future of functional programming (hence programming in general) is headed toward languages with ever richer type systems, like Coq's. Coq's type system is sophisticated enough to express not just programs but theorems. Before the end of your career, it's a good bet there will be some mainstream language that has rich enough types to express correctness properties that are beyond today's type systems, even if not rich enough to state arbitrary propositions.

But even more importantly, the final reason we covered formal methods was to spend some time thinking about what it means for a program to be correct. One perspective on that issue, a perspective well covered by other introductory programming courses as well as this course, is testing. Unit testing is a cost effective way to ascertain whether a program has faults. Now, you've experienced another perspective: proof. Proving the correctness of a program is expensive, yet it offers guarantees beyond unit tests.

That's not at all ---no, not at all!--- to claim that formal methods are perfect. You might end up proving the wrong theorems. You might make assumptions that turn out to be invalid. There might even be faults in the programs you use to check your proofs. All of those could make your formal efforts futile.

But at the end of the day, we programmers (we happy programmers, we band pursuing the craft of code) are creating artifacts that are part proof and part art and if we do our jobs right altogether beautiful. Beauty is our business. Never lose sight of that.

Terms and concepts

algebraic specification
axiom
extraction
generalized inductive hypothesis
inference rule
redundancy
reference implementation
relation
specification
testing
verification

Tactics

replace