RedblackImplementation and Proof of Red-Black Trees

Required Reading

(1) General background on red-black trees:
  • Section 3.3 of Algorithms, Fourth Edition, by Sedgewick and Wayne, Addison Wesley 2011; or
  • Chapter 13 of Introduction to Algorithms, 3rd Edition, by Cormen, Leiserson, and Rivest, MIT Press 2009; or
  • Wikipedia.
(2) An explanation of the particular implementation we use here. Red-Black Trees in a Functional Setting, by Chris Okasaki. Journal of Functional Programming, 9(4):471-477, July 1999. https://www.cambridge.org/core/journals/journal-of-functional-programming/article/red-black-trees-in-a-functional-setting/62BC5EA75A2C95E3F6EE95AE3DADF0E5
(3) Optional reading: Efficient Verified Red-Black Trees, by Andrew W. Appel, September 2011. http://www.cs.princeton.edu/~appel/papers/redblack.pdf

Implementation

Red-black trees are a form of binary search tree (BST), but with balance. Recall that the depth of a node in a tree is the distance from the root to that node. The height of a tree is the depth of the deepest node. The insert or lookup function of the BST algorithm (Chapter SearchTree) takes time proportional to the depth of the node that is found (or inserted). To make these functions run fast, we want trees where the worst-case depth (or the average depth) is as small as possible.
In a perfectly balanced tree of N nodes, every node has depth less than or or equal to log N, using logarithms base 2. In an approximately balanced tree, every node has depth less than or equal to 2 log N. That's good enough to make insert and lookup run in time proportional to log N.
The trick is to mark the nodes Red and Black, and by these marks to know when to locally rebalance the tree. For more explanation and pictures, see the Required Reading above.
We will use the same framework as in Extract: keys are OCaml integers. We don't repeat the Extract commands, because they are imported implicitly.
From VFA Require Import Perm.
From VFA Require Import Extract.
From Coq Require Import Lists.List.
Export ListNotations.
From Coq Require Import Logic.FunctionalExtensionality.
Require Import ZArith.
Open Scope Z_scope.

Definition key := int.

Inductive color := Red | Black.

Section TREES.
Variable V : Type.
Variable default: V.

Inductive tree : Type :=
 | E : tree
 | T: colortreekeyVtreetree.

Definition empty_tree := E.
lookup is exactly as in our (unbalanced) search-tree algorithm in Extract, except that the T constructor carries a color component, which we can ignore here.
Fixpoint lookup (x: key) (t : tree) : V :=
  match t with
  | Edefault
  | T _ tl k v trif ltb x k then lookup x tl
                         else if ltb k x then lookup x tr
                         else v
  end.
The balance function is copied directly from Okasaki's paper. Now, the nice thing about machine-checked proof in Coq is that you can prove this correct without actually understanding it! So, do read Okasaki's paper, but don't worry too much about the details of this balance function.
Definition balance rb t1 k vk t2 :=
 match rb with RedT Red t1 k vk t2
 | _
 match t1 with
 | T Red (T Red a x vx b) y vy c
      T Red (T Black a x vx b) y vy (T Black c k vk t2)
 | T Red a x vx (T Red b y vy c) ⇒
      T Red (T Black a x vx b) y vy (T Black c k vk t2)
 | amatch t2 with
            | T Red (T Red b y vy c) z vz d
                T Red (T Black t1 k vk b) y vy (T Black c z vz d)
            | T Red b y vy (T Red c z vz d) ⇒
                T Red (T Black t1 k vk b) y vy (T Black c z vz d)
            | _T Black t1 k vk t2
            end
  end
 end.
(In contrast, Sedgewick has proposed left-leaning red-black trees, which have a simpler balance function (but a more complicated invariant). He does this in order to make the proof of correctness easier: there are fewer cases in the balance function, and therefore fewer cases in the case-analysis of the proof of correctness of balance. But as you will see, our proofs about balance will have automated case analyses, so we don't care how many cases there are!)
The insert operation uses a helper function ins to do the insertion, and another function makeBlack to color the resulting tree's root black.
Definition makeBlack t :=
  match t with
  | EE
  | T _ a x vx bT Black a x vx b
  end.

Fixpoint ins x vx s :=
 match s with
 | ET Red E x vx E
 | T c a y vy bif ltb x y then balance c (ins x vx a) y vy b
                  else if ltb y x then balance c a y vy (ins x vx b)
                  else T c a x vx b
 end.

Definition insert x vx s := makeBlack (ins x vx s).

Verification

Now that the implementation has been defined, we want to verify certain properties:
  • SearchTree: Red-black trees should satisfy the SearchTree representation invariant: the keys in each left subtree are all less than the node's key, and the keys in each right subtree are greater.
  • Abs: Red-black trees should be correct w.r.t. the abstraction relation Abs, which relates trees to maps.
  • Balanced: For sake of performance, red-black trees should be balanced. Specifically, they should satisfy the red-black invarinats: there is the same number of black nodes on any path from the root to each leaf; and there are never two consecutive red nodes.
We'll address the properties in that order. But first, we need to learn a little more about proof automation, because the proofs of those properties would be too hard to manage without automation.

Proof Automation for Case-Analysis Proofs.

Several of the proofs for red-black trees require a big case analysis over all the clauses of the balance function. These proofs are very tedious to do by hand, but are easy to automate. Let's get some practice by automating a proof that insertion produces a non-empty tree.
Lemma T_neq_E:
  c l k v r, T c l k v rE.
  unfold not. intros. discriminate.
Qed.

Lemma ins_not_E: x vx s, ins x vx sE.
Proof.
intros. destruct s; simpl.
apply T_neq_E.
remember (ins x vx s1) as a1.
unfold balance.
Let's destruct on the topmost case, ltb x k. We can use destruct instead of bdestruct because we don't need to remember whether x<k or xk.
destruct (ltb x k).
The topmost test is match c with..., so just destruct c.
destruct c.
apply T_neq_E.
The topmost test is match a1 with..., so just destruct a1.
destruct a1.
The topmost test is match s2 with..., so just destruct s2.
destruct s2.
unfold not. intros. discriminate.
How long will this go on? A long time! But it will terminate. Just keep typing. Better yet, let's automate.
The following tactic applies whenever the current goal looks like match ?c with Red _ | Black _ end _ and what it does in that case is, destruct c.
match goal with
| ⊢ match ?c with Red_ | Black_ end_destruct c
end.
The following tactic applies whenever the current goal looks like, match ?s with E _ | T _ _ _ _ _ _ end _ and what it does in that case is, destruct s.
match goal with
    | ⊢ match ?s with E_ | T _ _ _ _ __ end_destruct s
end.
Let's apply that tactic again, and then try it on the subgoals, recursively. Recall that the repeat tactical keeps trying the same tactic on subgoals.
repeat match goal with
    | ⊢ match ?s with E_ | T _ _ _ _ __ end_destruct s
end.

match goal with
  | ⊢ T _ _ _ _ _Eapply T_neq_E
end.
Using what we've learned, let's start the proof all over again.
Abort.

Lemma ins_not_E: x vx s, ins x vx sE.
Proof.
intros. destruct s; simpl.
apply T_neq_E.
remember (ins x vx s1) as a1.
unfold balance.
This is the beginning of the big case analysis. This time, let's combine several tactics together:
repeat match goal with
  | ⊢ (if ?x then _ else _) ≠ _destruct x
  | ⊢ match ?c with Red_ | Black_ end_destruct c
  | ⊢ match ?s with E_ | T _ _ _ _ __ end_destruct s
end.
What we have left is 117 cases, every one of which can be proved the same way:
apply T_neq_E.
apply T_neq_E.
apply T_neq_E.
apply T_neq_E.
apply T_neq_E.
apply T_neq_E.
(* Only 111 cases to go... *)
Abort.
Let's start over again, adding that apply to the case analysis.
Lemma ins_not_E: x vx s, ins x vx sE.
Proof.
intros. destruct s; simpl.
apply T_neq_E.
remember (ins x vx s1) as a1.
unfold balance.

repeat match goal with
  | ⊢ (if ?x then _ else _) ≠ _destruct x
  | ⊢ match ?c with Red_ | Black_ end_destruct c
  | ⊢ match ?s with E_ | T _ _ _ _ __ end_destruct s
  | ⊢ T _ _ _ _ _Eapply T_neq_E
 end.
Qed.
Success! We'll use that same technique of repeatedly matching against the goal in a few more proofs, below.

Verifying Preservation of the Representation Invariant

The SearchTree property for red-black trees is exactly the same as for (unbalanced) search trees; we just ignore the color c of each node.
Inductive SearchTree' : ZtreeZProp :=
| ST_E : lo hi, lohiSearchTree' lo E hi
| ST_T : lo c l k v r hi,
    SearchTree' lo l (int2Z k) →
    SearchTree' (int2Z k + 1) r hi
    SearchTree' lo (T c l k v r) hi.

Inductive SearchTree: treeProp :=
| ST_intro: t lo hi, SearchTree' lo t hiSearchTree t.
Now we want to prove that the red-black operations preserve the representation invariant. Since lookup doesn't return a tree, all we need to worry about is empty_tree and insert.

Empty and SearchTree

The empty tree is easy:
Theorem empty_SearchTree : SearchTree empty_tree.
Proof.
  unfold empty_tree. apply ST_intro with (lo := 0) (hi := 0).
  constructor. omega.
Qed.
Verifying that insert preserves SearchTree will take more work.

Balance and SearchTree

Since the balance function is an important part of insert, let's start by proving that balance produces a search tree whose bounds are related to the bounds on its inputs:
Lemma balance_SearchTree:
 c s1 k kv s2 lo hi,
   SearchTree' lo s1 (int2Z k) →
   SearchTree' (int2Z k + 1) s2 hi
   SearchTree' lo (balance c s1 k kv s2) hi.
Proof.
intros.
unfold balance.
Use proof automation for this case analysis.
repeat match goal with
  | ⊢ SearchTree' _ (match ?c with Red_ | Black_ end) _destruct c
  | ⊢ SearchTree' _ (match ?s with E_ | T _ _ _ _ __ end) _destruct s
  end.
58 cases to consider!
* constructor; auto.
* constructor; auto.
* constructor; auto.
(* There's a pattern emerging, but, let's keep going for now. *)
* constructor; auto.
  constructor; auto. constructor; auto.
To prove this one, we have to do inversion on the proof goals above the line.
  inv H. inv H0. inv H8. inv H9.
  auto.
  constructor; auto.
  inv H. inv H0. inv H8. inv H9. auto.
  inv H. inv H0. inv H8. inv H9. auto.
There's a pattern here. Whenever we have a hypothesis above the line that looks like,
  • H: SearchTree' _ E _ - H: SearchTree' _ (T _ ) _
we should invert it. Let's build that idea into our proof automation.
Abort.

Lemma balance_SearchTree:
 c s1 k kv s2 lo hi,
   SearchTree' lo s1 (int2Z k) →
   SearchTree' (int2Z k + 1) s2 hi
   SearchTree' lo (balance c s1 k kv s2) hi.
Proof.
intros.
unfold balance.
Use proof automation for this case analysis.
repeat match goal with
  | ⊢ SearchTree' _ (match ?c with Red_ | Black_ end) _
             destruct c
  | ⊢ SearchTree' _ (match ?s with E_ | T _ _ _ _ __ end) _
             destruct s
  | H: SearchTree' _ E __inv H
  | H: SearchTree' _ (T _ _ _ _ _) __inv H
  end.
58 cases to consider!
* constructor; auto.
* constructor; auto. constructor; auto. constructor; auto.
* constructor; auto. constructor; auto. constructor; auto.
  constructor; auto. constructor; auto.
* constructor; auto. constructor; auto. constructor; auto.
  constructor; auto. constructor; auto.
* constructor; auto. constructor; auto. constructor; auto.
  constructor; auto. constructor; auto.
Do we see a pattern here? We can add that to our automation!
Abort.

Lemma balance_SearchTree:
 c s1 k kv s2 lo hi,
   SearchTree' lo s1 (int2Z k) →
   SearchTree' (int2Z k + 1) s2 hi
   SearchTree' lo (balance c s1 k kv s2) hi.
Proof.
intros.
unfold balance.
Use proof automation for this case analysis.
repeat match goal with
  | ⊢ SearchTree' _ (match ?c with Red_ | Black_ end) _
              destruct c
  | ⊢ SearchTree' _ (match ?s with E_ | T _ _ _ _ __ end) _
              destruct s
  | H: SearchTree' _ E __inv H
  | H: SearchTree' _ (T _ _ _ _ _) __inv H
  end;
 repeat (constructor; auto).
Qed.

Exercise: 2 stars, standard (ins_SearchTree)

You won't need proof automation of the kind we just used for balance_SearchTree to prove this one. You will need to apply balance_SearchTree twice. If you solved insert_SearchTree in SearchTree you could copy and adapt that proof.
Lemma ins_SearchTree: x vx s lo hi,
    loint2Z x
    int2Z x < hi
    SearchTree' lo s hi
    SearchTree' lo (ins x vx s) hi.
Proof.
(* FILL IN HERE *) Admitted.

Insert and SearchTree

Our goal is to prove that insert preserves the property of being a search tree. We'll eventually get there with the lemma insert_SearchTree below, but there are several helper lemmas to prove first.
Along the way you will likely need to use some library theorems about Z. You might want to briefly peruse the output of the following searches to familiarize yourself with what is available.
Hint: you don't need induction unless we have explicitly indicated it.
Search Z.le.
Search Z.min.
Search Z.max.

Exercise: 2 stars, standard (ins_E_SearchTree)

Lemma ins_E_SearchTree : x vx,
    SearchTree (ins x vx E).
Proof. (* FILL IN HERE *) Admitted.

Exercise: 4 stars, standard (ins_T_SearchTree)


Lemma expand_range_SearchTree':
  s lo hi,
    SearchTree' lo s hi
    lo' hi',
      lo'lohihi'
      SearchTree' lo' s hi'.
Proof. (* FILL IN HERE *) Admitted.

Lemma ins_T_SearchTree_aux : x vx c l k kv r lo hi,
  SearchTree' lo l (int2Z k)
  → SearchTree' (int2Z k + 1) r hi
  → SearchTree' (Z.min lo (int2Z x))
                (ins x vx (T c l k kv r))
                (Z.max hi (int2Z x + 1)).
Proof.
Hint: This is the hardest lemma. You'll need two applications of expand_range_SearchTree', and a few lemmas about Z.
(* FILL IN HERE *) Admitted.

Lemma ins_T_SearchTree : x vx c l k kv r,
    SearchTree (T c l k kv r) → SearchTree (ins x vx (T c l k kv r)).
Proof.
Hint: the only lemma you need is ins_T_SearchTree_aux.
(* FILL IN HERE *) Admitted.

Exercise: 3 stars, standard (insert_preserves_SearchTree)


Lemma ins_makeBlack_SearchTree : s,
  SearchTree sSearchTree (makeBlack s).
Proof. (* FILL IN HERE *) Admitted.

Theorem insert_SearchTree: x vx s,
    SearchTree sSearchTree (insert x vx s).
Proof.
Hint: use ins_makeBlack_SearchTree, ins_E_SearchTree, and ins_T_SearchTree.
(* FILL IN HERE *) Admitted.
That concludes our verification that the red-black implementation preserves the representation invariant. The main theorems we proved were the following:
Check empty_SearchTree.
Check insert_SearchTree.

Verifying Correctness w.r.t. the Abstraction Relation

Now we turn our attention to the abstraction relation. As we did in SearchTree and Extract, we will prove a relationship between red-black trees and (functional) maps.
Import IntMaps.
The abstraction relation Abs is the same as what we previously used, extended to include the color of nodes.
Definition combine {A} (pivot: Z) (m1 m2: total_map A) : total_map A :=
  fun xif Z.ltb x pivot then m1 x else m2 x.

Inductive Abs : treetotal_map VProp :=
| Abs_E : Abs E (t_empty default)
| Abs_T : a b c l k vk r,
      Abs l a
      Abs r b
      Abs (T c l k vk r) (t_update (combine (int2Z k) a b) (int2Z k) vk).

Empty and Abs

The empty tree is related to the empty map.

Exercise: 1 star, standard (empty_tree_relate)

Theorem empty_tree_relate :
  Abs empty_tree (t_empty default).
Proof. (* FILL IN HERE *) Admitted.

Lookup and Abs

If tree t abstracts to map m, then looking up a key in t yields the same value as looking up that key in m.

Exercise: 1 star, standard (lookup_relate)

Copy and adapt your proof from Extract.
Theorem lookup_relate : k t m,
    Abs t mlookup k t = m (int2Z k).
Proof.
(* FILL IN HERE *) Admitted.

Balance and Abs

The proof that insert is correct will take considerably more work because of the balance operation. In exercise balance_relate, soon to come, we'll prove that balance preserves the abstraction of a search tree. But first, here are two quick lemmas that will be helpful in our proof about balance.
Lemma Abs_helper:
  m' t m, Abs t m'm' = mAbs t m.
Proof.
   intros. subst. auto.
Qed.

Lemma SearchTree'_le:
  lo t hi, SearchTree' lo t hilohi.
Proof.
  induction 1; omega.
Qed.
We'll also make use of this Ltac tactic in balance_relate. The purpose of this tactic is to prove the equivalence of maps.
Ltac contents_equivalent_prover :=
  extensionality x; unfold t_update, combine, t_empty;
  repeat match goal with
   | ⊢ context [if ?A then _ else _] ⇒ bdestruct A
  end;
  auto;
  omega.
Here's an example of how that tactic could be used. You might want to stop and convince yourself that you understand why the tactic succeeds in proving the example.
Section ContentsExample.
Open Scope string_scope.
Example maps_equiv_ex :
  t_update (t_update (t_empty "") 2 "two") 1 "one"
=
  t_update (t_update (t_empty "") 1 "one") 2 "two".
Proof.
  contents_equivalent_prover.
Qed.
End ContentsExample.

Exercise: 4 stars, standard (balance_relate)

Now it's time to prove that balance preserves the abstraction of a search tree. We'll make heavy use of proof automation, similar to ins_not_E and balance_SearchTree above.
Theorem balance_relate:
  c l k vk r m,
    SearchTree (T c l k vk r) →
    Abs (T c l k vk r) m
    Abs (balance c l k vk r) m.

Proof.
  intros. inv H. unfold balance.
This is the main "loop" of the proof.
  repeat
    match goal with
    | _idtac (* this is just a "no op" to get us started *)
    end.
At this point you should have just 1 subgoal. Now you're going to augment that main loop by adding the following clauses, one at a time, to it.
  • 1. The subgoal has the form
    Abs (match c with Red ... | Black ...) ....
    We need to destruct c to make progress. So, replace the "no op" clause provided in the main loop with the following clause:
    | Abs (match ?c with Red _ | Black _ end) _ destruct c
    Step forward through the main loop; you should have 2 subgoals. If not, stop now and figure out what has gone wrong.
  • 2. The first subgoal is provable from the assumptions. So, add a new clause to the main loop below the clause you just added:
    | _ assumption
    Step forward through the main loop; you should be back down to 1 subgoal.
  • 3. The subgoal has the form
    Abs (match ... with E ... | T ... ...) ...
    Much like in step (1) above, we need to destruct the tree being matched to make progress. So, add a new clause to the main loop to do that. You should end up with 21 subgoals.
  • 4. The first subgoal has the form Abs (T ...) m. That is, we want to show that the tree abstracts to map m. Look at the hypotheses: one of them (call it H0) shows that a similar tree abstracts to the same m. To make progress, we can extract the evidence for why that similar tree abstracts to m. So, add a new clause that inverts the hypothesis using the inv tactic.
    If it seems like Coq diverges into an infinite loop, double check the pattern matching you wrote for the hypothesis. Make sure you're matching against something containing the T constructor.
    You still have 21 subgoals, but instead of being about arbitrary maps m, they should now contain information about the contents of those maps, e.g., t_update (combine ...) ....
  • 5. There is yet more information to be gained about the maps. Look at the hypotheses for the current subgoal: one of them should be of the form Abs E .... That is, it tells us something about a map being the abstraction of the empty tree. Empty trees were not covered by the clause we just added in step (4). So, add another clause to invert such hypotheses about the abstraction of empty trees. You will still have 21 subgoals, but now none of the hypotheses about Abs will contain information about the structure of the tree; that information has all been extracted.
  • 6. At this point every subgoal has the form Abs (T ...) (t_update ...). We'd like to apply the Abs_T constructor to make progress. But the map (t_update ...) does not have the right form. We therefore need to replace (t_update ...) with another map that has the right form. That's why we created the Abs_helper lemma above.
    Try eapply Abs_helper on the first subgoal. (Don't worry about the main loop yet.) You'll get two subgoals. The first you can prove with
             apply Abs_T. apply Abs_T.
             apply Abs_E. apply Abs_E.
             apply Abs_T. eassumption. eassumption.
           
    Step through that, one command at a time, to understand what it's doing. By the end, Coq has determined what the map needs to be in order for Abs_T to be applicable. The second subgoal which remains requires us to prove that map to be equivalent to the original map.
    Now undo those seven commands, and do this instead:
             repeat econstructoreassumption.
           
    That solves the first subgoal in exactly the same way.
    What we'd like to do now is use apply this technique to all of the 21 subgoals. Do that by adding this clause to your main loop:
             | ⊢ _ ⇒ eapply Abs_helper; [repeat econstructoreassumption | ]
    The tactical t1; [t2 | ] first applies t1 to a goal, then applies t2 to the first resulting subgoal, leaving the second resulting subgoal unchanged.
    You should still have 21 subgoals, all of which should have the form t_update ... = t_update .... So at this point, we're done with the abstraction relation, and all that's left is to prove the equality of some maps.
  • 7. The whole reason we created the contents_equivalent_prover tactic above was to prove the equality of maps. So add this clause to your main loop:
             | ⊢ _ = _ ⇒ contents_equivalent_prover
    Unfortunately, you'll notice that it doesn't solve any of the subgoals. The problem is that we still haven't extracted enough information from the hypotheses. The reason those maps are all equal has to do with inequality relationships amongst the keys being used in the maps. Those inequalities are part of the SearchTree' property. So, we need to invert all the hypotheses about SearchTree'.
    Add two clauses to the main loop to use inv to invert any hypothesis of the form SearchTree' ... (T ...) ... or SearchTree' ... E .... As with Abs above, if you encounter an infinite loop, make sure you are pattern matching correctly. You should be left with 15 subgoals.
  • 8. The final piece of information to extract for map equalities has to do with an additional property of SearchTree': whenever SearchTree' lo _ hi holds, it also holds that lo hi. So, add a clause to the main loop to apply SearchTree'_le in any such hypothesis when the subgoal has the form _ = _ .
Qed! That finishes the proof. You might notice that Coq takes awhile to process the Qed. command, because the resulting proof is quite large. Try printing it out with Print balance_relate. if you want to see for yourself. (But please don't leave that Print command in any homework you might submit.)
(* FILL IN HERE *) Admitted.

Insert and Abs

Now that we've taken care of balance, let's get back to insert.

Exercise: 2 stars, standard (ins_left_and_right)

The next two lemmas, ins_left and ins_right, show how the SearchTree property is preserved by ins. We'll soon need that information about the representation invariant.
Lemma ins_left : c l j u r k v,
    SearchTree (T c l j u r)
    → int2Z k < int2Z j
    → SearchTree (T c (ins k v l) j u r).
Proof. (* FILL IN HERE *) Admitted.

Lemma ins_right : c l j u r k v,
    SearchTree (T c l j u r)
    → int2Z j < int2Z k
    → SearchTree (T c l j u (ins k v r)).
Proof. (* FILL IN HERE *) Admitted.

Exercise: 3 stars, standard (ins_relate)

Now we prove the correctness of ins, which is the core of the insert operation.
Copy and adapt your proof of insert_relate from Extract. You'll use balance_relate twice, and ins_left and ins_right once each. Other than those, you shouldn't need any other theorems from this file.
Theorem ins_relate:
 k v t m,
    SearchTree t
    Abs t m
    Abs (ins k v t) (t_update m (int2Z k) v).
Proof.
(* FILL IN HERE *) Admitted.
The rest of the proof that insert is correct is straightforward, given ins_relate, so we'll go ahead and finish it rather than make it an exercise.
Lemma makeBlack_relate:
 t m,
    Abs t m
    Abs (makeBlack t) m.
Proof.
intros.
destruct t; simpl; auto.
inv H; constructor; auto.
Qed.

Theorem insert_relate:
 k v t m,
    SearchTree t
    Abs t m
    Abs (insert k v t) (t_update m (int2Z k) v).
Proof.
intros.
unfold insert.
apply makeBlack_relate.
apply ins_relate; auto.
Qed.
OK, we're almost done! We have proved all these main theorems:
Check empty_tree_relate.
Check lookup_relate.
Check insert_relate.
Together these imply that this implementation of red-black trees is correct, in that it respects the abstraction relation.

Optional: Verification of Elements

Exercise: 4 stars, standard, optional (elements)

Prove the correctness of the elements function. Because elements does not pay attention to colors, and does not rebalance the tree, then its proof should be a simple copy-paste from SearchTree, with only minor edits.
Fixpoint elements' (s: tree) (base: list (key*V)) : list (key * V) :=
 match s with
 | Ebase
 | T _ a k v belements' a ((k,v) :: elements' b base)
 end.

Definition elements (s: tree) : list (key * V) := elements' s nil.

Definition elements_property (t: tree) (m: total_map V) : Prop :=
   k v,
     (In (k,v) (elements t) → m (int2Z k) = v) ∧
     (m (int2Z k) ≠ default
      k', int2Z k = int2Z k'In (k', m (int2Z k)) (elements t)).

Theorem elements_relate:
  t m,
  SearchTree t
  Abs t m
  elements_property t m.
Proof.
(* FILL IN HERE *) Admitted.

Verification of Balance

Red-black trees are supposed to be more efficient than ordinary search trees, because they stay balanced. In a perfectly balanced tree, any two leaves have exactly the same depth, or the difference in depth is at most 1. In an approximately balanced tree, no leaf is more than twice as deep as another leaf. Red-black trees are approximately balanced. Consequently, no node is more then 2 log N deep, where N is the number of nodes in the tree, and the run time for insert or lookup is therefore O (log N).
We can't prove anything directly about the run time, because we don't have a cost model for Coq functions. But we can prove that the trees stay approximately balanced; this tells us important information about their efficiency.

The Red-Black Invariants

There are two red-black invariants. The global invariant requires the black height of every path in the tree to be the same. That is, there must be n black nodes on every path from the root to a leaf, for some n. The local invariant forbids a red node from having a red child.
We'll define an inductive proposition is_redblack to capture the notion that a subtree satisfies the red-black invariants. The proposition will take three arguments: the subtree, the color of the subtree's parent, and the black height that the subtree should have. That color is used to enforce the local invariant, and the height is of course used to enforce the global invariant.
So, is_redblack t c n should hold when t satisfies the red-black invariants, assuming that c is the color of t's parent, and n is the black height that t is supposed to have. If t happens to have no parent (i.e., it is the entire tree), then it will be colored black by the insert algorithm, so it won't actually matter what color its (non-existent) parent might purportedly have: red or black, it can't violate the local invariant. If t is a leaf, then it likewise won't matter what its parent color is, and its black height must be zero.
Inductive is_redblack : treecolornatProp :=
 | IsRB_leaf: c, is_redblack E c 0
 | IsRB_r: l k v r n,
          is_redblack l Red n
          is_redblack r Red n
          is_redblack (T Red l k v r) Black n
 | IsRB_b: c l k v r n,
          is_redblack l Black n
          is_redblack r Black n
          is_redblack (T Black l k v r) c (S n).
To satisfy the red-black invariant, then, is to satisfy is_redblack for some parent color and black height.
Definition rb (t : tree) : Prop :=
  c n, is_redblack t c n.

Empty and RB

The empty tree satisfies rb.

Exercise: 1 star, standard (empty_rb)

Theorem empty_rb : rb empty_tree.
Proof. (* FILL IN HERE *) Admitted.

Insert and RB

Our final goal is to prove that insertion preserves the red-black invariants. That will take us several lemmas to complete.

Exercise: 1 star, standard (is_redblack_toblack)

Lemma is_redblack_toblack : s n,
    is_redblack s Red nis_redblack s Black n.
Proof.
(* FILL IN HERE *) Admitted.

Exercise: 2 stars, standard (is_redblack_makeblack)

Lemma is_redblack_makeblack : s c n,
    is_redblack s Black n
    → n, is_redblack (makeBlack s) c n.
Proof.
(* FILL IN HERE *) Admitted.
nearly_redblack expresses, "the tree is a red-black tree, except that it's nonempty and it is permitted to have two consecutive red nodes at the root."
Inductive nearly_redblack : treenatProp :=
| nrRB_r: tl k kv tr n,
         is_redblack tl Black n
         is_redblack tr Black n
         nearly_redblack (T Red tl k kv tr) n
| nrRB_b: tl k kv tr n,
         is_redblack tl Black n
         is_redblack tr Black n
         nearly_redblack (T Black tl k kv tr) (S n).

Exercise: 4 stars, standard (ins_is_redblack)

You will need proof automation, in a similar style to the proofs of ins_not_E and balance_relate. This is challenging; don't be afraid to skip over it and come back.
Lemma ins_is_redblack:
  x vx s n,
    (is_redblack s Black nnearly_redblack (ins x vx s) n) ∧
    (is_redblack s Red nis_redblack (ins x vx s) Black n).
Proof.
induction s; intro n; simpl; split;
  intros; inv H; repeat constructor; auto.
*
destruct (IHs1 n); clear IHs1.
destruct (IHs2 n); clear IHs2.
specialize (H0 H6).
specialize (H2 H7).
clear H H1.
unfold balance.

(* FILL IN HERE *) Admitted.
Corollary ins_red :
  t k v n,
    (is_redblack t Red nis_redblack (ins k v t) Black n).
Proof.
  intros. apply ins_is_redblack. assumption.
Qed.

Corollary ins_black :
  t k v n,
    (is_redblack t Black nnearly_redblack (ins k v t) n).
Proof.
  intros. apply ins_is_redblack. assumption.
Qed.

Exercise: 3 stars, standard (insert_is_redblack)

You will find ins_red, ins_black, and is_redblack_makeblack useful here.
Lemma insert_is_redblack: k v t c n,
    is_redblack t c n
    c' n',
      is_redblack (insert k v t) c' n'.
Proof.
(* FILL IN HERE *) Admitted.

Exercise: 2 stars, standard (insert_rb)

Theorem insert_rb: k v t,
    rb trb (insert k v t).
Proof. (* FILL IN HERE *) Admitted.
End TREES.

Extracting and Measuring Red-Black Trees


Extraction "redblack.ml" empty_tree insert lookup elements.
You can run this inside the OCaml top level by:
#use "redblack.ml";;
#use "test_searchtree.ml";;
On my machine, in the byte-code interpreter this prints,
Insert and lookup 1000000 random integers in 0.889 seconds.
Insert and lookup 20000 random integers in 0.016 seconds.
Insert and lookup 20000 consecutive integers in 0.015 seconds.
You can compile and run this with the OCaml native-code compiler by:
$ ocamlopt -c redblack.mli redblack.ml
$ ocamlopt redblack.cmx -open Redblack test_searchtree.ml -o test_redblack
$ ./test_redblack
On my machine this prints,
Insert and lookup 1000000 random integers in 0.436 seconds.
Insert and lookup 20000 random integers in 0. seconds.
Insert and lookup 20000 consecutive integers in 0. seconds.

Success!

The benchmark measurements above (and in Extract) demonstrate that:
  • On random insertions, red-black trees are slightly faster than ordinary BSTs (red-black 0.436 seconds, vs ordinary 0.468 seconds)
  • On consecutive insertions, red-black trees are much faster than ordinary BSTs (red-black 0. seconds, vs ordinary 0.374 seconds)
In particular, red-black trees are almost exactly as fast on the consecutive insertions (0.015 seconds) as on the random (0.016 seconds).