A Calculational Proof of Andrew's Challenge
David Gries
Computer Science
Cornell University
Ithaca, NY 14853
August 1966
At the Marktoberdorf summer school in August 1996, Larry Paulson lectured on his mechanical theorem prover, Isabelle; Natarajan Shankar lectured on his mechanical theorem prover, PVS; and I lectured on calculational logic. Both Paulson and Shankar suggested that I try the calculational approach on Andrew's challenge, which is one of several difficult theorems used to benchmark mechanical theorem provers. Andrew's challenge is to prove the following theorem.
((E x A y |: p.x == p.y) ==
((E x |: q.x) == (A y |: p.y))) ==
((E x A y |: q.x == q.y) ==
((E x |: p.x) == (A y |: q.y)))
Remark. We use the notation (A x |: P) instead of the more usual E y . P, for reasons explained in [1]. (E y |: P)) may be abbreviated as (A x E y |: P). Also, we use == for equality over the bools and = for equality over any type (including the bools). Our precedences are, beginning with the tightest, ~ (not), =, $or and $and, => (implication) and <= (consequence), == . Finally, in order to eliminate parentheses, we write p.x instead of p(x) for application of function p to variable x. End of remark.
In proving Andrew's challenge using the calculational approach, I use theorems given in the text \cite[1] (or in its as-yet-unpublished second edition). The Appendix contains theorems used here that may be unfamiliar to the reader.
Now, == is both associative and symmetric, so we can rewrite Andrew's challenge as
P == Qwhere P and Q are defined by the following.
P: (E x A y |: p.x == p.y) == (E x |: p.x) == (A y |: p.y)where it is assumed that this formula is closed (so p.x and q.x contain no free variables other than x).
Q: (E x A y |: q.x == q.y) == (E x |: q.x) == (A y |: q.y)
This form gives the impression that perhaps P is valid (or invalid), regardless of p. If this is the case, then Q is also valid (or invalid). Hence, we try to prove P.
We don't have many theorems that deal with == as they appear in P, so we try to prove P by mutual implication, proving instead
(2) ((E x A y |: p.x == p.y) == (E x |: p.x)) \;\From\; ( A y |: p.y)
(3) ((E x A y |: p.x == p.y) == (E x |: p.x)) \;\Imp\; ( A y |: p.y)
Proof of (2) Assume (A y |: p.y) (E x A y |: p.x == p.y) == (E x |: p.x) = < Assumption, instantiated with y:=x and with y:=y, so p.x == true and p.y == true > (E x A y |: true == true) == (E x |: true) = < Identity of == (5); (A y |: true) == true > (E x |: true) == (E x |: true) ---Reflexivity of == (6) Q.E.D
Proof of (3) (3)} = < Contrapositive, X => Y == ~ Y => ~ X} ~ (A y |: p.y) => ~ ((E xA y |: p.x == p.y) == (E x |: p.x)) = < De Morgan (12) on antecedent; ~(X==Y) == X == ~Y and De Morgan (11) on the consequent > (E y |: ~p.y) => ((E x< strong>A y |: p.x == p.y) == (A x |: ~p.x))
By Metatheorem Witness (13), the last formula is a theorem iff the following one is.
~p.z => ((E x A y |: p.x==p.y) == (A x |: ~p.x))We calculate:
Assume ~ p.z, so also p.z == \False (E x A y |: p.x==p.y) = < Lemma (4) ---heading to change p.x to p.z> (E x |: (A y |: p.x==p.y) and p.x==p.z) =(E x |: (A y |: p.z==p.y) and p.x==p.z) = < Lemma (4)> (E x A y |: p.z==p.y) = < Assumption p.z==false; false == X == ~X> (E x A y |: ~p.y) = < Provided x doesn't occur free in X, (E x |: X) == X> (A y |: ~p.y) Q.E.D.
(4) Lemma. (A x |: S.x) == (A x |: S.x) and S.t Proof (A x | true: S.x) = < Zero of or (7)> (A x | true or x=t: S.x) = < Range split (10)> (A x | true: S.x) and (A x | x=t: S.x) = < One-point rule (9)> (A x | true: S.x) and S.t Q.E.D.
(5) Identity of ==: true == Q == Q (6) Reflexivity of ==: P == P (7) Zero of or: P or true == true (8) Substitution: X=Y and E[V:=X] == X=Y and E[V:=Y] (9) One-point rule: Provided x does not occur free in E, (A x | x=E: P) = P[x:=E] (10) Range split: (A x | R or S: P) = (A x | R: P) and (A x | S: P) (11) De Morgan: (E x | R: P) == (A x | R: ~P) (12) De Morgan: (A x | R: P) == (E x | R: ~P) (13) Metatheorem Witness: Suppose z does not occur free in P, Q, and R. Then (E x | R: P) => Q is a theorem iff (R and P)[x:=z] => Q is a theorem