Processing math: 3%

Lecture 21: Structural induction

Idea behind structural induction

Consider the definition x \in Σ^* ::= ε \mid xa. I will refer to x ::= ε as "rule 1" and x ::= xa as "rule 2". This definition says that there are two kinds of strings: empty strings (formed using rule 1), and strings of the form xa, where x is a smaller string (formed using rule 2); these are the only kinds of strings.

If we want to prove that property P holds on all strings (i.e. ∀x \in Σ^*, P(x)), we can do it by giving a proof for strings formed using rule 1 (let's call it proof 1), and another proof for strings formed using rule 2 (let's call it proof 2). In the second proof, we may assume that P(y) holds.

Why can we make this assumption? Suppose we have some complicated string, like εabc, and we want to conclude P(εabc). We build the string εabc by snapping together smaller strings using rules 1 and 2; we can imagine building a proof of P(εabc) by snapping together smaller proofs using proofs 1 and 2.

To show that εabc is a string, we first use rule 1 to show that ε is a string, then rule 2 to show that εa is a string (this assumes that ε is a string, but we just argued it was), and then rule 2 again to show that εab is a string (using the fact that εa is a string), and finally use rule 2 a third time to show that εabc is a string.

Similarly, we can use proof 1 to show that P(ε) holds, then use proof 2 to show that P(εa) holds (this assumes that P(ε) holds, but we just argued it does), and then use proof 2 again to show that P(εab) holds (using the fact that P(εa) holds), and finally use proof 2 a third time to show that P(εabc) holds.

In general, any element of an inductively defined set is built up by applying the rules defining the set, so if you provide a proof for each rule, you have given a proof for every element. Before you can build a complex structure, you have to build the parts, so while building the proof that some property holds on a complex structure, you can assume that you have already proved it for the subparts.

Structural induction step by step

In general, if an inductive set X is defined by a set of rules (rule 1, rule 2, etc.), then we can prove ∀x \in X, P(X) by giving a separate proof of P(x) for x formed by each of the rules. In the cases where the rule recursively uses elements y_1, y_2, \dots of the set being defined, we can assume P(y_1), P(y_2), \dots.

Example structures:

Example proof

Recall Σ^* is defined by x \in Σ^* ::= ε \mid xa and len : Σ^* → \N is given by len(ε) ::= 0 and len(xa) ::= 1 + len(x).

Claim: For all x \in Σ^*, len(x) \geq 0 Proof: By induction on the structure of x. Let P(x) be the statement "len(x) \geq 0". We must prove P(ε), and P(xa) assuming P(x).

P(ε) case: we want to show len(ε) \geq 0. Well, by definition, P(ε) = 0 \geq 0.

P(xa) case: assume P(x). That is, len(x) \geq 0. We wish to show P(xa), i.e. that len(xa) \geq 0. Well, len(xa) = 1 + len(x) \geq 1 + 0 = 1.

Proofs on pairs

Often, we want to prove something about all pairs x and y, where x and y are both in an inductively defined set X. Pairs of elements of X are formed by pairs of rules of X, so one can give a proof for each pair of rules. For example, to prove ∀x,y \in Σ^*, len(cat(x,y)) = len(x) + len(y), you can give a proof for the case where x and y are both ε, a proof for the case when x = ε and y is of the form zc, a proof for the case when x = zc and y = ε, and a proof for the case where x = zc and y = wd.

What inductive assumptions can be made in these cases? You can inductively assume that P holds on any pair that is formed from a subpiece of x and a subpiece of y, and at least one of those subpieces needs to be smaller. For example, while proving P(zc,wd), you can assume P(z,wd), you can assume P(zc,w), and you can assume P(z,w). You can't assume P(zc,wd) (since that's what you're trying to prove). You can't assume P(c,d), because that doesn't even make sense: c and d are elements of Σ not Σ^*, and P is a property of pairs of strings, not pairs of characters. You can't assume P(εc, wd) because εc is not a subpiece of zc. You can't assume P(cat(z,w),w) because cat(z,w) is not a substructure of zc. You shouldn't assume P(w,z), although this can be justified using more advanced techniques.

Here is an example:

Claim: for all x and y in Σ^*, len(cat(x,y)) = len(x) + len(y).

Proof: Recall len(ε) ::= 0 and len(xa) ::= 1 + len(x). Recall also that cat(ε,ε) ::= ε, cat(ε,xa) ::= xa, cat(xa, ε) ::= xa and cat(xa, yb) ::= cat(xa,t)b.

We proceed by induction on the structure of x and y. Let P(x,y) be the statement len(cat(x,y)) = len(x) + len(y).

P(ε,ε) case: we want to show len(cat(ε,ε)) = len(ε) + len(ε). By definition, the left hand side is len(ε) = 0, and the right hand side is 0 + 0 = 0.

P(ε,xa) case: we want to show len(cat(ε,xa)) = len(ε) + len(xa). By definition, cat(ε,xa) = xa, so $len(cat(ε,xa)) = len(xa). We also know len(ε) = 0, so the right hand side also simplifies to len(xa).

The P(xa,ε) case is symmetric to the P(ε,xa) case.

In the P(xa,yb) case, we want to show that len(cat(xa,yb)) = len(xa) + len(yb). We may assume P(xa,y), i.e. that len(cat(xa,y)) = len(xa) + len(y). Using this, we have \begin{aligned} len(cat(xa,yb)) &= len(cat(xa,y)b) && \text{by definition of $cat$} \\ &= 1 + len(cat(xa,y)) && \text{by definition of $len$} \\ &= 1 + len(xa) + len(y) = len(xa) + (len(y) + 1) && \text{by inductive assumption} \\ &= len(xa) + len(yb) && \text{by definition of $len$} \end{aligned}

This concludes the proof.

Note that the structure of this proof very closely follows the structure of the function we were proving something about. In this case, we were proving a property of the cat function; cat(xa,yb) was defined in terms of cat(xa,y), and in the proof of P(xa,yb), we had to use the assumption P(xa,y). This is a common occurrence in proofs by structural induction.