Lecture 24: Pumping lemma

Reading: Pass and Tseng, Limits of Automata, MCS 15.8 The pigeonhole principle
last semester's notes
Proof that $\{0^n1^n \mid n \in \mathbb{N}\}$ is unrecognizable
Pumping lemma
Review exercises:
- use the pumping lemma to prove that the set of strings of balanced parentheses is not recognizable
- prove the pumping lemma

A non-recognizable set

Let $L = \{0^k1^k \mid k \in \mathbb{N}\} = \{ε, 01, 0011, 000111, \dots\}$ .

Claim: $L$ is not recognizable.

Proof: by contradiction. Suppose $L$ were recognizable. Then there is some $M$ with $L = L(M)$ . Let $n$ be the number of states of $M$ , and let $x = 0^n1^n$ . Clearly $x \in L$ , so $M$ must accept $x$ .

Let's consider what happens while $M$ is processing $x$ . While processing the first $n$ characters, $M$ must pass through $n+1$ states $q_0$ , $q_1$ , , $q_n$ . Since there are only $n$ states to choose from, two of these states must be the same: there is a loop; $q_i = q_j$ for some $i \lt j \leq n$ .

Let $u$ be the part of $x$ that transitions from $q_0$ to $q_i$ ; $v$ be the part that transitions from $q_i$ to $q_j$ , and let $w$ be the part that transititons from $q_j$ to $q_n$ (which remember, is a final state). Note that since the loop happens within the first $n$ characters, $u$ and $v$ can consist only of 0's.

Now consider what happens if we plug the string $uvvw$ into $M$ . $M$ will transition to $q_i$ , and then go around the loop twice, ending up back at $q_j$ . It will then process $w$ , taking it from $q_j$ to $q_n$ , where it will be accepted. Therefore $uvvw \in L(M)$ .

However, since $v$ consisted of one or more 0s, $uvvw$ has more 0's than 1's, so $uvvw \notin L$ . This contradicts the assumption that $L(M) = L$ , completing the proof.

The pumping lemma

This same argument can be applied to many languages, and can be generalized into the so-called "pumping lemma":

Claim (pumping lemma): If $L$ is a DFA-recognizable language, then there exists some $n$ (often called the pumping length), such that for all $x \in L$ with $len(x) \geq n$ , there exists strings $u$ , $v$ , and $w$ such that

$x = uvw$ ,
$len(uv) \leq n$ ,
$len(v) > 0$ , and
for all $k \geq 0$ , $uv^kw \in L$ .

The proof is just like the proof above; we give it below.

This lemma is used to prove that languages are not DFA-recognizable. For example, we can use it to rewrite the proof above:

Claim: $L = \{0^n1^n \mid n \in \mathbb{N}\}$ is not DFA-recognizable.

Proof: by contradiction, assume that $L$ is DFA-recognizable. Then there exists some $n$ as in the pumping lemma. Let $x = 0^n1^n$ . Clearly $x \in L$ and $len(x) \geq n$ , so we can write $x$ as $uvw$ as in the pumping lemma. Since $len(uv) \leq n$ , $v$ can only consist of 0's (the first $n$ characters of $x$ are 0's). It must have at least one 0, since $len(v) > 0$ . The pumping lemma tells us that $uv^2w \in L$ , but this is a contradiction, because $uv^2w$ has more 0's than 1's. Therefore $L$ is not regular.

Here is another example:

Claim: Let $L$ be the set of strings of digits and the symbols $+$ and $=$ that represent equations that are true. For example, " $1+1=2$ " is in $L$ , while " $3+5=9$ " is not. $L$ is not recognizable.

Proof: by contradiction, assume that $L$ is DFA-recognizable. Then there exists some $n$ as in the pumping lemma. Let $x = "1^n+0=1^n"$ . Clearly $x \in L$ and $len(x) \geq n$ , so we can write $x$ as $uvw$ as in the pumping lemma. Since $len(uv) \leq n$ , $v$ can only consist of 1's (the first $n$ characters of $x$ are 1's). It must have at least one 1, since $len(v) > 0$ . The pumping lemma tells us that $uv^0w = uw \in L$ , but this is a contradiction, because $uw$ has a smaller number on the left hand side of the equation than on the right side, and therefore is not in $L$ . Thus, $L$ is not DFA-recognizable.

Proof of the pumping lemma: This proof is almost the same as the special case given above. Assume $L$ is DFA-recognizable. Then there is some machine $M$ that recognizes $L$ . Let $n$ be the number of states of $M$ . Now, if $x$ is an arbitrary string in $L$ with length greater than or equal to $M$ , then while processing the first $n$ characters, $M$ must traverse the some state $q$ at least twice.

Let $u$ be the portion of $x$ that transitions $M$ from the start state to $q$ . Let $v$ be the portion of $x$ that transitions from $q$ back to $q$ , and let $w$ be the remainder of $x$ ; $w$ transitions $M$ from $q$ to some final state (since $x \in L$ , $\hat{δ}(q_0,uvw)$ must be a final state).

Clearly $x = uvw$ . $len(uv) \leq n$ since the loop must occur within the first $n$ characters of $x$ . $len(v) \gt 0$ because otherwise the loop is not a loop. Finally, while processing $uv^kw$ , $M$ transitions to $q$ on $u$ , then back to $q$ on each iteration of $v$ , and finally from $q$ to an accepting state on $w$ , and thus $M$ accepts $uv^kw$ . Therefore $uv^kw \in L(M) = L$ , completing the proof.