Processing math: 0%

Lecture 24: Pumping lemma

A non-recognizable set

Let L = \{0^k1^k \mid k \in \mathbb{N}\} = \{ε, 01, 0011, 000111, \dots\}.

Claim: L is not recognizable.

Proof: by contradiction. Suppose L were recognizable. Then there is some M with L = L(M). Let n be the number of states of M, and let x = 0^n1^n. Clearly x \in L, so M must accept x.

Let's consider what happens while M is processing x. While processing the first n characters, M must pass through n+1 states q_0, q_1, , q_n. Since there are only n states to choose from, two of these states must be the same: there is a loop; q_i = q_j for some i \lt j \leq n.

Let u be the part of x that transitions from q_0 to q_i; v be the part that transitions from q_i to q_j, and let w be the part that transititons from q_j to q_n (which remember, is a final state). Note that since the loop happens within the first n characters, u and v can consist only of 0's.

Now consider what happens if we plug the string uvvw into M. M will transition to q_i, and then go around the loop twice, ending up back at q_j. It will then process w, taking it from q_j to q_n, where it will be accepted. Therefore uvvw \in L(M).

However, since v consisted of one or more 0s, uvvw has more 0's than 1's, so uvvw \notin L. This contradicts the assumption that L(M) = L, completing the proof.

The pumping lemma

This same argument can be applied to many languages, and can be generalized into the so-called "pumping lemma":

Claim (pumping lemma): If L is a DFA-recognizable language, then there exists some n (often called the pumping length), such that for all x \in L with len(x) \geq n, there exists strings u, v, and w such that

  1. x = uvw,
  2. len(uv) \leq n,
  3. len(v) > 0, and
  4. for all k \geq 0, uv^kw \in L.

The proof is just like the proof above; we give it below.

This lemma is used to prove that languages are not DFA-recognizable. For example, we can use it to rewrite the proof above:

Claim: L = \{0^n1^n \mid n \in \mathbb{N}\} is not DFA-recognizable.

Proof: by contradiction, assume that L is DFA-recognizable. Then there exists some n as in the pumping lemma. Let x = 0^n1^n. Clearly x \in L and len(x) \geq n, so we can write x as uvw as in the pumping lemma. Since len(uv) \leq n, v can only consist of 0's (the first n characters of x are 0's). It must have at least one 0, since len(v) > 0. The pumping lemma tells us that uv^2w \in L, but this is a contradiction, because uv^2w has more 0's than 1's. Therefore L is not regular.

Here is another example:

Claim: Let L be the set of strings of digits and the symbols + and = that represent equations that are true. For example, "1+1=2" is in L, while "3+5=9" is not. L is not recognizable.

Proof: by contradiction, assume that L is DFA-recognizable. Then there exists some n as in the pumping lemma. Let x = "1^n+0=1^n". Clearly x \in L and len(x) \geq n, so we can write x as uvw as in the pumping lemma. Since len(uv) \leq n, v can only consist of 1's (the first n characters of x are 1's). It must have at least one 1, since len(v) > 0. The pumping lemma tells us that uv^0w = uw \in L, but this is a contradiction, because uw has a smaller number on the left hand side of the equation than on the right side, and therefore is not in L. Thus, L is not DFA-recognizable.

Proof of the pumping lemma: This proof is almost the same as the special case given above. Assume L is DFA-recognizable. Then there is some machine M that recognizes L. Let n be the number of states of M. Now, if x is an arbitrary string in L with length greater than or equal to M, then while processing the first n characters, M must traverse the some state q at least twice.

Let u be the portion of x that transitions M from the start state to q. Let v be the portion of x that transitions from q back to q, and let w be the remainder of x; w transitions M from q to some final state (since x \in L, \hat{δ}(q_0,uvw) must be a final state).

Clearly x = uvw. len(uv) \leq n since the loop must occur within the first n characters of x. len(v) \gt 0 because otherwise the loop is not a loop. Finally, while processing uv^kw, M transitions to q on u, then back to q on each iteration of v, and finally from q to an accepting state on w, and thus M accepts uv^kw. Therefore uv^kw \in L(M) = L, completing the proof.