CMPS 260: CFL Pumping Lemma

Just as there is a pumping lemma for regular languages, there is one for context-free languages. The lemma describes a necessary condition for a language to be a CFL; hence, if we show that a given language fails to satisfy the condition, we have shown that it is not a CFL.

Here is the idea: For any CFG G, there is some constant m such that if w ∈ L(G) and |w|≥m, any G-derivation tree whose yield is w must include two nodes with the same label (i.e., nonterminal symbol) such that one is a descendant of the other. In the figure below, we depict such a derivation tree in which two nodes labeled with the nonterminal symbol A occur on the same path from the root. What the figure attempts to convey is that the yield of the subtree rooted at the "lower" occurrence of A is x, while the yield of the subtree rooted at the "upper" occurrence of A is vxy. The yield of the entire tree is uvxyz.

Consider the derivation tree obtained by taking the one shown above and replacing the subtree rooted at the upper occurrence of A by the subtree rooted at the lower occurrence of A. The yield of that tree is uxz (as the v and y "disappear").

Now consider the derivation tree obtained by replacing the subtree rooted at the lower occurrence of A by the subtree rooted at the upper occurrence of A. The yield of that tree is uvvxyyz. If we made the same replacement again, we'd get a tree whose yield is uvvvxyyyz. Repeating the same argument again and again, we see that there are derivation trees having yields uvixyiz for all i≥0.

The reasoning above essentially provides a proof for the following:

Pumping Lemma for Context-free Languages: If G is a CFG, where L = L(G), then there is a positive integer m (that depends upon G) such that if w∈L and |w|≥m, then there exist strings u, v, x, y, and z satisfying these properties:

  1. w = uvxyz
  2. |vxy| ≤ m
  3. |vy| > 0
  4. for all i ≥ 0, uvixyiz ∈ L

The "canonical" non-context free language is {anbncn : n ≥ 0}. We can prove it not to be a CFL using the Pumping Lemma.

Letting m be the constant mentioned in the lemma, choose w = ambmcm. Now we consider every possible way of factoring w into u, v, x, y, and z that satisfies the first three conditions listed in the lemma. For each of those ways, we show that the fourth condition does not hold.

Case 1: Either v or y contains two different "kinds" of symbols (e.g., a and b, or b and c). Then pumping (i.e., taking i to be 2 or greater) results in a string in which one of the substrings ba or cb occurs. (For example, suppose that v = a4b9. Then v2 = a4b9a4b9, which has ba as a substring.) No such string is a member of L.

Case 2: Each of v and y contains at most one kind of symbol. (Remember that one of them, but not both, can be the empty string.) But then pumping (i.e., taking i to be 2) increases the number of occurrences of at most two kinds of symbols (e.g., a and b) while leaving the number of occurrences of the third kind of symbol (e.g., c) the same. The resulting string is thus not in L.