CMPS 260: CFL Closure Properties

The theorem IDs correspond to Linz.

Theorem 8.3: CFLs are closed under union, concatenation, and star closure.
Proof:
Union: Suppose that G₁ and G₂ are CFGs that generate languages L₁ and L₂, respectively. Without loss of generality, assume that their start symbols are S₁ and S₂, respectively, and that the two grammars have no nonterminal symbols in common. Form a new grammar G that includes all the productions of G₁ and G₂. Also include a new nonterminal, S, to serve as the start symbol of G. Its productions are S ⟶ S₁ | S₂.

Obviously, if S₁ ⟹* w in G₁, then S ⟹ S₁ ⟹* w in G. Similarly for derivations in G₂. Hence L(G₁) ∪ L(G₂) ⊆ L(G). As for inclusion the other way, it is clear that every derivation in G begins with either the step S ⟹ S₁ or the step S ⟹ S₂, so S ⟹* w implies that either S₁ ⟹* w or S₂ ⟹* w.

Concatenation: Follow the construction for union (see above), except in the last step make the new grammar's start symbol have only one production: S ⟶ S₁ S₂.

For every x ∈ L(G₁) and every y ∈ L(G₂), there is a leftmost derivation in G that looks like this:

S ⟹ S₁S₂ ⟹* xS₂ ⟹* xy

Hence every string in the concatenation of L(G₁) and L(G₂) is in L(G). Going in the other direction, every leftmost derivation in G looks like what is shown above, and thus every string derivable from S is in L(G₁)·L(G₂).

Kleene/Star Closure: Take a CFG G that generates CFL L. Suppose that its start symbol is S. Now a new CFG G' with start symbol S' having the productions S' ⟶ λ | SS'. It is left to the reader to supply a proof that L(G') = (L(G))^*.

Theorem 8.4: CFL's are closed under neither intersection nor complement.
Proof: Consider the CFL's {aⁿbⁿc^m | n,m ≥ 0} and {aⁿb^mc^m | n,m ≥ 0}. (We can devise CFG's for each of them, or PDA's, to convince ourselves that each one is a CFL.)

Their intersection is the canonical non-CFL {aⁿbⁿcⁿ | n ≥ 0}. It follows that CFL's are not closed under intersection.

As for complement, consider, again, the canonical non-CFL. Its complement is

{a^kb^mcⁿ | k≠m ∨ k≠n} ∪ L((a+b+c)^*(ba + ca + cb)(a+b+c)^*)

which is the union of two CFLs and hence a CFL. (Every regular language is a CFL, so the second operand of ∪ above is a CFL.)

A different way to prove the result is to show that, if the class of CFLs were closed under complement, then it would also be closed under intersection. Because we know that the consequent of that implication is false, it must be that the antecedant is false as well. (This is proof by contradiction: you show that a falsehood follows from the negation of what you want to prove.)

Assume, contrary to what we want to prove, that the class of CFL's is closed under complement, and Let L₁ and L₂ be arbitrary languages:

L₁ is a CFL ∧ L₂ is a CFL ⟹ < assumption that CFL's are closed under complement > L₁^c is a CFL ∧ L₂^c is a CFL ⟹ < CFL's are closed under union > L₁^c ∪ L₂^c is a CFL ⟹ < assumption that CFL's are closed under complement > (L₁^c ∪ L₂^c)^c is a CFL = < set theory: (A ∪ B)^c = A^c ∩ B^c > (L₁^c)^c ∩ (L₂^c)^c is a CFL = < set theory: A = (A^c)^c > L₁ ∩ L₂ is a CFL

Summarizing, the above is a proof by contradiction, which is based upon the logical tautology P ≡ (¬P ⇒ false). We assumed that CFL's were closed under complement, which is the negation of what we wanted to prove. From that assumption, we were able to show that the intersection of any two CFLs is a CFL, which contradicts a known result.

Decision Problems for CFL's

Theorem 8.6 There is an algorithm that determines whether a CFG provided to it as input generates the empty language.
Proof: Use the algorithm that identifies the useful (i.e., both fruitful and reachable-from-the-start-symbol) symbols in a CFG. The language it generates is empty iff the grammar's start symbol is not fruitful.

Theorem 8.7 There is an algorithm that determines whether a CFG provided to it as input generates an infinite language.
Proof: (sketch) Assuming that the grammar has no useless symbols, no unit productions (i.e., of the form A → B), and no λ-productions (all of which we can achieve algorithmically), answering this question boils down to whether the grammar has a "self-embedding" nonterminal, i.e., a nonterminal A such that A ⟹⁺ αAβ, where |αβ| > 0. This can be solved by determining whether there exists a cycle in the directed graph formed from the grammar by representing each nonterminal by a node and having an edge (A,B) iff the grammar has a production of the form A → αBβ (whose left-hand side is A and which includes B on the right-hand side).

Interestingly, unlike regular languages, there is no algorithm by which to determine, for a given CFG G, whether L(G) = Σ*. Examples of other undecidable problems regarding CFG's/CFL's:

Given two CFL's, is their intersection empty?
Given a CFG, is it an ambiguous grammar?
Given CFG's G₁ and G₂, is it the case that L(G₁) = L(G₂) (or, similarly, that L(G₁) ⊆ L(G₂))?

Very succinct proofs for the first two problems listed above are found here. Each proof relies on already knowing that the so-called Post Correspondence Problem is undecidable. The method of proof is called reduction. The idea is that Problem A reduces to Problem B if, given the ability to make use of an algorithm that solves B, it is possible to devise an algorithm that solves A. Suppose that Problem A is known to be undecidable. Then showing that A reduces to B proves that B, too, is undecidable.

The hardest part is in proving a "first" undecidable problem. Once you have such a problem, you can show other problems to be undecidable by the reduction technique outlined above.