CMPS 260 Spring 2024
HW #5: Context-free Grammars/Languages; Chomsky Normal Form
Sample Solutions

1. Let G1 be the following context-free grammar:

S aS  |  aHb    (1) (2)
H aaHb  |  a    (3) (4)

(a) Offer a precise description of L(G1), the language generated by G1.

Solution: L(G1) = { akbn  |  n≥1, k≥2n }

(b) Provide some evidence for the claim that G1 is unambiguous. To do so, choose a "generic" member of L(G1), describe its derivation from S, and argue that there is no other way to derive it from S.

Solution: By observing the productions of the grammar, it is clear, with respect to any derivation from the start symbol S, that

Thus, every derivation of a terminal string from the start symbol S must have this form, for some m≥0 and n≥1:

Summarizing, the derivation is

S ⟹(1;m) amS ⟹(2;1) amaHb ⟹(3;n-1) am+2n-1Hbn(4;1) am+2nbn

The ⟹(j,k) notation indicates k applications of production (j).

If, rather than applying (1) m times and (3) n−1 times, we instead applied (1) m' times and (3) n'−1 times, where either m'≠m or n'≠n (or both), the terminal string produced would be x' = am'+2n'bn', which is distinct from x = am+2nbn. (Reason: If n'≠n, then the number of occurrences of b in x and x' differ. Meanwhile, if n'=n but m'≠m, then the number of occurrences of a in x and x' differ.)

It follows that for any string generated by G1, there is a unique derivation by which to produce it. Hence, G1 is unambiguous.

(c) Present an alternative CFG G'1 that generates the same language as G1 but whose only nonterminal symbol is S. Demonstrate that your grammar is ambiguous.

Solution: A grammar that generates L(G1) but has only one nonterminal symbol is as follows:

S ⟶ aS | aaSb | aab   (1) (2) (3)

To produce am+2nbn, where m≥0 and n≥1, one could apply production (1) m times and production (2) n−1 times, in any order, followed by an application of (3). For any values of m and n satisfying m>0 ∧ n>1, there are multiple derivations of that string!

Note: A context-free grammar is ambiguous iff there are distinct derivation trees both yielding the same terminal string. For every grammar, there is a one-to-one correspondence between its derivation trees and its leftmost derivations. For linear grammars, such as those described in this problem (including the solution to part (c)), every derivation is a leftmost derivation. Hence, for linear grammars, the question of ambiguity reduces to the question of whether any string has distinct derivations.


2. Let G2 be the following context-free grammar:

S HS  |  K    (1) (2)
H aHb  |  c    (3) (4)
K cK  |  c    (5) (6)

Give a precise description of L(G2), the language generated by G2.

Solution:

L(G2) = { (aik c bik)* c+   |   k = 0,1,2,...,n-1; n≥0, ik≥0 for all k }

The star/asterisk superscript means, in effect, zero or more occurrences, just as in regular expressions.


3. Present a context-free grammar G3 that generates the language { aj bk c2k bj  |  j≥1, k≥0 }

Small Bonus: Present a context-free grammar G'3 that generates the language { aj bk c2k d* bj  |  j≥1, k≥0 }.

The star/asterisk superscript means, in effect, zero or more occurrences, just as in regular expressions.

Solution:
S ⟶aSb  |  aHb   (1) (2)
H ⟶bHcc  |  λ   (3) (4)
S ⟶aSb  |  aHDb   (1) (2)
H ⟶bHcc  |  λ   (3) (4)
D ⟶dD  |  λ   (5) (6)
G3G'3 (bonus)


4. Present a context-free grammar G4 that generates the language { aj bj+k ak  |  j≥1, k≥0 }

Small Bonus: Present a context-free grammar G'4 that generates the language { aj c* bj+k ak  |  j≥1, k≥0 }

The star/asterisk superscript means, in effect, zero or more occurrences, just as in regular expressions.

Solution: The small bit of insight that makes this problem easy is to recognize that the language described is just the same as { ajbjbkak  |  j≥1, k≥0 }. For the bonus, the language is { ajc*bjbkak  |  j≥1, k≥0 }

S ⟶HK  (1)
H ⟶aHb  |  ab  (2) (3)
K ⟶bKa  |  λ  (4) (5)
S ⟶HK  (1)
H ⟶aHb  |  aCb  (2) (3)
K ⟶bKa  |  λ  (4) (5)
C ⟶cC  |  λ  (6) (7)
G4G'4 (bonus)


5. Consider the following Chomsky Normal Form grammar G5:

S HS  |  a    (1) (2)
H KH  |  b    (3) (4)
K KS  |  b    (5) (6)

For each of the strings bbaba and babab, use the CYK algorithm to determine whether it is generated by G5.

Specifically, for each string, fill in the cells of the matrix pictured below so that the cell in row i and column j contains Vi,j = { X ∈ {S,H,K} : X ⟹+ wi,j }, where wi,j is the substring of w (i.e., the string of interest) beginning with its i-th symbol and ending with its j-th symbol (inclusive). (Your answers are expected to include not only the answer to the question "Is w ∈ L(G)?" but also the correctly filled in table.)

Recall that, for i satisfying 1≤i≤|w|, X ∈ Vi,i iff X → wi,i is a production in the grammar. Meanwhile, for i and j satisfying 1≤i<j≤|w|, X ∈ Vi,j iff there exists k, where i≤k<j, and nonterminals Y and Z such that Y ∈ Vi,k, Z ∈ Vk+1,j, and X → YZ is a production in the grammar.

Solution:

    1       2       3      4      5
+-------+-------+-------+-------+-------+
|  H,K  |  H    |  S,K  |   H   |  S,K  | 1
|  (b)  | (bb)  | (bba) |(bbab) |(bbaba)|
+-------+-------+-------+-------+-------+
        |  H,K  |  S,K  |   H   |  S,K  | 2
        |  (b)  | (ba)  | (bab) |(baba) |
        +-------+-------+-------+-------+
                |   S   |   -   |   -   | 3
                |  (a)  |  (ab) | (aba) |
                +-------+-------+-------+
                        |  H,K  |  S,K  | 4
                        |  (b)  | (ba)  |
                        +-------+-------+
                                |   S   | 5
                                |  (a)  |
                                +-------+

   1       2       3       4       5
+-------+-------+-------+-------+-------+
|  H,K  |  S,K  |   H   |  K,S  |   H   | 1
|  (b)  | (ba)  | (bab) |(baba) |(babab)|
+-------+-------+-------+-------+-------+
        |   S   |   -   |   -   |   -   | 2
        |  (a)  | (ab)  | (aba) |(abab) |
        +-------+-------+-------+-------+
                |  H,K  |  S,K  |   H   | 3
                |  (b)  | (ba)  | (bab) |
                +-------+-------+-------+
                        |   S   |   -   | 4
                        |  (a)  |  (ab) |
                        +-------+-------+
                                |  H,K  | 5
                                |  (b)  |
                                +-------+
for bbabafor babab

The (1,5) cell in the matrix for bbaba includes S (the start symbol) and thus bbaba ∈ L(G5). However, the (1,5) cell in the matrix for babab does not include S and thus babab ∉ L(G5).


6. Use the standard construction to obtain a Chomsky Normal Form grammar that is equivalent to (i.e., generates the same language as) the context-free grammar shown below. Present the resultant CNF grammar.

S bHaH  |  c
H SaH | baa

Solution:

S b'VHaH  |  c
VHaH HVaH
VaH a'H
H SVaH | bVaa
VaH a'H
Vaa a'a'
a'a
b'b


7. A linear grammar is a CFG in which every production's right-hand side contains at most one nonterminal symbol. A right-linear grammar is a linear grammar in which any such occurrence of a nonterminal symbol must be the last (i.e., rightmost) symbol of that right-hand side.

(a) Describe a construction that, given a right-linear grammar GR and a linear grammar GLin, produces a linear grammar G such that L(G) = L(GR) · L(GLin).

Solution: Because nonterminal symbols can be renamed, without loss of generality we can assume that no nonterminal symbol appears in both grammars. We can also assume that SR is the start symbol of GR and SLin is the start symbol of GLin.

Take G to be the grammar whose start symbol is SR and whose productions include all those in the two grammars, except that any production in GR of the form A ⟶ x, where x is a terminal string, is replaced by A ⟶ xSLin.

(b) Apply your construction to these particular grammars and show the resulting linear grammar.

SRaaK | ca
H ⟶cK | λ
K ⟶abSR | bH
SLincaAdc | Ab
A ⟶dcSLind | a
GR GLin

Solution: The constructed grammar (with start symbol SR and with the right-hand sides of modified productions in red) is

SR aaK | caSLin
H ⟶cK | SLin
K ⟶abSR | bH
SLincaAdc | Ab
A ⟶dcSLind | a