Step 1(a): For each symbol t in the grammar's terminal alphabet, introduce a new nonterminal t' and give it the production t' → t.
Step 1(b): Except for productions whose right-hand sides are single terminal symbols (such as those introduced in Step 1(a)), replace every occurrence of a terminal symbol t by the corresponding nonterminal t' introduced in Step 1.
The result of this step will be that every production's right-hand side is either a single terminal symbol or a string of nonterminal symbols of length two or more.
Step 2:
Repeat the following until it no longer applies:
For each production A → Bβ, where β is a string
of nonterminals of length two or more, replace it by the pair of
productions A → BVβ and
Vβ → β, where Vβ is a new
nonterminal. (If β has length greater than two, it
will be replaced by subsequent applications of this rule.)
For example, the production A → BCDE would be replaced, after two iterations of Step 2, by this set of productions:
A | → | BVCDE |
VCDE | → | CVDE |
VDE | → | DE |
Of course, the names of newly-introduced nontermals is arbitrary, but, for purposes of uniformity, we are using the convention that every newly-introduced nonterminal symbol is named Vα, where the intent is for newly-introduced productions to allow for the derivation Vα ⟹+ α.
To illustrate the algorithm, we transform the CFG shown below into an equivalent CFG in Chomsky Normal Form. (As required, the given grammar has no λ- or unit-productions.)
S | ⟶ | aS | Sb | aAbA | (1) (2) (3) |
A | ⟶ | ASbA | ab | (4) (5) |
Step 1(a): The given grammar's terminal alphabet is {a,b}, so we introduce nonterminals a' and b' and productions a' → a and b' → b.
Step 1(b): Replacing each occurrence of a (respectively, b) in the productions of the given grammar by a' (resp., b'), we end up with
S | ⟶ | a'S | Sb' | a'Ab'A | (1') (2') (3') |
A | ⟶ | ASb'A | a'b' | (4') (5') |
a' | ⟶ | a | |
b' | ⟶ | b |
Step 2: Productions (3') and (4') have right-hand sides that are longer than two, and thus we must replace them. Doing so, we end up with the following grammar:
S | ⟶ | a'S | Sb' | a'VAb'A |
A | ⟶ | AVSb'A | a'b' |
a' | ⟶ | a |
b' | ⟶ | b |
VAb'A | ⟶ | AVb'A |
VSb'A | ⟶ | SVb'A |
Vb'A | ⟶ | b'A |