HW #4 CMPS 260 Spring 2024 Sample Solutions

CMPS 260 Spring 2024
Homework #4: Regular Languages/Expressions & Finite Transducers
Sample Solutions

1. Let M = (Q, Σ δ, q_₀, F) be a DFA such that L = L(M) (i.e., M accepts language L).

Consider the language

L' = { x ∈ L | for all x,y ∈ Σ*, if x = yz and |z|>0 then y ∉ L }

In words, x is a member of L' if and only if x is a member of L but no proper prefix of x is a member of L.

Describe how to modify M so as to obtain a DFA M' that accepts L'.

Solution: To obtain M' from M, simply make every outgoing transition from every accepting state go to the dead state.

That L(M') = L' follows from the following two lemmas.

Lemma 1.1: L' ⊆ L(M').
Let x ∈ L'. By definition of L', this means that x ∈ L(M) but no proper prefix of x is in L(M). Which means that, in M, the sequence of transitions beginning at q₀ and spelling out x ends in an accepting state but otherwise does not involve any accepting states. (That is, the only accepting state along that path is the one at the end.) By the construction of M', the same sequence of transitions exists within it (because, for every nonaccepting state in M, all of its outgoing transitions are also found in M'). Hence, x ∈ L(M'). ■

Lemma 1.2: L(M') ⊆ L'.
Let x ∈ L(M'). Then, in M', the sequence of transitions beginning in q₀ and spelling out x ends in an accepting state. Now, every transition in that sequence emanates from a non-accepting state, because in M' all transitions emanating from accepting states go to the dead state. But all transitions emanating from non-accepting states in M' also exist in M. Hence, the same sequence of transitions exists in M, which means that M accepts x, too. Because none of the states along that path are accepting (except the one at the end), no proper prefix of x is accepted by M. It follows that x ∈ L'. ■

One "incorrect solution": A few students offered an answer that at first seems plausible but turns out to be flawed. Their construction was to obtain M' by changing to nonaccepting the status of every accepting state in M that can be reached by a nonempty sequence of transitions from some accepting state. Suppose, for example, that both x and xy are accepted by M, where |y| > 0. Then in M we have q₀ ⇝^x p ⇝^y r for some accepting states p and r. By the construction, r is nonaccepting in M', thereby correctly preventing M' from accepting xy. The flaw is that there could be a string z such that q₀ ⇝^z r (implying that z is accepted by M) but having no proper prefix that is accepted by M. Then we have z ∈ L' but z ∉ L(M') (the latter because state r is nonaccepting in M').

2. Let M and N be a pair of DFAs with the same input alphabet Σ. Let their state sets be, respectively, Q_M and Q_N, and let their sets of accepting states be, respectively, F_M and F_N. Suppose that MN is the "trimmed" DFA obtained by applying the Cartesian Product construction to M and N. (For MN to be trimmed means that its state set is that subset of Q_M × Q_N containing only those states reachable from its start state.)

(a) Describe how MN can be used to determine whether or not L(M) = L(N).

(b) Describe how MN can be used to determine whether or not L(M) ⊆ L(N).

Solution: For part (a), L(M) = L(N) iff for every state [p,q] in MN, p ∈ F_M iff q ∈ F_N. That is, either both p and q are accepting or both are non-accepting (in their respective DFAs).

For part (b), L(M) ⊆ L(N) iff for every state [p,q] in MN, p ∈ F_M implies q ∈ F_N. That is, if p is accepting (in M), then q must be accepting (in N).

3. Present a finite transducer that, given a bit string x as input, produces as output the bit string obtained by compressing every run in x to a run having length equal to the minimum of two and the length of that run. (A run is a substring of repeated symbols, extending to the left and right until reaching a different symbol or a string boundary.)

For example, if the input were 00001011111001 (i.e., 0⁴101⁵0²1), the resulting output would be 0²101²0²1. Notice how the runs of lengths four and five were truncated to runs of length two.

Solution:

4. Present a finite transducer that performs addition on a pair of binary numerals written "backwards". For this purpose, the input alphabet will be Σ ∪ {#}, where Σ = { [0,0], [0,1], [1,0], [1,1]}. The output alphabet is {0,1}.

A string x#, where x ∈ Σ⁺, is intended to represent an instance of the binary addition problem, but with the bits written from least significant to most significant. The figure below shows an instance of a binary addition problem (on the left) and how that instance corresponds to the input string received and output string produced by the intended finite transducer (on the right).

101101 + 000111 ------ 110100
Input: [1,1][0,1][1,1][1,0][0,0][1,0]#
Output: 001011

Solution: The machine will be in state 0 (respectively, 1) when the carry coming into the "current" column is zero (respectively one). The # is an "end-of-input" marker, which is significant here because if processing the last pair of bits results in the machine being in the "carry in is 1" state, then to produce the correct answer it is necessary for the machine to output 1 before the computation terminates.

5. Present a regular expression r such that the language that it describes, L(r), is the complement of the language L, where

L = { 0^2m+1 1ⁿ | m≥0 ∧ n≥1 }.

L contains precisely those bit strings beginning with an odd number of 0's followed by at least one 1 and is described by the regular expression 0(00)^*11^*.

Solution #1: To solve such a problem, we should identify every kind of way that a string over {0,1} could fail to be a member of L. For each such kind of "failure", we should devise a regular expression that describes all the strings of that kind. The solution would then be the regular expression formed by taking the union of (i.e., applying the + operator upon) the regular expressions just mentioned.

Now, the members of L are those bit strings that are composed of an odd-length sequence of 0's followed by a non-empty sequence of 1's. A comprehensive list of bit strings that are "non-conforming" is

bit strings having no occurrences of 1. A regular expression for these is R₁ = 0^*.
bit strings having 10 as a substring. A regular expression for these is R₂ = (0+1)^*·10·(0+1)^*.
bit strings of the form 0^2k1ⁿ (that are of the correct form, except for the number of 0's being even). A regular expression for these is R₃ = (00)^*1^*.

A solution, then, is R₁ + R₂ + R₃, which in gory detail is

0^* + (0+1)^*·10·(0+1)^* + (00)^*1^*

Solution #2: A small number of students took the approach of devising a DFA M that accepts L, then flipping its accepting/nonaccepting states to obtain a DFA M_c accepting L^c (the complement of L). What remains is to translate M_c into a regular expression. As we saw earlier in the course, there is an algorithm by which to do that, but it typically produces very unwieldy reglar expressions. Instead, here we rely upon intuition.

DFA M accepting L DFA M_c accepting L^c


DFA M accepting L	DFA M_c accepting L^c

The "dead" state in M is "immortal" in M', which explains the name change. Suppose that we could devise, for each state q, a regular expression R_q whose language is precisely that set of strings that take the initial state to state q. There are three accepting states in M_c, namely Even, Odd, and Immortal. Thus, a solution would be

R_Even + R_Odd + R_Immortal

Clearly, R_Even = (00)^* is a good choice. The only transition into state Odd is the one labeled 0 coming from Even. Hence, R_Odd = R_Even·0 = (00)^*0.

The only transitions into Odd_1 are the ones labeled 1 from Odd and itself. Hence, R_{Odd_1} = R_Odd·11^* = (00)^*01⁺.

Finally, the immortal state has transitions into it from Even and Odd_1, respectively, labeled 1 and 0, plus transitions from itself on both 0 and 1. It follows that

R_Immortal = (R_Even·1 + R_{Odd_1}·0)(0+1)^* = ((00)^*1 + (00)^*01⁺0)(0+1)^*

Recall that a solution is R_Even + R_Odd + R_Immortal. Clearly, R_Even + R_Odd = 0^*. Thus, a simplified solution is 0^* + R_Immortal. In gory detail, that's

0^* + ((00)^*1 + (00)^*01⁺0)(0+1)^*

6. Present a regular expression describing the language over {a,b} containing precisely those strings having exactly two occurrences of bb and no occurrences of bbb. As an aid, provided is the regular expression

(a + ba)^*(b + λ)

describing the language over {a,b} containing precisely those strings having no occurrences of bb.

For a small bonus, present a second regular expression describing the language as described above, but allowing for there to be one occurrence of bbb that counts as two overlapping occurrences of bb.

Solution: (a + ba)^*·bba·(a + ba)^*·bb·(a + ab)^*

The logic is that every string of the desired form is composed, in order, of

a string of length zero or more in which every occurrence of b is immediately followed by a (giving rise to (a + ba)^*)
the string bba
a string of length zero or more in which every occurrence of b is immediately followed by a (giving rise to (a + ba)^*)
the string bb
a string of length zero or more in which every occurrence of b is immediately preceded by a (giving rise to (a + ab)^*)

Instead of thinking of the first occurrence of bb as having to be immediately followed by an a, we could instead think of the second occurrence of bb as having to be immediately preceded by an a. Taking this point of view, we get the equivalent solution

(a + ba)^*·bb·(a + ab)^*·abb·(a + ab)^*

For the bonus problem, we form a regular expression that describes all strings in which bbb occurs exactly once. Such a string would be composed, in order, of

a string of length zero or more in which every occurrence of b is immediately followed by a (giving rise to (a + ba)^*)
the string bbb
a string of length zero or more in which every occurrence of b is immediately preceded by a (giving rise to (a + ab)^*)

The resulting regular expression is (a + ba)^*·bbb·(a + ab)^*.

Abbreviating our solution to the non-bonus part of this problem as R₁ and the regular expression immediately above as R₂, a complete answer to the bonus question is R₁ + R₂.

Notice that both R₁ and R₂ have R_lead = (a + ba)^* as its "leading" factor and R_trail = (a + ab)^* as its "trailing" factor. Let S₁ = bba·(a + ba)^*·bb and S₂ = bbb, so that R₁ = R_lead·S₁·R_trail and R₂ = R_lead·S₂·R_trail

Then we can "factor" out the common leading and trailing factors and rewrite R₁ + R₂ as R_lead·(S₁ + S₂)·R_trail.

In gory detail (without using the abbreviations), that is

(a + ba)^*·(bba·(a + ba)^*·bb + bbb)·(a + ab)^*

7. Present regular expressions for each of the following languages:

(a) { ba^2kb | k≥0 }
Solution: b(aa)^*b

(b) The set of strings over the alphabet {a,b} in which an even number of a's appear between any two occurrences of b.
Solution: a^*(b + aa)^* a^*

(c) { xby | x,y ∈ {a,c}^* ∧ #_a(y) is even }
Solution: (a+c)^* b (c + ac^*a)^*

Numerous students had a similar answer, except for having aa instead of ac^*a. The flaw there is that it "forces" a's to occur in adjacent pairs (without any c's in between). The same students typically made the same mistake (not surprisingly) in parts (d) and (e).

(d) The set of strings over the alphabet {a,b,c} in which an even number of a's appear between any two occurrences of b.
Solution: (a+c)^* (b + c + ac^*a)^* (a+c)^*

(e) The set of strings over the alphabet {a,b,c} in which either

an even number of a's appear between any two occurrences of b, or
an even number of b's appear between any two occurrences of a

Solution: One solution is just that of part (d) unioned with the same expression, except that the roles of a and b are reversed:

(a+c)^* (b + c + ac^*a)^* (a+c)^* + (b+c)^* (a + c + bc^*b)^* (b+c)^*

CMPS 260 Spring 2024 Homework #4: Regular Languages/Expressions & Finite Transducers Sample Solutions

CMPS 260 Spring 2024
Homework #4: Regular Languages/Expressions & Finite Transducers
Sample Solutions