CMPS 260 Spring 2024
Homework #4: Regular Languages/Expressions & Finite Transducers
Sample Solutions

1. Let M = (Q, Σ δ, q0, F) be a DFA such that L = L(M) (i.e., M accepts language L).

Consider the language

L' = { x ∈ L  |  for all x,y ∈ Σ*, if x = yz and |z|>0 then y ∉ L }

In words, x is a member of L' if and only if x is a member of L but no proper prefix of x is a member of L.

Describe how to modify M so as to obtain a DFA M' that accepts L'.

Solution: To obtain M' from M, simply make every outgoing transition from every accepting state go to the dead state.

That L(M') = L' follows from the following two lemmas.

Lemma 1.1: L' ⊆ L(M').
Let x ∈ L'. By definition of L', this means that x ∈ L(M) but no proper prefix of x is in L(M). Which means that, in M, the sequence of transitions beginning at q0 and spelling out x ends in an accepting state but otherwise does not involve any accepting states. (That is, the only accepting state along that path is the one at the end.) By the construction of M', the same sequence of transitions exists within it (because, for every nonaccepting state in M, all of its outgoing transitions are also found in M'). Hence, x ∈ L(M'). ■

Lemma 1.2: L(M') ⊆ L'.
Let x ∈ L(M'). Then, in M', the sequence of transitions beginning in q0 and spelling out x ends in an accepting state. Now, every transition in that sequence emanates from a non-accepting state, because in M' all transitions emanating from accepting states go to the dead state. But all transitions emanating from non-accepting states in M' also exist in M. Hence, the same sequence of transitions exists in M, which means that M accepts x, too. Because none of the states along that path are accepting (except the one at the end), no proper prefix of x is accepted by M. It follows that x ∈ L'. ■

One "incorrect solution": A few students offered an answer that at first seems plausible but turns out to be flawed. Their construction was to obtain M' by changing to nonaccepting the status of every accepting state in M that can be reached by a nonempty sequence of transitions from some accepting state. Suppose, for example, that both x and xy are accepted by M, where |y| > 0. Then in M we have q0x p ⇝y r for some accepting states p and r. By the construction, r is nonaccepting in M', thereby correctly preventing M' from accepting xy. The flaw is that there could be a string z such that q0z r (implying that z is accepted by M) but having no proper prefix that is accepted by M. Then we have z ∈ L' but z ∉ L(M') (the latter because state r is nonaccepting in M').


2. Let M and N be a pair of DFAs with the same input alphabet Σ. Let their state sets be, respectively, QM and QN, and let their sets of accepting states be, respectively, FM and FN. Suppose that MN is the "trimmed" DFA obtained by applying the Cartesian Product construction to M and N. (For MN to be trimmed means that its state set is that subset of QM × QN containing only those states reachable from its start state.)

(a) Describe how MN can be used to determine whether or not L(M) = L(N).

(b) Describe how MN can be used to determine whether or not L(M) ⊆ L(N).

Solution: For part (a), L(M) = L(N) iff for every state [p,q] in MN, p ∈ FM iff q ∈ FN. That is, either both p and q are accepting or both are non-accepting (in their respective DFAs).

For part (b), L(M) ⊆ L(N) iff for every state [p,q] in MN, p ∈ FM implies q ∈ FN. That is, if p is accepting (in M), then q must be accepting (in N).


3. Present a finite transducer that, given a bit string x as input, produces as output the bit string obtained by compressing every run in x to a run having length equal to the minimum of two and the length of that run. (A run is a substring of repeated symbols, extending to the left and right until reaching a different symbol or a string boundary.)

For example, if the input were 00001011111001 (i.e., 041015021), the resulting output would be 021012021. Notice how the runs of lengths four and five were truncated to runs of length two.

Solution:


4. Present a finite transducer that performs addition on a pair of binary numerals written "backwards". For this purpose, the input alphabet will be Σ ∪ {#}, where Σ = { [0,0], [0,1], [1,0], [1,1]}. The output alphabet is {0,1}.

A string x#, where x ∈ Σ+, is intended to represent an instance of the binary addition problem, but with the bits written from least significant to most significant. The figure below shows an instance of a binary addition problem (on the left) and how that instance corresponds to the input string received and output string produced by the intended finite transducer (on the right).

  101101
+ 000111
  ------
  110100
Input: [1,1][0,1][1,1][1,0][0,0][1,0]#
Output: 001011

Solution: The machine will be in state 0 (respectively, 1) when the carry coming into the "current" column is zero (respectively one). The # is an "end-of-input" marker, which is significant here because if processing the last pair of bits results in the machine being in the "carry in is 1" state, then to produce the correct answer it is necessary for the machine to output 1 before the computation terminates.


5. Present a regular expression r such that the language that it describes, L(r), is the complement of the language L, where

L = { 02m+1 1n  |  m≥0 ∧ n≥1 }.

L contains precisely those bit strings beginning with an odd number of 0's followed by at least one 1 and is described by the regular expression 0(00)*11*.

Solution #1: To solve such a problem, we should identify every kind of way that a string over {0,1} could fail to be a member of L. For each such kind of "failure", we should devise a regular expression that describes all the strings of that kind. The solution would then be the regular expression formed by taking the union of (i.e., applying the + operator upon) the regular expressions just mentioned.

Now, the members of L are those bit strings that are composed of an odd-length sequence of 0's followed by a non-empty sequence of 1's. A comprehensive list of bit strings that are "non-conforming" is

  1. bit strings having no occurrences of 1. A regular expression for these is R1 = 0*.
  2. bit strings having 10 as a substring. A regular expression for these is R2 = (0+1)*·10·(0+1)*.
  3. bit strings of the form 02k1n (that are of the correct form, except for the number of 0's being even). A regular expression for these is R3 = (00)*1*.

A solution, then, is R1 + R2 + R3, which in gory detail is

0*  +  (0+1)*·10·(0+1)*  +  (00)*1*

Solution #2: A small number of students took the approach of devising a DFA M that accepts L, then flipping its accepting/nonaccepting states to obtain a DFA Mc accepting Lc (the complement of L). What remains is to translate Mc into a regular expression. As we saw earlier in the course, there is an algorithm by which to do that, but it typically produces very unwieldy reglar expressions. Instead, here we rely upon intuition.

DFA M accepting L DFA Mc accepting Lc

The "dead" state in M is "immortal" in M', which explains the name change. Suppose that we could devise, for each state q, a regular expression Rq whose language is precisely that set of strings that take the initial state to state q. There are three accepting states in Mc, namely Even, Odd, and Immortal. Thus, a solution would be

REven + ROdd + RImmortal

Clearly, REven = (00)* is a good choice. The only transition into state Odd is the one labeled 0 coming from Even. Hence, ROdd = REven·0 = (00)*0.

The only transitions into Odd_1 are the ones labeled 1 from Odd and itself. Hence, ROdd_1 = ROdd·11* = (00)*01+.

Finally, the immortal state has transitions into it from Even and Odd_1, respectively, labeled 1 and 0, plus transitions from itself on both 0 and 1. It follows that

RImmortal = (REven·1  +  ROdd_1·0)(0+1)* = ((00)*1  +  (00)*01+0)(0+1)*

Recall that a solution is REven + ROdd + RImmortal. Clearly, REven + ROdd = 0*. Thus, a simplified solution is 0* + RImmortal. In gory detail, that's

0*  +  ((00)*1 + (00)*01+0)(0+1)*

6. Present a regular expression describing the language over {a,b} containing precisely those strings having exactly two occurrences of bb and no occurrences of bbb. As an aid, provided is the regular expression

(a + ba)*(b + λ)

describing the language over {a,b} containing precisely those strings having no occurrences of bb.

For a small bonus, present a second regular expression describing the language as described above, but allowing for there to be one occurrence of bbb that counts as two overlapping occurrences of bb.

Solution: (a + ba)*·bba·(a + ba)*·bb·(a + ab)*

The logic is that every string of the desired form is composed, in order, of

  1. a string of length zero or more in which every occurrence of b is immediately followed by a (giving rise to (a + ba)*)
  2. the string bba
  3. a string of length zero or more in which every occurrence of b is immediately followed by a (giving rise to (a + ba)*)
  4. the string bb
  5. a string of length zero or more in which every occurrence of b is immediately preceded by a (giving rise to (a + ab)*)

Instead of thinking of the first occurrence of bb as having to be immediately followed by an a, we could instead think of the second occurrence of bb as having to be immediately preceded by an a. Taking this point of view, we get the equivalent solution

(a + ba)*·bb·(a + ab)*·abb·(a + ab)*

For the bonus problem, we form a regular expression that describes all strings in which bbb occurs exactly once. Such a string would be composed, in order, of

  1. a string of length zero or more in which every occurrence of b is immediately followed by a (giving rise to (a + ba)*)
  2. the string bbb
  3. a string of length zero or more in which every occurrence of b is immediately preceded by a (giving rise to (a + ab)*)

The resulting regular expression is (a + ba)*·bbb·(a + ab)*.

Abbreviating our solution to the non-bonus part of this problem as R1 and the regular expression immediately above as R2, a complete answer to the bonus question is R1 + R2.

Notice that both R1 and R2 have Rlead = (a + ba)* as its "leading" factor and Rtrail = (a + ab)* as its "trailing" factor. Let S1 = bba·(a + ba)*·bb and S2 = bbb, so that R1 = Rlead·S1·Rtrail and R2 = Rlead·S2·Rtrail

Then we can "factor" out the common leading and trailing factors and rewrite R1 + R2 as Rlead·(S1 + S2)·Rtrail.

In gory detail (without using the abbreviations), that is

(a + ba)*·(bba·(a + ba)*·bb  +  bbb)·(a + ab)*


7. Present regular expressions for each of the following languages:

(a) { ba2kb  |  k≥0 }
Solution: b(aa)*b

(b) The set of strings over the alphabet {a,b} in which an even number of a's appear between any two occurrences of b.
Solution: a*(b + aa)* a*

(c) { xby  |  x,y ∈ {a,c}* ∧  #a(y) is even }
Solution: (a+c)* b (c + ac*a)*

Numerous students had a similar answer, except for having aa instead of ac*a. The flaw there is that it "forces" a's to occur in adjacent pairs (without any c's in between). The same students typically made the same mistake (not surprisingly) in parts (d) and (e).

(d) The set of strings over the alphabet {a,b,c} in which an even number of a's appear between any two occurrences of b.
Solution: (a+c)* (b + c + ac*a)* (a+c)*

(e) The set of strings over the alphabet {a,b,c} in which either

Solution: One solution is just that of part (d) unioned with the same expression, except that the roles of a and b are reversed:

(a+c)* (b + c + ac*a)* (a+c)*  +  (b+c)* (a + c + bc*b)* (b+c)*