1. Let M = (Q, Σ δ, q0, F) be a DFA such that L = L(M) (i.e., M accepts language L).
Consider the language
In words, x is a member of L' if and only if x is a member of L but no proper prefix of x is a member of L.
Describe how to modify M so as to obtain a DFA M' that accepts L'.
Solution: To obtain M' from M, simply make every outgoing transition from every accepting state go to the dead state.
That L(M') = L' follows from the following two lemmas.
Lemma 1.1: L' ⊆ L(M').
Let x ∈ L'. By definition of L', this means that x ∈ L(M)
but no proper prefix of x is in L(M). Which means that, in M,
the sequence of transitions beginning at q0 and
spelling out x ends in an accepting state but otherwise does not
involve any accepting states. (That is, the only accepting state
along that path is the one at the end.) By the construction of M',
the same sequence of transitions exists within it
(because, for every nonaccepting state in M, all of its outgoing
transitions are also found in M'). Hence, x ∈ L(M').
■
Lemma 1.2: L(M') ⊆ L'.
Let x ∈ L(M'). Then, in M', the sequence of transitions beginning
in q0 and spelling out x ends in an accepting state.
Now, every transition in that sequence emanates from a
non-accepting state, because in M' all transitions emanating from
accepting states go to the dead state. But all transitions emanating
from non-accepting states in M' also exist in M. Hence, the same
sequence of transitions exists in M, which means that M accepts x, too.
Because none of the states along that path are accepting
(except the one at the end), no proper prefix of x is accepted by M.
It follows that x ∈ L'. ■
One "incorrect solution": A few students offered an answer that at first seems plausible but turns out to be flawed. Their construction was to obtain M' by changing to nonaccepting the status of every accepting state in M that can be reached by a nonempty sequence of transitions from some accepting state. Suppose, for example, that both x and xy are accepted by M, where |y| > 0. Then in M we have q0 ⇝x p ⇝y r for some accepting states p and r. By the construction, r is nonaccepting in M', thereby correctly preventing M' from accepting xy. The flaw is that there could be a string z such that q0 ⇝z r (implying that z is accepted by M) but having no proper prefix that is accepted by M. Then we have z ∈ L' but z ∉ L(M') (the latter because state r is nonaccepting in M').
2. Let M and N be a pair of DFAs with the same input alphabet Σ. Let their state sets be, respectively, QM and QN, and let their sets of accepting states be, respectively, FM and FN. Suppose that MN is the "trimmed" DFA obtained by applying the Cartesian Product construction to M and N. (For MN to be trimmed means that its state set is that subset of QM × QN containing only those states reachable from its start state.)
(a) Describe how MN can be used to determine whether or not L(M) = L(N).
(b) Describe how MN can be used to determine whether or not L(M) ⊆ L(N).
Solution: For part (a), L(M) = L(N) iff for every state [p,q] in MN, p ∈ FM iff q ∈ FN. That is, either both p and q are accepting or both are non-accepting (in their respective DFAs).
For part (b), L(M) ⊆ L(N) iff for every state [p,q] in MN, p ∈ FM implies q ∈ FN. That is, if p is accepting (in M), then q must be accepting (in N).
3. Present a finite transducer that, given a bit string x as input, produces as output the bit string obtained by compressing every run in x to a run having length equal to the minimum of two and the length of that run. (A run is a substring of repeated symbols, extending to the left and right until reaching a different symbol or a string boundary.)
For example, if the input were 00001011111001 (i.e., 041015021), the resulting output would be 021012021. Notice how the runs of lengths four and five were truncated to runs of length two.
Solution:
4. Present a finite transducer that performs addition on a pair of binary numerals written "backwards". For this purpose, the input alphabet will be Σ ∪ {#}, where Σ = { [0,0], [0,1], [1,0], [1,1]}. The output alphabet is {0,1}.
A string x#, where x ∈ Σ+, is intended to represent an instance of the binary addition problem, but with the bits written from least significant to most significant. The figure below shows an instance of a binary addition problem (on the left) and how that instance corresponds to the input string received and output string produced by the intended finite transducer (on the right).
101101 + 000111 ------ 110100 |
Input: [1,1][0,1][1,1][1,0][0,0][1,0]#
Output: 001011 |
Solution: The machine will be in state 0 (respectively, 1) when the carry coming into the "current" column is zero (respectively one). The # is an "end-of-input" marker, which is significant here because if processing the last pair of bits results in the machine being in the "carry in is 1" state, then to produce the correct answer it is necessary for the machine to output 1 before the computation terminates.
5. Present a regular expression r such that the language that it describes, L(r), is the complement of the language L, where
L contains precisely those bit strings beginning with an odd number of 0's followed by at least one 1 and is described by the regular expression 0(00)*11*.
Solution #1: To solve such a problem, we should identify every kind of way that a string over {0,1} could fail to be a member of L. For each such kind of "failure", we should devise a regular expression that describes all the strings of that kind. The solution would then be the regular expression formed by taking the union of (i.e., applying the + operator upon) the regular expressions just mentioned.
Now, the members of L are those bit strings that are composed of an odd-length sequence of 0's followed by a non-empty sequence of 1's. A comprehensive list of bit strings that are "non-conforming" is
A solution, then, is R1 + R2 + R3, which in gory detail is
Solution #2: A small number of students took the approach of devising a DFA M that accepts L, then flipping its accepting/nonaccepting states to obtain a DFA Mc accepting Lc (the complement of L). What remains is to translate Mc into a regular expression. As we saw earlier in the course, there is an algorithm by which to do that, but it typically produces very unwieldy reglar expressions. Instead, here we rely upon intuition.
DFA M accepting L | DFA Mc accepting Lc |
---|
The "dead" state in M is "immortal" in M', which explains the name change. Suppose that we could devise, for each state q, a regular expression Rq whose language is precisely that set of strings that take the initial state to state q. There are three accepting states in Mc, namely Even, Odd, and Immortal. Thus, a solution would be
Clearly, REven = (00)* is a good choice. The only transition into state Odd is the one labeled 0 coming from Even. Hence, ROdd = REven·0 = (00)*0.
The only transitions into Odd_1 are the ones labeled 1 from Odd and itself. Hence, ROdd_1 = ROdd·11* = (00)*01+.
Finally, the immortal state has transitions into it from Even and Odd_1, respectively, labeled 1 and 0, plus transitions from itself on both 0 and 1. It follows that
Recall that a solution is REven + ROdd + RImmortal. Clearly, REven + ROdd = 0*. Thus, a simplified solution is 0* + RImmortal. In gory detail, that's
6. Present a regular expression describing the language over {a,b} containing precisely those strings having exactly two occurrences of bb and no occurrences of bbb. As an aid, provided is the regular expression
describing the language over {a,b} containing precisely those strings having no occurrences of bb.
For a small bonus, present a second regular expression describing the language as described above, but allowing for there to be one occurrence of bbb that counts as two overlapping occurrences of bb.
Solution: (a + ba)*·bba·(a + ba)*·bb·(a + ab)*
The logic is that every string of the desired form is composed, in order, of
Instead of thinking of the first occurrence of bb as having to be immediately followed by an a, we could instead think of the second occurrence of bb as having to be immediately preceded by an a. Taking this point of view, we get the equivalent solution
For the bonus problem, we form a regular expression that describes all strings in which bbb occurs exactly once. Such a string would be composed, in order, of
The resulting regular expression is (a + ba)*·bbb·(a + ab)*.
Abbreviating our solution to the non-bonus part of this problem as R1 and the regular expression immediately above as R2, a complete answer to the bonus question is R1 + R2.
Notice that both R1 and R2 have Rlead = (a + ba)* as its "leading" factor and Rtrail = (a + ab)* as its "trailing" factor. Let S1 = bba·(a + ba)*·bb and S2 = bbb, so that R1 = Rlead·S1·Rtrail and R2 = Rlead·S2·Rtrail
Then we can "factor" out the common leading and trailing factors and rewrite R1 + R2 as Rlead·(S1 + S2)·Rtrail.
In gory detail (without using the abbreviations), that is
7. Present regular expressions for each of the following languages:
(a) { ba2kb | k≥0 }
Solution: b(aa)*b
(b) The set of strings over the alphabet {a,b} in which an
even number of a's appear between any two occurrences of
b.
Solution:
a*(b + aa)* a*
(c) { xby |
x,y ∈ {a,c}* ∧
#a(y) is even }
Solution:
(a+c)* b
(c + ac*a)*
Numerous students had a similar answer, except for having aa instead of ac*a. The flaw there is that it "forces" a's to occur in adjacent pairs (without any c's in between). The same students typically made the same mistake (not surprisingly) in parts (d) and (e).
(d) The set of strings over the alphabet {a,b,c} in which an
even number of a's appear between any two occurrences of
b.
Solution:
(a+c)* (b + c + ac*a)* (a+c)*
(e) The set of strings over the alphabet {a,b,c} in which either