CS500 Homework #1 Solutions

CS500 Homework #1 Solutions
1. (Exercise 1.4 c, d, and e, and Exercise 1.13) Draw diagrams of DFAs that recognize
the following languages, and give regular expressions for each one.
(a) {w | w contains the substring 0101}
1 0
q0 0 1 0 1 0,1
1
Figure 1: (0 + 1)∗ 0101(0 + 1)∗ . Gray states are accepting.
(b) {w | w has length at least 3 and its third symbol is 0}
q0 0,1 0,1 0 0,1
0,1
Figure 2: (0 + 1)(0 + 1)0(0 + 1)∗ . Gray states are accepting.
(c) {w | w starts with 0 and has odd length, or starts with 1 and has even length}
0,1
0 0,1
q0
1 0,1
0,1
Figure 3: 0((0 + 1)(0 + 1))∗ + 1(0 + 1)((0 + 1)(0 + 1))∗ . Gray states are accepting.
1
2. Recall the definition δ ∗ (q, w1 w2 · · · wn ) = δ ∗ (δ(q, w1 ), w2 · · · wn ) for the effect on a DFA
of reading a word w. The language L recognized by a machine can then be written
L = {w ∈ Σ∗ | δ ∗ (q0 , w) ∈ F }
where q0 is the initial state and F is the set of accepting states.
Given a DFA M , we can define an equivalence ∼M on words u, v ∈ Σ∗ as follows:
u ∼M v if and only if δ ∗ (q0 , u) = δ ∗ (q0 , v)
and, as we discussed in class, given a language L we can define an equivalence ∼L as
u ∼L v if and only if (∀w : uw ∈ L ⇔ vw ∈ L)
Prove formally that if M recognizes L, then u ∼M v implies u ∼L v. Explain why the
converse might not be true—that is, why ∼M might be a “finer” equivalence, dividing
Σ∗ into more equivalence classes, than ∼L .
Answer. Suppose u ∼M v, so δ ∗ (q0 , u) = δ ∗ (q0 , v). By induction, it is easy to show
that δ ∗ (q, xy) = δ ∗ (δ ∗ (q, x), y) for any state q and any strings x and y. Therefore, for
any string w,
δ ∗ (q0 , uw) = δ ∗ (δ ∗ (q0 , u), w) = δ ∗ (δ ∗ (q0 , v), w) = δ ∗ (q0 , vw)
Thus δ ∗ (q0 , uw) ∈ F if and only if δ ∗ (q0 , vw) ∈ F , and if M recognizes L this means
that uw ∈ L if and only if vw ∈ L. Thus u ∼L v.
The converse is not true, since if M is not the minimal DFA that recognizes L, it could
send u and v to separate states even though u ∼L v. In that case M has more states,
and ∼M divides Σ∗ into smaller pieces than ∼L .
3. Using the same notation as in the previous problem, prove that if u ∼L v, then for any
a ∈ Σ we have ua ∼L va. Show that this allows us to can define a DFA M whose set
of states Q is the set of equivalence classes of ∼L ; define q0 , δ and F , and prove that
M recognizes L. Moreover, prove that M is the smallest DFA that recognizes L, and
that it is (up to isomorphism) the only DFA of this size that recognizes L.
Answer. Suppose u ∼L v; then for all w, uw ∈ L if and only if vw ∈ L. This includes
words w which begin with a, i.e., words of the form w = ax. Thus, for all x, uax ∈ L
if and only if vax ∈ L, so ua ∼L va.
This has the following consequence: if I ask you for which equivalence class ua is in,
you don’t need to know u; you only need to know u’s equivalence class. Therefore, if
q is an equivalence class, we can define δ(q, a) as the equivalence class containing ua
where u is any word in q. (If we didn’t have the above result, there might be several
such equivalence classes, in which case δ(q, a) would not be well-defined.) Similarly,
we let q0 be the equivalence class containing the empty word , and let F be the set
of equivalence classes containing words in L. (Note that the words in an equivalence
class are either all in L or all not in L; to see this, set w = .)
2
Once we set all this up, the state δ ∗ (q0 , w) is simply the equivalence class containing
w, which is in F if and only if w ∈ L; so M recognizes L.
Moreover, the result of #2 above shows that the number of states of any machine M
is at least the number of equivalence classes of ∼L . Therefore, if M is minimal, the
equivalence classes of ∼M are exactly the equivalence classes of ∼L ; then there is a
one-to-one correspondence between the states of M and equivalence classes, and any
such machine is isomorphic to the one described here.
4. What are the equivalence classes of ∼L if L = {w ∈ {0, 1}∗ |w contains the substring 101}?
Use this to draw the minimal DFA for L.
Answer. There are four equivalence classes:
(a) w does not contain 101, and does not end in 1 or 10 (this is the start state)
(b) w does not contain 101, but ends in 1
(c) w does not contain 101, but ends in 10
(d) w contains 101 (the accepting state).
Considering the transitions between these gives the following DFA:
a 1 b 0 c 1 d 0,1
0
Figure 4: The minimal DFA for (0 + 1)∗ 101(0 + 1)∗ .
5. (Exercise 1.17c, modified) Give two proofs, one based on the pumping lemma and one
n
based on equivalence classes, that the language {a2 | n ∈ N} = {a, aa, aaaa, aaaaaaaa, . . .}
is not regular.
Proof 1. Suppose the pumping lemma holds for L. Then there would be a p such that
for all w ∈ L with |w| ≥ p, there would be some way to write w = xyz with |y| ≤ p
such that xy i z ∈ L for all i ≥ 0. For a unary language, w = a` and y = ak for some
k < p, so xy i z = w`+(i−1)k . Then the pumping lemma becomes the claim that for
all ` ≥ p, if a` ∈ L, then a`+(i−1)k ∈ L for all i ≥ 0, for some k ≤ p. In particular,
considering xy 2 z, the pumping lemma claims that a`+k ∈ L for some k ≤ p.
Now suppose that ` ≥ p and a` ∈ L, and let am be the next shortest word in L. For
the pumping lemma to be true, we need the gap in their lengths m − ` to be less than
or equal to p; otherwise xy 2 z = a`+k cannot be in the language since ` + k ≤ ` + p < m.
Therefore, if the gaps in a unary language don’t have a constant upper bound — that
is, if it has arbitrarily large gaps — it cannot be regular.
3
n
For {a2 } in particular, if the adversary claims that the pumping lemma is true with
some value of p, let w be a` where ` is the smallest power of 2 greater than p. Then
the length gap between w and the next shortest word a2` is ` > p.
Proof 2. I claim that none of the words in L are equivalent to each other. To see this,
m n m m+1
let u = a2 and v = a2 for any m 6= n, and let w = u. Now uw = a2·2 = a2 ∈ L,
m n
but vw = a2 +2 ∈ / L since the sum of two distinct powers of 2 is never a power of 2.
Thus u 6∼L v for any distinct u, v ∈ L, so there are an infinite number of equivalence
classes of ∼L and L is not regular.
6. (Problem 1.30) Show that for all p, the language
Lp = {w ∈ {0, 1}∗ | w represents a binary integer divisible by p}
is regular.
Answer. Let x be the number represented by the digits we’ve read so far. Reading
the next binary digit a = 0 or 1 changes x to 2x + a. To tell whether w is a multiple
of p, it suffices to know w mod p, and thus to keep track of x mod p. Now, for any a,
(2x + a) mod p is a function of x mod p; this is obvious if you’re familiar with modular
arithmetic — if you’re not, consider
(2x + a) mod p = (2(x mod p) + a) mod p .
Therefore, Lp can be recognized by a DFA with p states, one for each possible value of
x mod p, where 0 mod p is the accepting state. (The one problem with this machine is
that it accepts the empty word, but presumably represents zero.) Note that a similar
construction works for numbers in any base; there’s nothing special about binary.
7. (Problem 1.39) Prove that for all k > 1, DFAs with k states are more powerful than
DFAs with k − 1 states. That is, come up with a family of languages Lk such that
Lk can be recognized by a DFA with k states but not by a DFA with k − 1 states.
Thus the DFAs form a strict hierarchy based on the number of states, which classify
the regular languages according to their complexity.
Answer. The languages Lp from the previous question require p states whenever p
is odd, but here’s a simpler example: let Lk be the set of words of length at least
k − 1. It’s clear that Lk has k equivalence classes, corresponding to words of length
0, 1, 2, . . . , k − 2 and k − 1 or more. Thus Lk requires a DFA with k states.
8. (Problem 1.44 — a little harder) Now prove that NFAs can be exponentially more
compact than DFAs, i.e., that the exponential blowup in the number of states we get
when converting an NFA to a DFA is sometimes necessary. That is, give a family of
languages Lk such that Lk can be recognized by an NFA with k states, but whose
minimal DFA has Ω(2k ) states. Hint: Look at Example 1.14 in Sipser.
4
0,1 q0 1 k-1 0,1 k-2 2 0,1 1
Figure 5: The NFA for Lk .
Answer. Let Lk be the language of words in {0, 1}∗ where the (k − 1)th-to-last symbol
is a 1. Example 1.14 is L4 . Lk can be recognized by an NFA with k states; the start
state q0 , and k − 1 states labeled k − 1, k − 2, . . . , 1. The NFA is shown in the figure.
We can stay in the start state as long as we want, but when we see a 1 we can guess
that it’s the (k − 1)th-to-last symbol, and then count down from there. If we were
right, we end in the accepting state at the end of the word.
But, the minimal DFA for Lk has at least 2k−1 states. To see this, suppose that u and
v differ somewhere in their last k − 1 symbols. In that case, there is some j < k such
that the jth-to-last symbol of u is 1 but the jth-to-last symbol of v is 0 (or vice versa).
Then, if w is of length j − 1, uw ∈ L but vw ∈ / L, so u 6∼ v. Therefore, there are at
least 2k−1 equivalence classes, one for each final sequence of k − 1 symbols.
9. (Problem 1.24 plus...) Given a word w = w1 · · · wn , its reverse is wR = wn · · · w1 , and

the reverse of a language L is LR = {wR | w ∈ L}. Show that if L is regular, so is
LR . However, show also that the number of states for the minimal DFA for LR can be
exponentially larger than the number for L. Hint: use the same languages as in the
previous problem.
Answer. If L can be written as a regular expression, then so can LR . Since a regular
expression is built from smaller ones using union, concatenation, and the Kleene star,
we can prove this by induction on the size of the expression as follows:
(L1 ∪ L2 )R = LR R
1 ∪ L2
(L1 L2 )R = LR R
2 L1
(L∗1 )R = (LR
1)
∗
The base case is given by the fact that aR = a for a single symbol a.
In the previous problem we showed that Lk requires a DFA of Ω(2k ) states. However,
LRk can be recognized with a DFA of only k states: simply reverse the NFA shown above
and confirm that the (k − 1)th symbol is a 1. Thus Lk = (LR R
k ) requires exponentially
R
more states than Lk does.
10. (Problem 1.41) Prove that the language
L = {w | w has an equal number of 01s and 10s}
is regular.
5
Answer. This is kind of a trick question — L is simply the language of words which
begin and end with the same symbol, i.e.,
0(0 + 1)∗ 0 + 1(0 + 1)∗ 1 .
To see this, note that we get a 01 or 10 each time we switch from 0 to 1 or back again,
and if we start and end with the same symbol we switch back and forth an equal
number of times.
11. (Problem 1.42 — Tricky!) For a language L, define L1/2 as
L1/2 = {x | xy ∈ L for some word y such that |y| = |x|}
That is, L1/2 is the set of “first halves” of L, namely the set of words x that can be
followed by words y of the same length giving a word xy in L. Prove that if L is
regular, so is L1/2 .
Answer. We can’t say things like “accept at the half-way point,” since there is no way
for the DFA to know when it is half-way through the input. What we have to do is
simulate two copies of the DFA in parallel; one moves forwards from the start state by
reading x, while the other non-deterministically moves backwards from an accepting
state by guessing y. Then if both states are equal, this means that we met in the
middle and xy ∈ L.
Formally, let M = (Q, Σ, δ, q0 , F ) be a DFA for L, and define a new machine, an NFA,
M 0 = (Q0 , Σ, δ 0 , q00 , F 0 ) as follows. Let Q0 = Q × Q corresponding to the two copies of
M . Let us expand our definition of an NFA slightly by letting it non-deterministically
start in any element of a starting set Q00 = {(q0 , qf ) | qf ∈ F }; equivalently, let its start
state be q00 and allow -transitions from q00 to each q 0 ∈ Q00 . Then define
δ 0 ((q1 , q2 ), a) = {(δ(q1 , a), q3 ) | ∃b : δ(q3 , b) = q2 }
Note that b is the next symbol of y, read in reverse, which we guess to move backwards
from q2 to q3 . Finally, define F 0 = {(q, q)}. Then M 0 recognizes L1/2 .
You can generalize this in various ways; for instance, you can define Lmid as
Lmid = {y | ∃x, z : |x| = |y| = |z| and xyz ∈ L} .
Then Lmid is the set of middle thirds of words in L, and by guessing the first and last
thirds x and z we can prove that Lmid is regular if L is. (The converse is not true.)
6
12. Show that
L = {w ∈ {0, 1}∗ | w represents a prime number in binary}
is not regular. You may assume the following conjecture: there are an infinite number
of Mersenne primes, i.e., primes of the form 2k − 1. You may also use the following
fact: if 2k − 1 is prime, then k is prime (the converse is not true). Finally, you may also
use the Prime Number Theorem, which states that for some constant C, the number
π(x) of primes less than x obeys π(x) < Cx/ ln x.
Answer. Let’s transform L into a unary language. Let L0 = L ∩ 1∗ ; since 1k in binary
represents 2k − 1, we have
L0 = {w | w represents a Mersenne prime}
If we can prove that L0 is not regular, then since the set of regular languages is closed
under intersection it follows that L is not regular.
As in question #5, it’s enough to show that the set of lengths of words in L0 has
arbitrarily large gaps. (Since any finite language is regular, we also need to know that
L0 is infinite; for this we rely on the conjecture that there are an infinite number of
Mersenne primes.) First, since k must be prime for 2k − 1 to be prime, we have
L0 ⊂ {1k | k is prime}
If the Pumping Lemma were true the gaps between adjacent primes would be bounded
above by some constant p. But then the number of primes less than x would obey
π(x) ≥ x/p, and this would contradict the Prime Number Theorem, since for suffi-
ciently large x we have x/p > Cx/ ln x. Thus the set of primes has arbitrarily large
gaps, and since words in L0 have prime length its gaps are at least as large. This
completes the proof.

CS500 Homework #1 Solutions

Uploaded by

Copyright:

Available Formats

CS500 Homework #1 Solutions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS500 Homework #1 Solutions

Uploaded by

Copyright:

Available Formats

CS500 Homework #1 Solutions

(a) {w | w contains the substring 0101}

Figure 1: (0 + 1)∗ 0101(0 + 1)∗ . Gray states are accepting.

(b) {w | w has length at least 3 and its third symbol is 0}

q0 0,1 0,1 0 0,1

Figure 2: (0 + 1)(0 + 1)0(0 + 1)∗ . Gray states are accepting.

Considering the transitions between these gives the following DFA:

Figure 4: The minimal DFA for (0 + 1)∗ 101(0 + 1)∗ .

6. (Problem 1.30) Show that for all p, the language

Lp = {w ∈ {0, 1}∗ | w represents a binary integer divisible by p}

(2x + a) mod p = (2(x mod p) + a) mod p .

Figure 5: The NFA for Lk .

9. (Problem 1.24 plus...) Given a word w = w1 · · · wn , its reverse is wR = wn · · · w1 , and

10. (Problem 1.41) Prove that the language

L = {w | w has an equal number of 01s and 10s}

0(0 + 1)∗ 0 + 1(0 + 1)∗ 1 .

11. (Problem 1.42 — Tricky!) For a language L, define L1/2 as

L1/2 = {x | xy ∈ L for some word y such that |y| = |x|}

δ 0 ((q1 , q2 ), a) = {(δ(q1 , a), q3 ) | ∃b : δ(q3 , b) = q2 }

Lmid = {y | ∃x, z : |x| = |y| = |z| and xyz ∈ L} .

L = {w ∈ {0, 1}∗ | w represents a prime number in binary}

L0 = {w | w represents a Mersenne prime}

You might also like