Nothing Special   »   [go: up one dir, main page]

On The Language of Primitive Partial Words

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

On the Language of Primitive Partial Words

Ananda Chandra Nayak and Kalpesh Kapoor


Department of Mathematics
Indian Institute of Technology Guwahati, Guwahati, India
{n.ananda,kalpesh}@iitg.ernet.in

Abstract. A partial word is a word which contains some holes known


as do not know symbols and such places can be replaced by any letter
from the underlying alphabet. We study the relation between language
of primitive partial words with the conventional language classes viz.
regular, linear and deterministic context-free in Chomsky hierarchy. We
give proofs to show that the language of primitive partial words over an
alphabet having at least two letters is not regular, not linear and not
deterministic context free language. Also we give a 2DPDA automaton
that recognizes the language of primitive partial words.
Keywords: Combinatorics on words, Partial word, Primitive word,
Regular, Linear, Deterministic Context-free, 2DPDA

Introduction

Let be a finite alphabet. We assume that is a nontrivial alphabet, which


means that it has at least two distinct symbols. A total word (referred to as simply a word) u = a0 a1 a2 . . . an1 of length n can be defined by a total function
u : {0, . . . , n 1} where each ai . We use string and word interchangeably. The set is the free monoid generated by which contains all the
strings. The length of a string u is the number of symbols contained in it and
denoted by |u|. The string with length zero (also referred to as empty string)
is denoted asS. The set of all words of length n over is denoted by n . We
define, = nN n where 0 = {} and, + = \ {}. A language L over
is a subset of .
A partial word u of length n over alphabet can be defined by a partial
function u : {0, . . . , n 1} . The partial word u contains some do not know
symbols known as holes along with the usual symbols. For 0 i < n, if u(i) is
defined, then we say i D(u) (the domain of u), otherwise i H(u) (the set of
holes). A word is a partial word without any holes. If u is a partial word of length
n over , then the companion of u is the total function u : {0, . . . , n 1}
{} defined by u (i) = u(i) if i D(u) and, u (i) = otherwise [4]. We
denote the language of partial words with arbitrary number of holes as p . The
set p is also a monoid under the operation of concatenation where serves as
the identity. Peter Leupold in [14] has given some connection between partial
words and languages in Chomsky hierarchy [17].

A. C. Nayak and K. Kapoor

The outline of the rest of the paper is as follows. The next section contains
some basic concepts and describes the hierarchy of languages of partial words. In
Section 3, we prove that the language of primitive partial words is not regular,
not linear and not deterministic context-free language. In Section 4, we give a
2DPDA automaton that accepts the language of primitive partial words and
partial Lyndon words.

The Language of Primitive Partial Words

A word u is said to be primitive if it cannot be represented as nontrivial power


of another word, that is, if u = v n then it implies u = v andn = 1. The language
containing all primitive words over an alphabet is represented as Q. The language
of primitive words plays a vital role in various fields such as coding theory, formal
languages and applications and combinatorics on words [20].
If w = xn and x is a primitive, then x is called as the primitive root of w.
Thus, for every total word there is a word which is considered as its root.
Lemma 1 ([20]). Every nonempty word w can be expressed uniquely in the
form w = xn , where n 1 and x is primitive.
Similarly, for each partial word a set of words can be considered
as root. For

example, a = {ab, a2 }. Thus, a ab and a a2 . Hence a = {ab, a}.


The language of primitive words Q has been extensively studied and many
facts have been proved about relation of Q with conventional formal language
classes. The language Q is known to be not regular [7], not linear [12], not DCFL
[16] but context sensitive language [13]. However, it is still an open question
whether the language of primitive words Q is context-free or not [19, 15, 8, 9].
We briefly mention two basic concepts, containment and compatibility, that
are required to extend the definition of primitivity to partial words [3]. If u and
v are two partial words of equal length then u is said to be contained in v, if all
elements in D(u) are also in D(v) and u(i) = v(i) for all i D(u). It is denoted
by u v.
The partial words u and v are called compatible if there exists a partial word
w such that u w and v w. The compatibility of u and v is denoted by u v.
Note that containment is not a symmetric relation where as compatibility is a
symmetric relation.
A partial word u is said to be primitive if there does not exists a word v such
that u v n , n 2. Note that if u is primitive and u v, then v is primitive
as well[3]. We denote Wi () as the set of words over alphabet with at most
i holes [4]. It is easy to see the following relation.
W0 () W1 () W2 () Wi ()
S
where W0 () = . We put W () = i0 Wi (). Let us denote the language
of primitive partial words with at most i holes as Qip . We know that Q Q =
where Q is the set of all non-primitive words. We denote the language of partial

On the Language of Primitive Partial Words

words with at most i holes as i and the language of partial words with arbitrary
number of holes as p over the alphabet {}. Thus, we define
p = 0 1 2 3
The root of a partial word w is defined as follows.

w = {p | p is primitive and total and there exists k such that w pk }

For a language L of partial words, we define L = { w | w L}.


Let us recall the relation between the finiteness of the set of root terms with
the set of regular languages.
Theorem 2 ([14]). A regular language has finite root if and only if it can be
described by a root term.
Theorem 3 ([14]). All context-free languages with finite root are regular.
We denote the language of primitive partial words as Qp . Therefore,
Qp = Q0p Q1p Q2p Q3p .
The language of partial words with at most i holes can be viewed in a similar
way as Chomsky hierarchy of formal languages. The following figure shows the
hierarchy of language of partial words.

i = Qip Qip
2

Q2p

1
Q1p

Q2p
0

Q0p = Q

Q1p

Fig. 1. The hierarchy of language of partial words

A. C. Nayak and K. Kapoor

In [5], a special language class of partial words CF p is defined, where


CF = {w | w is a partial word over {a, b} that has a critical factorization}
It has been proved that CF is not regular [2] but CF is context sensitive [6].
It is an open problem whether CF is context-free or not [5]. In this paper we
study a different language viz. the language of primitive partial words Qp and
its relation with the languages in Chomsky hierarchy.

Properties of the Language of Primitive Partial Words

In this section, we shall prove that the language of primitive partial words Qp
is not regular, not linear as well as not Deterministic Context-free Language
(DCFL). First we recall the pumping lemma for regular languages and linear
languages and also some of the properties of DCFL which are required to prove
our result.
Lemma 4 (Pumping Lemma for Regular Languages [10]). For a regular
language L, there exists an integer n > 0 such that for every word w L with
|w| n, there exist a decomposition of w as w = xyz such that the following
conditions holds.
(i) |y| > 0,
(ii) |xy| n, and
(iii) xy i z L for all i 0.
It is easy to observe that the language of primitive partial words with at
most i holes, Qip , is not regular. This is because pumping up a hole in a word
having at most i holes will give a word which is not in Qip . The following result
proves the claim in general. We use the similar idea as used in [7].
Lemma 5 ([7]). For any fixed integer k, there exist a positive integer m such
that the equation system (k j)xj + j = m, j = 0, 1, 2, . . . , k 1 has a nontrivial
solution with appropriate positive integers x1 , x2 , . . . , xj > 1.
Theorem 6. Qp is not regular.
Proof. Suppose that the language of primitive partial words Qp is regular. So
there exist a natural number n > 0 depending upon the number of states of
finite automaton for Qp .
Consider the partial word w = an am am b, m > n. Note that w is a primitive partial word over {}, where || 2 and a 6= b. Since w Qp
and |w| n, then it must satisfy the other conditions of pumping Lemma for
regular languages. So there exist a decomposition of w into x, y, z such that
w = xyz, |y| > 0 and xy i z Qp for all i 0.
Let x = ak , y = a(nj) , z = ajk am am b. Now choose i = xj and
since we know by Lemma 5 that for every j {0, 1, . . . , n 1}, there exists a positive integer xj > 1 such that xy xj z = ak a(nj)xj ajk am am b =
a(nj)xj +j am am b = am am am b (am b)3
/ Qp which is a contradiction.
Hence the language of primitive partial words Qp is not regular.
t
u

On the Language of Primitive Partial Words

Next, we prove that the language of primitive partial words Qp is not linear.
Let us recall the pumping lemma for linear languages.
Lemma 7 (Pumping Lemma for Linear Languages [11]). Let L be a linear
language. There exists an integer n such that any word p L with |p| n, admits
a factorization p = uvwxy satisfying the following conditions.
(i) uv m wxm y L m N,
(ii) |vx| > 0, and
(iii) |uvxy| n.
Theorem 8. The language of primitive partial words Qp is not linear.
Proof. Suppose that the language Qp is linear. Let n > 0 be an integer. Consider
s = an bm am n Qp be a partial word and m > n and a 6= b. Since |s| n,
so there exists a factorization of s into u, v, w, x, y such that it satisfies the
conditions of pumping Lemma for linear languages.
Let u = ai , v = aj , x = bk , y = bl such that i + j + k + l n, j + k > 0,w =
r m m q
a b a and i, j, k, l, r, q 0. So 0 < i + j + k + l n. Also, i + j + r =
s + k + l = n. Now it must be that uv t wxt y Qp for all t N. However,
uv t wxt y = ai+tj+r bm am q+tk+l = am bm am m (am bm )2
/ Qp . It will happen
that i + tj + r = m = q + tk + l because if we consider the left hand side equality
we get,
i + tj + r = m
i + j + r + (t 1)j = m
n + (t 1)j = m
mn
+1
t=
j
It is true that there will be some integers m, n and j such that t is an integer.
So, uv t wxt y = am bm am m (am bm )2 for some t. Hence Qp is not linear.
t
u
Next we prove that the language of primitive partial words is not deterministic context-free language. We will use the closure properties of DCFL. In
particular, the set of DCFLs is closed under complementation.
Theorem 9. Qp is not deterministic context-free.
Proof. Suppose that the language of primitive partial words Qp is deterministic context-free. As the set of DCFLs is closed under complementation, the
complement of Qp , that is, Qp (set of partial periodic words) is also a DCFL.
Also, we know that the intersection of a DCFL with a regular language is also
DCFL. Therefore, Qp {a b a b } = {an bm an bm | m, n N } is also a DCFL.
But the language {an bm an bm | m, n N } is not a Context Free Language
(CFL) which can be proved by using pumping lemma for CFLs. Hence it is
a contradiction that the language of primitive partial words Qp is a DCFL.
Therefore the language Qp is not deterministic context-free.
t
u

A. C. Nayak and K. Kapoor

2DPDA For Qp

In this section we define Partial Lyndon words denoted by Lp and prove that
the language of primitive partial words with 1-hole Q1p is accepted by Two-way
Pushdown Automaton (2DPDA). Next, we show that the language of primitive
partial words with at least two holes is accepted by a 2DPDA. Let us recall the
basic definition of 2DPDA and some of the results which are required in the
proof.
A 2DPDA is the same as ordinary DPDA but with an additional ability to
move its input head in both directions. We use Z0 as the bottom of stack and
one left end and one right end symbol as ` and a, respectively.
Definition 10 ([1]). A 2DPDA, P , consists of 7-tuple
P = hS, I, T, , s0 , Z0 , st i,
where
(a)
(b)
(c)
(d)

S is the states of the finite control.


I is the input alphabet (excluding ` and a).
T is the pushdown list alphabet (excluding Z0 ).
is a mapping on (S{st })(I {`, a})(T Z0 ). The value of (s, a, A) is,
0
0
0
if defined, is of one of the forms (s , d, push B), (s , d), or (s , d, pop) where
0
s S, B T and d {1, 0, +1}. We assume a 2DPDA makes no moves
from the final state st and certain other states may have no moves defined.
(e) s0 S is the initial state of the finite control.
(f ) Z0 is the special symbol that indicates the bottom of the pushdown list.
(g) st is one of the designated final state.
In the above definition (s, a, A) indicates the transition function of the machine when it is in the state s, the input head scans a symbol a, and the pushdown
list has the symbol A on top. There are three possible actions.
0
(s , d, push B) provided B 6= Z0
0
(s, a, A) = (s , d);
0
(s , d, pop)
if A 6= Z0
0

In these transitions, the machine enters in state s and moves its input head in
the direction d, where d = 1, +1 or 0, to indicate to move its head to left, right
or remain stationary, respectively. The operation push B means add the symbol
B on the top of pushdown list and pop means remove the topmost symbol from
the pushdown list.
Definition 11 ([3]). Let k and l be two positive integers satisfying k l.
For 0 i < k + l, we define the sequence of i relative to k, l as seqk,l (i) =
(i0 , i1 , . . . , in+1 ) where
(a) i0 = i = in+1

On the Language of Primitive Partial Words

(b) For 1 j n, ij 6= i

(c) For 1 j n + 1, ij is defined as ij =

ij1 + k
ij1 l

if 6= ij1 < l,
otherwise.

Definition 12 ([4]). Let k and l be two positive integers satisfying k l and


let z be a partial word of length (k + l). We say that z is (k,l)-special if there
exists 0 i gcd(k, l) such that seqk,l (i) = (i0 , i1 , . . . , in+1 ) contains at least
two positions that are holes of z while z(i0 )z(i1 ) . . . z(in+1 ) is not 1-periodic.
Lemma 13 ([4]). Let u and v be two nonempty partial words such that uv
contains at most one hole. The words u and v commute if and only if they are
contained in powers of the same words, that is, uv vu if there exists a word w
such that u wm and v wn for some integer m, n.
Lemma 14 ([4]). Let u and v be two nonempty partial words such that |u| |v|.
If uv vu and uv is not (|u|, |v|)-special, then there exists a word w such that
u wm and v wn for some integer m, n.
Definition 15. A partial word w + is primitive if and only if it is not
contained in two non-empty commuting words, that is, w Qp w =
6 and
u, v : (w uv and w vu {u, v}).
Several facts about primitive partial words are known; we recall some of them
which will be useful below.
Lemma 16 ([4]). Let u be a partial word with one hole. Then u is primitive if
and only if uu xuy for some partial words x, y implies x =  or y = .
Lemma 17 ([4]). Let u be a partial word with at least two holes.
I. If uu xuy for some partial words x, y implies x =  or y = , then u is
primitive.
II. If uu xuy for some nonempty partial words x and y satisfying |x| |y|,
then the following hold:
(a) If |x| = |y|, then u is not primitive.
(b) If u is not (|x|, |y|)-special, then u is not primitive (it is contained in
a power of a word of length |x|).
(c) If u is (|x|, |y|)-special, then u is not contained in a power of a word of
length |x|.
Definition 18. A primitive partial word w is a partial Lyndon word if and only
if it is minimal in its conjugate class (with respect to lexicographic order where
we assume a < b < < ).
Lemma 19. The cardinality of the class of conjugates of a primitive partial
word w of length n is n.

A. C. Nayak and K. Kapoor

Proof. Let w be a primitive partial word over the alphabet {} of length


n. So every new partial word which is generated by cyclic permutation of w is
a conjugate of w. Since w is primitive and we know that the primitive word
is closed under cyclic permutation [20], then each of the conjugate of w is also
primitive. So the number of such conjugates of w is n. This proves our claim. t
u
In [18], a 2DPDA automaton for the language of primitive partial words Qp
is presented. We show that a 2DPDA can also be constructed for Q1p by using
the same idea used in [18].
Theorem 20. Q1p is accepted by 2DPDA.
Proof. The informal idea is as follows: Let P be a 2DPDA. Let ` w a be the
input partial word with at most one hole augmented with two end markers. If
w = , P rejects. If not, then P moves its head towards a . P skips the last
symbol of w and pushes the remaining symbols of w onto its stack. Again P
moves to a, pushes all symbols of w onto the stack and pops one symbol. If we
0
write the input w = xw y with x, y {} (assuming |w| 2), the contents
of the stack will be:
0
0
w yxw Z0
The automaton P compares w with the pushdown contents one symbol at
a time. If the symbols match (assuming that a 6= b and = a for any a ),
the head moves right and P pops the pushdown. If in this way the entire word
0
0
w is completely scanned and is compatible to a factor of w yxw P rejects (by
assuming that the hole is compatible to any of the symbol in ).
If a mismatch is encountered P moves the input head back to ` and pushes
the symbols scanned during this move. Then P pops the topmost symbol from
the pushdown and repeats the process. If the pushdown becomes empty then P
accepts.
Hence the 2DPDA accepts w if and only if w is not compatible to a factor of
0
0
w yxw by using Lemma 16.
t
u
Observe that the above proof method does not work for the set of primitive
partial words with at least two holes. A counter example is given below.
Example 21. Let w = ab. So by the method described in Theorem 20 we have
0
0
0
x = a, y = and w = b. Now w yxw = bab and w is compatible to a
0
0
factor of w yxw . We know that if a partial word u is contained in another word
v and u is primitive then v is also primitive, that is, if v is not primitive then
u is also not primitive. We can see that a partial word is contained in a set of
words. If w = ab, then w {aaba, aabb, abba, abbb}.
Next we extend the Theorem 20 to show that the language Qp with at least
two holes is also recognizable by a 2DPDA by using a similar idea as in proof of
Theorem 20.
Theorem 22. The language of primitive partial words Qp with at least two holes
is accepted by a 2DPDA.

On the Language of Primitive Partial Words

Proof. Let A be a 2DPDA. Let ` w a be a partial word augmented with two


end markers which is the input to A. The informal idea is as follows: since w is
a partial word, first compute the set of words in which w is contained. Let w
is contained in the set {w1 , w2 , . . . , wk }. Now we will give input each of the wi ,
1 i k to A one at a time and if at least one of the wi is rejected by A, then
the partial word w is rejected and stop the process; otherwise continue the same
process.
If w = then A rejects.
Otherwise it computes the set of words {w1 , w2 , . . . , wk }. Suppose ` w1 a is the
input to A. Next, A advances its input head to a. Then A skips the last symbol
of w1 and pushes the remainder of w1 onto its pushdown store. Then A moves
its head to a again and pushes all of w1 onto its store and pop one symbol. If
00
we write w1 = xw y where x, y with |w1 | 2 the contents of pushdown are
of the form:
00
00
w yxw Z0
The automaton A compares w1 with the pushdown contents symbol by symbol. If the symbols match, the head moves right and A pops the pushdown. If in
this way w1 is completely scanned A rejects. If a mismatch occurs A moves the
input head back to ` and pushes the symbols scanned during this move. Then
A pops one symbol and repeats the process. If the pushdown is empty then A
chooses the next word w2 from the set and continue.
The automaton A accepts the partial word w if and only if each of the word
in the set {w1 , w2 , . . . , wk } is accepted by A.
t
u
Lemma 23. The language of partial Lyndon words Lp is accepted by 2DPDA.
Proof. The key idea is based on that a partial Lyndon word w is smaller than
each of its proper right factors. Let us describe an automaton B that accepts Lp .
Similar to the previous theorem, let ` w a be the input where w {}. B
scans the input w and push the symbols of w into the pushdown while scanning.
Then B compares the topmost symbol of stack with w. If B encounters a symbol
a in the stack and symbol b in w with a < b (assuming the comparison a < b <
. . . < ) or reach the bottom most symbol Z0 , then it rejects.
If the symbols in w and the topmost symbol in stack matches, in particular
a dont know symbol matches only with a dont know symbol, then B pops the
top symbol of stack and moves its head one position right. If the symbol on the
stack is greater than the symbol in the input then B returns to ` and push the
symbols onto stack during the move. If B reached the symbol ` in the input and
some symbols are left in the pushdown then B accepts. The procedure is then
repeated.
t
u

References
1. Aho, A.V., Hopcroft, J.E.: Design & Analysis of Computer Algorithms. Pearson
Education India (1974)

10

A. C. Nayak and K. Kapoor

2. Blanchet-Sadri, F., Zhang, J.: On the critical factorization theorem. Preprint


3. Blanchet-Sadri, F.: Primitive partial words. Discrete Applied Mathematics 148(3),
195213 (2005)
4. Blanchet-Sadri, F.: Algorithmic combinatorics on partial words. CRC Press (2007)
5. Blanchet-Sadri, F.: Open problems on partial words. In: New Developments in
Formal Languages and Applications, pp. 1158. Springer (2008)
6. Blanchet-Sadri, F., Davis, C., Dodge, J., Mercas, R., Moorefield, M.: Unbordered
partial words. Discrete Applied Mathematics 157(5), 890900 (2009)
7. D
om
osi, P., Horv
ath, G.: The language of primitive words is not regular: two
simple proofs. Bulletin of European Association for Theoretical Computer Science
87, 191194 (2005)
8. D
om
osi, P., Horv
ath, S., Ito, M., K
aszonyi, L., Katsura, M.: Formal languages
consisting of primitive words. In: Fundamentals of Computation Theory. pp. 194
203. Springer (1993)
9. D
om
osi, P., Ito, M., Marcus, S.: Marcus contextual languages consisting of primitive words. Discrete Mathematics 308(21), 48774881 (2008)
10. Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to automata theory, languages, and computation. ACM SIGACT News 32(1), 6065 (2001)
11. Horv
ath, G., Nagy, B.: Pumping lemmas for linear and nonlinear context-free languages. arXiv preprint arXiv:1012.0023 (2010)
12. Horv
ath, S.: Strong interchangeability and nonlinearity of primitive words. In:
Proc. Workshop AMAST Workshop on Algebraic Methods in Language Processing.
vol. 95, pp. 173178 (1995)
13. Kunimochi, Y.: A context sensitive grammar generating the set of all primitive
words. In: Algebras, Languages, Computations and their Applications. vol. 1562,
pp. 143145 (2007)
14. Leupold, P.: Languages of partial words. Grammars 7, 179192 (2004)
15. Leupold, P.: Primitive words are unavoidable for context-free languages. In: Language and Automata Theory and Applications, pp. 403413. Springer (2010)
16. Lischke, G.: Primitive words and roots of words. arXiv preprint arXiv:1104.4427
(2011)
17. Miller, G.A., Chomsky, N., Luce, D.R., Bush, R.R., Galanter, E.: Handbook of
mathematical psychology. Handbook of Mathematical Psychology (1963)
18. Petersen, H.: The ambiguity of primitive words. In: Symposium on Theoretical
Aspects of Computer Science, pp. 679690. Springer (1994)
19. Petersen, H.: On the language of primitive words. Theoretical Computer Science
161(1), 141156 (1996)
20. Shallit, J.: A second course in formal languages and automata theory. Cambridge
University Press (2008)

You might also like