CFL Pumping Lemma
CFL Pumping Lemma
CFL Pumping Lemma
Similar to regular-language PL, but you have to pump two strings in the middle of the string, in tandem (i.e., the same number of copies of each). Formally:
Outline of Proof of PL
Let there be a Chomsky-normal-form CFG for L with m variables. Pick n = 2m. Because CNF grammars have bodies of no more than 2 symbols, a string z of length n must have some path with at least m + 1 variables. Thus, some variable must appear twice on the path.
Compare with the DFA argument about a path longer than the number of states.
Focus on some path that is as long as any path in the tree. In this path, we can find a duplication of some variable A among the bottom m + 1 variables on the path.
Let the lower A derive w and the upper A derive vwx.
vx { I. By repeatedly replacing the lower A's tree by the upper A's tree, we see uviwxiy has a parse tree for all i > 1.
And replacing the upper by the lower shows the case i = 0; i.e., uwy is in L.
n and
S A
A A
S A A
w u y
u v v
x y x
Another formulation
Consider the derivation S* uAy*uvAxy*uvwxy
S
* A vAx
v A x
A path in a binary tree representing a derivation for a grammar in CNF either is empty or consists of a node, one of its descendents, and all nodes in between. Length of the path is the number of nodes it contains Height of a binary tree is the length of its longest path.
LEMMA: For any h u 1, a binary tree having more than 2h leaf nodes, must have a height greater than h + 1
* Aw
w
@ uviwxiy L
Pumping lemma constant for CFLs is 2n+1 where n is the number of variables in the grammar in CNF Why? Because the derivation tree for a sufficiently long string must have a height of at least n + 2
It has more than 2n leaf nodes, and therefore its height is greater than n + 1 If n non-terminals, string must have tree with height n+2 to be sufficiently long for the PL to be applied
Height of n+1 gets us to the level of the bottom-most non-leaf nodes (nodes labeled with a non-terminal)
S C1 C3 C4 A B C2
z = (ab)(b)(ab)(b)(a) =u v w x y
Root of vwx
C5 C6 a C7 C9 b
Cannot overlap w
@|vx| > 0
b A
C8
Consider a leaf and the n + 1 nodes above it: since there are only n variables, one must appear twice.
Example
k2
Example
L = {0 | k is any integer} is not a CFL.
Suppose it were. Then let n be the PL constant for L. Consider z = 0n2. We can write z = uvwxy, with |vwx| e n and |vx| > 0. Then uvvwxxy should be in L. But n2 < |uvvwxxy| e n2 + n < (n + 1)2, so there is no perfect square that |uvvwxxy| could be. By "proof by contradiction, L is not a CFL.
Example
L = {ww | w {0,1}*} is not a CFL. Suppose it were. Then let n be the PL constant for L. Choosing a string is less obvious for this language.
Try z = 0n10n1. But it can be pumped by dividing as follows:
0n1 0n1 000000 0 1 0 0000001 u v x y z
Use PL condition that the string can be pumped by dividing into z = uvwxy, where |vwx| e n. vwx must straddle the midpoint of z. Otherwise, if only in the first half of z, pumping up to uv2wx2y moves a 1 into the first position of the second half, so it cannot be of form ww. If in the second half, a 0 is moved into the last position of the first half, so cannot be of form ww. If vwx straddles the midpoint of z, pumping z down to uwy yields 0n1i0j1n, where i and j cannot both be n. This string cannot be of form ww. Contradiction!
Example
L= {aibjck | i < j < k} is not a CFL
Suppose it were. Then let n be the PL constant for L.
Example
L = {aibjck | 0 e i e j e k} is not a CFL
Suppose it were. Then let n be the PL constant for L. Consider z = anbncn. We can write z = uvwxy, with |vwx| e n and |vx| > 0, and uviwxiy L for every i u 0.. When both v and x contain only one type of symbol, v does not contain both as and bs and bs or cs and the same holds for x. Must divide into three sub-cases:
1. 2. No as. Then try pumping down to obtain uv0wx0y = uwy. Contains too few bs or cs. No bs. Then either as or cs must appear in v or x because both cant be the empty string. If as appear, then uv2wx2y contains more as than bs. If cs appear, then uv0wx0y contains more bs than cs. No cs. The string uv2wx2y contains more as or more bs than cs.
Consider z = anbn+1cn+2. We can write z = uvwxy, with |vwx| e n and |vx| > 0, and uviwxiy L for every i u 0.
This time must pump down as well as pump up. First we consider the case where vx contains at least one a. Then since |vwx| e n, vx can contain no c's. Therefore, uv2wx2y has at least n + 1 a's and exactly n + 2 c's, which is impossible for strings in L. If vx contains no a's, then it must contain either b or c. In this case, uv0wx0y = uwy has either fewer than n + 1 b's or fewer than n + 2 c's, but in either case exactly no a's. This is also impossible for strings in L. By "proof by contradiction," L is not a CFL.
3.
When either v or x contain more than one type of symbol, uv2wx2y will not contain symbols in the correct order. By "proof by contradiction," L is not a CFL.
Example
L = {xyx | x,y {a,b}* and |x| 1}is not a CFL
Suppose it were. Then let n be the PL constant for L.
Let z = anbnanbn. Then z = uvwxy for some u, v, w, x, and y, satisfying |vx| > 0, |vwx| e n, and uviwxiy L for every i u 0.
Suppose that vx contains either only a's from the first group or only b's from the last group. Then uv2wx2y is either an+i bnanbn or anbnanbn+i for some 0 < i e n, and in neither case can this string be in the form xyx for any x with |x| > 0. Otherwise, vx contains either a b from the first group or an a from the second. In this case uv0wx0y is either ai bjakbn or anbi ajbk where in either case i and k are positive and j < n. Neither of these strings can be in the form required for L either. By "proof by contradiction," L is not a CFL.
Say we want to check i { k. As long as 0's come in, count them on the stack. Ignore 1's. Pop the stack for each 2. As long as we have not just exposed the bottom-of-stack marker when the first 3 comes in, accept, and keep accepting as long as 3's come in. But we also have to accept, and keep accepting, as soon as we see that the input is not in L(0*1*2*3*).
Another Example
If L1 and L2 are CFLs, then if the CFLs are closed under complementation, then the language
L ! L1 L2
Is context free because the CFLs are closed under union But by DeMorgans Law,
L ! L1 L2
It would then follow that CFLs are closed under intersection. But we have already seen that they are not!
CYK Algorithm
Start with a CNF grammar for L. Build a two-dimensional table:
Row = length of a substring of w. Column = beginning position of the substring. Entry in row i and column j = set of variables that generate the substring of w beginning at position j and extending for i positions. These entries are denoted Xj,i+j-1 i.e., the subscripts are the first and last positions of the string represented, so the first row is X11,X22, ,Xnn, the second row is X12,X23, ,Xn-1,n, and so on.
Table
The horizontal axis corresponds to the positions of the string w = a1a2an Table entry Xij is the set of non* terminals A such that A aiai+1aj
We are particularly interested in whether S is in * Xn1 because that is the same as saying S w (that is, w is in L)
Basis: (row 1) Xii = the set of variables A such that A p a is a production, and a is the symbol at position i of w.
The grammar is in CNF, therefore the only way to derive a terminal is with a production of the form A p a, so Xii is the set of non-terminals such that A p ai is a production of G
Example
We'll use the algorithm to determine if the string w = aabbb is in the language generated by the grammar S p AB A p BB | a B p AB |b Note that w11 = a, so X11 is the set of all variables that immediately derive a, that is X11 = {A}. Since w22 = a, we also have X22 = {A}, and so on to get
X11 = {A}, X22 = {A}, X33 = {B}, X44 = {B}, X55 = {B}
Compute X12 : since X11 = {A} and X22 = {A}, X12 consists of all variables on the left side of a production whose right side is AA. None, so X12 is empty. Next X23 = {A | A pBB, B X22, B X33} so the required right side is AB, thus X23 = {S,B} Rest is easy:
X12 = *, X23 = {S,B}, X34 = {A}, X45 = {A}, X13 = {S,B}, X24 = {A}, X35 = {S,B}, X14 = {A}, X25 = {S,B}, X15 = {S,B}
S p AB A p BB | a B p AB |b
a 1,1 2,2
a 3,3
b 4,4
b 5,5
A
1,2
A
2,3
B
3,4
B
4,5
1,3
2,4
3,5
1,4
2,5
1,5
S p AB A p BB | a B p AB |b
1,1 2,2 3,3 4,4 5,5
a a b b b
B
3,4
Another Example
B B
A
1,2
A
2,3
B
4,5
B A A B A S, B S,B S,B
S, B
1,3 2,4
A
3,5
S, B
1,4
A
2,5
S, B
A
1,5
S, B
S, B
X p aXb | ab
b 5,5 5,6
b 6,6
S p AB | BC A p BA | a B p CC | b C p AB |a
b 1,1 1,2 1,3 1,4 1,5 2,2 2,3 2,4 2,5 a 3,3 3,4 3,5
Another Example
Test for string baaba
a 4,4 4,5 a 5,5 a
Preview of Undecidable CFL Problems Is a given CFG ambiguous? Is a given CFG inherently ambiguous? Is the intersection of two CFLs empty? Are two CFLs the same? Is a given CFL equal to *, where is the alphabet of the language?
Context-Sensitive Grammars
The next grammar type, more powerful than CFGs, is a "somewhat restricted" grammar A grammar is context-sensitive if all productions are of the form x p y, where x,y are in (V T)+ and |x| |y|
Fundamental property:
grammar is non-contracting--i.e., the length of successive sentential forms can never decrease
Regular Languages
Finite Automata
Why "context-sensitive"?
All productions can be rewritten in a normal form xAy p xvy Effectively, "A can be replaced by v only in the context of a preceding x and a following y"
Example
CSG for {anbncn | n 1} S Ab Ac bB aB p abc | aAbc p bA p Bbcc p Bb p aa | aaA
S aAbc abAc abBbcc aBbbcc aaAbbcc aabAbcc aabbAcc aabbBbccc aabBbbccc aaBbbbccc aaabbbccc
Linear-Bounded Automata
A limited TM in which tape use is restricted
Use only part of the tape occupied by the input I.e., has an unbounded tape, but the amount that can be used is a function of the input
Restrict usable part of tape to exactly the cells taken by the input
A and B are "messengers"- an A is created on the left, travels to the right to the first c, creates another b and c. Then sends B back to create the corresponding a. Similar to the way one would program a TM to accept the language.
10
11