Nothing Special   »   [go: up one dir, main page]

Lecture 7 - 8 & 9 - Chapter 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Chapter 4

Context Free Language

Esubalew A.
Contents
 Context-free Grammars
 Derivation, Parse Tree, Ambiguity, Parsing Expression
 Transformation Methods
 Useful Substitution Rule, removing useless-productions,
Removing l-productions, removing unit productions
 Normal Forms
 Chomsky Norm Form (CNF)
 Grieback Normal Form (GNF)
 Pumping Lemmas for CFLs
 Properties CFLs
 Decision Algorithms for CFGs
 The design of the grammar is an initial phase of the design of a
programming language.
 CFG is used to specify the structure of legal programs.
 Formally a CFG G = (N, T, S, P), where:
 All productions are in the form A  X [A  N, X  (NUT)*]
 Each production must have a single non-terminal on its left hand
side.
 Examples:
 S  Sa|SS | A, A  aA|aa is CFG
 S  aA|ab, aA ab|l is not CFG
 A language L is said to be context-free iff there is a CFG G such
that L=L(G).
 Every regular language is a context-free language but every
context-free is not a regular language .
 This is because CFG is more expressive than RE
 L = {anbn | n>=1}, is an example language that can be expressed by
CFG but not by RE
 Context-free grammar is sufficient to describe most
programming languages.
 Example:
E E+E | E–E | E*E | E/E | -E
E (E)
E  id
 Where
 T= {+, -, *, / (,), id}, N= {E}
 S = {E}
 Production are shown above
 BNF (Backus Normal Form or Backus–Naur Form) a
notation techniques for context-free grammars, often used to
describe the syntax of languages used in computing
 Has many extensions and variants
 Extended Backus–Naur Form (EBNF)
 Augmented Backus–Naur Form (ABNF).
 A BNF specification is a set of derivation rules, written as
 <symbol> ::= expression
 BNF for valid arithmetic expression
<expr> ::= <expr> <op> <expr>
<expr> ::= ( <expr> )
<expr> ::= - <expr> Exercise
<expr> ::= id Write BNF nation for valid
<op> ::= + | - | * | / C identifier names
 A sequence of replacements of non-terminal symbols to obtain
strings/sentences is called a derivation
 If we have a grammar E  E+E then we can replace E by E+E
 In general a derivation step is A   if there is a production
rule A in a grammar
 where  and  are arbitrary strings of terminal and non-terminal
symbols
 Derivation of a string should start from a production with start
symbol in the left
 is a sentential form (terminals & non-terminals Mixed)
S   is a sentence if it contains
only terminal symbols
 Derivate string –(id+id) from G1
E  -E  -(E)  -(E+E)  -(id+E)  -(id+id) (LMD)
OR
E  -E  -(E)  -(E+E)  -(E+id)  -(id+id) (RMD)
 At each derivation step, we can choose any of the non-terminal in the
sentential form of G for the replacement.

 If we always choose the left-most non-terminal in each derivation


step, this derivation is called as left-most derivation(LMD).

 If we always choose the right-most non-terminal in each derivation


step, this derivation is called as right-most derivation(RMD).
 A parse tree can be seen as a graphical representation of a
derivation
 Inner nodes of a parse tree are non-terminal symbols.
 The leaves of a parse tree are terminal symbols.

E  -E E
 -(E) E
 -(E+E)
E
- E - E - E

( E ) ( E )

E E E + E
- E - E
 -(id+E)  -(id+id)
( E ) ( E )

E + E E + E

id id id
 An ambiguous grammar is one that produces more than one
LMD or more than one RMD for the same sentence.
E  E+E E  E*E
 id+E  E+E*E
 id+E*E  id+E*E
 id+id*E  id+id*E
 id+id*id  id+id*id
E
E
E * E
E + E

id * E + E id
E E
id id
id id
 For the most parsers, the grammar must be unambiguous.
 If a grammar unambiguous grammar then there are unique
selection of the parse tree for a sentence
 We should eliminate the ambiguity in the grammar during the
design phase of the compiler.
 An unambiguous grammar should be written to eliminate the
ambiguity.
 We have to prefer one of the parse trees of a sentence
(generated by an ambiguous grammar) to disambiguate that
grammar to restrict to this choice.
Parsing Expression
 BNF (Backus Normal Form or Backus–Naur Form) a
notation techniques for context-free grammars, often used to
describe the syntax of languages used in computing
 Has many extensions and variants
 Extended Backus–Naur Form (EBNF)
 Augmented Backus–Naur Form (ABNF).
 A BNF specification is a set of derivation rules, written as
 <symbol> ::= expression
 BNF for valid arithmetic expression
<expr> ::= <expr> <op> <expr>
<expr> ::= ( <expr> )
<expr> ::= - <expr>
<expr> ::= id
<op> ::= + | - | * | /
Parsing Expression…
 Example: Write BNF nation for valid C++ identifier names. Check
if _var2is valid identifier name.
 Answer:
 < id> : = < letter > < rest > < underscore > < rest >
 < rest > :: = < letter >< rest >|< underscore >< rest >|< digits > < rest
>| l
 <digits> :: = a|b|c…|z|A|B|…|Z
 <digits> ::= 0|1|…|9
 <underscore> :: = -
 Changing it to CFG :
I  LR|UR
R  LR|UR|DR|l
L  a|b|c…|z|A|B|…|Z
D  0|1|…|9
U_
Parsing Expression…
 To check if _var2 is valid identifier name show derivation or
there
Transformation Methods
 The productions of context-free grammars can be coerced into
a variety of forms without affecting the expressive power of the
grammar.

 In this course we will see four types of Transformation


methods
 Substitution Rule
 Removal of useless production
 Removal of l–production
 Removal of Unit Production
Useful substitution rule
 Theorem:
 Let G= (N, T, S, P) be a CFG. Suppose that P contains a
production of the form A  x1Bx2.
 Assume that AÎB and that B  y1|y2|…|yn is the set of all
productions in P which have B as the left side.
 Then we can find a CFG G’= (N,T,S, P’ ) where P contains all
productions in P except A  x1Bx2 which is replaced by A 
X1Y1X2 | X1Y2Y2|…|X1YnX2, such that L(G) =L(G).

 Useful substitution rule may lead to fewer number of terminals


and productions
Useful substitution rule…
 Example: consider G=({A,B},{a,b,c},A,P) with productions
A  a |aaA|abBc
B  abbA|b
 Using the substitution rule B on A  abBC can be replaced by
A  ababbAC & A  abbc
 Then G’ = ({A,B}, {a,b,c},A,P’) with productions
A  a |aaA|ababbAc| abbc
B  abb A|b

 Note that the productions B  abbA/b no longer play a part


in any derivation.
Removing Useless productions
 Definition:
 Let G = (N,T,S,P) be a CFG. A variable AeN is said to be usefull iff there
* *
is at least one weL(G) such that S xAy w with x, y e (NUT)*. A
variable that is not useful is called useless.
 Example 1: Given a CFG with productions S  A, B  bA,
A  Aa|l.
 The variable B is useless and so is the production B  Ba.
 Example 2: Eliminate useless symbols and productions from a CFG.
G = ( {S,A,B,C},{a,b},S,P) with P given as S  as|aAa|CB, A 
a, B  b, C  abc
 Answers:-B & C are useless symbols
- S  CB, B  b, & C  abC are useless productions.
Removing l-productions
 Let G= (N,T,S,P) be a CFG and let A  µ its production
 If µ = l, A  µ is called l- production
 If µÎl, for Aµ then G is said to be l-free grammar
*
 For any AeN , A  l then A is called l -generating or
nullable non-terminal.

 Theorem:
 Let G = (N,T,S,P) be any CFG with l –productions, then
there exists an equivalent l-free grammar G’ = (N,T,S,P’)
such that L(G’) = L(G)/{l }
Removing l-productions…
 Proof
 First find all nullable variables. Say Nn
 For all productions A  l, put A into Nn
 Repeat the following step until no further variables are added
to Nn.
For all productions B  A1 A2 ….An, where A1 A2 ……An are in
Nn, put B into Nn.
 Construct P’. How?
 for every production in P with one or more nullable appearing
in the right hand side of the production add to P’ l- free
production by eliminating one or more nullables.
 Any production of the form A  l will not be in P’
 Add any other production from P to P’.
Removing l-productions
 Example: Remove l-productions from the following CFG
S  AaC
A  a| l
C  b| l
 Answer:
 Nn = {A,C}.
 P’ contains the following productions
 S  Aac|Aa|aC|a, A  a, and C  b

 Exercise: Remove l-productions from the following CFG


S  Abac, A  BC, B  b| l, C  D|l, D  d
Removing Unit productions
 Any production of CFG of the form A  B, where A,B e N is
called unit production.
 Theorem: Let G = (N,T,S,P) be a CFG without l-production
then there exists a CFG G’ = (N’,T,S,P’) that does not have any
unit-productions and that is equivalent to G.
 Proof
*
 First for each A find all non-terminals B such that A B (use
dependency graph)
 Put all non-unit production from P to in P’
 For all B add A Y1 |Y2|……|Yn, where B  Y1
|Y2|……..Yn are productions in P with B on left.
Removing Unit productions
 Example: Remove all unit productions from, S  Aa|B, B 
A|bb, A  a|ac|B
 Solution:
 Draw dependency graph from the unit productions

 Add the original non-unit productions to P’


 S  Aa, B  bb, A  a|ac
 Add new production.
 From the graph we can see that S B, S A, B A, A B.
 Then the new rules are S  bb|a|bc, A  bb, B  a|bc
Removing Unit productions
 Finally, all productions P’ are
S  Aa | bb|a|bc
B  bb| a|bc
A  a|ac| bb
 Exercise: Remove all unit productions from a grammar with
the following productions
S A
A  BAB |aC|a|AB|BA| A,
B  C|A
Bb
C  B|ab
Normal Forms
 When working with context-free grammars, it is often
convenient to have them in simplified form.
 There are at least two importances of Normal Forms.
 Simplicity of proofs
 There are plenty of proofs around CFGs, including reducability and
equivalence to automata. Therefore, normal forms can be helpful
there.
 Enables parsing
 Normal forms can give us more structure to work with, resulting in
easier parsing algorithms, though PDAs can be used to parse words
with any grammar (this is often inconvenient)
 Among the simplified normal forms we will see
 Chomsky normal Form (CNF)
 Griebach Normal Form (GNF)
Chomsky normal Form (CNF)
 Let G = (N,T,P,S ) be CFG and A µ be production
 If µeN, then A µ is unit production.
 If |µ| >1, contains Substrings of terminals then A µ is called
a secondary production.
 If µ contains more than or equal to three non-terminals, then it
is called a tertiary production.

 Example: Given a context free grammar with following


productions: S  A|Abc|abc, A  ABC
S  A is unit production
S  Abc|abc are secondary productions
S  ABC is a tertiary production.
Chomsky normal Form (CNF)…
 Definition: A context free grammar is said to be in chomsky
normal form if all of its production rules are of the form: A 
a or A  BC Where A and B are none terminal and a is terminal

 Theorem: Any l-free context free language L can be


generated by CFG in CNF
 Proof: Let G= (N,T,P,S) be a CFG such that L=L(G). We
want to construct G’ in CNF such that L=L(G’).
1. Eliminate any unit production from p (previously seen)
2. Eliminate all secondary production
 For each aeT in the substring on the R.H.S, replace a by a new
terminal An and a new production Aa  a
Chomsky normal Form (CNF)…
3. Eliminate all tertiary productions
 For each production A  B1B2--- Bm, m>3 (Bi’s are non-
terminals in N and those introduced in step 2) replace by
 A  B 1B 1’
 B i  B 2B 2’
 .…
 Bm2  Bm-1Bm

 Example:- convert the following CFG to CNF


S  A|ABA
A  aA|a|B
B  bB|b
Chomsky normal Form (CNF)…
 Solution
1. Eliminate all unit production
 The grammar without unit- production
 S  aA|a|ABA|bB|b
 A  aA|a|bB|b
 B  bB|b
2. Eliminate all secondary production
 aA and bB are secondary productions: substitute a by C and b by D,
add productions C  a, and D  b. Now the grammar is
 S  CA|a|ABA|DB|b
 A  CA|a|DB|b
 B  DB|b
 Ca
 Db
Chomsky normal Form (CNF)…
 Solution ….
3. Eliminate all tertiary production
 S  ABA is tertiary productions.
 Replace it with productions:
S  AB’
B’  BA.
 Therefore, the grammar in CNF
 S  CA|a|AB’|DB|b
 A  CA|a|DB|b
 B  DB|b
 B’  BA
Ca
Db
Chomsky normal Form (CNF)…
 Exercises
 Change the grammar G with the following rules to its
equivalence Chomsky Normal Form.
S  ASA | aB
A  B |S
Bb|l
Griebach Normal Form (GNF)
 The normal form was established by Sheila Greibach and it
bears her name.
 A CFG is said to be in GNF if all productions have the form
A  aX, where aeT and xeT*
 A non-strict form allows one exception to this format restriction
for allowing the l–productions to be a member of the described
language.
 Theorem (GNF) - Any l-free CFL can be generated by a CFG
in GNF.

 A CFG in GNF should not contain left-recursive grammars.


 What is left-recursive grammars?
Griebach Normal Form (GNF)..
 A grammar is left recursive if it has a non-terminal A such that
there is a derivation A  A for some string 
 Eliminating Immediate Left Recursion
A A  |  where  does not start with A

A   A’ A   A’ | 
A’   A’ |  OR A’   A’ | 
In general,
A  A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with A

 eliminate immediate left recursion


A  1 A’ | ... | n A’
A’  1 A’ | ... | m A’ |  an equivalent grammar
Griebach Normal Form (GNF)..
 Example: Remove left recursion from the grammar below
E  E+T | T
T  T*F | F
F  id | (E)
 Answer
E T E’
E’  +T E’ | 
T  F T’
T’  *F T’ | 
F  id | (E)
Griebach Normal Form (GNF)..
 Steps to convert a CFG into GNF
1. Reduce G to G’ in CNF
2. Rename the non-terminals in G’ as {A1,A2, ----, Am) with A1 as S
3. Convert productions into the following forms Ai  Aj,---- X, j>i and
A  aX; Ai, Aj,AeN, aeT,Xe(NUT)* (how?)
 While there exists Ai  AjX, j<i do
 apply substitution rule to get Ai  AjX, j> i
 If there is left recursive production eliminate it
4. After step tree, the productions are of the form
 Ai  Aj x, j>i
 Ai  aX,
 A’i  Ajx where Ai – is newly added non-terminal in step 3
Griebach Normal Form (GNF)..
 Steps to convert a CFG into GNF…
5. Remove productions of the form
 Ai  AjX, j>I and A’i  AjX Using substitution rule
 Example: Change the following grammar with these rules to
its equivalent GNF
S  AA|a
A  SS|b
 Step 1: Change it into CNF. It is already in CNF
 Step 2: Rename non-terminals S=A1 and A=A2 .Then the new
production is
A1  A2A2|a
A2  A1A1|b
Griebach Normal Form (GNF)..
 Step 3: for all Ai  Aj check if i< j or not
A1  A2A2|a 
A2  A1A1|b X

 So, apply substitution rule


A2  A2A2A1 |a A1|b 

 However, the newly obtained production contains left


recursion. Eliminate it
A2  aA1|b|aA1B2|bB2
B2  A2A2A1/A2A2A1B2
Griebach Normal Form (GNF)..
 Step 4: Group the productions
A1  A2A2|a
A2  aA1|b|aA1B2|bB2
B2  A2A2A1 |A2A2A1B2
 Step 5: Apply the substation rule to productions which are not
in GNF
Therefore, replacing all production with A2 as the first
element on RHS, the grammar is changed into GNF
A1  aA1A2|bA2|aA1B2A2|bB2A2|a
B2  aA1A2A1|bA2A1|aA1B2A2A1|bB2A2A1|
aA1A2A1B2 |bA2A1B2 |aA1B2A2A1B2 |bB2A2A1B2
A2  aA1|b|aA1B2|bB2
Griebach Normal Form (GNF)..
 Exercise 1 - convert the following CFG to GNF
 S  AS|AB
 A  BS|a
 B  AA|b

 Exercise 2 - convert the following CFG to GNF


S  Aa | b
A  Ac | Sd | f
Properties of CFLs
 Pumping Lemma for CFL's

 Used to prove a language is not CFL

 Theorem: Let L be any infinite CFL. Then there exists a


constant n, depending on L, such that if z is in L and |z|  n,
then we can write z = uvwxy such that
|vx|  1,
|vwx|  n, and
for all i  0, uviwxiy is in L.
 Proof

 Let G be a Chomsky normal-form grammar generating L - {}.


Pumping Lemma for CFL's…
 Proof…
 Since the size of the variables of a context-free grammar for L
is fixed and L is infinite. There exits a long sentence z in L such
that any parse tree for z must contain a long path of variables.
And there would be at least one variable that appear twice in
the path.
 For Chomsky normal-form grammar, there are only two types
of the production rules: ABC and Aa.
 Hence for a parse tree of sentence z in L having no path of
length greater than i, then the length of z is no more than 2 i-1.
 We consider the length of a path as the length of the internal
nodes of the path, not including the leaf.
S S S

a B
A

|Path|=k=1, T1 T2
a b
|word|=1 ≦ 2k-1
|path|=k=2 |path of T1| ≦ i
|word|=2 ≦ 2k-1
|word of T1| ≦ 2i-1

|path of T2| ≦ i

|word of T2| ≦ 2i-1

=> |path| ≦ i+1

|word| ≦ 2i
Pumping Lemma for CFL's…
 Proof…

 Suppose that G has k variables, and let n = 2k.


 If z is in L and |z|  n, then any parse tree of z must have
a path of length > k.
 Hence there must be at least one variables that appears at
least twice in the path of the parse tree.
 Let the two vertices v1 and v2 in the parse tree be the
vertices labeled with the same variable A, and the vertex
v1 is closer to the root than vertex v2 . Also they are the
vertices with the same label closest to leaves of the tree.
S

≦ k+1
A

u v w x y

≦ 2k = n

Since ABC, we have that |vx|≧1.

A * vAx and A*w, where |vwx| ≦n.


Pumping Lemma for CFL's…
 Proof ….

 Now, we have that A * vAx and A*w, where |vwx| ≦n.

 A * viwxi, for i=0, 1, 2, … .

 Hence S * uviwxiy in L, for i=0, 1, 2, … .


 Example 1: Show that L = {anbncn : n>0} is not CFL
 Soln: Suppose L be CFL. By pumping Lemma there exist mez+,
weL with |w|>m, where w=uvxyz, | vy|>1, |vxy|<m,
uvixyizeL, for all i >0
 Take w = am bm cm,,
 Let w=uvxyz= akaras lbmcm , where u=ak, v=ar, x=as , y=l, &
z = bm cm. Note that k+r+s=m
Pumping Lemma for CFL's…
 uvixyiZ= akariasbmcm, if i=0, ak+sbmcm is not elment of L, since k+s< m
 Hence, L is not CFL
 Example 2 : show that L= {ww:We{a,b}*} is not CFL
 Solution
 suppose L is a CFC. By pumping lemma there exist mez+, weL
with |w|>m, where w=uvxyz, | vy|>1, |vxy|<m, uvixyizeL,
for all i >0
 Take w=ambmambm= ak ar al bt ambm, where u=ak, v=ar, x=bl , y=, blt
& z = am bm, And then k+r=l+t=m
 uvixyiz= ak ari bl btiambm is not elment of L, if i=0
 Hence, L is not CFL

 Exercise: Show that language L= {anbj:n=j2} in not CFL


Closure Properties of for CFGs
 Theorem The family of CFLs is closed under union,
concatenation and star closure.
 Proof Ler G1=(N1,T1,P1,S1) and G2= (N2,T2,P2,S2)
 Union
Construct G= (N,T,P,S) such that L(G)=L(G1) U L(G2)
Where N=N1uN2uS
T=T1uT2
P={S  S1|S2}UP1UP2
 Concatenation
Construct G= (N,T,P,S) such that L(G)=L(G1)L(G2)
Where N=N1uN2uS
T=T1uT2
P={S  S1S2}UP1UP2
Closure Properties of for CFGs…
 Star closure (*)
Construct G= (N,T,P,S) such that L(G)=L(G1)*
Where N= N1
T=T1
S=S1
P=P1u{S  s1s1|l }
 Context- free languages are not closed under intersection &
complementation.
 Example 1
 Let G1 be a CFG with productions s1as|b|l, L1 =
L(G1)={anbn: n>0}
 Let G2 be a CFG with productions: s2  cs2|l,
L2=L(G2)={cm:m>0}
Closure Properties of for CFGs…
 Then
 L3=L1L2= {anbncm|m, n>0 }is CFL
 L4 = L1UL2 = {ab, c, aabb,cc, aaabbb, ccc, ….} is CFL
 L5 = L2* = = L2 is CFL
 Example 2:
 Let G4 be a CFG with productions: s4 as4|l, L4 = L(G4) = {am:
m >0}
 Let G5 be a CFG with productions:s5 bs5c|l, L5 =
L(G5)={bncn:n>0}
 Then
 L6=L4L5={ambncn:m,n>n} is CFL
 L3nL6={anbncn}is not CFC
Decision Algorithms for CFGs
 There is an algorithm to determine whether CFL is empty
 Proof. Let G= (N,T,P,S) be a CFG L= L(G)
 If s  l in p, LÎ0
 Otherwise, Let n=|N|. Draw all possible derivation trees whose
depth ˉn, L=0 iff there is no terminal strings, the language is empty
 There are algorithms to determine whether CFL, L is finite or
infinite
 Proof. Let G= (N,T,P,S) be a CFG L= L(G)
 Convert G to G’ in CNF
 Draw directed graph with non-terminals as vertices.
 If A  BC in P, then there are edges connoted A&B, A&C
 If the graph has no cycles, then L is finite.
Decision Algorithms for CFGs…
 Example : consider CFG with Production

S  AB, A  BC|a, B  CC|b, C  a


As it can be seen from directed graph with
non-terminals, the graph has no cycles. So L is Finite

 Let G be CFG with production

S  AB, A  BC|a, B  CC|b, C  AB|a


As it can be seen from directed graph with
non-terminals, the graph has cycles. So L is infinite

You might also like