Lecture 7 - 8 & 9 - Chapter 4
Lecture 7 - 8 & 9 - Chapter 4
Lecture 7 - 8 & 9 - Chapter 4
Esubalew A.
Contents
Context-free Grammars
Derivation, Parse Tree, Ambiguity, Parsing Expression
Transformation Methods
Useful Substitution Rule, removing useless-productions,
Removing l-productions, removing unit productions
Normal Forms
Chomsky Norm Form (CNF)
Grieback Normal Form (GNF)
Pumping Lemmas for CFLs
Properties CFLs
Decision Algorithms for CFGs
The design of the grammar is an initial phase of the design of a
programming language.
CFG is used to specify the structure of legal programs.
Formally a CFG G = (N, T, S, P), where:
All productions are in the form A X [A N, X (NUT)*]
Each production must have a single non-terminal on its left hand
side.
Examples:
S Sa|SS | A, A aA|aa is CFG
S aA|ab, aA ab|l is not CFG
A language L is said to be context-free iff there is a CFG G such
that L=L(G).
Every regular language is a context-free language but every
context-free is not a regular language .
This is because CFG is more expressive than RE
L = {anbn | n>=1}, is an example language that can be expressed by
CFG but not by RE
Context-free grammar is sufficient to describe most
programming languages.
Example:
E E+E | E–E | E*E | E/E | -E
E (E)
E id
Where
T= {+, -, *, / (,), id}, N= {E}
S = {E}
Production are shown above
BNF (Backus Normal Form or Backus–Naur Form) a
notation techniques for context-free grammars, often used to
describe the syntax of languages used in computing
Has many extensions and variants
Extended Backus–Naur Form (EBNF)
Augmented Backus–Naur Form (ABNF).
A BNF specification is a set of derivation rules, written as
<symbol> ::= expression
BNF for valid arithmetic expression
<expr> ::= <expr> <op> <expr>
<expr> ::= ( <expr> )
<expr> ::= - <expr> Exercise
<expr> ::= id Write BNF nation for valid
<op> ::= + | - | * | / C identifier names
A sequence of replacements of non-terminal symbols to obtain
strings/sentences is called a derivation
If we have a grammar E E+E then we can replace E by E+E
In general a derivation step is A if there is a production
rule A in a grammar
where and are arbitrary strings of terminal and non-terminal
symbols
Derivation of a string should start from a production with start
symbol in the left
is a sentential form (terminals & non-terminals Mixed)
S is a sentence if it contains
only terminal symbols
Derivate string –(id+id) from G1
E -E -(E) -(E+E) -(id+E) -(id+id) (LMD)
OR
E -E -(E) -(E+E) -(E+id) -(id+id) (RMD)
At each derivation step, we can choose any of the non-terminal in the
sentential form of G for the replacement.
E -E E
-(E) E
-(E+E)
E
- E - E - E
( E ) ( E )
E E E + E
- E - E
-(id+E) -(id+id)
( E ) ( E )
E + E E + E
id id id
An ambiguous grammar is one that produces more than one
LMD or more than one RMD for the same sentence.
E E+E E E*E
id+E E+E*E
id+E*E id+E*E
id+id*E id+id*E
id+id*id id+id*id
E
E
E * E
E + E
id * E + E id
E E
id id
id id
For the most parsers, the grammar must be unambiguous.
If a grammar unambiguous grammar then there are unique
selection of the parse tree for a sentence
We should eliminate the ambiguity in the grammar during the
design phase of the compiler.
An unambiguous grammar should be written to eliminate the
ambiguity.
We have to prefer one of the parse trees of a sentence
(generated by an ambiguous grammar) to disambiguate that
grammar to restrict to this choice.
Parsing Expression
BNF (Backus Normal Form or Backus–Naur Form) a
notation techniques for context-free grammars, often used to
describe the syntax of languages used in computing
Has many extensions and variants
Extended Backus–Naur Form (EBNF)
Augmented Backus–Naur Form (ABNF).
A BNF specification is a set of derivation rules, written as
<symbol> ::= expression
BNF for valid arithmetic expression
<expr> ::= <expr> <op> <expr>
<expr> ::= ( <expr> )
<expr> ::= - <expr>
<expr> ::= id
<op> ::= + | - | * | /
Parsing Expression…
Example: Write BNF nation for valid C++ identifier names. Check
if _var2is valid identifier name.
Answer:
< id> : = < letter > < rest > < underscore > < rest >
< rest > :: = < letter >< rest >|< underscore >< rest >|< digits > < rest
>| l
<digits> :: = a|b|c…|z|A|B|…|Z
<digits> ::= 0|1|…|9
<underscore> :: = -
Changing it to CFG :
I LR|UR
R LR|UR|DR|l
L a|b|c…|z|A|B|…|Z
D 0|1|…|9
U_
Parsing Expression…
To check if _var2 is valid identifier name show derivation or
there
Transformation Methods
The productions of context-free grammars can be coerced into
a variety of forms without affecting the expressive power of the
grammar.
Theorem:
Let G = (N,T,S,P) be any CFG with l –productions, then
there exists an equivalent l-free grammar G’ = (N,T,S,P’)
such that L(G’) = L(G)/{l }
Removing l-productions…
Proof
First find all nullable variables. Say Nn
For all productions A l, put A into Nn
Repeat the following step until no further variables are added
to Nn.
For all productions B A1 A2 ….An, where A1 A2 ……An are in
Nn, put B into Nn.
Construct P’. How?
for every production in P with one or more nullable appearing
in the right hand side of the production add to P’ l- free
production by eliminating one or more nullables.
Any production of the form A l will not be in P’
Add any other production from P to P’.
Removing l-productions
Example: Remove l-productions from the following CFG
S AaC
A a| l
C b| l
Answer:
Nn = {A,C}.
P’ contains the following productions
S Aac|Aa|aC|a, A a, and C b
a B
A
|Path|=k=1, T1 T2
a b
|word|=1 ≦ 2k-1
|path|=k=2 |path of T1| ≦ i
|word|=2 ≦ 2k-1
|word of T1| ≦ 2i-1
|path of T2| ≦ i
=> |path| ≦ i+1
|word| ≦ 2i
Pumping Lemma for CFL's…
Proof…
≦ k+1
A
u v w x y
≦ 2k = n