Nothing Special   »   [go: up one dir, main page]

Context Free Grammars: Unit - Iii

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

UNIT – III

Context Free Grammars

• Formal Languages, Grammars,


• Classification of Grammars,
• Chomsky Hierarchy Theorem,
• Context Free Grammar,
• Leftmost and Rightmost Derivations,
• Parse Trees,
• Ambiguous Grammars,
• Simplification of Context Free Grammars-Elimination of Useless Symbols,
• E- Productions and Unit Productions,
• Normal Forms for Context Free Grammars:
▪ Chomsky Normal Form
▪ Greibach Normal Form,
• Pumping Lemma, Closure Properties,
• Applications of Context Free Grammars.
Grammar: Grammar is a set of rules used to define a valid sentence in any language.
Mathematical representation of grammar: Grammar is defined as a 4-tuple G={V,T,P,S}
where
V - finite set of non-terminals
T - finite set of terminals
P – finite set of production rules
S – Start symbol
“Non terminals” are represented by capital letters A, B, C,… so on.
“Terminals” are represented by small letters a, b, c,… so on.
“Production rules” are of the form α  β, where
α contains non-terminal symbols or may also contain both terminals
and non-terminals with atleast one non-terminal.
β contains terminal, non-terminal symbols or any combination of both
terminals and non-terminals.
Formal Grammar: Grammar is an useful model when designing software that processes
data with recursive structure. i.e., It denotes the structure of data.
Formal Language: Formal language is an abstraction for general characteristics of
programming languages.

Classification of Grammars:
Chomsky Hierarchy: Noam Chomsky classified grammar into 4 types depending on
production rules (.
1. Type 0 – Unrestricted Grammar
2. Type 1 – Context Sensitive Grammar (CSG)
3. Type 2 – Context Free Grammar (CFG)
4. Type 3 – Regular Grammar
Type0 Grammar: It is also known as “unrestricted grammar” or “phase structured
grammar”. An unrestricted grammar is a formal grammar in which no restrictions are made
on LHS and RHS of production.
Mathematical representation of Type 0 grammar: Type 0 Grammar is defined as a 4-tuple
G={V,T,P,S} where
V - finite set of non-terminals
T - finite set of terminals
P – finite set of production rules of the form αAβ → αγβ
Where α – left context, β – right context,
α, β ∈ (V∪T)+ and γ ∈ (V∪T)*,
i.e., no epsilon (ɛ) productions are allowed on LHS but epsilon
(ɛ) productions are allowed on RHS.
S – Start symbol
The language that can be generated by making use of type 0 grammar is “recursive
enumerable language”. The machine format that is used to recognize strings of that particular
grammar is “turing machine”.
E.g: S→aSBA | abA
AB→BA
bB→bb
bA→ba
aA→aa
Type 1 Grammar: It is also known as “Context Sensitive Grammar (CSG)”. Context
Sensitive Grammar (CSG) is a formal grammar in which LHS and RHS of any production
rule may be surrounded by a context of terminals and non-terminals.
Mathematical representation of Type 1 grammar: Type 1 Grammar is defined as a 4-tuple
G={V,T,P,S} where
V - finite set of non-terminals
T - finite set of terminals
P – finite set of production rules of the form αAβ → αγβ
Where α – left context, β – right context,
α, β ∈ (V∪T)+ and γ ∈ (V∪T)+
i.e., no epsilon (ɛ) productions are allowed on LHS and RHS.
S – Start symbol
The language that can be generated by making use of type 1 grammar is “Context Sensitive
language (CSL)”. The machine format that is used to recognize strings of that particular
grammar is “Linear Bounded Automata (LBA)”.
E.g: S→abc | aSBC
CB→BC
bB→bb
Type 2 Grammar: It is also known as “Context Free Grammar (CFG)”. Context Free
Grammar (CFG) is a formal grammar in which LHS and RHS of any production rule is free
from a context of terminals and non-terminals.
Mathematical representation of Type 2 grammar: Type 2 Grammar is defined as a 4-tuple
G={V,T,P,S} where
V - finite set of non-terminals
T - finite set of terminals
P – finite set of production rules of the form A → γ
Where and γ ∈ (V∪T)*,
i.e., A (V∪T)*,
i.e., LHS of any production rule contains only a single non-
terminal and epsilon (ɛ) productions are allowed on RHS.
S – Start symbol
The language that can be generated by making use of type 2 grammar is “Context Free
language (CFL)”. The machine format that is used to recognize strings of that particular
grammar is “Push Down Automata (PDA)”.
E.g: S→aSb
S→ab
Type 3 Grammar: It is also known as “Regular Grammar (RG)”. Regular Grammar (RG) is
a formal grammar with restrictions on both LHS and RHS. i.e., LHS contains only a single
non-terminal and RHS may contain single terminal (or) single terminal followed by non
terminal (or) nothing.
Mathematical representation of Type 3 grammar: Type 3 Grammar is defined as a 4-tuple
G={V,T,P,S} where
V - finite set of non-terminals
T - finite set of terminals
P – finite set of production rules of the form A t | tB|ɛ,
Where t – terminals
A,B – non terminals
S – Start symbol
The language that can be generated by making use of type 3 grammar is “Regular language
(RL)”. The machine format that is used to recognize strings of that particular grammar is
“Finite Automata (FA)”.
E.g: S→aS
S→a
Types of formal languages: Based on properties of Chomsky hierarchy of grammars, formal
languages are classified into 4 types.
1. Regular Language
2. Context Free Language
3. Context Sensitive Language
4. Recursive Language and Recursive Enumerable Language
Regular Language: It is a formal language generated by regular grammar.
Properties of Regular Language:
1. It is accepted by DFA, NFA and read only turing machine.
2. It is described by regular expressions.
3. It is generated by regular grammar.
Closure Properties of Regular Language:
Two regular languages are closed under the following.
1. Union
2. Concatenation
3. Complementation
4. Intersection
5. Kleene closure
6. Transpose
7. Substitution
8. Homomorphism
9. Inverse homomorphism
Methods used to prove the given language is not regular – pumping lemma.
E.g: L={an | n≥1} is a regular language generated by regular grammar
S→aS
S→a
Context Free Language: It is a formal language generated by context free grammar.
Properties of Context Free Language:
1. It is accepted by pushdown automata.
2. It is generated by context free grammar.
3. It doesn’t satisfy operations such as intersection, complement, difference.
Closure Properties of Context Free Language:
Two context free languages are closed under the following.
1. Union
2. Concatenation
3. Kleene closure
4. Transpose
5. Substitution
6. Homomorphism
7. Inverse homomorphism
Methods used to prove the given language is not context free – pumping lemma.
E.g: L={an bn | n≥1} is a context free language generated by context free grammar
S→aSb
S→ab
Context Sensitive Language: It is a formal language generated by context sensitive
grammar.
Properties of Context Sensitive Language:
1. It is accepted by pushdown automata.
2. It is generated by context sensitive grammar.
3. It doesn’t satisfy operations such as complementation, difference.
Closure Properties of Context Sensitive Language:
Two context sensitive languages are closed under the following.
1. Union
2. Concatenation
3. Kleene closure
4. Intersection
5. Transpose
Methods used to prove the given language is not context sensitive – pumping lemma.
E.g: L={an bn cn | n≥1} is a regular language generated by context sensitive grammar
S→abc | aSBC
CB→BC
bB→bb
Recursive Language: Any language L over alphabet ∑ is said to be recursive, if there is a
turing machine that accepts every word present in the language L and rejects every word
which is not in L.
Properties of Recursive Language:
1. It is accepted by turing machine.
2. It is generated by unrestricted grammar.
Closure Properties of Recursive Language:
Two recursive languages are closed under the following.
1. Union
2. Concatenation
3. Complementation
4. Intersection
5. Kleene closure
Recursive Enumerable Language: A recursive enumerable language L over alphabet ∑
accepts every word in L of turing machine M (or) either rejects or loops every word which is
not in L.
Accept=L
Reject + Loop = L̄
i.e., a turing machine on same input can either accept a language or else, will run in loop
forever.
Properties of Recursive Enumerable Language:
1. It is accepted by turing machine.
2. It is generated by unrestricted grammar.
Closure Properties of Recursive Enumerable Language:
Two recursive enumerable languages are closed under the following.
1. Union
2. Concatenation
3. Intersection
4. Kleene closure
E.g: L={an bn an | n≥0} is a recursive enumerable language generated by unrestricted
grammar S→aSBA | abA
AB→BA
bB→bb
bA→ba
aA→aa
Context Free Grammar:
A grammar G=(V,T,P,S) is said to be context free grammar if it is recursive and free from
context. i.e., productions are in the form of A (V∪T)* in which LHS of any productionrule
contains only a single non-terminal and epsilon (ɛ) productions are allowed on RHS.
CFG is formally defined as G=(V,T,P,S) where
V - finite set of non-terminals
T - finite set of terminals
P – finite set of production rules of the form A (V∪T)*,
S – Start symbol
Backus naur Form:
It was proposed by John Backus and Peter Naur. If there are many productions with same
LHS variable, then they can be combined into single production LHS → RHS, where
LHS is common variable of LHS of productions
LHS is RHS productions separated by vertical bar ( | )
E.g: S→aS
S→ɛ
ɛ should be written at the end. Therefore, S→aS | ɛ

Derivation:
The process of generating a string from the production rule of a grammar is called derivation.

Types of Derivation:
1. Leftmost derivation
2. Rightmost derivation

Leftmost derivation:
A derivation is called leftmost derivation, if leftmost variable is replaced with some
production rule of a grammar.
E.g: S→aSbS | ab
Let the string to be derived is aabbab.
LMD: S→aSbS
→aabbS
→aabbab

Rightmost derivation:
A derivation is called rightmost derivation, if rightmost variable is replaced with some
production rule of a grammar.
E.g: S→aSbS | ab
Let the string to be derived is aabbab.
RMD: S→aSbS
→aSbab
→aabbab
Derivation tree:
The diagrammatical representation of the derivation process by a tree structure is known as
“derivation tree”.
It is also known as “parse tree” because it is created by parser (second phase of compiler) to
check syntax of the statement.
It is also defined as an ordered tree in which LHS of production rule represents parent node
and RHS of production rule represents children nodes.
Properties of Derivation tree:
In derivation tree of CFG,
1. Root node is represented starting variable.
2. Leaf nodes are represented by terminals or epsilon (ɛ).
3. Internal nodes are represented by non- terminals.
4. If A→α1 α2 α3... αn then A is parent of nodes α1 α2 α3... αn

Ambiguous Grammars:
If more than one parse tree (or) more than one LMD (or) more than one RMD is possible to
derive the same string, then that grammar is said to be “Ambiguous grammar”, Otherwise it
is known as unambiguous grammar.

Yield of a derivation tree:


The string obtained by reading only the leaves of derivation tree from left to right (ɛ-symbols
are ignored) is known as “yield” of a derivation tree.

Simplification of Context Free Grammars:


Purpose: CFG may contain different types of useless symbols, unit productions, null
productions. These may increase number of steps in generating a language from CFG.
Therefore simplification of CFG will reduce variables and productions, so that time
complexity for generating language become less.
Procedure to simplify CFG:
CFG can be simplified by performing the following steps:
1. Elimination of ɛ-Productions
2. Elimination of Unit Productions
3. Elimination of Useless Productions

Elimination of ɛ-Productions:
ɛ-Productions (null production): A production in the form of “A→ɛ” is called as null
production.

Procedure to eliminate ɛ-Productions:


1. Identify ɛ-Productions in given CFG.(in the form of A→ɛ)
2. To eliminate A→ɛ, then write all the productions from CFG whose right side contains
A.
3. Replace each occurrence of A with ‘ɛ’ to get production without ‘ɛ’and eliminate
A→ɛ.
4. The obtained productions in step3 must be added to the grammar.
5. A→ɛ is allowed only if A is not used on RHS of any production.
Elimination of unit-Productions:
Unit-Productions: A production in the form of “A→B” is called as unit production.

Procedure to eliminate Unit-Productions:


1. Identify unit-Productions in given CFG.(in the form of A→B)
2. To eliminate A→B, then write all the productions from CFG whose left side contains A.
3. Replace B with ‘A’and eliminate A→B.
4. The obtained productions in step3 must be added to the grammar.

Elimination of useless-Productions:
Useless-Productions: Useless productions are of 2 types.
1. Non-generating productions
2. Non-reachable productions
Non-generating productions: Non-generating productions are the productions which do not
produce terminal string. Non-generating symbols are the symbols because of which productions
cannot produce terminal string.
Non-reachable productions: Non-reachable productions are the productions which cannot be
reached from starting symbol.

Procedure to eliminate useless-Productions:


1. Identify non generating symbols in given CFG.
2. To eliminate non generating symbol A, then remove all the productions from CFG
containing A on LHS or RHS.
3. Identify non reachable productions and remove them.
4. If start symbol is non generating symbol, it cannot be removed.
5. If symbol is non reachable but generating symbol, it cannot be removed.

Normalization of CFG:
The productions in CFG are of the form A→ (V∪T)*. i.e., there are no restrictions on RHS of
CFG. There may be any number of terminals and non terminals in any combination.
If we made restrictions on RHS of CFG specifying that there should be fixed number of terminals
and non terminals, it results in normal form.
Normal form:
If every production of CFG is converted into some standard form, grammar is said to be in
normal form.

Use of normalization:
If some standard form is specified, then complexity of automata design will be less and easy to
implement the automata.

Types of normal forms:


Normal forms are of 2 types. They are
1. Chomsky Normal Form (CNF)
2. Greibach Normal Form (GNF)
Chomsky Normal Form (CNF):
Let G be the CFG. The grammar G=(V,T,P,S) is said to be in CNF, if all productions are of the
form A → BC (or) A → a, where A, B, C ∈ V(non terminals) and a ∈ T (terminals).
i.e., if grammar is in CNF, RHS should contain only one terminal or two non terminals.
Conversion from CFG to CNF:
1. Simplify the given CFG – Eliminate ɛ-Productions, Unit Productions, and Useless
Productions.
2. For the productions which are not in CNF, (having non terminals and terminals on RHS),
replace each terminal with new non terminal.
3. For the productions which are not in CNF, (having more than 2 non terminals on RHS),
replace 2 non terminals with new non terminal.
4. Repeat 2,3 steps until we get CNF grammar.

Greibach Normal Form (GNF):


1. Let G be the CFG. The grammar G=(V,T,P,S) is said to be in GNF, if all productions are
of the form A → aB*,
2. where A, B ∈ V(non terminals) and a ∈ T (terminals).
3. i.e., if grammar is in GNF, RHS should contain single terminal followed by any number
of non terminals.
Left Recursion:
A grammar is said to be left recursive, if it contains production of the form
A → Aα1| Aα2| Aα3|…| Aαn| β1| β2| β3|…| βn
where RHS contains αi starting with A and βi which does not starts with A
Elimination of Left Recursion:
Let A → Aα | β is the production having left recursion. To eliminate left recursion, introduce a
new variable A’ such that
A → βA’ A →
αA’|ɛ

Conversion from CFG to GNF:


1. Convert the given CFG into CNF.
2. Replace the names of variables with A1,A2, A3,… An (indexed form).
3. Convert it into GNF using the following rules
i. If production is of the form Ai→Aj Ak , i>j, then convert it into Ai→Aj Ak , i<j by
substituting Aj with its RHS.
ii. If production is of the form Ai→Aj Ak , i=j, then eliminate left recursion and
eliminate ɛ productions .
iii. If production is of the form Ai→Aj Ak , i<j, then convert substitute Aj with its
RHS to obtain GNF grammar.
Pumping Lemma for context free languages:
Every language is not context free. To show that L is not context free, pumping lemma is used.
Pumping means “generating”. Pumping lemma means “it is a method of generating many input
strings from a given string.” i.e., it is used to break a given long input string into several
substrings.”

Statement of pumping lemma:


1. Assume that the given language L is context free accepted by pushdown automata M.
2. Choose a string z c L such that |z| ≥ n for some n.
3. Break the string into 5 strings u,v,w,x,y i.e., z=uvwxy such that
i. |vx| ≥ 1
ii. |vwx| ≤ n
4. By using pumping lemma, pump v, x to the suitable integer ‘i’ such that uviwxiy ∉ L for
i≥0. This contradicts our assumption.
5. Therefore L is not context free.

Closure Properties of Context Free Languages:


Closure property: A set is closed under an operation if applying that operation to elements of the
set results in an element of the set.

For example, the set of natural numbers is closed under addition and multiplication but not under
subtraction or division because 2-3 is negative and 1/2 is not an integer. The set of integers is
closed under addition, subtraction and multiplication but not under division.

Context free languages are closed under the following:


1. Union
2. Concatenation
3. Kleene closure
4. Transpose
Context free languages are not closed under the following:
1. Complementation
2. Intersection

Problems : Convert the given CFG to CNF.

Example: Consider the given grammar G: S → a | aA | B


A → aBB | ε
B → Aa | b
Solution:
Step 1: We will create a new production S1 → S, as the start symbol S appears on the RHS.
The grammar will be:
S1 → S
S → a | aA | B
A → aBB | ε
B → Aa | b
Step 2: As grammar G contains A → ε null production, its removal from the grammar yields:
S1 → S
S → a | aA | B
A → aBB
B → Aa | b | a
Now, as grammar G contains Unit production S → B, its removal yield:
S1 → S
S → a | aA | Aa | b
A → aBB
B → Aa | b | a
Also remove the unit production S1 → S, its removal from the grammar yields:
S0 → a | aA | Aa | b
S → a | aA | Aa | b
A → aBB
B → Aa | b | a
Step 3: In the production rule S0 → aA | Aa, S → aA | Aa, A → aBB and B → Aa, terminal a exists
on RHS with non-terminals. So we will replace terminal a with X:
S0 → a | XA | AX | b
S → a | XA | AX | b
A → XBB
B → AX | b | a
X→a
Step 4: In the production rule A → XBB, RHS has more than two symbols, removing it from
grammar yield:
S0 → a | XA | AX | b
S → a | XA | AX | b
A → RB
B → AX | b | a
X→a
R → XB
Hence, for the given grammar, this is the required CNF

Problems: convert the given CFG to GNF

Example: Consider the given grammar G: S → XB | AA


A → a | SA
B→b
X→a
Solution:
As the given grammar G is already in CNF and there is no left recursion, so we can skip step 1 and
step 2 then directly go to step 3.
The production rule A → SA is not in GNF, so we substitute S → XB | AA in the production rule A
→ SA as: S → XB | AA
A → a | XBA | AAA
B→b
X→a
The production rule S → XB and B → XBA is not in GNF, so we substitute X → a in the
production rule S → XB and B → XBA as:
S → aB | AA
A → a | aBA | AAA
B→b
X→a
Now we will remove left recursion (A → AAA), we get:
S → aB | AA
A → aC | aBAC
C → AAC | ε
B→b
X→a
Now we will remove null production C → ε, we get:
S → aB | AA
A → aC | aBAC | a | aBA
C → AAC | AA
B→b
X→a
The production rule S → AA is not in GNF, so we substitute A → aC | aBAC | a | aBA in
production rule S → AA as:
S → aB | aCA | aBACA | aA | aBAA
A → aC | aBAC | a | aBA
C → AAC
C → aCA | aBACA | aA | aBAA
B→b
X→a
The production rule C → AAC is not in GNF, so we substitute A → aC | aBAC | a | aBA in
production rule C → AAC as:
S → aB | aCA | aBACA | aA | aBAA
A → aC | aBAC | a | aBA
C → aCAC | aBACAC | aAC | aBAAC
C → aCA | aBACA | aA | aBAA
B→b
X→a
Hence, this is the GNF form for the grammar G

Derivations:

Ex 1: Derive the string "abb" for leftmost derivation and rightmost derivation using a CFG
given by, S → AB | ε
A → aB
B → Sb
Solution:

Leftmost derivation: abb Rightmost derivation:


Ex 2: Derive the string "aabbabba" for leftmost derivation and rightmost derivation using a
CFG given by, S → aB | bA
A→ a | aS | bAA
B → b | aS | aBB
Solution:
Leftmost derivation:
S
aB S → aB
aaBB B → aBB
aabB B→b
aabbS B → bS
aabbaB S → aB
aabbabS B → bS
aabbabbA S → bA
aabbabba A → a

Rightmost derivation:

S
aB S → aB
aaBB B → aBB
aaBbS B → bS
aaBbbA S → bA
aaBbba A→a
aabSbba B → bS
aabbAbba S → bA
aabbabba A → a

Ex 3: Derive the string "00101" for leftmost derivation and rightmost derivation using a
CFG given by, S → A1B
A → 0A | ε
B → 0B | 1B | ε
Solution:
Leftmost derivation:
S
A1B
0A1B
00A1B
001B
0010B
00101B
00101
Rightmost derivation: S
A1B
A10B
A101B
A101
0A101
00A101
00101
Ex 4: Construct a derivation tree for the string aabbabba for the CFG given by,
S → aB | bA
A → a | aS | bAA
B → b | bS | aBB

Solution: To draw a tree, we will first try to obtain derivation for the string aabbabba

Now, the derivation tree is as follows:

Ex 5: Show the derivation tree for string "aabbbb" with the following grammar.
S → AB | ε
A → aB
B → Sb

Solution: To draw a tree we will first try to obtain derivation for the string aabbbb

Now, the derivation tree for the string "aabbbb" is above.


Ex 6: Let us consider a grammar G with the production rule E → I
E→E+E
E→E*E
E → (E)
I → ε | 0 | 1 | 2 | ... | 9
Solution: For the string "3 * 2 + 5", the above grammar can generate two parse trees by leftmost
derivation:

Since there are two parse trees for a single string "3 * 2 + 5", the grammar G is ambiguous

Ex 1 : Check whether the given grammar G is ambiguous or not. S → aSb | SS , S → ε

Solution: For the string "aabb" the above grammar can generate two parse trees

Since there are two parse trees for a single string "aabb", the grammar G is ambiguous.

Simplification of CFG :-
Removal of Useless Symbols :- A symbol can be useless if it does not appear on the right-
hand side of the production rule and does not take part in the derivation of any string. That symbol is
known as a useless symbol.

Similarly, a variable can be useless if it does not take part in the derivation of any string. That
variable is known as a useless variable.

For Example: T → aaB | abA | aaT


A → aA
B → ab | b
C → ad

In the above example, the variable 'C' will never occur in the derivation of any string, so the
production C → ad is useless. So we will eliminate it, and the other productions are written in such a
way that variable C can never reach from the starting variable 'T'.

Production A → aA is also useless because there is no way to terminate it. If it never terminates,
then it can never produce a string. Hence this production can never take part in any derivation.

To remove this useless production A → aA, we will first find all the variables which will never lead
to a terminal string such as variable 'A'. Then we will remove all the productions in which the
variable 'B' occurs.

Elimination of ε - Production:-
The productions of type S → ε are called ε productions.
These type of productions can only be removed from those grammars that do not generate ε.
Step 1: First find out all nullable non-terminal variable which derives ε.
Step 2: For each production A → a, construct all production A → x, where x is obtained from a by
removing one or more non-terminal from step 1.
Step 3: Now combine the result of step 2 with the original production and remove ε productions.

Ex: Remove the production from the following CFG by preserving the meaning of it.
S → XYX
X → 0X | ε
Y → 1Y | ε
Solution: Now, while removing ε production, we are deleting the rule X → ε and Y → ε. To
preserve the meaning of CFG we are actually placing ε at the right-hand side whenever X and Y
have appeared. Let us take
S → XYX If the first X at right-hand side is ε. Then
S → YX Similarly if the last X in R.H.S. = ε. Then
S → XY If Y = ε then
S → XX If Y and X are ε then,
S → X If both X are replaced by ε
S → Y Now,
S → XY | YX | XX | X | Y Now let us consider
X → 0X If we place ε at right-hand side for X then,
X→0
X → 0X | 0 Similarly Y → 1Y | 1
Collectively we can rewrite the CFG with removed ε production as
S → XY | YX | XX | X | Y
X → 0X | 0
Y → 1Y | 1
Removing Unit Productions :-
The unit productions are the productions in which one non-terminal gives another non-terminal. Use
the following steps to remove unit production:
Step 1: To remove X → Y, add production X → a to the grammar rule whenever Y → a occurs in
the grammar.
Step 2: Now delete X → Y from the grammar.
Step 3: Repeat step 1 and step 2 until all unit productions are removed.
For example: S → 0A | 1B | C
A → 0S | 00
B→1|A
C → 01
Solution: S → C is a unit production. But while removing S → C we have to consider what C gives.
So, we can add a rule to S.
S → 0A | 1B | 01
Similarly, B → A is also a unit production so we can modify it as
B → 1 | 0S | 00
Thus finally we can write CFG without unit production as
S → 0A | 1B | 01
A → 0S | 00
B → 1 | 0S | 00
C → 01

You might also like