Context Free Grammars: Unit - Iii
Context Free Grammars: Unit - Iii
Context Free Grammars: Unit - Iii
Classification of Grammars:
Chomsky Hierarchy: Noam Chomsky classified grammar into 4 types depending on
production rules (.
1. Type 0 – Unrestricted Grammar
2. Type 1 – Context Sensitive Grammar (CSG)
3. Type 2 – Context Free Grammar (CFG)
4. Type 3 – Regular Grammar
Type0 Grammar: It is also known as “unrestricted grammar” or “phase structured
grammar”. An unrestricted grammar is a formal grammar in which no restrictions are made
on LHS and RHS of production.
Mathematical representation of Type 0 grammar: Type 0 Grammar is defined as a 4-tuple
G={V,T,P,S} where
V - finite set of non-terminals
T - finite set of terminals
P – finite set of production rules of the form αAβ → αγβ
Where α – left context, β – right context,
α, β ∈ (V∪T)+ and γ ∈ (V∪T)*,
i.e., no epsilon (ɛ) productions are allowed on LHS but epsilon
(ɛ) productions are allowed on RHS.
S – Start symbol
The language that can be generated by making use of type 0 grammar is “recursive
enumerable language”. The machine format that is used to recognize strings of that particular
grammar is “turing machine”.
E.g: S→aSBA | abA
AB→BA
bB→bb
bA→ba
aA→aa
Type 1 Grammar: It is also known as “Context Sensitive Grammar (CSG)”. Context
Sensitive Grammar (CSG) is a formal grammar in which LHS and RHS of any production
rule may be surrounded by a context of terminals and non-terminals.
Mathematical representation of Type 1 grammar: Type 1 Grammar is defined as a 4-tuple
G={V,T,P,S} where
V - finite set of non-terminals
T - finite set of terminals
P – finite set of production rules of the form αAβ → αγβ
Where α – left context, β – right context,
α, β ∈ (V∪T)+ and γ ∈ (V∪T)+
i.e., no epsilon (ɛ) productions are allowed on LHS and RHS.
S – Start symbol
The language that can be generated by making use of type 1 grammar is “Context Sensitive
language (CSL)”. The machine format that is used to recognize strings of that particular
grammar is “Linear Bounded Automata (LBA)”.
E.g: S→abc | aSBC
CB→BC
bB→bb
Type 2 Grammar: It is also known as “Context Free Grammar (CFG)”. Context Free
Grammar (CFG) is a formal grammar in which LHS and RHS of any production rule is free
from a context of terminals and non-terminals.
Mathematical representation of Type 2 grammar: Type 2 Grammar is defined as a 4-tuple
G={V,T,P,S} where
V - finite set of non-terminals
T - finite set of terminals
P – finite set of production rules of the form A → γ
Where and γ ∈ (V∪T)*,
i.e., A (V∪T)*,
i.e., LHS of any production rule contains only a single non-
terminal and epsilon (ɛ) productions are allowed on RHS.
S – Start symbol
The language that can be generated by making use of type 2 grammar is “Context Free
language (CFL)”. The machine format that is used to recognize strings of that particular
grammar is “Push Down Automata (PDA)”.
E.g: S→aSb
S→ab
Type 3 Grammar: It is also known as “Regular Grammar (RG)”. Regular Grammar (RG) is
a formal grammar with restrictions on both LHS and RHS. i.e., LHS contains only a single
non-terminal and RHS may contain single terminal (or) single terminal followed by non
terminal (or) nothing.
Mathematical representation of Type 3 grammar: Type 3 Grammar is defined as a 4-tuple
G={V,T,P,S} where
V - finite set of non-terminals
T - finite set of terminals
P – finite set of production rules of the form A t | tB|ɛ,
Where t – terminals
A,B – non terminals
S – Start symbol
The language that can be generated by making use of type 3 grammar is “Regular language
(RL)”. The machine format that is used to recognize strings of that particular grammar is
“Finite Automata (FA)”.
E.g: S→aS
S→a
Types of formal languages: Based on properties of Chomsky hierarchy of grammars, formal
languages are classified into 4 types.
1. Regular Language
2. Context Free Language
3. Context Sensitive Language
4. Recursive Language and Recursive Enumerable Language
Regular Language: It is a formal language generated by regular grammar.
Properties of Regular Language:
1. It is accepted by DFA, NFA and read only turing machine.
2. It is described by regular expressions.
3. It is generated by regular grammar.
Closure Properties of Regular Language:
Two regular languages are closed under the following.
1. Union
2. Concatenation
3. Complementation
4. Intersection
5. Kleene closure
6. Transpose
7. Substitution
8. Homomorphism
9. Inverse homomorphism
Methods used to prove the given language is not regular – pumping lemma.
E.g: L={an | n≥1} is a regular language generated by regular grammar
S→aS
S→a
Context Free Language: It is a formal language generated by context free grammar.
Properties of Context Free Language:
1. It is accepted by pushdown automata.
2. It is generated by context free grammar.
3. It doesn’t satisfy operations such as intersection, complement, difference.
Closure Properties of Context Free Language:
Two context free languages are closed under the following.
1. Union
2. Concatenation
3. Kleene closure
4. Transpose
5. Substitution
6. Homomorphism
7. Inverse homomorphism
Methods used to prove the given language is not context free – pumping lemma.
E.g: L={an bn | n≥1} is a context free language generated by context free grammar
S→aSb
S→ab
Context Sensitive Language: It is a formal language generated by context sensitive
grammar.
Properties of Context Sensitive Language:
1. It is accepted by pushdown automata.
2. It is generated by context sensitive grammar.
3. It doesn’t satisfy operations such as complementation, difference.
Closure Properties of Context Sensitive Language:
Two context sensitive languages are closed under the following.
1. Union
2. Concatenation
3. Kleene closure
4. Intersection
5. Transpose
Methods used to prove the given language is not context sensitive – pumping lemma.
E.g: L={an bn cn | n≥1} is a regular language generated by context sensitive grammar
S→abc | aSBC
CB→BC
bB→bb
Recursive Language: Any language L over alphabet ∑ is said to be recursive, if there is a
turing machine that accepts every word present in the language L and rejects every word
which is not in L.
Properties of Recursive Language:
1. It is accepted by turing machine.
2. It is generated by unrestricted grammar.
Closure Properties of Recursive Language:
Two recursive languages are closed under the following.
1. Union
2. Concatenation
3. Complementation
4. Intersection
5. Kleene closure
Recursive Enumerable Language: A recursive enumerable language L over alphabet ∑
accepts every word in L of turing machine M (or) either rejects or loops every word which is
not in L.
Accept=L
Reject + Loop = L̄
i.e., a turing machine on same input can either accept a language or else, will run in loop
forever.
Properties of Recursive Enumerable Language:
1. It is accepted by turing machine.
2. It is generated by unrestricted grammar.
Closure Properties of Recursive Enumerable Language:
Two recursive enumerable languages are closed under the following.
1. Union
2. Concatenation
3. Intersection
4. Kleene closure
E.g: L={an bn an | n≥0} is a recursive enumerable language generated by unrestricted
grammar S→aSBA | abA
AB→BA
bB→bb
bA→ba
aA→aa
Context Free Grammar:
A grammar G=(V,T,P,S) is said to be context free grammar if it is recursive and free from
context. i.e., productions are in the form of A (V∪T)* in which LHS of any productionrule
contains only a single non-terminal and epsilon (ɛ) productions are allowed on RHS.
CFG is formally defined as G=(V,T,P,S) where
V - finite set of non-terminals
T - finite set of terminals
P – finite set of production rules of the form A (V∪T)*,
S – Start symbol
Backus naur Form:
It was proposed by John Backus and Peter Naur. If there are many productions with same
LHS variable, then they can be combined into single production LHS → RHS, where
LHS is common variable of LHS of productions
LHS is RHS productions separated by vertical bar ( | )
E.g: S→aS
S→ɛ
ɛ should be written at the end. Therefore, S→aS | ɛ
Derivation:
The process of generating a string from the production rule of a grammar is called derivation.
Types of Derivation:
1. Leftmost derivation
2. Rightmost derivation
Leftmost derivation:
A derivation is called leftmost derivation, if leftmost variable is replaced with some
production rule of a grammar.
E.g: S→aSbS | ab
Let the string to be derived is aabbab.
LMD: S→aSbS
→aabbS
→aabbab
Rightmost derivation:
A derivation is called rightmost derivation, if rightmost variable is replaced with some
production rule of a grammar.
E.g: S→aSbS | ab
Let the string to be derived is aabbab.
RMD: S→aSbS
→aSbab
→aabbab
Derivation tree:
The diagrammatical representation of the derivation process by a tree structure is known as
“derivation tree”.
It is also known as “parse tree” because it is created by parser (second phase of compiler) to
check syntax of the statement.
It is also defined as an ordered tree in which LHS of production rule represents parent node
and RHS of production rule represents children nodes.
Properties of Derivation tree:
In derivation tree of CFG,
1. Root node is represented starting variable.
2. Leaf nodes are represented by terminals or epsilon (ɛ).
3. Internal nodes are represented by non- terminals.
4. If A→α1 α2 α3... αn then A is parent of nodes α1 α2 α3... αn
Ambiguous Grammars:
If more than one parse tree (or) more than one LMD (or) more than one RMD is possible to
derive the same string, then that grammar is said to be “Ambiguous grammar”, Otherwise it
is known as unambiguous grammar.
Elimination of ɛ-Productions:
ɛ-Productions (null production): A production in the form of “A→ɛ” is called as null
production.
Elimination of useless-Productions:
Useless-Productions: Useless productions are of 2 types.
1. Non-generating productions
2. Non-reachable productions
Non-generating productions: Non-generating productions are the productions which do not
produce terminal string. Non-generating symbols are the symbols because of which productions
cannot produce terminal string.
Non-reachable productions: Non-reachable productions are the productions which cannot be
reached from starting symbol.
Normalization of CFG:
The productions in CFG are of the form A→ (V∪T)*. i.e., there are no restrictions on RHS of
CFG. There may be any number of terminals and non terminals in any combination.
If we made restrictions on RHS of CFG specifying that there should be fixed number of terminals
and non terminals, it results in normal form.
Normal form:
If every production of CFG is converted into some standard form, grammar is said to be in
normal form.
Use of normalization:
If some standard form is specified, then complexity of automata design will be less and easy to
implement the automata.
For example, the set of natural numbers is closed under addition and multiplication but not under
subtraction or division because 2-3 is negative and 1/2 is not an integer. The set of integers is
closed under addition, subtraction and multiplication but not under division.
Derivations:
Ex 1: Derive the string "abb" for leftmost derivation and rightmost derivation using a CFG
given by, S → AB | ε
A → aB
B → Sb
Solution:
Rightmost derivation:
S
aB S → aB
aaBB B → aBB
aaBbS B → bS
aaBbbA S → bA
aaBbba A→a
aabSbba B → bS
aabbAbba S → bA
aabbabba A → a
Ex 3: Derive the string "00101" for leftmost derivation and rightmost derivation using a
CFG given by, S → A1B
A → 0A | ε
B → 0B | 1B | ε
Solution:
Leftmost derivation:
S
A1B
0A1B
00A1B
001B
0010B
00101B
00101
Rightmost derivation: S
A1B
A10B
A101B
A101
0A101
00A101
00101
Ex 4: Construct a derivation tree for the string aabbabba for the CFG given by,
S → aB | bA
A → a | aS | bAA
B → b | bS | aBB
Solution: To draw a tree, we will first try to obtain derivation for the string aabbabba
Ex 5: Show the derivation tree for string "aabbbb" with the following grammar.
S → AB | ε
A → aB
B → Sb
Solution: To draw a tree we will first try to obtain derivation for the string aabbbb
Since there are two parse trees for a single string "3 * 2 + 5", the grammar G is ambiguous
Solution: For the string "aabb" the above grammar can generate two parse trees
Since there are two parse trees for a single string "aabb", the grammar G is ambiguous.
Simplification of CFG :-
Removal of Useless Symbols :- A symbol can be useless if it does not appear on the right-
hand side of the production rule and does not take part in the derivation of any string. That symbol is
known as a useless symbol.
Similarly, a variable can be useless if it does not take part in the derivation of any string. That
variable is known as a useless variable.
In the above example, the variable 'C' will never occur in the derivation of any string, so the
production C → ad is useless. So we will eliminate it, and the other productions are written in such a
way that variable C can never reach from the starting variable 'T'.
Production A → aA is also useless because there is no way to terminate it. If it never terminates,
then it can never produce a string. Hence this production can never take part in any derivation.
To remove this useless production A → aA, we will first find all the variables which will never lead
to a terminal string such as variable 'A'. Then we will remove all the productions in which the
variable 'B' occurs.
Elimination of ε - Production:-
The productions of type S → ε are called ε productions.
These type of productions can only be removed from those grammars that do not generate ε.
Step 1: First find out all nullable non-terminal variable which derives ε.
Step 2: For each production A → a, construct all production A → x, where x is obtained from a by
removing one or more non-terminal from step 1.
Step 3: Now combine the result of step 2 with the original production and remove ε productions.
Ex: Remove the production from the following CFG by preserving the meaning of it.
S → XYX
X → 0X | ε
Y → 1Y | ε
Solution: Now, while removing ε production, we are deleting the rule X → ε and Y → ε. To
preserve the meaning of CFG we are actually placing ε at the right-hand side whenever X and Y
have appeared. Let us take
S → XYX If the first X at right-hand side is ε. Then
S → YX Similarly if the last X in R.H.S. = ε. Then
S → XY If Y = ε then
S → XX If Y and X are ε then,
S → X If both X are replaced by ε
S → Y Now,
S → XY | YX | XX | X | Y Now let us consider
X → 0X If we place ε at right-hand side for X then,
X→0
X → 0X | 0 Similarly Y → 1Y | 1
Collectively we can rewrite the CFG with removed ε production as
S → XY | YX | XX | X | Y
X → 0X | 0
Y → 1Y | 1
Removing Unit Productions :-
The unit productions are the productions in which one non-terminal gives another non-terminal. Use
the following steps to remove unit production:
Step 1: To remove X → Y, add production X → a to the grammar rule whenever Y → a occurs in
the grammar.
Step 2: Now delete X → Y from the grammar.
Step 3: Repeat step 1 and step 2 until all unit productions are removed.
For example: S → 0A | 1B | C
A → 0S | 00
B→1|A
C → 01
Solution: S → C is a unit production. But while removing S → C we have to consider what C gives.
So, we can add a rule to S.
S → 0A | 1B | 01
Similarly, B → A is also a unit production so we can modify it as
B → 1 | 0S | 00
Thus finally we can write CFG without unit production as
S → 0A | 1B | 01
A → 0S | 00
B → 1 | 0S | 00
C → 01