Nothing Special   »   [go: up one dir, main page]

Grammar

Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

1

AUTOMATA THEORY
AND
COMPILER DESIGN
(CC-3203 )

Grammar

By: Dr. L. P. Verma


Grammar
2

 A grammar G can be formally written as a 4-tuple (V, T, S, P) where −


 Vor VN is a set of variables or non-terminal symbols.
 T or ∑ is a set of Terminal symbols.

 S is a special variable called the Start symbol, S ∈ N

 P is Production rules for Terminals and Non-terminals. A production rule has the
form α → β, where α and β are strings on VN ∪ ∑ and least one symbol of α
belongs to VN.
Grammar
3

 Example
 Grammar G1 −
 ({S, A, B}, {a, b}, S, {S → AB, A → a, B → b})
 Here,

 S, A, and B are Non-terminal symbols;


 a and b are Terminal symbols

 S is the Start symbol, S ∈ N

 Productions, P : S → AB, A → a, B → b
Grammar
4

 Derivation
 Derivation is a sequence of production rules. It is used to get the input string
through these production rules. During parsing, we have to take two decisions.
These are as follows:
 We have to decide the non-terminal which is to be replaced.
 We have to decide the production rule by which the non-terminal will be replaced.

 We have two options to decide which non-terminal to be placed with production


rule.
 LeftmostDerivation:
 Rightmost Derivation.
Grammar
5

 Leftmost Derivation:
 In the leftmost derivation, the input is scanned and replaced with the production rule
from left to right. So in leftmost derivation, we read the input string from left to right.
 Example:
 Production rules:
E=E+E
E=E-E
E=a|b

 Input
 a-b+a
Grammar
6

 Production rules:
E=E+E
E=E-E
E=a|b
 Input
 a-b+a
 The leftmost derivation is:
E=E+E
E=E-E+E
E=a-E+E
E=a-b+E
E=a-b+a
Grammar
7

 Rightmost Derivation:
 In rightmost derivation, the input is scanned and replaced with the production rule
from right to left. So in rightmost derivation, we read the input string from right to
left.
 Example
 Production rules:
E =E+E
E = E - E

E = a | b
Grammar
8

 Production rules:
 E=E+E
 E=E-E
 E=a|b
 Input
 a-b+a
 The rightmost derivation is:
 E=E-E
 E=E-E+E
 E=E-E+a
 E=E-b+a
 E=a-b+a
Grammar
9

 Derivation Tree
 Derivation tree is a graphical representation for the derivation of the
given production rules for a given CFG.
 It is the simple way to show how the derivation can be done to obtain
some string from a given set of production rules.
 The derivation tree is also called a parse tree.
Grammar
10

 Derivation Tree Production rules:


 Parse tree follows the precedence E=E+E
E=E-E
of operators. A parse tree contains E=a|b
the following properties: Input
a-b+a
1. The root node is always a node
The rightmost derivation is:
indicating start symbols. E=E-E
2. The derivation is read from left to right. E=E-E+E
3. The leaf node is always terminal nodes. E=E-E+a
E=E-b+a
4. The interior nodes are always the non-
E=a-b+a
terminal nodes.
Grammar
11

Production rules:
E=E+E
E=E-E
E=a|b
Input
a-b+a
The rightmost derivation is:
E=E-E
E=E-E+E
E=E-E+a
E=E-b+a
E=a-b+a
Grammar
12

 Example-1
S → bSb | a | b
 Input string= bbabb
 Example-2:
S → xB / yA
S → xS / yAA / x
B → yS / xBB / y
 Consider a string W = xxxyyxyyyx
Grammar
13

 Ambiguity in Grammar
A grammar is said to be ambiguous if there exists more than one
leftmost derivation or more than one rightmost derivation or more than
one parse tree for the given input string.
 If the grammar is not ambiguous, then it is called unambiguous.
Grammar
14

 Ambiguity in Grammar
 Example 1:
 Let us consider a grammar G with the production rule
1. E →I
2. E → E + E
3. E → E * E
4. E → (E)
5. I→ ε | 0 | 1 | 2 | ... | 9

For the string "3 * 2 + 5", the above grammar can generate two
parse trees by leftmost derivation.
Grammar
15

 Ambiguity in Grammar
For the string "3 * 2 + 5", the above grammar can generate two
parse trees by leftmost derivation.
Grammar
16

 Classification of Grammar (Chomsky)


Grammar
17

 Classification of Grammar (Chomsky)


Grammar
18

 Type - 3 Grammar
 Type-3 grammars generate regular languages. Type-3 grammars must
have a single non-terminal on the left-hand side and a right-hand side
consisting of a single terminal or single terminal followed by a single
non-terminal.
 The productions must be in the form X → a or X → aY
 where X, Y ∈ N (Non terminal)
 and a ∈ T (Terminal)

 The rule S → ε is allowed if S does not appear on the right side of any
rule. X → ε
X → a | aY
Y → b
Grammar
19

 Type - 2 Grammar
 Type-2 grammars generate context-free languages.
 The productions must be in the form α → β

 Where A ∈ N (Non-terminal)
 and γ ∈ (T ∪ N)* (String of terminals and non-terminals).
 These grammars’ languages are recognized by a non-deterministic pushdown
automaton. Example
S→Xa
X→a
X → aX
X → abc
X→ε
Grammar
20

 Type - 1 Grammar
 Type-1 grammars generate context-sensitive languages. The
productions must be in the form
αAβ → α γ β
 where A ∈ N (Non-terminal)
 and α, β, γ ∈ (T ∪ N)* (Strings of terminals and non-terminals)
 The strings α and β may be empty, but γ must be non-empty.
 The rule S → ε is allowed if S does not appear on the right side of any rule. The
languages generated by these grammars are recognized by a linear bounded
automaton. Example
AB → AbBc
A → bcA
B → b
Grammar
21

 Type - 0 Grammar
 Type-0 grammars generate recursively enumerable languages. The
productions have no restrictions. They are any phase structure grammar
including all formal grammars.
 They generate the languages that are recognized by a Turing machine.

 The productions can be in the form of α → β where α is a string of


terminals and non-terminals with at least one non-terminal and α cannot
be null. β is a string of terminals and non-terminals.
Example
S → ACaB
BC → acB
CB → DB
aD → Db
Grammar
22

 Simplification of CFG
 Simplification of grammar means reduction of grammar by
removing useless symbols.
 As we have seen, various languages can efficiently be represented by a
context-free grammar.
 All the grammar are not always optimized that means the grammar may
consist of some extra symbols(non-terminal).
 Having extra symbols, unnecessary increase the length of grammar.
Grammar
23

 Simplification of CFG
 The properties of reduced grammar are given below:
1. Each variable (i.e. non-terminal) and each terminal of G appears in the
derivation of some word in L.

2. There should not be any production as X → Y where X and Y are non-terminal.

3. If ε is not in the language L then there need not to be the production X → ε.


Grammar
24

 Removal of Useless Symbols


A symbol can be useless if it does not appear on the right-hand side of
the production rule and does not take part in the derivation of any string.
That symbol is known as a useless symbol.
 A variable can be useless if it does not take part in the derivation of any
string. That variable is known as a useless variable.
 For Example:
1. T → aaB | abA | aaT
2. A → aA
3. B → ab | b
4. C → ad
Grammar
25

 Elimination of ε Production
 Theproductions of type S → ε are called ε productions. These type of
productions can only be removed from those grammars that do not
generate ε.
 Step 1: First find out all nullable non-terminal variable which derives ε.
 Step 2: For each production A → a, construct all production A → x, where x is
obtained from a by removing one or more non-terminal from step 1.
 Step 3: Now combine the result of step 2 with the original production and
remove ε productions.
Grammar
26

 Elimination of ε Production
 Example:
 Remove the production from the following CFG by preserving the meaning of
it.
1. S → XYX
2. X → 0X | ε
3. Y → 1Y | ε

 Solution:
 Now, while removing ε production, we are deleting the rule X → ε and Y → ε.
To preserve the meaning of CFG we are placing ε at the right-hand side
whenever X and Y have appeared.
 Let us take
1. S → XYX
Grammar
27

 Elimination of ε Production
 Let us take
1. S → XYX
2. S → YX

 Similarly if the last X in R.H.S. = ε. Then


1. S → XY
If Y and X are ε then
 If Y = ε then S→X
If both X are replaced by ε
1. S → XX 1.S → Y
Now,
1.S → XY | YX | XX | X | Y
Grammar
28

 Elimination of ε Production
 Now let us consider
1. X → 0X
 If we place ε at right-hand side for X then,
1. X →0
1. X → 0X | 0
 Similarly Y → 1Y | 1
Collectively we can rewrite the CFG with removed ε production
as:
S → XY | YX | XX | X | Y | ε
X → 0X | 0
Y → 1Y | 1
Grammar
29

 Removing Unit Productions


 The unit productions are the productions in which one non-terminal
gives another non-terminal. Use the following steps to remove unit
production:
 Step 1: To remove X → Y, add production X → a to the grammar rule whenever
Y → a occurs in the grammar.

 Step 2: Now delete X → Y from the grammar.

 Step 3: Repeat step 1 and step 2 until all unit productions are removed.
Grammar
30

 Removing Unit Productions


 For example:
1. S → 0A | 1B | C
2. A → 0S | 00
3. B→1|A
4. C → 01
 Solution:
 S → C is a unit production. But while removing S → C we must
consider what C gives. So, we can add a rule to S.
S → 0A | 1B | 01
 Similarly, B → A is also a unit production so we can modify it as
B → 1 | 0S | 00
Grammar
31

 Chomsky's Normal Form (CNF)


 CNF stands for Chomsky normal form. A CFG(context free grammar) is
in CNF(Chomsky normal form) if all production rules satisfy one of the
following conditions:
• Start symbol generating ε. For example, A → ε.
• A non-terminal generating two non-terminals. For example, S → AB.
• A non-terminal generating a terminal. For example, S → a.
 For example:
1. G1 = {S → AB, S → c, A → a, B → b}
2. G2 = {S → aA, A → a, B → c}
Grammar
32

 Chomsky's Normal Form (CNF)


 Example:
 Convert the given CFG to CNF.
 Consider the given grammar G1:
1. S → a | aA | B
2. A → aBB | ε
3. B → Aa | b
Grammar
33

 Chomsky's Normal Form (CNF)


 Solution:
 Step 1: We will create a new production S1 → S, as the start symbol S appears on the RHS. The grammar
will be:
1. S1 → S
2. S → a | aA | B
3. A → aBB | ε
4. B → Aa | b
 Step 2: As grammar G1 contains A → ε null production, its removal from the grammar yields:
1. S1 → S
2. S → a | aA | B
3. A → aBB
4. B → Aa | b | a
 Now, as grammar G1 contains Unit production S → B, its removal yield:
1. S1 → S
2. S → a | aA | Aa | b
3. A → aBB
4. B → Aa | b | a
Grammar
34

 Chomsky's Normal Form (CNF)


 Also remove the unit production S1 → S, its removal from the grammar yields:
1. S0 → a | aA | Aa | b
2. S → a | aA | Aa | b
3. A → aBB
4. B → Aa | b | a
 Step 3: In the production rule S0 → aA | Aa, S → aA | Aa, A → aBB and B → Aa, terminal a exists on RHS with non-
terminals. So we will replace terminal a with X:
1. S0 → a | XA | AX | b
2. S → a | XA | AX | b
3. A → XBB
4. B → AX | b | a
 X → a Step 4: In the production rule A → XBB, RHS has more than two symbols, removing it from grammar yield:
1. S0 → a | XA | AX | b
2. S → a | XA | AX | b
3. A → RB
4. B → AX | b | a
5. X→a
6. R → XB
Grammar
35

 Greibach Normal Form (GNF)


 GNF stands for Greibach normal form. A CFG(context free grammar) is
in GNF(Greibach normal form) if all the production rules satisfy one of
the following conditions:
• A start symbol generating ε. For example, S → ε.
• A non-terminal generating a terminal. For example, A → a.
• A non-terminal generating a terminal which is followed by any number of
non-terminals. For example, S → aASB.
Grammar
36

 Greibach Normal Form (GNF)


 Steps for converting CFG into GNF
 Step 1: Convert the grammar into CNF.
 If the given grammar is not in CNF, convert it into CNF. You can refer the following topic
to convert the CFG into CNF: Chomsky normal form
 Step 2: If the grammar exists left recursion, eliminate it.
 If the context free grammar contains left recursion, eliminate it. You can refer the
following topic to eliminate left recursion: Left Recursion
 Step 3: In the grammar, convert the given production rule into GNF form.
 If any production rule in the grammar is not in GNF form, convert it.
Grammar
37

 Left Recursive Grammars


 In a context-free grammar G, if there is a production in the form X
→ Xa where X is a non-terminal and ‘a’ is a terminal, it is called
a left recursive production. The grammar has a left recursive
production is called a left recursive grammar.
Grammar
38

 Elimination of Left Recursion


 If we have the left-recursive pair of productions-
 A → Aα / β
 where β does not begin with an A.
 Then, we can eliminate left recursion by replacing the pair of
productions with-
 A → βA’
 A’ → αA’ / ∈
Grammar
39

 Right Recursive Grammars


 If in a context-free grammar G, if there is a production is in the form X
→ aX where X is a non-terminal and ‘a’ is a string of terminals, it is
called a right recursive production. The grammar having a right
recursive production is called a right recursive grammar.
Grammar
40

 GNF
 Example:
1. S → XB | AA
2. A → a | SA
3. B→b
4. X→a
Grammar
41

 Solution:
 As the given grammar G is already in CNF and there is no left recursion, so we can skip
step 1 and step 2 and directly go to step 3.
 The production rule A → SA is not in GNF, so we substitute S → XB | AA in the production
rule A → SA as:
1. S → XB | AA
2. A → a | XBA | AAA
3. B→b
4. X→a
 The production rule S → XB and A → XBA is not in GNF, so we substitute X → a in the
production rule S → XB and A → XBA as:
1. S → aB | AA
2. A → a | aBA | AAA
3. B→b
4. X→a
Grammar
42

 Now we will remove left recursion (A → AAA), we get:


1. S → aB | AA
2. A → aC | aBAC
3. C → AAC | ε
4. B→b
5. X→a
 Now we will remove null production C → ε, we get:
1. S → aB | AA
2. A → aC | aBAC | a | aBA
3. C → AAC | AA
4. B→b
5. X→a
Grammar
43

 The production rule S → AA is not in GNF, so we substitute A → aC | aBAC | a | aBA in production rule
S → AA as:
1. S → aB | aCA | aBACA | aA | aBAA
2. A → aC | aBAC | a | aBA
3. C → AAC
4. C → aCA | aBACA | aA | aBAA
5. B→b
6. X→a
 The production rule C → AAC is not in GNF, so we substitute A → aC | aBAC | a | aBA in production
rule C → AAC as:
1. S → aB | aCA | aBACA | aA | aBAA
2. A → aC | aBAC | a | aBA
3. C → aCAC | aBACAC | aAC | aBAAC
4. C → aCA | aBACA | aA | aBAA
5. B→b
6. X→a
 Hence, this is the GNF form for the grammar G.
44

References
 Hopcroft, Ullman, “Introduction to Automata Theory, Languages and Computation”, Pearson
Education.
 KLP Mishra and N. Chandrasekaran, “Theory of Computer Science: Automata, Languages and
Computation”, PHI Learning Private Limited, Delhi India.
 Peter Linz, "An Introduction to Formal Language and Automata", Narosa Publishing house.

You might also like