Nothing Special   »   [go: up one dir, main page]

Predictive Parsing: Recall The Main Idea of Top-Down Parsing

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 19

Predictive parsing

 Recall the main idea of top-down parsing:


 Start at the root, grow towards leaves
 Pick a production and try to match input
 May need to backtrack

 Can we avoid the backtracking?


 Given A   |  the parser should be able to choose
between  and 
 How?
 What if we do some "preprocessing" to answer the question:
Given a non-terminal A and lookahead t, which (if any)
production of A is guaranteed to start with a t?

1
Predictive parsing
 If we have two productions: A , we want a
distinct way of choosing the correct one.
 Define:
 for G, x  FIRST() iff  * x
 If FIRST() and FIRST() contain no common
symbols, we will know whether we should choose
A or A by looking at the lookahead symbol.

2
Predictive parsing
 Compute FIRST(X) as follows:
 if X is a terminal, then FIRST(X)={X}
 if X is a production, then add  to FIRST(X)
 if X is a non-terminal and XY1Y2...Yn is a production, add
FIRST(Yi) to FIRST(X) if the preceding Yjs contain  in their
FIRSTs

3
Predictive parsing
 What if we have a "candidate" production A
where = or *?
 We could expand if we knew that there is some
sentential form where the current input symbol
appears after A.
 Define:
 for AN, xFOLLOW(A) iff  S*Ax

4
Predictive parsing
 Compute FOLLOW as follows:
 FOLLOW(S) contains EOF
 For productions AB, everything in FIRST() except 
goes into FOLLOW(B)
 For productions AB or AB where FIRST() contains
, FOLLOW(B) contains everything that is in FOLLOW(A)

5
Predictive parsing
 Armed with
 FIRST
 FOLLOW

 we can build a parser where no backtracking is


required!

6
Predictive parsing (w/table)
 For each production A do:
 For each terminal a  FIRST() add A to entry M[A,a]
 If FIRST(), add A to entry M[A,b] for each terminal
b  FOLLOW(A).
 If FIRST() and EOFFOLLOW(A), add A to
M[A,EOF]
 Use table and stack to simulate recursion.

7
Recursive Descent Parsing
 Basic idea:
 Write a routine to recognize each lhs
 This produces a parser with mutually recursive routines.
 Good for hand-coded parsers.

 Example:
 AaB | b will correspond to

A() {
if (lookahead == 'a')
match('a');
B();
else if (lookahead == 'b')
match ('b');
else error();
} 8
Building a parser
 Original grammar:
EE+E
EE*E
E(E)
Eid
 This grammar is left-recursive, ambiguous and requires left-
factoring. It needs to be modified before we build a predictive
parser for it:
Remove ambiguity: Remove left recursion:
EE+T ETE'
TT*F E'+TE'|
F(E) TFT'
Fid T'*FT'|
F(E)
Fid
9
Building a parser
ETE'
 The grammar:
E'+TE'|
TFT'
T'*FT'|
F(E)
Fid
FIRST(E) = FIRST(T) = FIRST(F) = {(, id}
FIRST(E') = {+, }
FIRST(T') = {*, }
FOLLOW(E) = FOLLOW(E') = {$, )}
FOLLOW(T) = FOLLOW(T') = {+, $, )}
FOLLOW(F) = {*, +, $, )}

Now, we can either build a table or design a recursive descend parser.


10
Parsing table

+ * ( ) id $
E ETE' ETE'
E' E'+TE' E' E'
T TFT' TFT'
T' T' T'*FT' T' T'
F F(E) Fid
+ match
* match
( match
) match
id match
$ accept

11
Parsing table
Parse the input id*id using the parse table and a stack

Step Stack Input Next Action


1 $E id*id$ ETE'
2 $E'T id*id$ TFT'
3 $E'T'F id*id$ Fid
4 $E'T'id id*id$ match id
5 $E'T' *id$ T'*FT'
6 $T'F* *id$ match *
7 $T'F id$ Fid
8 $T'id id$ match id
9 $T' $ T'
10 $ $ accept

12
Recursive descend parser

parse() { Eprime() {
token = get_next_token(); if (token == '+')
if (E() and token == '$') then token=get_next_token()
then return true if (T())
else return false
then return Eprime()
}
else return false
E() { else if (token==')' or token=='$')
if (T()) then return true
then return Eprime() else return false
else return false }
}

The remaining procedures are similar.


13
LL(1) parsing
 Our parser scans the input Left-to-right, generates a
Leftmost derivation and uses 1 symbol of lookahead.
 It is called an LL(1) parser.
 If you can build a parsing table with no multiply-
defined entries, then the grammar is LL(1).
 Ambiguous grammars are never LL(1)
 Non-ambiguous grammars are not necessarily LL(1)

14
LL(1) parsing
 For example, the following grammar will have two
possible ways to expand S' when the lookahead is
else.
S  if E then S S' | other
S' else S | 
E  id

 It may expand S' else S or S' 


 We can resolve the ambiguity by instructing the parser to always
pick S' else S. This will match each else to the closest previous
then.

15
LL(1) parsing
 Here's an example of a grammar that is NOT LL(k) for
any k:

S  Ca | Cb
C  cC | c
 Why? Suppose the grammar was LL(k) for some k. Consider the
input string ck+1a. With only k lookaheads, the parser would not
be able to decide whether to expand using S  Ca or S  Cb
 Note that the grammar is actually regular: it generates strings
of the form c+(a|b)

16
Error detection in LL(1) parsing
 An error is detected whenever an empty table slot is
encountered.
 We would like our parser to be able to recover from
an error and continue parsing.
 Phase-level recovery
 We associate each empty slot with an error handling
procedure.
 Panic mode recovery
 Modify the stack and/or the input string to try and reach
state from which we can continue.

17
Error recovery in LL(1) parsing
 Panic mode recovery
 Idea:
 Decide on a set of synchronizing tokens.

 When an error is found and there's a nonterminal at the

top of the stack, discard input tokens until a


synchronizing token is found.
 Synchronizing tokens are chosen so that the parser can

recover quickly after one is found


 e.g. a semicolon when parsing statements.
 If there is a terminal at the top of the stack, we could try
popping it to see whether we can continue.
 Assume that the input string is actually missing that
terminal.
18
Error recovery in LL(1) parsing
 Panic mode recovery
 Possible synchronizing tokens for a nonterminal A
 the tokens in FOLLOW(A)

 When one is found, pop A of the stack and try to continue


 the tokens in FIRST(A)
 When one is found, match it and try to continue
 tokens such as semicolons that terminate statements

19

You might also like