CS 4300: Compiler Theory A Simple Syntax-Directed Translator
CS 4300: Compiler Theory A Simple Syntax-Directed Translator
CS 4300: Compiler Theory A Simple Syntax-Directed Translator
Chapter 2
A Simple Syntax- Directed
Translator
Xuejun Liang
2019 Fall
2
Outline
• This chapter is an introduction to the compiling techniques
in Chapters 3 to 6 of the Dragon book
• It illustrates the techniques by developing a working Java
program that translates representative programming
language statements into three-address code
• The major topics are
2. Syntax Definition
3. Syntax-Directed Translation
4. Parsing
5. A Translator for Simple Expressions
6. Lexical Analysis
7. Symbol Tables
8. Intermediate Code Generation
3
2. Syntax Definition
An if-else statement in Java can have the form
if ( expression ) statement else statement The rule called
production, left
This structuring rule can be expressed as side called head,
and right side
stmt if ( expr ) stmt else stmt
called body
Example Grammar
Context-free grammar for simple expressions:
with productions P =
digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
8
Derivation Example
list
list + digit
list - digit + digit
digit - digit + digit
9 - digit + digit
9 - 5 + digit
9-5+2
Parse Trees
• The root of the tree is labeled by the start symbol
• Each leaf of the tree is labeled by a terminal
(=token) or
• Each interior node is labeled by a nonterminal
• If A X1 X2 … Xn is a production, then node A has
immediate children X1, X2, …, Xn where Xi is a
(non)terminal or ( denotes the empty string)
11
list
list digit
list digit
digit
The sequence of
9 - 5 + 2 leafs is called the
yield of the parse tree
12
Ambiguity
A grammar can have more than one parse tree
generating a given string of terminals. Such a
grammar is said to be ambiguous
Associativity of Operators
Left-associative operators have left-recursive productions
Precedence of Operators
Operators with higher precedence “bind more tightly”
expr expr + term | term
term term * factor | factor
factor digit | ( expr )
String 2+3*5 has the same meaning as 2+(3*5)
expr
expr term
term term factor
factor factor digit
digit digit
2 + 3 * 5
17
Syntax (Grammar)
Expressions
Subset of Java
Statements
18
3. Syntax-Directed Translation
• Uses a CF grammar to specify the syntactic
structure of the language
• AND associates a set of attributes with the
terminals and nonterminals of the grammar
• AND associates with each production a set of
semantic rules to compute values of attributes
• A parse tree is traversed and semantic rules
applied: after the tree traversal(s) are completed,
the attribute values on the nonterminals contain
the translated form of the input
19
term.t = “9”
9 - 5 + 2
Depth-First Traversals
procedure visit(n : node);
begin
for each child c of n, from left to right do
visit(c);
evaluate semantic rules at node n
end
23
expr.t = “95-2+”
term.t = “9”
Translation Schemes
• A translation scheme is a CF grammar embedded
with semantic actions by attaching program
fragments to productions in the grammar
Embedded rest
semantic action
expr
term 5 { print(“5”) }
9 { print(“9”) }
4. Parsing
• Parsing = process of determining if a string of
tokens can be generated by a grammar
• For any CF grammar there is a parser that takes at
most O(n3) time to parse a string of n tokens
• Linear algorithms suffice for parsing
programming language source code
• Top-down parsing “constructs” a parse tree from
root to leaves
• Bottom-up parsing “constructs” a parse tree from
leaves to root
28
Top-Down Parsing
• The top-down construction of a parse tree is done
by starting with the root, labeled with the starting
nonterminal , and repeatedly performing the
following two steps.
1. At node N, labeled with nonterminal A, select one of
the productions for A and construct children at N for
the symbols in the production body.
2. Find the next node at which a subtree is to be
constructed, typically the leftmost unexpanded
nonterminal of the tree.
29
Grammar
Input string
for ( ; expr ; expr ) other
Predictive Parsing
• Recursive descent parsing is a top-down method of
syntax analysis in which a set of recursive procedures
is used to process the input.
– Each nonterminal has one (recursive) procedure that is
responsible for parsing the nonterminal’s syntactic
category of input tokens
– When a nonterminal has multiple productions, each
production is implemented in a branch of a selection
statement based on input look-ahead information
• Predictive parsing is a special form of recursive
descent parsing where we use one lookahead token to
unambiguously determine the parse operations
32
33
FIRST
FIRST() is the set of terminals that appear as the first
symbols of one or more strings generated from
Left Factoring
When more than one production for nonterminal A starts
with the same symbols, the FIRST sets are not disjoint
Left Recursion
When a production for nonterminal A starts with a
self reference then a predictive parser loops forever
AA
|
|
We can eliminate left recursive productions by systematically
rewriting the grammar using right recursive productions
AR
|R
RR
|
37
6. Lexical Analysis
• The expression only deals with single digit integer and no
white space is allowed. So, no lexical analysis is needed.
• Expend to multiple digit integer and to include identifiers
44
Lexical Analyzer
• To expend to multiple digit integer and to
include identifiers, a lexical analyzer is
needed.
• Typical tasks of the lexical analyzer:
– Remove white space and comments
– Encode constants as tokens
– Recognize keywords
– Recognize identifiers and store identifier names
in a global symbol table
45
Constants (Number)
31 + 28 + 59
token treminal integer-valued attribute
Lexical
<num, 31> <+> <num, 28> <+> <num, 59>
analyzer
(key, value)
(lexeme, token)
A Lexical Analyzer
Token scan () {
skip white space;
handle numbers;
pseudocode handle reserved words and identifiers;
/ * treat read-ahead character peek as a token * /
Token t = new Token (peek) ;
peek = blank /* initialization*/ ;
return t;
}
48
7. Symbol Tables
Given input :
{ int x ; char y ; { bool y ; x ; y ; } x ; y ; }
Stmt Expr
while statement
While If Do Eval
Block
sequence
62
Static Checking
• Static checks are consistency checks that are done during
compilation
– Syntactic Checking.
• There is more to syntax than grammars
– Type Checking
• Assure that an operator or function is applied to the right
number and type of operands
• L-values and R-values
– r-values are what we usually think of as "values," while l-
values are locations.
• Coercion
– A coercion occurs if the type of an operand is automatically
converted to the type expected by the operator
65
Three-Address Code
• Show how to write functions that process the
syntax tree and, as a side-effect, emit the
necessary three-address code
• Three-Address Instructions
• Translation of Statements
– Example: if expr then stmt1
66
.
Function gen in class If
generates three-address code
67
stmt if expr
{ after = newlabel();
print(“ifFalse goto after:”); }
then stmt1
{ print(“after: ”); }
Translation of Expressions
i - j +k 2 *a [i] a [2*k] a [ i ] = 2 * a[j – k]
tl = i - j tl = a [ i ] t = 2*k t3 = j - k
t2 = tl + k t2 = 2 * tl a [t] t2 = a [t3]
t1 = 2 * t2
a [i] = t1