School Work">
Análisis Sintáctico
Análisis Sintáctico
Análisis Sintáctico
yacc.py is used to parse language syntax. Before showing an example, there are a few important bits
of background that must be mentioned. First, syntax is usually specified in terms of a BNF grammar.
For example, if you wanted to parse simple arithmetic expressions, you might first write an
unambiguous grammar specification like this:
: term * factor
| term / factor
| factor
factor
: NUMBER
| ( expression )
In the grammar, symbols such as NUMBER, +, -, *, and / are known asterminals and correspond to
raw input tokens. Identifiers such as termand factor refer to more complex rules, typically
comprised of a collection of tokens. These identifiers are known as non-terminals.
The semantic behavior of a language is often specified using a technique known as syntax directed
translation. In syntax directed translation, attributes are attached to each symbol in a given grammar
rule along with an action. Whenever a particular grammar rule is recognized, the action describes
what to do. For example, given the expression grammar above, you might write the specification for a
simple calculator like this:
Grammar
-------------------------------expression0 : expression1 + term
| expression1 - term
| term
term0
: term1 * factor
| term1 / factor
| factor
factor
: NUMBER
| ( expression )
A good way to think about syntax directed translation is to simply think of each symbol in the grammar
as some kind of object. The semantics of the language are then expressed as a collection of
methods/operations on these objects.
Yacc uses a parsing technique known as LR-parsing or shift-reduce parsing. LR parsing is a bottom up
technique that tries to recognize the right-hand-side of various grammar rules. Whenever a valid righthand-side is found in the input, the appropriate action code is triggered and the grammar symbols are
replaced by the grammar symbol on the left-hand-side.
LR parsing is commonly implemented by shifting grammar symbols onto a stack and looking at the
stack and the next input token for patterns. The details of the algorithm can be found in a compiler
text, but the following example illustrates the steps that are performed if you wanted to parse the
expression 3 + 5 * (10 - 20) using the grammar defined above:
Step Symbol Stack
Input Tokens
---- ---------------------
---------------------
3 + 5 * ( 10 - 20 )$
$ 3
+ 5 * ( 10 - 20 )$
$ factor
+ 5 * ( 10 - 20 )$
$ term
+ 5 * ( 10 - 20 )$
$ expr
+ 5 * ( 10 - 20 )$
$ expr +
$ expr + 5
* ( 10 - 20 )$
$ expr + factor
* ( 10 - 20 )$
5 * ( 10 - 20 )$
$ expr + term
* ( 10 - 20 )$
10
$ expr + term *
11
$ expr + term * (
12
$ expr + term * ( 10
- 20 )$
13
- 20 )$
14
- 20 )$
15
- 20 )$
16
17
)$
18
)$
19
)$
20
)$
21
22
23
$ expr + term
24
$ expr
25
( 10 - 20 )$
10 - 20 )$
20 )$