  # CS606 - Compiler Construction - Lecture Handout 34

User Rating:     / 0
PoorBest

Let’s go through an example of using YACC to implement the ad-hoc scheme for an arithmetic calculator.

The YACC file for a calculator grammar is as follows:

%token NUMBER LPAREN RPAREN
%token PLUS MINUS TIMES DIVIDE
%%
expr : expr PLUS expr
| expr MINUS expr
| expr TIMES expr
| expr DIVIDE expr
| LPAREN expr RPAREN
| MINUS expr
| NUMBER
;
%%

We will add the code snippets

%{
#include <iostream>
%}
// type of value entries in the parse stack
%union {int val;}
%token NUMBER LPAREN RPAREN EQUAL
%token PLUS MINUS TIMES DIVIDE
/* associativity and precedence:in order of increasing
precedence */
%nonassoc EQUAL
%left PLUS MINUS
%left TIMES DIVIDE
%left UMINUS /* dummy token to use as
precedence marker */
%type <val> NUMBER expr
%%
prog : expr { cout << \$1 << endl;}
;
expr : expr PLUS expr {\$\$ = \$1 + \$3;}
| expr MINUS expr {\$\$ = \$1 - \$3;}

| expr TIMES expr {\$\$ = \$1 * \$3;}
| expr DIVIDE expr {if(\$3) \$\$ = \$1 / \$3;}
| LPAREN expr RPAREN {\$\$ = \$2;}
| MINUS expr {\$\$ = -\$2;}
| NUMBER {\$\$ = \$1;}
;

The ‘\$’ notation is used by YACC to refer to values of symbols on the right hand side of the grammar production. For example, for expr : expr PLUS expr, \$1 refers to first expr on the right, \$2 refers to PLUS and \$3 refers to the second non-terminal expr. The notation \$\$ refers to the symbol on the left hand side of the production.
Internally, the \$1 refers to the attribute value associated with the first grammar symbol, \$2 with the second, \$3 with the third and so on. These values are stored in the parse stack. The notation \$\$ instructs YACC to push a computed attribute value on the stack and associate it with the symbol on the left when the reduction takes place.

The following attributed tree shows the values as they are computed in a bottom-up parse (note: please see the file “lex_yacc.pdf” for additional information on using YACC.)

## Intermediate Representations

Compilers are organized as a series of passes. This creates the need for an intermediate representation (IR) for the code being compiled. Compilers use some internal form– an IR –to represent the code being analyzed and translated. Many compilers use more than one IR during the course of compilation.

The IR must be expressive enough to record all of the useful facts that might be passed between passes of the compiler. During translation, the compiler derives facts that have no representation in the source code. For example, the addresses of variables and procedures are not specified in the code being compiled. Typically, the compiler augments the IR with a set of tables that record additional information. Foremost among these is the symbol table. These tables are considered part of the IR

Selecting an appropriate IR for a compiler project requires an understanding of both the source language and the target machines and the properties of the programs to be compiled. Thus, a source-to-source translator e.g., C++ to Java, might keep its internal information in a form quite to close to the source. In contrast, a compiler that produces
assembly code might use a form close to the target machine’s instruction set.