Spread Knowledge

CS606 - Compiler Construction - Lecture Handout 35

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

IR Taxonomy

IRs fall into three organizational categories:

  1. Graphical IRs encode the compiler’s knowledge in a graph.
  2. Linear IRs resemble pseudo-code for some abstract machine
  3. Hybrid IRs combine elements of both graphical (structural) and linear IRs

Graphical IRs

Parse trees are graphs that represent source-code form of the program. The structure of the tree corresponds to the syntax of the source code. Parse trees are used primarily in discussion of parsing and in attribute grammar systems where they are the primary IR In most other applications, compilers use one of the more concise alternatives. An abstract syntax tree (AST) retains the essential structure of the parse tree but eliminates extraneous nodes. Here, for example, is the AST for the expression a = b*-c + b*-c. Notice how all derivation related information has been removed.

AST for the expression

ASTs have been used in many practical compiler systems such as source-to-source systems, automatic parallelization tools, pretty-printing etc.

AST is more concise than a parse tree. It faithfully retains the structure of the original source code. Consider the AST for x*2+x*2*y

AST is more concise than a parse tree

The AST contains two distinct copies of x*2.A directed acyclic graph (DAG) is a contraction of the AST that avoids duplication.

AST that avoids duplication

If the value of x does not change between uses of x*2, the compiler can generate code that evaluates the subtree once and uses the result twice.

The task of building AST fits neatly into an ad hoc-syntax-directed translation scheme.
Assume that the compiler has routines mknode and mkleaf for creating tree nodes. The following rules can be attached to the expression grammar to create AST.

Production Semantic Rule
E → E1 + E2 E.nptr = mknode(‘+’, E1.nptr, E2.nptr)
E → E1 ∗ E2 E.nptr = mknode(‘∗’, E1.nptr, E2.nptr)
E → – E1 E.nptr = mknode(‘–’, E1.nptr)
E → ( E1 ) E.nptr = E1.nptr
E → num E.nptr = mkleaf(‘num’, num.val)

The following table shows the same rules using YACC syntax.

Production Semantic Rule (yacc)
E → E1 + E2 $$.nptr = mknode(‘+’, $1.nptr, $3.nptr)
E → E1 ∗ E2 $$.nptr = mknode(‘∗’, $1.nptr, $3.nptr)
E → – E1 $$.nptr = mknode(‘–’, $1.nptr)
E → ( E1 ) $$.nptr = $1.nptr
E → num $$.nptr = mkleaf(‘num’, $1.val)

We will use another IR, called three-address code, for actual code generation The semantic rules for generating three-address code for common programming languages constructs are similar to those for AST.

Linear IRs

The alternative to graphical IR is a linear IR. An assembly-language program is a form of linear code. It consists of a sequence of instructions that execute in order of appearance Two linear IRs used in modern compilers are stack-machine code and three-address code.

Stack-machine code is sometimes called one-address code. It assumes the presence of an operand stack. Most operations take their operands from the stack and push results back onto the stack. Here, for example, is the linear IR for x – 2 × y

stack-machine three-address
push 2 t1 ← 2
push y t2 ← y
multiply t3 ← t1 × t2
push x t4 ← x
subtract t5 ← t4 – t1

Stack-machine code is compact; it eliminates many names from IR. This shrinks the program in IR form. All results and arguments are transitory unless explicitly moved to memory. Stack-machine code is simple to generate and execute. Smalltalk-80 and Java use byte-codes which are abstract stack-machine code. The byte-code is either interpreted or translated into target machine code (JIT).

In three-address code most operations have the form

x ← y op z

with an operator (op), two operands (y and z) and one result (x). Some operators, such as an immediate load and a jump, will need fewer arguments.