Spread Knowledge

CS606 - Compiler Construction - Lecture Handout 42

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Code Generation

The code generation problem is the task of mapping intermediate code to machine code.
The generated code must be correct for obvious reasons. It should be efficient both in terms of memory space and execution time.

The input to the code generation module of compiler is intermediate code (optimized or not) and its task is typically to produce either machine code or assembly language code for a target machine.

The code generation module has to tackle a number of issues.

  • Memory management: mapping names to data objects in the run-time system.
  • Instruction selection: the assembly language instructions to choose to encode intermediate code statements
  • Instruction scheduling: instruction chosen must utilize the CPU resources effectively. Hardware stalls must be avoided.
  • Register allocation: operands are placed in registers before executing machine operation such as ADD, MULTIPLY etc. Most processors have a limited set of registers available. The code generator has to make efficient use of this limited resource

For our discussion, we will target a machine that has the following general characteristics. Most actual processors are similar to such architecture.

The machine is byte-addressable with 4-byte words. It has N general -purpose registers.
It uses two-address instructions of the form op source, destination. The target assembly language operations are:

  • MOV source, destination
  • ADD source, destination
  • SUB source, destination (dest = dest – source)
  • GOTO address
  • CJ conditional jump

More instruction will be added to the instruction set as needed.

The following table presents the addressing modes for source or destination operands.

absolute M M 1
register R R 0
indexed c(R) c + contents(R) 1
indirect register *R contents(R) 0
indirect indexed *c(R) contents(c+contents(R)) 1
literal #c c 1
stack SP SP 0
indexed stack c(SP) c + contents(SP) 1

We associate a cost with each instruction. This will allow us to compute the cost of generated code. The cost corresponds to length of instruction. For example the instruction

MOV R0,R1 ; R0 = c(R1)
has cost 1 while
MOV R5,M ; M = c(R5)

has cost 2: 1 for instruction, 1 additional for memory address. The column title “ADDED COST” indicates this additional cost.

Simple Code Generation

We start with a simple code generation strategy: define a target code sequence for each intermediate code (such as 3-address code) statement type. Thus,

Intermediate code becomes…
a = b MOV b,a
a = b[c] MOV addr(b),R0
ADD c, R0
MOV *R0,a
a = b + c MOV b,a
ADD c,a
a[b] = c MOV addr(a),R0
ADD b,R0
MOV c,*R0

Consider the C statement: a[i] = b[c[j]]; the simple code generator will emit

C statement

The cost of this code is 18 and we are forced to allocate space for two temporaries.
While the simple approach works, it does not produce good code. There a number of reasons for this. The generator considers each IR (3-address in this case) alone and makes local decision. It does not take temporary variables into account. One optimization possible is to get rid of the temporaries:

MOV addr(c), R0
ADD j, R0
MOV addr(b), R1
ADD *R0, R1
MOV addr(a), R2
ADD i, R2
MOV *R1, *R2

The cost of this code is 12. We can optimize further:

MOV addr(c), R0
ADD j, R0
MOV addr(a), R2
ADD i, R2
MOV *addr(b)(R0), *R2

The cost of this code is 10. What is needed is a way to generate machine code based on past and future use of the data.