Advanced Backend Code Optimization

Download Full Version of the eBook "Advanced Backend Code Optimization"

Advanced Backend Code Optimization by Sid Touati

Download - Advanced Backend Code Optimization by Sid Touati - PDF 

An open question that remains in computer science is how to define a program of good quality. At the semantic level, a good program is one that computes what is specified formally (either in an exact way, or even without an exact result but at least heading towards making a right decision). At the algorithmic level, a good program is one that has a reduced spatial and temporal complexity. This book does not tackle these two levels of program quality abstraction. We are interested in the aspects of code quality at the compilation level (after a coding and an implementation of an algorithm). When a program has been implemented, some quality can be quantified according to its efficiency, for instance. By the term “efficiency”, we mean a program that exploits the underlying hardware at its best, delivers the correct results as quickly as possible, has a reasonable memory footprint and a moderate energy consumption. There are also some quality criteria that are not easy to define, for instance the clarity of the code and its aptitude for being analyzed conveniently by automatic methods (worst-case execution time, data-flow dependence analysis, etc.).


Automatic code optimization, in general, focuses on two objectives that are not necessarily antagonists: the computation speed and the memory footprint of the code. These are the two principal quality criteria approached in this book. The computation speed is the most popular objective, but it remains difficult to model precisely. In fact, the execution time of a program is influenced by a complex combination of multiple factors, a list of which (probably incomplete) is given below:

1) The underlying processor and machine architecture: instruction set architecture (ISA), explicit instruction-level parallelism (very long instruction word – VLIW), memory addressing modes, data size, input/output protocols, etc.

2) The underlying processor micro-architecture: implicit instruction-level parallelism (superscalar), branch prediction, memory hierarchy, speculative execution, pipelined execution, memory disambiguation mechanism, out-of-order execution, register renaming, etc.

3) The technology: clock frequency, processor fabrication, silicon integration, transistor wide, components (chipset, DRAM and bus), etc.

4) Software implementation: syntactic constructs of the code, used data structures, program instructions’ order, way of programming, etc.

5) The data input: the executed path of the code depends on the input data.

6) The experimental environment: operating system configuration and version, activated system services, used compiler and optimization flags, workload of the test machine, degradation of the hardware, temperature of the room, etc.

7) The measure of the code performance: experimental methodology (code loading and launching), rigor of the statistical analysis, etc.


All the above factors are difficult to tackle in the same optimization process. The role of the compiler is to optimize a fraction of them only (software implementation and its interaction with the underlying hardware). For a long time, compilation has been considered as one of the most active research topics in computer science. Its importance is not only in the field of programming, code generation and optimization, but also in circuit synthesis, language translation, interpreters, etc. We are all witness of the high number of new languages and processor architectures. It is not worthwhile to create a compiler for each combination of language and processor. The core of a compiler is asked to be common to multiple combinations between programming languages and processor architectures. In the past, compiler backends were specialized per architecture. Nowadays, backends are trying to be increasingly general in order to save the investment cost of developing a compiler.


In universities and schools, classes that teach compilation theory define clear frontiers between frontend and backend:

1) High-level code optimization: this is the set of code transformations applied on an intermediate representation close to the high-level language. Such intermediate representation contains sophisticated syntax constructs (loops and controls) with rich semantics, as well as high-level data structures (arrays, containers, etc.). Analyzing and optimizing at this level of program abstraction tends to improve the performance metrics that are not related to a specific processor architecture. Examples include interprocedural and data dependence analysis, automatic parallelization, scalar and array privatization, loop nest transformations, alias analysis, etc.


2) Low-level code optimization: this is the set of code transformations applied to an intermediate representation close to the final instruction set of the processor (assembly instructions, three address codes, Register Transfer Level (RTL), etc.). The performance metrics optimized at this level of program abstraction are generally related to the processor architecture: number of generated instructions, code size, instruction scheduling, register need, register allocation, register assignment, cache optimization, instruction selection, addressing modes, etc.


The practice is not very attractive. It is not rare to have a code transformation implemented at frontend optimizing for a backend objective: for instance, cache optimization at a loop nest can be done at frontend because the high-level program structure (loops) has yet to be destroyed. Inversely, it is possible to have a high-level analysis implemented at assembly or as binary code, such as data dependence and interprocedural analysis. Compilers are very complex pieces of software that are maintained for a long period of time, and the frontiers between high and low levels can sometimes be difficult to define formally. Nevertheless, the notion of frontend and backend optimization is not fundamental. It is a technical decomposition of compilation mainly for easing the development of the compiler software.


We are interested in backend code optimization mainly due to a personal inclination towards hardware/software frontiers. Even this barrier starts to leak with the development of reconfigurable and programmable architectures, where compilers are asked to generate a part of the instruction set. In this book, we have tried to be as abstract as possible in order to have general results applicable to wide processor families (superscalar, VLIW, Explicitly Parallel Instruction Computing (EPIC), etc.). When the micro-architectural features are too complex to model, we provide technical solutions for practical situations.


79
Views
0
Likes

Licenses:

  • CC BY-NC-SA 3.0 PH
  • The author's reference is not required

Share on networks

eBooks Details:

Comments (0) Add

Кликните на изображение чтобы обновить код, если он неразборчив
No comments yet. Your comment will be the first!