=================== What is a compiler =================== + what is the difference compiler is a translation PL ==> Instruction Set PL ==> PL intrepreter is an evaluation PL ==> Result interpret the code to produce a result compile the code to produce a program that can be executed to produce the result compiled languages c java fortran interpreted languages scheme python ruby is java compiled or interpreted? BOTH!! can combine both compilation and interpretation is there such thing as a compiled or interpreted language? - no, a language is a specification with semantics - implementations are interpreted or compiled + choosing one over the other dynamic languages are usually interpreted, why? - more useful to have around the runtime information - changing the features that are more useful to interpretation - platform independence (Java's byte code, for example) - reflective usage of the evaluator (e.g. a first-order eval function) - dynamic typing - ease of debugging - object polymorphism - dynamic scoping features that are more useful for compilation - speed - complexity of implementation ======================== High Level Organization ======================== + Front end - lexing - parsing - generating the IR + Optimization - many passes can be made to improve the code - idea is to make the code better - using an IR lets us insert different front ends and middle ends + Back End - optimization - register allocation - code generation + Intermediate Representation - allows compiler to work with a simpler representation of the program ======================== Data Structures ======================== + front end - AST for parsing - this is where we turn our Context free grammar into a data + middle - cfg encodes explicit information about the flow of control in the representation. something that is not done by the ast which just encodes the structure of the program syntax + backend - set hold values which we have proved to be true at that point in the graph. - Union-Find used in variety of algorithms, dominance for example examples: constant propagation - variables that hold constant values reaching definitions - variables that are defined then used in that block without being redefined ======================== Intermediate Representations ======================== why? well think about a human translating a sentence from english to german. it usually does not work to translate word for word. we must first turn it into something we understand the meaning and then produce the result. same thing with the compiler. also, it allows us to use multiple front ends for the same compiler + Graphical - AST is one representation. often not good enough for a compiler because they are rigid and and not good for representing things such as control structure. for those a more general graph is useful - Linear Sequence of instructions that execute in order of appearance. Ususally psueudo code for some abstract machine - Hybrid Combines Graphical and Linear. Usually use graphs to represent control flow and Linear to represent straight line code called basic blocks ======================== CFG Cont. ======================== + how many extended basic blocks are here? + more important how would we compute it? + when might these be useful - allows us to do optimization over larger sections of the graph - the set of facts across a basic block is larger than in a single block - more information == better optimization ======================== SSA ======================== + invented in 1992 by researchers at IBM as a way to encode data flow and control flow into the namespace + changed the way compilers are made (from what I can tell) + used in latest version of GCC (> 3.4.0) + what is the time bound - bound depends on N, E, D, U (nodes in cfg, edges, defs, uses). Max value will dominate the time bound, call it R, then convert to SSA is O(R^3). - linear in practice ======================== Why Use SSA? ======================== + variety of optimizations can be improved or made easier - constant propagation - dead code elimination - defined but not used - register allocation - building the interference graph, live ranges - value numbering + why do we have all these different representations? - good for different things - AST closely related to context free grammars - CFG allows for nice clean graph algorithms - SSA makes optimizations easier - what about the cost to transform between them? ======================== Dominators ======================== + important concept used throughout optimization + used in SSA construction + time bound algorithm uses Union Find and is O(N log N + E) can be made nearly linear, but the complexity is not worth it. very simple algorithm for computing dominators can compute with iterative dataflow analysis. DOM(n_0) = \emptyset DOM(n) = \cap_{p \in pred(n)} DOM(p) ======================== Can we do better? ======================== + an expression is available at a node n if it has been previously defined on every path leading to n and none of its operands have been redefined