The concept of LLVM

Our development tools are more or less related to LLVM, so what is LLVM? What does it do? First we need to understand two concepts: interpreted language and compiled language. Interpreted language: it executes as soon as it reads the current code, such as Python. Compiled language: he needs to translate it into a binary that the CPU can read before he can execute it. LLVM 􏰀 􏰁 􏰂 􏰃 􏰄 􏰅 became architecture compiler, it is written in c + +, main effect is to optimize any program written compilation time 􏰃 􏰌 􏰍 􏰎 􏰊 􏰏 􏰐 􏰑 􏰒 􏰋 􏰓 􏰔 􏰕, link, time, running time and leisure time, to developers and compatible with existing scripts.

Compiler design

The image above shows a traditional compiler design, which is a pattern of front and back end separation.

Compiler FrontendParse the source code. It does lexical analysis, syntax analysis, semantic analysis, source code error checking, and builds abstract syntax trees (AST),LLVMThe front end also generates intermediate code (IR).

Optimizer: Be responsible for various optimizations to improve the runtime of the code, such as optimizing redundant calculations in the code.

Backend/CodeGenerator: Converts the optimized code into a binary and maps it to the target instruction set.

The iOS compiler architecture

Oc, C, C ++ use the compiler front end is Clang, Swift use the compiler front end is Swift, the back end is LLVM.

The advantages of LLVM

LLVM is designed to use the generic code representation IR, which is used to represent code in the compiler. LLVM can write front-end independently for any language and back-end for any hardware architecture.

Clang

Clang is a subproject of the LLVM project. It is a lightweight compiler that compiles C, C ++, oc languages and is the front end of the compiler in the LLVM architecture.

The input terminalopen /usr/binYou can see the compiler Clang

Compilation process analysis

  • Start by creating a. M file and CD it to the current file path

  • The input terminalclang -ccc-print-phases main.m, the terminal will print the following

0Enter the file and find the source file.

1: preprocessing phase, dealing with macro replacement, header file introduction.

2: Compilation stage, lexical analysis, grammar analysis, and finally generation of IR.

3: Back end: At this stage, LLVM is optimized one by one through the pass and finally generates assembly code.

4: Generates the target file.

5: link, link required dynamic and static libraries, generate executable files.

6: Generate executable files based on different schemas.

pretreatment

The terminal executionclang -E main.m After execution, we can see that the header file has been imported and the macro has been replaced.

compile

What do you meanLexical analysis? The pre-processing phase splits the code into tokens one by one, such as parentheses, equals signs, strings, etc. This process is calledLexical analysis.What do you meanSyntax analysis?Syntax analysisAfter lexical analysis, it is mainly to verify that the syntax is correct. On the basis of morphology, word sequence numbers are combined into grammatical phrases, such as programs, expressions, etc., and then all nodes are formed into a grammar tree (AST). It mainly analyzes whether the program is structurally correct.

The input terminalclang -fmodules -fsyntax-only -Xclang -ast-dump main.m

The following figure shows the output

Generating IR (Intermediate Representation)

The code generator will translate the syntax tree from top to bottom into IR code.

The input terminal clang -S -fobjc-arc -emit-llvm main.mYou can look at the IR code and see that a.ll file is generated in the directoryThe OC code performs runtime bridging, property synthesis, and ARC processing in this step

  • Basic syntax of IR

@ : global id % : 􏴲􏴴􏲊􏴳 Local ID ALloca: open memory space align: 􏴶􏱭􏰣􏴷 memory alignment i32:32bit, 4 bytes Store: write memory Load: 􏴸􏴹􏴺􏴻 Read data Call: invoke function ret: return

The optimization of the IR

The optimization levels of LLVM are -O0, -O1, -O2, -O3, and -OS terminal instructions clang-os-s-fobjc-ARC-EMIT – LLVM main.m -o main.ll

bitCode

When bitcode is turned on xcode will further optimize the code and generate.bc intermediate code. Terminal instructionsclang -emit-llvm -c main.ll -o main.bcCan optimize IR code generation. BC code

Generating assembly code

Terminal instructionsclang -S -fobjc-arc main.bc -o main.s orclang -S -fobjc-arc main.ll -o main.s You can generate assembly code from.bc or.ll code. Of course, the generated assembly code can also be through terminal instructionsclang -Os -S -fobjc-arc main.m -o main.s Further optimization

Generate object file

Object file generation, is by the assembler to assembly code as input, assembly code into machine code, and finally output into the object file.

Terminal instructionsclang -fmodules -c main.s -o main.oThe assembly file can be output as an object file. Symbols in the main.o file can be viewed using the nm command terminal instructions xcrun nm -nm main.o undefined: 􏲹 􏲺 􏱆 􏲘 􏱏 􏳥 􏲟 􏵆 􏰙 􏳦 􏴂 􏲉 􏴎 􏴌 _printf said in the current document to find symbol.

external: Indicates that the symbol can be accessed externally.

Generate an executable file

The linker will eventually compile the generated.o and.dylib.a files, generating a Mach-o file. Terminal instructions clang main.o -o main

In the same way, you can use nm to view the symbol of the executable after the link.

Terminal instructions xcrun nm -nm main

conclusion

  • Compilation process:

Input code -> Expand preprocessing -> Lexical analysis (token) -> Syntax analysis -> Generate IR ->IR optimization -> generate assembly code -> generate object file -> link dynamic and static libraries to generate executable files.

  • typedef: No preprocessing, not preprocessing instructions.
  • The optimization level is not the higher the better, too high will be useful code optimization away.
  • .oThe file cannot be executed and needs to be linked to an external library. The link is just marking.
  • LLVMAdvantages: Front and rear end separation, scalability is very strong.
  • LLVMThis can affect compilation speed, and optimizing the executable can improve compilation speed.
  • You can optimize on different nodes.