This is the 28th day of my participation in the August Text Challenge.More challenges in August

1. LLVM

LLVM is a framework system for architecture compilers. It is written in C++ to optimize compile-time, link-time, run-time, and idle-time for programs written in any programming language. Keep it open to developers and compatible with existing scripts

1.1 Traditional compiler design

Source Code + Frontend + Optimizer + Back-end CodeGenerator + Machine Code, as shown in the following figure

1.2 iOS compiler architecture

OC, C and C++ use a compiler with Clang front end, Swift Swift and LLVM back end, as shown in the figure below

Module Description:

  • The front-end Frontend: compiler front-endtaskisParsing source code(compile phase), it will proceedLexical analysis, grammar analysis, semantic analysis, check whether the source code errorAnd then buildAbstract syntax tree(the Abstract Syntax TreeAST),LLVMThe front end will also generateThe middle codeIntermediate representationIR), can be understoodllvmisCompiler + optimizerThe receiver isIRIntermediate code, output is stillIR, which is translated into the target instruction set by the back end
  • Optimizer: The Optimizer is responsible for various optimizations that improve the runtime of your code, such as eliminating redundant calculations
  • Backend (Code Generator)Will:The code maps to the target instruction set, generating machine codeAnd optimize the machine code

1.3 Design of LLVM

The most important aspect of the LLVM design is the use of a common code representation (IR), which is used to represent code in the compiler. All LLVMS can be written independently on the front end for any programming language and independently on the back end for any hardware architecture, as shown below

In a colloquial way, the design of LLVM isFront end separationWhether the front end or the back end changes, it does not affect the other

1.4 introduction of Clang

Clang is a sub-project of the LLVM project. It is a lightweight compiler based on the LLVM architecture diagram. It was originally created to replace GCC and provide faster compilation speed. There are many benefits to studying Clang

2. LLVM compilation process

  • Create a new file and write the following code
int test(int a,int b){
    return a + b + 3;
}


int main(int argc, const char * argv[]) {
    int a = test(1, 2);
    printf("%d",a);
    return 0;
}
Copy the code
  • The compilation process of the source code can be printed by command
/ / * * * * * * * * * * * * command * * * * * * * * * * * * clang - CCC - print - phases. The main m / / * * * * * * * * * * * * the compilation process * * * * * * * * * * * * / / 0 - input file: find the source file + 0: Input, "main.m", Objective-C //1 - Preprocessing stage: This process includes macro replacement, header file import + -1: Preprocessor, {0}, objective-c-pcp-output //2 - Compilation stage: perform lexical analysis, syntax analysis, check whether the syntax is correct, and finally generate IR + -2: Compiler, {1}, IR //3 - back end: LLVM is optimized one pass at a time, and each pass does something to generate assembly code + -3: Backend, {2}, assembler //4 - Assembler code to generate object files + -4: Assembler, {3}, object //5 - Link: link required dynamic libraries and static libraries to generate executable files + -5: Linker, {4}, image (image file) //6 - Bind: generate the corresponding executable file from different architectures 6: bind-arch, "x86_64", {5}, imageCopy the code

The following explains the above process respectively, where 0 is mainly the input file, that is, find the source file. I won’t say too much here

2.1 Pre-compile stage

This stage is mainly to deal with the replacement of the macro, the import of the header file, you can execute the following command, the execution can see the import of the header file and the replacement of the macro

Clang -e main.m >> main2.mCopy the code

Note that:

  • Typedefs are not replaced during the preprocessing phase when aliasing data types
  • defineIn the pretreatment phaseWill be replaced, so it is often used for code confusion, the purpose is to app security, the implementation logic is: the app core classes, core methods, etcAlias with a system-like nameIt is then replaced during the preprocessing phase to obfuscate the code

2.2 Compilation Phase

The compilation stage is mainly for analysis and check of morphology, grammar, etc., and then generate intermediate code IR

2.2.1 Lexical analysis

After the preprocessing is done, a lexical analysis is performed, where the code is sliced into tokens such as brackets, equals signs, and strings,

  • You can run the following command to view the information
clang -fmodules -fsyntax-only -Xclang -dump-tokens main.m
Copy the code
  • If the header file cannot be found, specify the SDK
Clang-isysroot (own SDK path) -fmodules-fsynth-only -xclang-dump -tokens main.m clang-isysroot / Applications/Xcode. App/Contents/Developer/Platforms/iPhoneSimulator platform/Developer/SDKs/iPhoneSimulator14.1 SDK / -fmodules -fsyntax-only -Xclang -dump-tokens main.mCopy the code

2.2.2 Grammar analysis

After the completion of lexical analysis is grammatical analysis, its task is to verify whether the grammar is correct, on the basis of lexical analysis, combine word sequences into various lexical phrases, such as programs, statements, expressions and so on, and then form all nodes into Abstract Syntax Tree (􏰊AST). The parser determines whether a program is structurally correct

  • You can run the following command to view the result of parsing
clang -fmodules -fsyntax-only -Xclang -ast-dump main.m
Copy the code
  • If the import header file is not found, specify the SDK
Clang-isysroot (own SDK path) -fmodules-fsyntax-only - xclang-ast -dump main.m clang-isysroot / Applications/Xcode. App/Contents/Developer/Platforms/iPhoneSimulator platform/Developer/SDKs/iPhoneSimulator14.1 SDK / -fmodules -fsyntax-only -Xclang -ast-dump main.mCopy the code

Among them, it mainly explains the meanings of several keywords

  • – FunctionDecl function
  • – ParmVarDecl parameters
  • -CallExpr calls a function
  • – BinaryOperator operator

2.2.3 Generate intermediate code IR

After completing the above steps, the intermediate Code IR is generated. The Code Generation iterates the syntax tree from top to bottom and translates it into LLVM IR.

  • You can run the following command to generate itLl text file, view IR code. The OC code does runtime bridging, : property synthesis, ARC processing, etc
Clang-s-fobjc-arc-emia-llvm main.m // the following IR basic syntax @ global id % local id alloca open space align memory align i32 32bit, 4 bytes store write memory Load read data call call function RET returnCopy the code

Below is the generated intermediate code.ll file

Where, the parameters of the test function are interpreted as

  • Of course, IR files can be optimized in OC, the general setting is intarget - Build Setting - Optimization Level(Optimizer level). The LLVM optimization levels are respectively-O0 -O1 -O2 -O3 -Os(the first is a capital O), here is the optimized command to generate the intermediate code IR
clang -Os -S -fobjc-arc -emit-llvm main.m -o main.ll
Copy the code

This is the optimized intermediate code

  • After Xcode7 opens bitcode, Apple will further optimize and generate the intermediate code of.bc. We will generate the.BC code through the optimized IR code
clang -emit-llvm -c main.ll -o main.bc
Copy the code

2.3 the back-end

The LLVM backend is optimized one Pass at a time, doing something with each Pass and eventually generating assembly code

2.3.1 Generating assembly code

  • We go through the final.bc or.ll codegenerateAssembly code
 clang -S -fobjc-arc main.bc -o main.s clang -S -fobjc-arc main.ll -o main.s
Copy the code
  • Generating assembly code can also be optimized
clang -Os -S -fobjc-arc main.m -o main.s
Copy the code

2.4 Generating an object file

The generation of the object file is that the assembler takes the assembly code as the insert, converts the assembly code into machine code, and finally outputs the object file.

clang -fmodules -c main.s -o main.o
Copy the code

You can run the nm command to view the symbols in main.o

$xcrun nm -nm main.o
Copy the code

The following symbols in main.o are in object file format

  • _printfThe function is one is oneUndefined, external 的
  • undefinedIndicates that the current file is temporaryThe symbol _printf could not be found
  • externalThat means that the symbol isExternally accessiblethe

2.5 the link

Links are mainly needed to link dynamic libraries and static libraries, which generate executable files

  • Static libraries are merged with executable files
  • Dynamic libraries are independent

The linker creates a Mach-o file by linking the compiled.o file with the.dyld. A file

clang main.o -o main
Copy the code

See the symbol after the link

$xcrun nm -nm main
Copy the code

The result is shown below, where undefined means dynamic binding at run time

Viewing by CommandmainWhat format is it, in this caseMach -o executable file

2.6 binding

The binding is mainly used to generate the corresponding mach-O format executable through different architectures

conclusion

To sum up, the compilation process of LLVM is shown in the figure below