1. Traditional compiler design

1.1 Compiler front end

The task of the compiler front end is to parse the source code. It performs lexical analysis, Syntax analysis, semantic analysis, checks for errors in the code, builds an Abstract Syntax Tree (AST), and then Intermediate Representation (IR).

1.2 the optimizer

The optimizer takes care of various optimizations, reducing package size (stripping symbols), improving code runtime (eliminating redundant calculations, reducing pointer jumps, and so on).

1.3 Back end, code generator

The back end maps the code to the target instruction set, generates machine language, and performs machine-specific code optimizations.

Because traditional compilers (such as GCC) are designed for holistic applications and do not support multiple languages or hardware architectures, their use is limited.

2. The design of the LLVM

LLVM is a compiler architecture system written in C++. Used to optimize compile-time, link-time, run-time, idle-time of programs written in any language.

The most important design in the LLVM design is the use of a common code representation (IR). When a new language needs to be supported, only a separate front end is written to generate IR. When you need to support a new hardware architecture, you just need to write a separate back end that can receive IR.

3.Clang

Clang is an initiative written by Apple as a subproject of the LLVM project. Llvm-based lightweight compiler, originally intended as an alternative to GCC, provides faster compilation times. It is responsible for compiling C, C++, OC language compiler.

3.1 Compilation Process

You can print the compilation of the source code with the following command:


clang -ccc-print-phases main.m

Copy the code

The print result is as follows:

The optimizer is not specified because the optimizer is distributed in the front and back ends.

Next, analyze each step in detail:

0: input source file:

Find the source file

1: Pretreatment stage:

Perform preprocessing instructions, including macro replacement, import of header files, conditional compilation, and generate new source code to the compiler. You can see the code after executing the preprocessor instruction by using the command clang -e main.m.

2: Compile phase -> IR(.ll) or bitcode(.bc) files:

Perform lexical analysis, grammar analysis, semantic analysis, check whether syntax is correct, generate AST, generate IR (bitcode)

2.1 Lexical Analysis:

After the preprocessing, lexical analysis is carried out to separate the code into tokens and indicate the number of lines and columns in which they are located, including keywords, class names, method names, variable names, parentheses, operators, etc

With the following command, you can see the result of the lexical analysis:


clang -fmodules -fsyntax-only -Xclang -dump-tokens main.m

Copy the code

2.2 Grammar Analysis

After lexical analysis, it is grammatical analysis, whose task is to verify the correctness of the grammatical structure of the source program, and combine words into various grammatical phrases, such as statements and expressions, on the basis of lexical analysis. All nodes are then grouped into an abstract syntax tree (AST).

You can run the following command to view the result of syntax analysis:

Clang-fmodules-fsyntax-only - xclang-ast -dump main.m // If the import header file is not found, You can specify the SDK clang-isysroot SDK path -fmodules-fsyntax-only -xclang-ast -dump main.mCopy the code

2.3 generate IR

Translate the syntax tree from top to bottom step by step into LLVM IR. OC code does runtime processing at this step, such as composition of classification attributes, ARC processing, and so on

To generate a.ll text file, view the IR code:


clang -S -fobjc-arc -emit-llvm main.m

Copy the code

The IR code above means:

  1. Test method, enter two parameters %0, %1
  2. Create two variables %3, %4
  1. Write %0 data to %3 and %1 data to %4
  2. Read data from %3 and assign to %5, read data from %4 and assign to %6
  1. Assign the result of %5 plus %6 to %7
  2. Assign the result of %7 plus 3 to %8
  1. Return % 8

IR optimization

In the IR code above, you can see that by translating the syntax tree bit by bit, the GENERATED IR code, which looks a bit silly, can actually be optimized.

IR optimization grades from low to high are: -O0-O1-O2-O3-OS-ofast-OZ

This can be optimized using the command:


clang -Os -S -fobjc-arc -emit-llvm main.m -o main.ll

Copy the code

You can also set it in Xcode: target -> Build Setting -> Optimization Level

Let’s take a look at the code optimized at the Os level:

The IR code above means:

  1. Assign %0 plus 3 to %3
  2. Assign the result of %3 plus %1 to %4
  1. Return to % 4

Optimized IR code, more concise and clear!

bitcode

After xcode7, if bitcode is turned on, apple will further optimize the IR file to generate the intermediate code for the.bc file.

The command is as follows:


clang -emit-llvm -c main.ll -o main.bc

Copy the code

3: Back-end stage -> assembler file (.s) :

The back end constructs the received IR into different processing objects and implements the processing as a Pass type. Through processing Pass, IR conversion, analysis and optimization are completed. Assembly code is then generated.

The command is as follows:

// bitcode ->.s clang-s -fobjc-arc main.bc -o main.s // IR ->.s clang-s -fobjc-arc main.ll -o main.s // also can optimize assembly code  clang -Os -S -fobjc-arc main.ll -o main.sCopy the code

4: Assembly stage -> Generate object file (.o) :

The assembler converts assembly code into machine code to generate mach-O files of object files

The command is as follows:

clang -fmodules -c main.s -o main.o

Copy the code

Check the symbols of the main.o mach-o file using nm’s command


xcrun nm -nm main.o

Copy the code

The print result is as follows:

You can see that an error was reported when executing the command: the external _printf symbol was not found. 􏰋􏵘􏵙􏰆 because this function is imported from the outside, we need to link to other libraries to avoid error 􏰱.

5: Link stage -> Executable Mach-O files:

The executable – (Mach-O) file is generated by linking the object files together and linking the required dynamic libraries (.dylib) and static libraries (.a).

The command is as follows:

clang main.o -o main

Copy the code

You can see that the external symbol printf is still not found in the print, but there is more after it (from libsystem). Specifies that the library in which _printf resides is libsystem. This is because the libSystem dynamic library needs to be bound dynamically at run time. This file is now a correct executable.

Run the following command:

./main

Copy the code

Execution Result:

6. Bind the hardware architecture:

Generate executable files based on x86_64 hardware architecture (Mach-O)

Summary compilation process

1. Commands used in each phase:

//// ====== front-end start ===== // 1. Tokens main.m // 2. Clang-fmodules-fsyntax-only - xclang-ast -dump main.m // 3. Generate IR file clang-s-fobjc-arcemit - LLVM main.m -o main.ll // specify optimization level to generate IR file clang-os-s-fobjc-arcemit - LLVM main.m -o main.ll // 3.2 (according to compiler Settings) Generate bitcode file clang-emit - LLVM -c main.ll -o main. BC //// ====== Back-end start ===== // 1. // bitcode ->.s clang-s -fobjc-arc main.bc -o main.s // IR ->.s clang-s -fobjc-arc main.ll -o main.s // Clang-os-s-fobjc-arc main.ll -o main.s // 2. The mach-o file, clang -fmodules -c main.s -o main. O, is generated. To generate an executable Mach -o file clang. Main O - O main / / / / start = = = = = = = = = = = / / 4. Execute executable Mach -o file./mainCopy the code

2. File types generated at each stage:

3. Schematic diagram of compilation process:

3.2 OC generates C++ files

  • Function:Can put theOCFile compiled intoC++File. For example,main.mCompiled intomain.cppFile, used to better view the underlying structure of the code and implementation logic, easy to understand the underlying principle.
  • Compiling method: Access the directory where the file to be compiled resides on the terminal, and run the following command to compile the file:
CPP clang -rewrite-objc main.m -o main. CPP //2, viewController.m to viewController.cpp clang -rewrite-objc-fobjc-arc-fobjc-Runtime =ios-13.0.0 -isysroot / / Applications/Xcode. App/Contents/Developer/Platforms/iPhoneSimulator platform/Developer/SDKs/iPhoneSimulator13.7. The SDK Viewcontroller.m // The following two ways are through the command line specifying the schema, Use the xcode tool xcrun //3, emulator file compilation -xcrun -sdk iphonesimulator clang -arch arm64-rewrite-objc main.m -o main-arm64.cpp Iphoneos clang -arch arm64 - rerewrite -objc main.m -o main-arm64.cppCopy the code

An 🌰 :


- (instancetype)init {
    self = [super init];
    if (self) {
        NSLog(@"%@-----%@", [self class], [super class]);
    }
    return self;
}

Copy the code

The compiled:

static instancetype _I_LGPerson_init(LGPerson * self, SEL _cmd) {
    self = ((LGPerson *(*)(__rw_objc_super *, SEL))(void *)objc_msgSendSuper)((__rw_objc_super){(id)self, (id)class_getSuperclass(objc_getClass("LGPerson"))}, sel_registerName("init"));
    if (self) {
        NSLog((NSString *)&__NSConstantStringImpl__var_folders_86_0y_j3bzj65z6vw6hy1chw_4m0000gp_T_LGPerson_1615a9_mi_0, ((Class (*)(id, SEL))(void *)objc_msgSend)((id)self, sel_registerName("class")), ((Class (*)(__rw_objc_super *, SEL))(void *)objc_msgSendSuper)((__rw_objc_super){(id)self, (id)class_getSuperclass(objc_getClass("LGPerson"))}, sel_registerName("class")));
    }
    return self;
}

Copy the code

The above is xiaobian to compile information for you, I hope to help you in the future program career. Green mountains do not change, green water flow, see you soon, thank every beautiful woman for her support!