IOS underlying principles + reverse article summary

This article focuses on understanding the LLVM compilation process and the development of the Clang plug-in


LLVM is a framework system for architecture compilers, written in C++, to optimize compile-time, link-time, run-time, and idle-time of programs written in any programming language. Open to developers and compatible with existing scripts

Traditional compiler design

Source Code + Frontend + Optimizer + Backend (CodeGenerator) + Machine Code, as shown in the figure below

Compiler architecture for ios

OC, C, C++The compiler front end used isClang.Swiftisswift, both the back endLLVM, as shown in the figure below

The module specification

  • The front-end Frontend: The front end of the compiler is to parse the source code (compile phase). It will do lexical analysis, Syntax analysis, semantic analysis, check the source code for errors, and then build the Abstract Syntax Tree AST. The front end of LLVM also generates intermediate representation (IR for short), which can be understood as LLVM is a compiler + optimizer, receiving IR intermediate code and output IR to the back end, which is translated into the target instruction set by the back end

  • Optimizer: The Optimizer performs various optimizations to improve the running time of the code, such as eliminating redundant calculations

  • Backend Code Generator: Maps Code to target instruction sets, generates machine Code, and optimizes Code related to machine Code

The design of the LLVM

The most important aspect of the LLVM design is,Use a common code representation (IR)All LLVMS can independently write the front-end for any programming language and the back-end for any hardware architecture, as shown below

LLVM is designed to separate the front and back ends, so that changes to either end do not affect the other

Clang profile

Clang is a subproject of the LLVM project. It is a lightweight compiler based on the LLVM architecture diagram. It was originally created to replace GCC and provide faster compilation. There are many benefits to studying Clang

LLVM compilation process

  • Create a new file and write the following code
int test(int a,int b){
    return a + b + 3;

int main(int argc, const char * argv[]) {
    int a = test(1, 2);
    return 0;
Copy the code
  • The compilation process of the source code can be printed by command
/ / * * * * * * * * * * * * command * * * * * * * * * * * * clang - CCC - print - phases. The main m / / * * * * * * * * * * * * the compilation process * * * * * * * * * * * * / / 0 - input file: find the source file + 0: Input, "main.m", objective-c //1 - Preprocessing: this process includes macro replacement, header import + -1: Preprocessor, {0}, objective-c-cpp-output //2 - compile phase: Compiler, {1}, ir //3 - Back end: Here LLVM will be optimized with a pass, each pass doing something, resulting in assembly code + -3: Backend, {2}, Assembler //4 - Assembler code generates object files + -4: Assembler, {3}, object //5 - Link: Links required dynamic and static libraries to generate executable files + -5: Linker, {4}, image (image file) //6 - binding: generate the corresponding executable file by different architectures 6: bind-arch, "x86_64", {5}, imageCopy the code

The above processes are explained as follows, where 0 mainly refers to the input file, i.e. finding the source file. I won’t go into too much detail here

1. Pre-compile phase

This stage mainly deals with the replacement of macros and the import of header files. You can run the following commands. After the execution, you can see the import of header files and the replacement of macros

Clang -e main.m >> main2.m // Generate the corresponding file to view the replaced source clang -e main.m >> main2.mCopy the code

Note that:

  • Typedefs are not replaced during the preprocessing phase when aliasing a data type

  • The define will be replaced in the pre-processing stage, so it is often used to confuse the code for the purpose of app security. The implementation logic is: The core classes and methods in the app are aliased with similar names in the system, and then they are replaced in the pre-processing stage to achieve the purpose of code confusion

2. Compile phase

In the compilation phase, we mainly analyze and check the morphology and syntax, and then generate the intermediate code IR

1. Lexical analysis

After the preprocessing is completed, lexical analysis will be carried out. Here, the code will be cut into tokens, such as braces, equal signs and strings.

  • You can view it using the following command
clang -fmodules -fsyntax-only -Xclang -dump-tokens main.m
Copy the code
  • If the header file is not found, specify the SDK
Clang-isysroot - fmodules-syntax -only - xclang-dump -tokens main.m clang-isysroot / Applications/Xcode. App/Contents/Developer/Platforms/iPhoneSimulator platform/Developer/SDKs/iPhoneSimulator14.1 SDK / -fmodules -fsyntax-only -Xclang -dump-tokens main.mCopy the code

Here are the results of the lexical analysis of the code

2. Syntax analysis

Syntax analysis is completed after the lexical analysis, its task is to verify whether the grammar is correct, on the basis of lexical analysis, the word sequence is combined into various phrases, such as programs, statements, expressions, and so on, and then all nodes are composed of Abstract Syntax Tree (Abstract Syntax Tree􏰊AST). The parser determines whether the program is structurally correct

  • You can run the following command to view the result of the syntax analysis
clang -fmodules -fsyntax-only -Xclang -ast-dump main.m
Copy the code
  • If the import header file cannot be found, you can specify the SDK
Clang-isysroot (SDK path) -fmodules-syntax -only -xclang-ast -dump main.m clang-isysroot / Applications/Xcode. App/Contents/Developer/Platforms/iPhoneSimulator platform/Developer/SDKs/iPhoneSimulator14.1 SDK / -fmodules -fsyntax-only -Xclang -ast-dump main.mCopy the code

Here are the results of the parsingAmong them, it mainly explains the meaning of a few keywords

  • – FunctionDecl function
  • – ParmVarDecl parameters
  • -CallExpr calls a function
  • – BinaryOperator operator

3. Generate intermediate code IR

After completing the above steps, the intermediate Code IR is generated. The Code Generation iterates the syntax tree from top to bottom and gradually translates it into LLVM IR.

  • Can be generated by the following command.ll text fileTo view the IR code. OC code does runtime bridging, : property synthesis, ARC processing, etc
Clang-s-fobjc-arc -emia-llVM main.m // The following is the IR basic syntax @global id % local id alloca open space align memory alignment i32 32bit, 4 bytes Store Writes memory load reads data Call calls ret returnsCopy the code

Below is the generated intermediate code.ll fileWhere, the parameter of the test function is interpreted as

  • Of course, IR files can be optimized in OC, the general setting is intarget - Build Setting - Optimization Level(Optimizer level). The optimization levels for LLVM are-O0 -O1 -O2 -O3 -Os(the first is a capital O), here is the command with the optimized intermediate code IR
clang -Os -S -fobjc-arc -emit-llvm main.m -o main.ll
Copy the code

This is the optimized intermediate code

  • After opening Bitcode in XCode7, Apple will make further optimization and generate intermediate code of.bc. We will generate.bc code through optimized IR code
clang -emit-llvm -c main.ll -o main.bc
Copy the code

Third, the back end

On the back end, LLVM is primarily optimized through passes, each of which does something and ultimately generates assembly code

Generating assembly code

  • We passed the finalBC or. Ll codegenerateAssembly code
 clang -S -fobjc-arc main.bc -o main.s clang -S -fobjc-arc main.ll -o main.s
Copy the code
  • Generating assembly code can also be optimized
clang -Os -S -fobjc-arc main.m -o main.s
Copy the code

View the generated at this pointmain.sThe file format isAssembly code

4. Generate object files

An object file is generated when the assembler inserts assembly code, converts assembly code into machine code, and outputs an object file.

clang -fmodules -c main.s -o main.o
Copy the code

To see the symbols in main.o, run the nm command

$xcrun nm -nm main.o
Copy the code

Here are the symbols in main.o, in the file formatThe target file

  • The _printf function is an undefined, external function

  • Undefined means the symbol _printf cannot be found in the current file

  • External means that the symbol is externally accessible

Five, links,

Links are mainly required to link dynamic libraries and static libraries to generate executable files, in which

  • Static libraries are merged with executable files

  • Dynamic libraries are independent

The linker links the compiled.o file to.dyld. A file to produce a Mach-o file

clang main.o -o main
Copy the code

View the symbol after the link

$xcrun nm -nm main
Copy the code

The results are shown below, whereundefinedSaid to be inDynamic binding is performed at run time

  • View by commandmainWhat is the format? In this caseMach-o executable file

Sixth, the binding

The binding mainly generates the corresponding Mach-O format executable files through different architectures


To sum up, the compilation process of LLVM is shown in the figure below

Clang plug-in development

1. Preparation

Due to domestic network restrictions, need to use the image to download LLVM source code, here is the image link

  • Download the LLVM project
git clone
Copy the code
  • inLLVMtheprojectsDownload from directoryCompiler-rt, libcXX, libcxxABI
cd .. /projects git clone git clone git clone the code
  • inClangthetoolsinstallextratool
cd .. /tools/clang/tools git clone the code

2. Compile LLVM

Since the latest LLVM only supports cmake for compilation, you need to install cmake

Install cmake

  • Check to see if BREW has cmake installed, and if so, skip the following steps
brew list
Copy the code
  • throughThe brew install cmake
brew install cmake
Copy the code

Compile the LLVM

There are two ways to compile:

  • Compile LLVM in Xcode

  • Compile LLVM using Ninja

Compile LLVM in Xcode

  • Cmake is compiled into an Xcode project
mkdir build_xcode cd build_xcode cmake -G Xcode .. /llvmCopy the code
  • Compile Clang with Xcode
    • Select create Schemes automatically

    • Compile (CMD + B), selectALL_BUILD SechemeCompile, estimated 1+ hours

Note: The i386 architecture is deprecated. You should update your ratings build setting to remove The I386 Architecture is trying to solve it, but no good solution has been found yet (it will be added later).

Alternative: select to create Schemes Schemes manually and compile Clang + ClangTooling

Compile LLVM using Ninja

  • useninjaTo compile, you also need to installninja, use the following command to install Ninja
brew install ninja
Copy the code
  • Create a directory build_ninja in the root directory of LLVM source code, and eventually generate ‘’ in the build_ninja directory

  • Create the LLVM_release directory in the root directory of the LLVM source code, and the final compiled file will be in the LLVM_release folder path

Cmake -g Ninja.. /LLVM -dcmake_install_prefix = installation path (/ Users/ XXX/XXX /LLVM/ LLVM_release)Copy the code
  • Compile and install instructions at once

ninja install
Copy the code

3. Create a plug-in

  • in/llvm/tools/clang/toolsCreate a new plug-inCJLPlugin

  • in/llvm/tools/clang/toolsIn the directoryCMakeLists.txtFile, addedadd_clang_subdirectory(CJLPlugin)And here’sCJLPluginIs created in the previous stepThe plug-in name

  • inCJLPluginDirectory to create two files, respectivelyCJLPlugi.cppCMakeLists.txtAnd, inCMakeLists.txtTo add the following code
CPP touch cMakelists. TXT //2, cMakelists. TXT add the following code add_llVM_library (CJLPlugin) MODULE BUILDTREE_ONLY CJLPlugin.cpp )Copy the code

  • Next use cmake to regenerate the Xcode project inbuild_xcodeDirectory, run the following command
cmake -G Xcode .. /llvmCopy the code
  • You can finally see it in LLVM’s Xcode projectLoadable modulesDirectory by the custom CJLPlugin directory, and then you can write plug-in code in it

Writing plug-in code

In the cjlplugin.cpp file in the CJLPlugin directory, add the following code

// create by CJL // 2020/11/15 #include <iostream> #include "clang/AST/AST.h" #include "clang/AST/DeclObjC.h" #include "clang/AST/ASTConsumer.h" #include "clang/ASTMatchers/ASTMatchers.h" #include "clang/Frontend/CompilerInstance.h" #include "clang/ASTMatchers/ASTMatchFinder.h" #include "clang/Frontend/FrontendPluginRegistry.h" using namespace clang; using namespace std; using namespace llvm; using namespace clang::ast_matchers; // Add a callback function to MatchCallback class CJLMatchCallback; // add a callback function to MatchCallback class CJLMatchCallback; // add a callback function to MatchCallback class CJLMatchCallback. public MatchFinder::MatchCallback { private: //CI transfer path: The CreateASTConsumer method parameter in the CJLASTAction class -cjlConsumer constructor -cjlMatchCallback private property, Get the CompilerInstance &CI from the CJLASTConsumer constructor using the constructor. Bool isUserSourceCode(const string filename) {// the filename is not null if (filenance.empty ()) return false; If (filenames. Find ("/Applications/ /") == 0) return false; return true; } / / determine whether should use the copy modify bool isShouldUseCopy (const string typeStr) {/ / if the types of judgment nsstrings | NSArray | NSDictionary the if (typeStr.find("NSString") ! = string::npos || typeStr.find("NSArray") ! = string::npos || typeStr.find("NSDictionary") ! = string::npos/*... */) { return true; } return false; } public: CJLMatchCallback(CompilerInstance &CI) :CI(CI) {} // Override the run method void run(const MatchFinder::MatchResult &Result) { Const ObjCPropertyDecl *propertyDecl = const ObjCPropertyDecl *propertyDecl = const ObjCPropertyDecl *propertyDecl = const ObjCPropertyDecl *propertyDecl = Result.Nodes.getNodeAs<ObjCPropertyDecl>("objcPropertyDecl"); // Judge the value of the node, And it's the user file if (propertyDecl && IsUserSourceCode (ci.getSourcemanager ().getFilename(propertyDecl->getSourceRange().getBegin()).str())) {//15. Obtain the description of the node ObjCPropertyDecl::PropertyAttributeKind attrKind = propertyDecl->getPropertyAttributes(); String typeStr = propertyDecl->getType().getasString (); / / cout < < "-- -- -- -- -- -- -- -- -- received:" < < typeStr < < "-- -- -- -- -- -- -- -- -" < < endl; If (propertyDecl->getTypeSourceInfo() && isShouldUseCopy(typeStr) &&! (attrKind & ObjCPropertyDecl::OBJC_PR_copy)) {// Send a warning message with CI // DiagnosticsEngine &diag = ci.getDiagnostics (); /* Error location: getBeginLoc = getBeginLoc; GetCustomDiagID (rank, Prompt) * / diag. Report (propertyDecl - > getBeginLoc (), diag. GetCustomDiagID (DiagnosticsEngine: : Warning, "% 0 - this place is recommended to use copy!!" ))<< typeStr; }}}}; //3. Customize CJLASTConsumer, which is inherited from ASTConsumer, to listen for information about AST nodes -- filter class CJLASTConsumer: Public ASTConsumer {private: //AST node lookup filter MatchFinder matcher; // Define the callback class object CJLMatchCallback callback; Public: // create matcherFinder object CJLASTConsumer(CompilerInstance &CI) in constructor: Callback (CI) {// add a MatchFinder, each objcPropertyDecl node is bound to an objcPropertyDecl identifier (to match objcPropertyDecl nodes) Matcher.addmatcher (objcPropertyDecl().bind("objcPropertyDecl"), &callback); } // Implement the two callback methods HandleTopLevelDecl and HandleTranslationUnit. Bool HandleTopLevelDecl(DeclGroupRef D){// cout<<" parsing..." <<endl; return true; Void HandleTranslationUnit(ASTContext & Context) {// cout<<" The file is parsed!" <<endl; Matcher.matchast (context) matcher.matchast (context) matcher.matchast (context); }}; Class CJLASTAction: public PluginASTAction {public: CJLASTAction; // Override ParseArgs and CreateASTConsumer methods bool ParseArgs(const CompilerInstance &ci, const std::vector<std::string> &args) { return true; } // Return an object of type ASTConsumer, where ASTConsumer is an abstract class, that is, the base class /* parses the given plug-in command-line arguments. - param CI compiler instance, used to report diagnostics. -return True if the resolution succeeds. Otherwise, the plug-in is destroyed and no action is performed. This plug-in is responsible for reporting errors using the Diagnostic object CompilerInstance. */ unique_ptr<ASTConsumer> CreateASTConsumer(CompilerInstance &CI, StringRef iFile) {// return the custom CJLASTConsumer, which is a subclass of ASTConsumer /* CI for: */ return unique_ptr<CJLASTConsumer> (new CJLASTConsumer(CI)); }}; } // Step 1: Register the plug-in, And the custom Action class AST syntax tree / / 1, a registered plug-in static FrontendPluginRegistry: : Add < CJLPlugin: : CJLASTAction > CJL (" CJLPlugin ", "This is CJLPlugin");Copy the code

The principle is divided into three main steps

  • [Step 1] Register the plugin and customize the AST syntax tree Action class
    • Inherited fromPluginASTActionTo customize ASTAction, you need to override both methodsParseArgsandCreateASTConsumer, the key method isCreateASTConsumer, the method has an argumentCIThat is, compiling instance objects for two main purposes
      • Used to determine whether the file belongs to the user

      • Used to throw a warning

    • throughFrontendPluginRegistryTo register a plug-in, associate the plug-in name with a custom ASTAction class
  • [Step 2] Scan configuration is complete
    • It inherits from ASTConsumer and implements a custom subclass, CJLASTConsumer, with two arguments, a MatchFinder object, matcher, and a custom callback object, callback, for CJLMatchCallback

    • Implement the constructor, which creates the MatchFinder object and puts the CI base into the callback object

    • Implement two callback methods

      • HandleTopLevelDecl: after parsing a top-level declaration, it calls back once
      • HandleTranslationUnit: callback when the entire file has been parsed, the context after the file has been parsedcontext(AST syntax tree) tomatcher
  • [Step 3] The callback function after scanning
    • Inherited from MatchFinder: : MatchCallback, custom callback class CJLMatchCallback

    • Define the private CompilerInstance property to receive CI information from the ASTConsumer class

    • Override the run method

      • 1. Obtain the corresponding node according to the node tag through result, and the tag in this case should be consistent with that in the constructor of CJLASTConsumer

      • 2, verify that the node has a value and that it is a user file (isUserSourceCode private method

      • 3. Obtain the description of the node

      • 4. Get the type of the node and convert it to a string

      • “Copy” should be used, but it is not

      • 6. Obtain the diagnosis engine through CI

      • 7. Report errors through the diagnosis engine

So, in summary, the flowchart for clang plug-in development is as follows

Then test the plug-in in the terminal

// The clang file compiled by the command format is -isysroot / Applications/Xcode. App/Contents/Developer/Platforms/iPhoneSimulator platform/Developer/SDKs/iPhoneSimulator14.1 SDK / -xclang-load-xclang plug-in (.dyld) path -xclang-add-plugin-xclang plug-in name -c Source path // Example /Users/XXX/Desktop/build_xcode/Debug/bin/clang -isysroot / Applications/Xcode. App/Contents/Developer/Platforms/iPhoneSimulator platform/Developer/SDKs/iPhoneSimulator14.1 SDK / -Xclang -load -Xclang /Users/XXXX/Desktop/build_xcode/Debug/lib/CJLPlugin.dylib -Xclang -add-plugin -Xclang CJLPlugin -c / / Users/XXXX/Desktop/XXX XXXX/test demo/testClang/testClang/ViewController. MCopy the code

4. Xcode integration plug-in

Load the plug-in

  • Open the test project attarget->Build Settings -> Other C FlagsAdd the following
-xclang-load-xclang (.dylib) Dynamic library path -xclang-add-plugin-xclang CJLPluginCopy the code

Setting the compiler

  • Because the clang plug-in needs to be loaded with the corresponding version, if the version is inconsistent, the compilation will fail, as shown below

  • inBuild SettingsTwo new user-defined Settings are added to the column, namelyCCandCXX
    • CC corresponds to the absolute path of the clang you compiled

    • CXX corresponds to the absolute path of your clang++ compilation

  • The next inBuild SettingsIn the searchindexThat will beEnable Index-Wihle-Building FunctionalitytheDefaultInstead ofNO

  • Finally, recompile the test project, and you get the following effect