This article is in the form of notes, not all original, see the source at the end of the article

Compiler

Clang – LLVM

Apple (and later NeXT) has always used GCC as the official compiler. GCC has always done a good job as the compiler standard in the open source world, but Apple is going to demand more from its compiler tools.

Clang is a software project initiated by Apple in 2005. It is the front end of the LLVM compiler toolset. The purpose of Clang is to output the Abstract Syntax Tree (AST) corresponding to the code and compile the code into LLVM Bitcode. LLVM is then compiled into platform-specific machine languages on the back-end.

Look at the results

main.m

#import <Foundation/Foundation.h>
#define DEFINEEight 8

int main(){
    @autoreleasepool {
        int eight = DEFINEEight;
        int six = 6;
        NSString* site = [[NSString alloc] initWithUTF8String:"starming"];
        int rank = eight + six;
        NSLog(@"%@ rank %d", site, rank);
    }
    return 0;
}

Copy the code

Compile directly to an execution file

clang -fmodules main.m

Output. The out (executable)

Clang (Frontend Frontend)

Is a compiler front end for C, C++, Objective-C, and Objective-C++ programming languages

Clang source code structure

Clang steps

clang -ccc-print-phases main.m

1.Input (Driver)

Specify the language, schema, and type file

clang -x objective-c main.m

2.Preprocessor

Import header file, include header file, etc

Single pretreatment, and obtain pretreatment results

clang -E main.m

Final results of pretreatment:

# 1 "/ Applications/Xcode. App/Contents/Developer/Platforms/MacOSX platform/Developer/SDKs/MacOSX10.14 SDK/System/Library/Fram eworks/Foundation.framework/Headers/FoundationLegacySwiftCompatibility.h" 1 3
# 185 "/ Applications/Xcode. App/Contents/Developer/Platforms/MacOSX platform/Developer/SDKs/MacOSX10.14 SDK/System/Library/Fram eworks/Foundation.framework/Headers/Foundation.h" 2 3
# 2 "main.m" 2


int main(a){
    @autoreleasepool {
        int eight = 8;
        int six = 6;
        NSString* site = [[NSString alloc] initWithUTF8String:"starming"];
        int rank = eight + six;
        NSLog(@"%@ rank %d", site, rank);
    }
    return 0;
}

Copy the code

The header has been replaced with an explicit global location and the constant DEFINEEight has been replaced with modules when it comes to import

2.1 Modules Module (-fmodules)

Refer to the LLVM Modules documentation

According to the article: Modules provide an alternative, simpler way to use software libraries that provides better compile-time scalability and eliminates many of the problems inherent to using the C preprocessor to access the API of a library.

Clang replaces redundant function libraries with the simple import std. IO concept by introducing #include

, a Java-like package, currently Clang

  1. The use of the Compile-time h.R. Header is easily included in a common file that the compiler has retrieved from the h.R. Header. More introduction of order or resulting in macro conflict Conventional workarounds: C language long habit of information, resulting in the code more ugly Tool confusion

  2. However, when the compiler encounters an import, it loads the module binary and fetches its API. A module does not rely on external headers. It is compiled once, and the API is parsed only once. Module also has some drawbacks including namesSpace (possibly the same name), library code changes, and Arch that can’t be adapted to various machines.

Lexical Analysis ->. I (Tokens)

This step is basic in Compiler, reading characters into the Lexer one by one and recognizing tokens based on word-formation rules, without checking syntax

Do lexical analysis and present Token analysis results

clang -fmodules -fsyntax-only -Xclang -dump-tokens main.m

Each tag contains the corresponding source content and its position in the source. Note that this position is before the macro is expanded, so that if something goes wrong during compilation, Clang will be able to indicate in the source code where it went wrong.

-fsyntax-only: Run the preprocessor, parser and type checking stages.

4 Semantic Analysis -> AST

Syntax analysis: Two modules Parser and Sema are combined in Clang to verify whether the syntax is correct and give correct hints.

4.1 Parser

Iterate over each Token for word analysis to generate information for a node

4.2 Semantic

After Lex and syntax Analysis, which is the stage where you have made sure that the syntax is in the correct form, The semantic then checks return values, size boundaries, uninitialized variables, etc., generates Nodes based on the current information, and combines all Nodes into an abstract grammar.

Do parsing and show the AST

clang -fmodules -fsyntax-only -Xclang -ast-dump main.m

5 Abstract Syntax Tree

At the heart of Clang, most of the optimizations and judgments are handled in the AST (e.g. finding classes, replacing code… Etc.)

This step converts the Clang Attr into an AttributeList on the AST, which can be retrieved from the Clang plugin via Decl::getAttr

Clang Attributes is a source code annotation provided by Clang that allows developers to express certain requirements to the compiler, Participates in the control of processes such as Static Analyzer, Name Mangling, Code Generation, etc., which generally appears in Code as __attribute__(XXX), Ex: NS_CLASS_AVAILABLE_IOS(9_0)

The structure of the AST is the same as that of other Compiler. The difference is that the AST of Clang is made up of C++, like Class and Variable, and the rest are written in assembly language.

This means that the AST can also have corresponding apis, which make it easier for the AST to manipulate, retrieve information, and even carry addresses and code locations.

AST Context: Stores all AST related information and provides traversal methods such as ASTMatcher

Node’s three Class Declarations, STMt-statements, and Type subclasses are too detailed to write here

CodeGen -> IR intermediate code (.ll)

CodeGen is responsible for traversing the syntax tree from top to bottom and translating it into LLVM IR, which is the input to LLVM Backend and the bridge language at the front and back ends.

Output IR: clang-s-fobjc-arc-emit – LLVM main.m -o main.ll

LLVM IR is represented in three formats, the first is a storage format such as bitcode with a.bc suffix, the second is a readable format with.ll, and the third is an in-memory format used to manipulate LLVM IR at development time.

Output Bit clang-emit – LLVM -c main.m -o main.bc

See BitCode LLVM – dis < main. BC | less

6.1 Optimization of IR

IR offers a variety of optimization options, -01-02-03-0S…. For different input parameters, there are passes such as dead code cleanup, inlining, expression reorganization, and loop variable movement.

Extra: Clang plug-in

Libclan G, clang, LibTooling plug-ins were used

You can change the way Clang generates code, add stronger type checking, or check your code for your own definition, and so on. In order to achieve these goals,

reference:

ClangAST clang.llvm.org/ # in – depth – ios-compile-clang Llvm.org/devmtg/2017… # bridge files from Swift to clang-llVM