“GCC” “LLVM” “Clang” “Lexical analysis” “Syntax analysis” “Intermediate code” “object file” author: WYW Review: QiShare team


preface

The author found that the compilation time was relatively long when running a Swift project within a group. So I checked the compilation time of part of the optimization project (of course, part of the reason is that my computer configuration is relatively low). And I plan to record two articles. The first article focuses on the compilation process, and the second article documents my attempt to optimize the compile time for the Swift project.

This is the first article about the compilation process. The author will introduce the content related to the compilation process in this article, which will be expanded as follows.

  1. Compile related nouns to explain; 1.1 Compiler 1.2 Compiler Architecture 1.3 GCC 1.4 Clang 1.5 LLVM

  2. Simple understanding of the compilation process; 2.1 Lexical analysis; 2.2 Grammar analysis; 2.3 Semantic analysis; 2.4 Generate intermediate code; 2.5 Optimization of intermediate code; 2.6 Generating object code;

  3. Analyze the compilation process simply with specific command codes; First of all, I will introduce compiler, GCC and LLVM.

I. Explanation of nouns

1. The compiler

A compiler is not hardware, but a computer program that converts a source program into an object program.

A compiler is a computer program that converts the source code written in one programming language (the source language) into another programming language (the target language). Quoted from wikipedia compiler

2. Compiler architecture

Compiler architecture: 2.1 Frontend: front-end lexical analysis, syntax analysis, semantic analysis, generation of intermediate code (assembly instructions)

2.2 Optimizer: intermediate code optimization (assembler further optimize code generation object file)

2.3 Backend: generates machine code at the Backend (a linker links object files to generate executable files)

3. GCC

GCC (GNU Compiler Collection) is a Compiler that can compile C, Objective-C, C++ and other source programs.

The GNU Compiler Collection includes front ends for C, C++, objective-c, Fortran, Java, Ada, and Go, as well as libraries for these languages (such as libstdc++, libgcj, and so on). GCC was originally written as a compiler specifically for the GNU operating system. The GNU system is completely free software. Here, “freedom” means that it respects the freedom of its users. Quoted from 360 encyclopedia GCC

4. Clang

Clang is a compiler front end for C, Objective-C, C++, and more.

Clang (pronounced/klæŋ/ like Clang [1]) is a compiler front end for C, C++, Objective-C, and Objective-C++ programming languages. It uses LLVM as its back end and releases new versions together, starting with LLVM2.6. It aims to provide an alternative to the GNU compiler suite (GCC), which supports most of the GNU compiler’s compilation Settings as well as extensions to unofficial languages. Quote from Wikipedia Clang

5. LLVM

LLVM (Low Level Virtual Machine). In a narrow sense, LLVM is the back-end part of the Clang compiler that converts compiled code into object files. Broadly speaking, LLVM is used to develop components and toolchains for both the front and back ends of the compiler.

LLVM is a framework system of architecture compilers, written in C++, used to optimize compile-time, link-time, run-time, and idle-time of programs written in any programming language. Keep it open to developers and compatible with existing scripts.

Low Level Virtual Machine Low-level Virtual Machine. Quote from 360 encyclopedia LLVM

6. The whole pre-compilation, compilation and linking process of the project

The following figure shows that a project goes from source code to executable through preprocessing, compilation, loading, and linking.

Below I share a preliminary version of GCC compilation process.

Second, GCC compilation process

This part, the author divided into two parts to share the initial view of GCC to compile the source code to the executable file process, and GCC to compile the source code to the executable file command.

1. View the GCC compilation process

WYW:GCC wangyongwangyongwang$ GCC -ccc-print-phases main.c
0: input, "main.c", c
1: preprocessor, {0}, cpp-output
2: compiler, {1}, ir
3: backend, {2}, assembler
4: assembler, {3}, object
5: linker, {4}, image
6: bind-arch, "x86_64", {5}, image
Copy the code

2. Commands that may be used during compilation

GCCProcess wangyongwangyongwang$GCC -e main.c -o main. I // 2: GCCProcess wangyongwangyongwang$GCC -e main.c -o main. I: Compiler generates the compiler file WYW:GCCProcess wangyongwangyongwang$GCC -s main. i-o main. S // 3: backend, 4: WYW:GCCProcess wangyongwangyongwang$GCC -c main.s -o main.obind-arch Generates executable WYW:GCCProcess wangyongwangyongwang$GCC main.o -o main // Executes executable WYW:GCCProcess wangyongwangyongwang$ sudo ./main Hello, World!Copy the code
2.1. Front-end process
2.1.1 Lexical analysis

It is the process of converting character sequences into token sequences in computer science.

2.1.2 Grammar analysis

According to the given formal grammar, the input text composed of word sequences is analyzed and its grammatical structure is determined.

2.1.3 Semantic analysis

Detect semantic errors in the program (such as checking for the type of parameter to be passed in).

2.1.4 Generate intermediate code

Get abstract computer programs that are easy to generate and translate into target programs.

2.2 Intermediate Process

2.2.1 optimization

2.3 Back-end process

2.3.1 Generating object code

3. The link target file is an executable file

./ Executable name: The name of the executable file in the current path of execution

I’m going to share some details about the Clang compilation process, which I currently understand to be the front end of LLVM. LLVM, broadly speaking, includes Clang.

Clang compilation process

In this part, the author shares the process of viewing the Clang compiled source code to executable file and analyzes the content of the Clang compiled source code to executable file in two parts.

1. Clang compilation process

I first looked at the Clang version on my computer.

Apple LLVM version 11.0.0 (clang-1100.0.33.17)

WYW:~ wangyongwangyongWang $objdump --version Apple LLVM Version 11.0.0 (clang-1100.0.33.17) Optimized build.default Target: x86_64-apple-Darwin19.2.0 Host CPU: Broadwell Registered Targets: aarch64 - AArch64 (little endian) aarch64_be - AArch64 (big endian) arm - ARM arm64 - ARM64 (little endian) armeb - ARM (big endian) thumb - Thumb thumbeb - Thumb (big endian) x86 - 32-bit X86: Pentium-Pro and above x86-64 - 64-bit X86: EM64T and AMD64Copy the code
See how Clang compiles source code to an executable
WYW:LLVM wangyongwangyongwang$ clang -ccc-print-phases clang_main.c
0: input, "clang_main.c", c
1: preprocessor, {0}, cpp-output
2: compiler, {1}, ir
3: backend, {2}, assembler
4: assembler, {3}, object
5: linker, {4}, image
6: bind-arch, "x86_64", {5}, image
Copy the code

2. Analyze the process of Clang compiling main.m to executable file

The main. M file used by the author is as follows:

// Precompile
// The contents of precompiled comments introduced in the header file are not displayed in the precompiled file
#import <Foundation/Foundation.h>
#import <Foundation/NSObjCRuntime.h>

// Conditional compilation
#if __has_include(<UIKit/UIKit.h>)
        #import <UIKit/UIKit.h>
    #ifndef Qi_HAS_IMPORT_UIKit
        #define QUC_HAS_IMPORT_UIKit 1
    #endif
#else
    #ifndef Qi_HAS_IMPORT_UIKit
        #define Qi_HAS_IMPORT_UIKit 0
    #endif
#endif

// macro precompiled
#define QiShareAge 2
// Macro precompiled with arguments
#define QiShareMember(Member, Name) Member#Name

int main(int argc, char * argv[]) {
    
    // NSString * appDelegateClassName;
     @autoreleasepool {
         
         NSLog(@"Hello Clang Compile");
         
         const char *me = QiShareMember("QiShare"."WYW");
         printf("Member: % s \ n", me);
         NSString *member = [NSString stringWithCString:me encoding:NSUTF8StringEncoding];
         NSLog(@Member: "% @", member);
         // Setup code that might create autoreleased objects goes here.
         // appDelegateClassName = NSStringFromClass([AppDelegate class]);
         Hello Clang Compile * member: QiShare"WYW" * member: QiShare"WYW
     }
    // return UIApplicationMain(argc, argv, nil, appDelegateClassName);
}

Copy the code
2.1 the precompiled
// Precompile the command
clang -E main.m -o main_precompile.i
Copy the code
2.1.1 What the precompilation process does
1. #include #import a precompiled directive that inserts the included file into the location of the precompiled directive2. Delete all comments // and block comments but leave the blank line where the comment is located. Add line numbers and file identifiers 4. They are retained because the compiler needs themPragma compiles instructions
Copy the code
2.1.2 Code generated after pre-compilation
# 1 "main.m" # 1 "<built-in>" 1 # 1 "<built-in>" 3 # 374 "<built-in>" 3 # 1 "<command line>" 1 # 1 "<built-in>" 2 # 1 "main.m" 2 # 11 "main.m" # 1 "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Framework s/Foundation.framework/Headers/Foundation.h" 1 3 typedef enum __attribute__((enum_extensibility(closed))) NSComparisonResult : NSInteger NSComparisonResult; enum NSComparisonResult : NSInteger { NSOrderedAscending = -1L, NSOrderedSame, NSOrderedDescending }; typedef NSComparisonResult (^NSComparator)(id obj1, id obj2); typedef enum __attribute__((flag_enum,enum_extensibility(open))) NSEnumerationOptions : NSUInteger NSEnumerationOptions; enum NSEnumerationOptions : NSUInteger { NSEnumerationConcurrent = (1UL << 0), NSEnumerationReverse = (1UL << 1), }; # 1 "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Framework s/Foundation.framework/Headers/FoundationLegacySwiftCompatibility.h" 1 3 # 193 "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Framework s/Foundation.framework/Headers/Foundation.h" 2 3 # 12 "main.m" 2 # 31 "main.m" int main(int argc, char * argv[]) { @autoreleasepool { NSLog(@"Hello Clang Compile"); const char *me = "QiShare""\"WYW\""; Printf ("member: %s \n", me); NSString *member = [NSString stringWithCString:me encoding:NSUTF8StringEncoding]; NSLog (@ member: "% @", member); }}Copy the code

Below 2 picture on the left is to generate precompiled files, on the right side is the contents of the < > Foundation/NSObjCRuntime. H. You can see that there is a one-to-one correspondence.

2.1.3 Summary of precompilation

It can be found that the.i file generated by the precompiled.m file will import the header file, and the import method will specify the specific problems of the header file on the computer; Comments are removed and replaced with blank lines accordingly; Add the line number. For example, the int main function of main.m has a position of line 31 in the main.m file.

2.2 Lexical analysis

Clang-fmodules-e-xclang-dump-tokens main.m or clang-fmodules-e-xclang-dump-tokens main.m > main_precompile

/ / the lexical analysis results annot_module_include '# import < Foundation/Foundation. H > # import < Foundation/NSObjCRuntime. H > / / # if conditional compilation __has_include(<UIKit/UIKit.h>) #import <UIKit/UIKit.h> #ifndef Qi_HAS_IMPORT_UIKit #define QUC_HAS_IMPORT_UIKit 1 #endif Loc = # else '< main. M: now > annot_module_include' # import < Foundation/NSObjCRuntime. H > / / # if conditional compilation __has_include(<UIKit/UIKit.h>) #import <UIKit/UIKit.h> #ifndef Qi_HAS_IMPORT_UIKit #define QUC_HAS_IMPORT_UIKit 1 #endif  #else #ifndef Qi_HAS_IMPORT_UIKit #define Qi_HAS_IMPORT_UIKit 0 ' Loc=<main.m:12:1> int 'int' [StartOfLine] Loc=<main.m:31:1> identifier 'main' [LeadingSpace] Loc=<main.m:31:5> l_paren '(' Loc=<main.m:31:9> int 'int' Loc=<main.m:31:10> identifier 'argc' [LeadingSpace] Loc=<main.m:31:14> comma ',' Loc=<main.m:31:18> char 'char' [LeadingSpace] Loc=<main.m:31:20> star '*' [LeadingSpace] Loc=<main.m:31:25> identifier 'argv' [LeadingSpace] Loc=<main.m:31:27> l_square '[' Loc=<main.m:31:31> r_square ']' Loc=<main.m:31:32> r_paren ')' Loc=<main.m:31:33> l_brace '{' [LeadingSpace] Loc=<main.m:31:35> at '@' [StartOfLine] [LeadingSpace] Loc=<main.m:34:6> identifier 'autoreleasepool' Loc=<main.m:34:7> l_brace '{' [LeadingSpace] Loc=<main.m:34:23> identifier 'NSLog' [StartOfLine] [LeadingSpace] Loc=<main.m:36:10> l_paren '(' Loc=<main.m:36:15> at '@' Loc=<main.m:36:16> string_literal '"Hello Clang Compile"' Loc=<main.m:36:17> r_paren ')' Loc=<main.m:36:38> semi '; ' Loc=<main.m:36:39> const 'const' [StartOfLine] [LeadingSpace] Loc=<main.m:38:10> char 'char' [LeadingSpace] Loc=<main.m:38:16> star '*' [LeadingSpace] Loc=<main.m:38:21> identifier 'me' Loc=<main.m:38:22> equal '=' [LeadingSpace] Loc=<main.m:38:25> string_literal '"QiShare"' [LeadingSpace] Loc=<main.m:38:27 <Spelling=main.m:38:41>> string_literal '"\"WYW\""' Loc=<main.m:38:27 <Spelling=<scratch space>:3:1>> semi '; ' Loc=<main.m:38:57> identifier 'printf' [StartOfLine] [LeadingSpace] Loc=<main.m:39:10> l_paren '(' Loc=<main.m:39:16> String_literal '" member: %s \n"' Loc=<main.m:39:17> comma ',' Loc=<main.m:39:33> identifier 'me' [LeadingSpace] Loc=<main.m:39:35> r_paren ')' Loc=<main.m:39:37> semi '; ' Loc=<main.m:39:38> identifier 'NSString' [StartOfLine] [LeadingSpace] Loc=<main.m:40:10> star '*' [LeadingSpace] Loc=<main.m:40:19> identifier 'member' Loc=<main.m:40:20> equal '=' [LeadingSpace] Loc=<main.m:40:27> l_square '[' [LeadingSpace] Loc=<main.m:40:29> identifier 'NSString' Loc=<main.m:40:30> identifier 'stringWithCString' [LeadingSpace]  Loc=<main.m:40:39> colon ':' Loc=<main.m:40:56> identifier 'me' Loc=<main.m:40:57> identifier 'encoding' [LeadingSpace]  Loc=<main.m:40:60> colon ':' Loc=<main.m:40:68> identifier 'NSUTF8StringEncoding' Loc=<main.m:40:69> r_square ']' Loc=<main.m:40:89> semi '; ' Loc=<main.m:40:90> identifier 'NSLog' [StartOfLine] [LeadingSpace] Loc=<main.m:41:10> l_paren '(' Loc=<main.m:41:15> At '@' Loc=<main.m:41:16> string_literal '"member: %@"' Loc=<main.m:41:17> comma ',' Loc=<main.m:41:30> identifier 'member' [LeadingSpace] Loc=<main.m:41:32> r_paren ')' Loc=<main.m:41:38> semi '; ' Loc=<main.m:41:39> r_brace '}' [StartOfLine] [LeadingSpace] Loc=<main.m:49:6> r_brace '}' [StartOfLine] Loc=<main.m:51:1> eof '' Loc=<main.m:51:2>Copy the code

2.2.1 Summary of lexical analysis

According to the above screenshot, it can be seen that the keyword and method name descriptors, commas, semicolons, and line and column numbers of parentheses are all recorded in the lexical analysis process.

2.3 Grammar Analysis

Syntax analysis command: clang-fmodules-fsyntax-only – xclang-ast -dump main.m

// WYW:clang wangyongwangyongwang$ clang -fmodules -fsyntax-only -Xclang -ast-dump main.m
// 语法分析输出内容
TranslationUnitDecl 0x7fbeb702da08 <<invalid sloc>> <invalid sloc>
|-TypedefDecl 0x7fbeb702e2a0 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
| `-BuiltinType 0x7fbeb702dfa0 '__int128'
|-TypedefDecl 0x7fbeb702e308 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'
| `-BuiltinType 0x7fbeb702dfc0 'unsigned __int128'
|-TypedefDecl 0x7fbeb702e3a0 <<invalid sloc>> <invalid sloc> implicit SEL 'SEL *'
| `-PointerType 0x7fbeb702e360 'SEL *' imported
|   `-BuiltinType 0x7fbeb702e200 'SEL'
|-TypedefDecl 0x7fbeb702e478 <<invalid sloc>> <invalid sloc> implicit id 'id'
| `-ObjCObjectPointerType 0x7fbeb702e420 'id' imported
|   `-ObjCObjectType 0x7fbeb702e3f0 'id' imported
|-TypedefDecl 0x7fbeb702e558 <<invalid sloc>> <invalid sloc> implicit Class 'Class'
| `-ObjCObjectPointerType 0x7fbeb702e500 'Class' imported
|   `-ObjCObjectType 0x7fbeb702e4d0 'Class' imported
|-ObjCInterfaceDecl 0x7fbeb702e5a8 <<invalid sloc>> <invalid sloc> implicit Protocol
|-TypedefDecl 0x7fbeb702e8e8 <<invalid sloc>> <invalid sloc> implicit __NSConstantString 'struct __NSConstantString_tag'
| `-RecordType 0x7fbeb702e700 'struct __NSConstantString_tag'
|   `-Record 0x7fbeb702e670 '__NSConstantString_tag'
|-TypedefDecl 0x7fbeb702e980 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'
| `-PointerType 0x7fbeb702e940 'char *'
|   `-BuiltinType 0x7fbeb702daa0 'char'
|-TypedefDecl 0x7fbeb7834868 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list 'struct __va_list_tag [1]'
| `-ConstantArrayType 0x7fbeb7834810 'struct __va_list_tag [1]' 1
|   `-RecordType 0x7fbeb7834690 'struct __va_list_tag'
|     `-Record 0x7fbeb7834600 '__va_list_tag'
|-ImportDecl 0x7fbeb79ef2a8 <main.m:11:1> col:1 implicit Foundation
|-ImportDecl 0x7fbeb79ef2e0 <line:12:1> col:1 implicit Foundation.NSObjCRuntime
|-FunctionDecl 0x7fbeb79ef580 <line:31:1, line:51:1> line:31:5 main 'int (int, char **)'
| |-ParmVarDecl 0x7fbeb79ef330 <col:10, col:14> col:14 argc 'int'
| |-ParmVarDecl 0x7fbeb79ef440 <col:20, col:32> col:27 argv 'char **':'char **'
| `-CompoundStmt 0x7fbeb88158a8 <col:35, line:51:1>
|   `-ObjCAutoreleasePoolStmt 0x7fbeb8815890 <line:34:6, line:49:6>
|     `-CompoundStmt 0x7fbeb8815858 <line:34:23, line:49:6>
|       |-CallExpr 0x7fbeb7a000d0 <line:36:10, col:38> 'void'
|       | |-ImplicitCastExpr 0x7fbeb7a000b8 <col:10> 'void (*)(id, ...)' <FunctionToPointerDecay>
|       | | `-DeclRefExpr 0x7fbeb79fffa8 <col:10> 'void (id, ...)' Function 0x7fbeb79ef688 'NSLog' 'void (id, ...)'
|       | `-ImplicitCastExpr 0x7fbeb7a000f8 <col:16, col:17> 'id':'id' <BitCast>
|       |   `-ObjCStringLiteral 0x7fbeb7a00038 <col:16, col:17> 'NSString *'
|       |     `-StringLiteral 0x7fbeb7a00008 <col:17> 'char [20]' lvalue "Hello Clang Compile"
|       |-DeclStmt 0x7fbeb7a005d0 <line:38:10, col:57>
|       | `-VarDecl 0x7fbeb7a00128 <col:10, <scratch space>:3:1> main.m:38:22 used me 'const char *' cinit
|       |   `-ImplicitCastExpr 0x7fbeb7a00208 <col:41, <scratch space>:3:1> 'const char *' <NoOp>
|       |     `-ImplicitCastExpr 0x7fbeb7a001f0 <main.m:38:41, <scratch space>:3:1> 'char *' <ArrayToPointerDecay>
|       |       `-StringLiteral 0x7fbeb7a001c8 <main.m:38:41, <scratch space>:3:1> 'char [13]' lvalue "QiShare\"WYW\""
|       |-CallExpr 0x7fbeb7a006f0 <main.m:39:10, col:37> 'int'
|       | |-ImplicitCastExpr 0x7fbeb7a006d8 <col:10> 'int (*)(const char *, ...)' <FunctionToPointerDecay>
|       | | `-DeclRefExpr 0x7fbeb7a005e8 <col:10> 'int (const char *, ...)' Function 0x7fbeb7a00228 'printf' 'int (const char *, ...)'
|       | |-ImplicitCastExpr 0x7fbeb7a00738 <col:17> 'const char *' <NoOp>
|       | | `-ImplicitCastExpr 0x7fbeb7a00720 <col:17> 'char *' <ArrayToPointerDecay>
|       | |   `-StringLiteral 0x7fbeb7a00648 <col:17> 'char [14]' lvalue "member\357\274\232%s \n"
|       | `-ImplicitCastExpr 0x7fbeb7a00750 <col:35> 'const char *' <LValueToRValue>
|       |   `-DeclRefExpr 0x7fbeb7a00670 <col:35> 'const char *' lvalue Var 0x7fbeb7a00128 'me' 'const char *'
|       |-DeclStmt 0x7fbeb88156c8 <line:40:10, col:90>
|       | `-VarDecl 0x7fbeb7a00780 <col:10, col:89> col:20 used member 'NSString *' cinit
|       |   `-ObjCMessageExpr 0x7fbeb8815688 <col:29, col:89> 'NSString * _Nullable':'NSString *' selector=stringWithCString:encoding: class='NSString'
|       |     |-ImplicitCastExpr 0x7fbeb8815670 <col:57> 'const char *' <LValueToRValue>
|       |     | `-DeclRefExpr 0x7fbeb8815000 <col:57> 'const char *' lvalue Var 0x7fbeb7a00128 'me' 'const char *'
|       |     `-DeclRefExpr 0x7fbeb8815330 <col:69> 'NSStringEncoding':'unsigned long' EnumConstant 0x7fbeb8815028 'NSUTF8StringEncoding' 'NSStringEncoding':'unsigned long'
|       `-CallExpr 0x7fbeb88157d0 <line:41:10, col:38> 'void'
|         |-ImplicitCastExpr 0x7fbeb88157b8 <col:10> 'void (*)(id, ...)' <FunctionToPointerDecay>
|         | `-DeclRefExpr 0x7fbeb88156e0 <col:10> 'void (id, ...)' Function 0x7fbeb79ef688 'NSLog' 'void (id, ...)'
|         |-ImplicitCastExpr 0x7fbeb8815800 <col:16, col:17> 'id':'id' <BitCast>
|         | `-ObjCStringLiteral 0x7fbeb8815760 <col:16, col:17> 'NSString *'
|         |   `-StringLiteral 0x7fbeb8815738 <col:17> 'char [12]' lvalue "member\357\274\232%@"
|         `-ImplicitCastExpr 0x7fbeb8815818 <col:32> 'NSString *' <LValueToRValue>
|           `-DeclRefExpr 0x7fbeb8815780 <col:32> 'NSString *' lvalue Var 0x7fbeb7a00780 'member' 'NSString *'
`-<undeserialized declarations>
Copy the code

It can be found that the syntax analysis process will describe the directory of the file to be compiled in the structure of the syntax tree, such as the start and end lines of the main function, the automatic release of pool parameters, string variables, the line number of the function to call, and other information.

The author found that the parser is used for:

  • Decompile parsing
  • Code highlighting
  • Keyword matching
  • Scope judgment
  • Code compression

In fact, we can try to make grammatical errors. Look more specifically at the use of the lexical and grammatical sections.

2.3.1 Making Syntax Errors 1

If the following errors occur, there will be no problem during lexical analysis, but errors will be reported during grammatical analysis.

int int main(int argc, char * argv[]) {}Copy the code

Results of lexical analysis

'		Loc=<main.m:12:1>
int 'int'	 [StartOfLine]	Loc=<main.m:31:1>
int 'int'	 [LeadingSpace]	Loc=<main.m:31:5>
identifier 'main'	 [LeadingSpace]	Loc=<main.m:31:9>
l_paren '('		Loc=<main.m:31:13>
Copy the code

Parsing results

main.m:31:5: error: cannot combine with previous 'int' declaration specifier
int int main(int argc, char * argv[]) {
    ^
TranslationUnitDecl 0x7ff14302c808 <<invalid sloc>> <invalid sloc>

...

|           `-DeclRefExpr 0x7ff143922b80 <col:32> 'NSString *' lvalue Var 0x7ff14391f780 'member' 'NSString *'
`-<undeserialized declarations>
1 error generated.
Copy the code
2.3.2 Making syntax errors 2
void logSomething(NSString *str1, NSString *str2) {
    
    NSLog(@Str1: %@--str2: %@, str1, str2);
}

int main(int argc, char * argv[]) {
    
    // NSString * appDelegateClassName;
     @autoreleasepool {
     	logSomething(1.2); }}Copy the code
main.m:41:23: warning: incompatible integer to pointer conversion passing 'int'
      to parameter of type 'NSString *' [-Wint-conversion]
         logSomething(1, 2);
                      ^
main.m:31:29: note: passing argument to parameter 'str1' here
void logSomething(NSString *str1, NSString *str2) {
                            ^
main.m:41:26: warning: incompatible integer to pointer conversion passing 'int'
      to parameter of type 'NSString *' [-Wint-conversion]
         logSomething(1, 2);
                         ^
main.m:31:45: note: passing argument to parameter 'str2' here
void logSomething(NSString *str1, NSString *str2) {
                                            ^

Copy the code

It can be found that the syntax error like two consecutive int keywords will not be reported in the lexical analysis process, in the syntax analysis, there will be corresponding error.

2.4 Generate intermediate code

To generate intermediate code, run the following command: clang-s – fobjc-arc-emit – LLVM main.m -o main.ll You can run the cat command to view the contents of intermediate code.

2.4.1 Intermediate code reduction

The following code is intermediate code, which is identified with the file name and system name. The intermediate code will have constant strings, caches, instance variables, message lists, attribute lists, and so on. I don’t know much about intermediate code and don’t do analysis.

WYW:clang wangyongwangyongwang$ clang -S -fobjc-arc -emit-llvm main.m -o main.ll WYW:clang wangyongwangyongwang$ cat main.ll ; ModuleID = 'main.m' source_filename = "main.m" target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-apple-macosx10.15.0" // Constant string, %0 = type opaque %struct.__NSConstantString_tag = type {i32*, i32, i8*, i64 } %struct._class_t = type { %struct._class_t*, %struct._class_t*, %struct._objc_cache*, i8* (i8*, i8*)**, %struct._class_ro_t* } %struct._objc_cache = type opaque %struct._class_ro_t = type { i32, i32, i32, i8*, i8*, %struct.__method_list_t*, %struct._objc_protocol_list*, %struct._ivar_list_t*, i8*, %struct._prop_list_t* } %struct.__method_list_t = type { i32, i32, [0 x %struct._objc_method] } %struct._objc_method = type { i8*, i8*, i8* } %struct._objc_protocol_list = type { i64, [0 x %struct._protocol_t*] } %struct._protocol_t = type { i8*, i8*, %struct._objc_protocol_list*, %struct.__method_list_t*, %struct.__method_list_t*, %struct.__method_list_t*, %struct.__method_list_t*, %struct._prop_list_t*, i32, i32, i8**, i8*, %struct._prop_list_t* } %struct._ivar_list_t = type { i32, i32, [0 x %struct._ivar_t] } %struct._ivar_t = type { i64*, i8*, i8*, i32, i32 } %struct._prop_list_t = type { i32, i32, [0 x %struct._prop_t] } %struct._prop_t = type { i8*, i8* } @__CFConstantStringClassReference = external global [0 x i32] @.str = private unnamed_addr constant [20 x i8] c"Hello Clang Compile\00", section "__TEXT,__cstring,cstring_literals", align 1 @_unnamed_cfstring_ = private global %struct.__NSConstantString_tag { i32* getelementptr inbounds ([0 x i32], [0 x i32]* @__CFConstantStringClassReference, i32 0, i32 0), i32 1992, i8* getelementptr inbounds ([20 x i8], [20 x i8]* @.str, i32 0, i32 0), i64 19 }, section "__DATA,__cfstring", align 8 @.str.1 = private unnamed_addr constant [13 x i8] c"QiShare\22WYW\22\00", align 1 @.str.2 = private unnamed_addr constant [14 x i8] c"member\EF\BC\9A%s \0A\00", align 1 ; Function Attrs: nounwind declare i8* @llvm.objc.autoreleasePoolPush() #1 declare void @NSLog(i8*, ...) #2 declare i32 @printf(i8*, ...) # 2. Function Attrs: nonlazybind declare i8* @objc_msgSend(i8*, i8*, ...) # 3. Function Attrs: nounwind declare i8* @llvm.objc.retainAutoreleasedReturnValue(i8*) #1 ; Function Attrs: nounwind declare void @llvm.objc.storeStrong(i8**, i8*) #1 ; Function Attrs: nounwind declare void @llvm.objc.autoreleasePoolPop(i8*) #1 ! 0 =! {i32 2, !" SDK Version", [2 x i32] [i32 10, i32 15]} ! 1 =! {i32 1, !" Objective-C Version", i32 2} ! 2 =! {i32 1, !" Objective-C Image Info Version", i32 0} ! 3 =! {i32 1, !" Objective-C Image Info Section", !" __DATA,__objc_imageinfo,regular,no_dead_strip"} ! 4 =! {i32 4, !" Objective-C Garbage Collection", i32 0} ! 5 =! {i32 1, !" Objective-C Class Properties", i32 64} ! 6 =! {i32 1, !" wchar_size", i32 4} ! 7 =! {i32 7, !" PIC Level", i32 2} ! 8 =! {!" Apple Clang Version 11.0.0 (clang-1100.0.33.17)"}! 9 =! {}Copy the code
To generate the assembly

Generate a compilation file: clang -s main.m -o main_compile. S and run cat main_compile.

Abridged version of the assembly file

The following compilation documents are annotated by the author with reference to online materials.

// WYW:clang wangyongwangyongwang$ clang -S main.m -o main_compile.s // WYW:clang wangyongwangyongwang$ cat The main_compile. S // assembly directive //.section directive specifies which segment will be executed next. .section __TEXT,__text,regular,pure_instructions .build_version macos, 10, 15 sdk_version 10, 15 //. Globl directive description _main is an external symbol. So that's our main() function. This function is visible to the outside of the binary because it is called by the system to run the executable. The.globl _main ## -- Begin function main //.align directive indicates the alignment of the following code. In our code, the rest of the code will be aligned to 16(2^4) bytes and padded with 0x90 if necessary. The.p2align 4, 0x90 // main header: _main: ## @main //. Cfi_startproc directive is usually used at the beginning of a function. CFI stands for Call Frame Information. This call frame corresponds loosely to a function. It matches the.cfi_endproc that follows to mark where the main() function ends. .cfi_startproc ## %bb.0: On OS X, we will have code for X86_64. For this architecture, there is something called the ABI (Application Binary Interface), which specifies how function calls work at the assembly code level. The ABI lets the RBP register (the base pointer Register) be protected during a function call. It is the responsibility of main to ensure that the RBP register has the same value as before when the function call returns. Pushq % RBP pushes the value of RBP onto the stack so that we can pop it out later. Pushq % RBP // Next are two CFI instructions: this will output some information about generating call stack unwinding and debugging. We've changed the stack and the base pointer, and these two instructions tell the compiler where they are, or more precisely, they make sure the debugger finds something later when it wants to use the information. Cfi_offset % RBP, -16 // Next, movq % RSP, % RBP will place local variables on the stack. Subq $32, % RSP moves the stack pointer 32 bytes, where the function will be called. We first store the old stack pointer in RBP, then use this as the base address for our local variables, and then we update the stack pointer to where we will use it. movq %rsp, %rbp .cfi_def_cfa_register %rbp subq $64, %rsp movl %edi, -4(%rbp) movq %rsi, -16(% RBP) // Call callq _objc_autoreleasePoolPush leaq L__unnamed_cfstring_(%rip), %rsi movq %rsi, %rdi movq %rax, -40(% RBP) ## 8-byte Spill movb $0, %al -24(%rbp) movq -24(%rbp), %rsi leaq L_.str.2(%rip), %rdi movb $0, // call printf callq _printf movq L_OBJC_CLASSLIST_REFERENCES_$_(%rip), %rsi movq-24 (% RBP), %rdx movq L_OBJC_SELECTOR_REFERENCES_(%rip), %rdi movq %rdi, -48(%rbp) ## 8-byte Spill movq %rsi, %rdi movq -48(%rbp), %rsi ## 8-byte Reload movl $4, %ecx movl %eax, -52(%rbp) ## 4-byte Spill callq *_objc_msgSend@GOTPCREL(%rip) leaq L__unnamed_cfstring_.4(%rip), %rcx movq %rax, -32(%rbp) movq -32(%rbp), %rsi movq %rcx, %rdi movb $0, %al callq _NSLog movq -40(%rbp), %rdi ## 8-byte Reload callq _objc_autoreleasePoolPop xorl %eax, %eax addq $64, Cfi_endproc ## -- End function. section __TEXT,__cstring,cstring_literals L_. STR:  ## @.str .asciz "Hello Clang Compile"Copy the code
2.5 Generating an object File

The command to generate the object file is clang-fmodules -c main.m -o main_obj.o. You can view the contents of the target file using objdump. You can run the objdump -t main_obj.o or objdump -syms main_obj.o command to view the symbol table entry of the target file. For more objdump commands, see 14.objdump Binary File Analysis

// clang -fmodules -c main.m -o main_obj.o
// WYW:clang wangyongwangyongwang$ objdump -t main_obj.o

main_obj.o:	file format Mach-O 64-bit x86- 64.

SYMBOL TABLE:
0000000000000140 l     O __TEXT,__ustring	l_.str3.
0000000000000000 g     F __TEXT,__text	_main
0000000000000000         *UND*	_NSLog
0000000000000000         *UND*	_OBJC_CLASS_$_NSString
0000000000000000         *UND*	___CFConstantStringClassReference
0000000000000000         *UND*	_objc_autoreleasePoolPop
0000000000000000         *UND*	_objc_autoreleasePoolPush
0000000000000000         *UND*	_objc_msgSend
0000000000000000         *UND*	_printf
Copy the code

Most of the following is taken from the Mach-o executable

2.5.1 Viewing the Contents of the target File section

An executable file contains multiple sections. Different parts of the executable file will be loaded into different sections, and each section will be converted into a segment. This concept holds true for all executables.

Let’s look at sections in a.out binary. We can use the size tool to observe this

xcrun size -x -l -m main_obj.o

WYW:clang wangyongwangyongwang$ xcrun size -x -l -m main_obj.o
Segment : 0x1c0 (vmaddr 0x0 fileoff 1488)
	Section (__TEXT, __text): 0x9b (addr 0x0 offset 1488)
	Section (__TEXT, __cstring): 0x2f (addr 0x9b offset 1643)
	Section (__DATA, __cfstring): 0x40 (addr 0xd0 offset 1696)
	Section (__DATA, __objc_classrefs): 0x8 (addr 0x110 offset 1760)
	Section (__TEXT, __objc_methname): 0x1c (addr 0x118 offset 1768)
	Section (__DATA, __objc_selrefs): 0x8 (addr 0x138 offset 1800)
	Section (__TEXT, __ustring): 0x14 (addr 0x140 offset 1808)
	Section (__DATA, __objc_imageinfo): 0x8 (addr 0x154 offset 1828)
	Section (__LD, __compact_unwind): 0x20 (addr 0x160 offset 1840)
	Section (__TEXT, __eh_frame): 0x40 (addr 0x180 offset 1872)
	total 0x1b2
total 0x1c0
Copy the code

The __TEXT segment contains the code to be executed. It is mapped in a read-only and executable manner. The process is allowed to execute the code, but not modify it. The code can’t make changes to itself, so the mapped pages are never changed.

__DATA segments are mapped in both read-write and non-executable ways. It contains the data that will be changed.

Otool (1) to view the contents of a section:

xcrun otool -s __TEXT __text main_obj.o

WYW:clang wangyongwangyongwang$ xcrun otool -s __TEXT __text main_obj.o
main_obj.o:
Contents of (__TEXT,__text)section 0000000000000000 55 48 89 e5 48 83 ec 40 89 7d fc 48 89 75 f0 e8 0000000000000010 00 00 00 00 48 8d 35 b5 00 00 00 48 89 f7 48 89 0000000000000020 45 d8 b0 00 e8 00 00 00 00 48 8d 35 7f 00 00 00 0000000000000030 48 89 75 e8 48 8b 75  e8 48 8d 3d 7d 00 00 00 b0 0000000000000040 00 e8 00 00 00 00 48 8b 35 c3 00 00 00 48 8b 55 0000000000000050 e8 48 8b 3d e0 00 00 00 48 89 7d d0 48 89 f7 48 0000000000000060 8b 75 d0 b9 04 00 00 00 89 45 cc ff 15 00 00 00 0000000000000070  00 48 8d 0d 78 00 00 00 48 89 45 e0 48 8b 75 e0 0000000000000080 48 89 cf b0 00 e8 00 00 00 00 48 8b 7d d8 e8 00 0000000000000090 00 00 00 31 c0 48 83 c4 40 5d c3Copy the code
2.5.2 the disassembly

Xcrun otool -v -t main_obj.o

WYW:clang wangyongwangyongwang$ xcrun otool -v -t main_obj.o
main_obj.o:
(__TEXT,__text) section
_main:
0000000000000000	pushq	%rbp
0000000000000001	movq	%rsp, %rbp
0000000000000004	subq	$0x40, %rsp
0000000000000008	movl	%edi, -0x4(%rbp)
000000000000000b	movq	%rsi, -0x10(%rbp)
000000000000000f	callq	0x14
0000000000000014	leaq	0xb5(%rip), %rsi
000000000000001b	movq	%rsi, %rdi
000000000000001e	movq	%rax, -0x28(%rbp)
0000000000000022	movb	$0x0, %al
0000000000000024	callq	0x29
0000000000000029	leaq	0x7f(%rip), %rsi
0000000000000030	movq	%rsi, -0x18(%rbp)
0000000000000034	movq	-0x18(%rbp), %rsi
0000000000000038	leaq	0x7d(%rip), %rdi
000000000000003f	movb	$0x0, %al
0000000000000041	callq	0x46
0000000000000046	movq	0xc3(%rip), %rsi
000000000000004d	movq	-0x18(%rbp), %rdx
0000000000000051	movq	0xe0(%rip), %rdi
0000000000000058	movq	%rdi, -0x30(%rbp)
000000000000005c	movq	%rsi, %rdi
000000000000005f	movq	-0x30(%rbp), %rsi
0000000000000063	movl	$0x4, %ecx
0000000000000068	movl	%eax, -0x34(%rbp)
000000000000006b	callq	*(%rip)
0000000000000071	leaq	0x78(%rip), %rcx
0000000000000078	movq	%rax, -0x20(%rbp)
000000000000007c	movq	-0x20(%rbp), %rsi
0000000000000080	movq	%rcx, %rdi
0000000000000083	movb	$0x0, %al
0000000000000085	callq	0x8a
000000000000008a	movq	-0x28(%rbp), %rdi
000000000000008e	callq	0x93
0000000000000093	xorl	%eax, %eax
0000000000000095	addq	$0x40, %rsp
0000000000000099	popq	%rbp
000000000000009a	retq

Copy the code

Otool (1) can be used to view the contents of a section:

WYW:clang wangyongwangyongwang$ xcrun otool -s __TEXT __cstring main_obj.o
main_obj.o:
Contents of (__TEXT,__cstring) section
000000000000009b	48 65 6c 6c 6f 20 43 6c 61 6e 67 20 43 6f 6d 70 
00000000000000ab	69 6c 65 00 51 69 53 68 61 72 65 22 57 59 57 22 
00000000000000bb	00 6d 65 6d 62 65 72 ef bc 9a 25 73 20 0a 00 
Copy the code
2.5.3 Disassemble the object file

Below we can disassemble the object file to check whether the assembly file obtained after reverse disassembly is consistent with the assembly file used to generate the object file before. The following figure shows that the contents of the disassembled assembler are basically the same as those of the previous assembler that generated the object file forward. Only some commands are removed from the disassembly file.

2.6 Generating executable files and execution files
// Generate the execution file
WYW:clang wangyongwangyongwang$ clang main_obj.o -o mainExec

// Execute the executable file
WYW:clang wangyongwangyongwang$ ./mainExec
2020- 04- 01 22:12:15.766 mainExec[84398:2934623] Hello Clang Compile member: QiShare"WYW" 
2020- 04- 01 22:12:15.767 mainExec[84398:2934623] member: QiShare"WYW"
Copy the code

4. Refer to the learning website

  • Compilation principle dragon book and other compilation principle information
  • The Architectures of Open Source Applications LLVM
  • Mach-O executable file
  • The compiler
  • Chapter 13 uses Bison for grammatical analysis
  • Chapter 6 Compilation and Debugging Tools (GCC, GDB)
  • Understand LLVM in simple terms
  • 6-1 the compiler
  • Learn clang common commands
  • Clang common syntax introduction
  • Common Linux GCC commands
  • Object file format analysis tools: objdump, nm, AR
  • See. O,. Obj file symbol list, powerful nm command
  • Apple operating system executable file Mach-o
  • ⌘ to something that happens after pressing + R
  • Online syntax tree view

To learn more about iOS and related new technologies, please follow our official account:

You can add the following xiaobian wechat, and note to join the QiShare technical exchange group, xiaobian will invite you to join the QiShare technical Exchange Group.

QiShare(Simple book) QiShare(digging gold) QiShare(Zhihu) QiShare(GitHub) QiShare(CocoaChina) QiShare(StackOverflow) QiShare(wechat public account)

IOS View and export project run logs the use of Flutter Platform Channel and source code analysis development is not cut map how to do? Vector icon (iconFont) Getting started with guide DarkMode, WKWebView, Apple login must be adapted? IOS Access To Google and Facebook login (2) iOS access to Google and Facebook login (1) Nginx Getting started 3D transformation in iOS (2) 3D transformation in iOS (1) WebSocket dual-end practice (iOS/ Golang) Today we are going to talk about WebSocket (iOS/Golang) strange dance team Android team — aTaller strange dance weekly