preface

Today, I accidentally created an OC file in Swift project, which prompted me to create a bridge file. So why do I need to create a bridge file, and what is its principle?

Open baidu search, all teach you how to create a bridge file, seems to find no answer ~

LVVM – Low Level Virtual Machine Clang – C Lange Family Frontend for LVVM

Compiler exploration

  • GCC

The GNU Compiler Collection includes front ends for C, C++, objective-c, Fortran, Java, Ada, and Go, as well as libraries for these languages (e.g. Libstdc++, libgcj, and so on).

Early OC programmers have experienced the GCC compiler, but why does Apple not use GCC instead of creating its own?

1. The objective-C Frontend of GCC is not useful: The front-end of GCC is not maintained by Apple. If you want to add some syntax hints, you have to ask the front-end of GCC to do it.

2.GCC plug-ins, tools, IDE support weak: many compiler features do not have, automatic completion, code prompts, warning, static analysis and other processes are not very powerful, are required to IDE call the bottom command to complete, the results need to be exposed in the form of plug-ins, this part of GCC is not very good.

3.GCC compilation efficiency and performance is insufficient: Apple Clang out, its compilation efficiency is 3 times of GCC, compiler performance is good, the compiled file is small.

4.Apple takes back control of the toolchain (LLDB, LLD…) : Apple’s early transition from the GCC front end to the LLVM back end compiler to the Clang-LVVM compiler, followed by the replacement of GDB, took back control of the compilation tool chain step by step and laid the foundation for Swift.

  • Three-phase compiler architecture

The diagram above shows the simplest three-step compiler architecture.

First, we see that source is our source code, which goes into the compiler’s Frontend Frontend. Once the front end is complete, the optimizer module is entered; After optimization, enter the back-end module; After this is all done, produce machine code based on your architecture, x86, ARMV7, etc.

But there’s a problem:

M (Language) * N (Target) = M * N (Compilers)
Copy the code

If you have M languages (C, C++, Objective-C…) , N architectures (ARMV7, ARMV7S, ARM64, I386, X86_64…) , then you have M * N compilation methods to deal with, which is obviously not reasonable.

  • appleClang/Swift - LLVMCompiler architecture:

The Common Optimizer is shared. For each language has its front-end part, if there is a new language, only need to implement the front-end module of the language; If a new device comes out with a different architecture, it will only need to complete the back-end module separately. The changes are very small and do not duplicate the work.

The following details:

Blue: the front end of the C language family, which belongs to Clang.

Green: The Swift language front end, which also includes its own SIL intermediate language and optimizer intermediate language optimization process.

The purple part: the optimization phase and the backend module unification is the LLVM part.

  • The code size

The Clang + LLVM code module has 400W lines of code, of which the body is written in C++, about 235W lines. If all the target, lib, etc files were compiled, they would be about 20GB in size:

The size of the Swift Frontend code is much smaller, around 43W lines. There may be a lot of code on the back end, such as the optimizer policy, to generate machine code:

  • Clang command

Clang is conceptually a compiler front end, but also acts as a “black box” Driver on the command line.

It encapsulates the compiler pipeline, front-end commands, LLVM commands, Toolchain commands, and so on.

Easy migration from GCC.

When we click the run command, it looks like this:

Arc arc arc arc arc arc arc arc arc arc arc arc arc

  • Disassemble the compilation process
#import <Foundation/Foundation.h>

int main() {
	@autoreleasepool {
		id obj = [NSObject new];
		NSLog(@"Hellow world: %@", obj); }}Copy the code

1.Preprocess – Preprocessing

Import headers, include headers, etc. Macro extends preprocessing directives for ‘#’ headers like #if, #elseif, etc

Terminal input:

$ clang -E main.m
Copy the code

Only the pre-processing step will be done, not backward, as follows

You can see that a header file imports many lines of code, and that’s where the PCH file comes in. Apple itself provides this file to optimize the compilation process by putting it into Foundation or UIKit libraries that will not change at all. However, the developers import various macros and header files, resulting in slow compilation. So much so that Apple removed the file, leaving developers to create it themselves. But Apple does provide the concept of modules, which can be opened with the following command:

$ clang -E -fmodules main.m
Copy the code

By default, some files are packaged as library files, which are opened by default in build Setting. We can use @import Foundation:

2.Lexical Analysis

Lexical analysis, also called Lex or Tokenization Converting preprocessed code text into Token streams does not validate semantics

You can enter the following command on the terminal:

$ clang -fmodules -fayntax-only -Xclang -dump-tokens main.m
Copy the code

The diagram below:

3.Analysis

Syntax analysis: Two modules Parser and Sema are combined in Clang to verify whether the syntax is correct and give correct hints. This is where Clang flaunts GCC, its own syntax hint friendly.

Generate semantic nodes based on the current syntax and combine all nodes into an abstract grammar book (AST)

Enter the command:

$ clang -fmodules -fsyntax-only -Xclang -ast-dump main.m
Copy the code

The diagram below:

You can use the syntax tree to write back the source code, as shown below:

4.Static Analysis (not required)

Static analysis of code through the grammar book, find out non-syntactic error simulation code execution path, analysis of contro- Flow Graph (CFG) preset common Checker

In Xcode you can do the following:

5. Codegen-ir code generation

CodeGen is responsible for traversing the syntax tree from top to bottom and translating it into LLVMIR. LLVMIR is the input of Frontend and LLVM Backend, which is the bridge language for the front and back ends.

Bridge with Objective-C Runtime

Class/Meta Class/Protocol/Category memory structure generated and stored in the specified session (e.g. Class: _DATA, _objC_classrefs)

Method/Ivar/Property memory structure is generated

③ Form method_list/ivar_list/property_list and fill in the Class

④Non-Fragile ABI: Synthesize **OBJC_IVAR_$_** offset constant for each Ivar

⑤ Access Ivar statement (Ivar = 123; int a = ivar;) Convert to base + **OBJC_IVAR$**

⑥ Translate ObjCMessageExpr into objc_msgSend and the call to super into objc_msgSendSuper

All landowners @ synthsynthesize processing

Generate block_layout data structure

⑨ variable capture (__block/ __weak)

10. Generate the _block_invoke function

11.ARC: Analyze object reference relationships and insert ARC codes such as objc_storeStrong and objc_storeWeak

12. Convert ObjCAutoreleasePoolStmt to objc_autoreleasePoolPush/Pop

13. Implement automatic call [super dealloc]

14. Synthesize the.cxx_destructor method for each Class that owns ivar to automatically free the Class’s member variables, instead of “self. XXX = nil” under MRC.

For example, hey hey:

Terminal input:

$ clang -S -fobjc-arc -emit-llvm main.m -o main.m
Copy the code

Enter the following:

The intermediate form between C and assembly.

If optimization is added:

$ clang -O3 -S -fobjc-arc -emit-llvm main.m -o main.m
Copy the code

It definitely feels a lot less.

6. LVVM Bitcode – Generates bytecode

Enter the command:

$ clang -emit-llvm -c main.m -o main.bc
Copy the code

I’m sure you’ve heard of this concept since iOS 9, which is actually generating binary for IR.

7.Assemble – Generates a target-related assembly

Terminal input:

$ clang -S -fobjc-arc main.m -o main.s
Copy the code

The diagram below:

Assembly code.

8.Assemble – Generate target-related Object(Mach-o) terminal input:

$ clang -fmodules -c main.m -o main.o
Copy the code

Assembler in the form of main.o.

9. The Link to generate the Executable

Terminal input:

$ clang main.m -o main
$ ./main
Copy the code

To sum up:

At this point, I’m guessing that the bridge file is probably in the Clang phase, compiling the OC file, generating the syntax tree, and then returning it to a class file that Swift can recognize.

  • What can we do with Clang?

Apple gave us three ports:

C API to access Clang’s upper-level capabilities, such as obtaining Tokens, traversing the syntax tree, code completion, and obtaining diagnostic information. ② The API is stable and is not affected by Clang source code update ③ Only the syntax tree can be accessed, cannot obtain all information ④ Use the original C API ⑤ script language: Use the Python Binding or The Node-js/Ruby Binding ⑥Objective-C library ClangKit

LibTooling ① has complete control over the syntax tree ② can be used as a standalone command, such as clang-format ③ requires C++ and familiarity with the clang source code

3.ClangPlugin ① has full control over the syntax tree ② is injected as a plug-in into the build process and can influence the build process ③ needs to use C++ and is familiar with the Clang source code

conclusion

Finally, thanks to Sun Yuan (MY name is Sunny how) to share, and hope that interested partners can read “self-cultivation of programmers” this book, want to advanced information, then “dragon book” will be your best choice.

If there are any errors in this article, please contact [email protected], github: github.com/edsum!

References:

http://clang.llvm.org/docs/index.html http://blog.llvm.org/ https://www.objc.io/issues/6-build-tools/compiler/ http://llvm.org/docs/tutorial/index.html https://github.com/loarabia/Clang-tutorial http://lowlevelbits.org/getting-started-with-llvm/clang-on-os-x/ https://hevinaboos.wordpress.com/2013/07/23/clang-tutorial-part-i-introducation/ http://szelei.me/code-generator/

  • Getting Started with LLVM Core Libraries
  • The LLVM Cookbook”