Swift compilation process

OC compilation is described in detail in the LLVM & Clang article. This article explores the Swift compilation process. Swift is compiled using Swiftc, which, like Clang, is a front end to the LLVM compilation architecture.

Common swiftc commands:

-dump-ast Parses and type checks source files & converts to ast-dump-parse parses source files & Converts to ast-emit -assembly generates assembly files -emit- BC generates LLVM Bitcode files -emit-executable generates linked executable file -emit-imported-modules generates imported library -emit-ir generates LLVM IR file -emit-library generates connected library -emit-object Generate target file -emit-silgen Generate raw SIL file (stage 1) -emit- SIL generate Canonical SIL file (stage 2) -index-file Generate index data for source file -print-ast Parse and typecheck source files & convert to a more minimalist format for better ast-typecheck parse and typecheck source filesCopy the code

Swift compilation process:

Compared to Clang, LLVM front-end processes have a layer of SIL (Swift Intermediate Language) between AST and IR. The purpose of this layer is to compensate for some Clang compiler defects, such as the inability to perform advanced analysis. Reliable diagnosis and optimization, while AST and LLVM IR are not suitable choices. Therefore, SIL was created to solve existing shortcomings.

SIL code

SIL official document portal

Since SIL is an intermediate of Swift compilation, it can help us understand the details of Swift’s low-level implementation.)

The source code generates the SIL command as follows:

Swift swiftc-EMIT - SIL main.swift Swiftc-emit -sil main. Swift >> main.sil And save to your main. The sil file swiftc - emit - sil main. Swift | xcrun swift - demangle > > main. SilCopy the code

SIL syntax is similar to IR syntax.

  • Load: Reads data
  • Sil_global: Marks variables as global variables
  • Hidden: The tag is visible only to objects in the same Swift module
  • Alloc_global: Opens up memory for global variables
  • Global_addr: Gets the address of the global variable
  • Ref_element_addr: Gets the address of the element
  • Init_existential_addr:The instruction will generateExistential ContainerStructure that wraps the instance variable and protocol correspondingPWT
  • destroy_addr
  • bb0 / bb1 … : BASIC block number representing a code block. There are no branch statements in the SIL, only entry and exit
  • Alloc_ref/dealloc_ref: opens/releases memory
  • Function_ref: Gets the address of the directly distributed function.
  • Class_method: Obtains methods from the function table.
  • Witness_method: Use the PWT to obtain the function address
  • Objc_method: Obtains the ADDRESS of the OC method
  • Apply: calls functions
  • Store A to B: Stores the value of A to B.
  • Begin_access/END_access: start or end access
    • [modify] / [read] / [deinit] : modifies access, reads access, and deletes access
    • [dynamic] : dynamic access
    • [static] : indicates static access
  • Retain_value: Reference count + 1
  • Release_value: reference count -1
  • Metatype gets the metatype
    • @thick describes the form that the metatype represents, which refers to the object type or its subclass,
    • @thin represents an exact value type that does not need to be stored,
  • $: indicates the type identifier
  • %: registers, like local constants, cannot be modified after assignment. If a new register is needed, the register number is added, which is conducive to compiler optimization. These numbered virtual registers are converted into real registers for the corresponding architecture during subsequent downgrade operations.
  • @ : All identifiers in SIL begin with the @ symbol
    • The @main method name is main
    • @_hasStorage Indicates that the attribute is a storage attribute
    • @_hasinitialValue Indicates that the attribute has an initial value
    • @owned indicates that the function receiver is responsible for destroying the returned value
    • The @convention flag is used to specify explicitly how arguments and return values should be handled when a function is called
      • @convention(c) means to call as a C function
      • @convention(swift) The default way to call the pure swift function
      • @convention(method) Specifies the way to call a function
      • @convention(witness_method) Protocol method call, which is the same as convention(method), except when dealing with generic type parameters
      • @convention(objc_method) Called in Objective-C mode

Common ARM64 assembly instructions

Bl: indicates address hop

BLR: Jump with returned address, jump back instruction followed by the address saved in the register

Mov: Copies a value from one register to another

🌰 mov x0, x8 Copy the value of x0 into x8

LDR: Reads a value from memory into a register

🌰 LDR x0, [x0, x8] write the address of x0 + x8 to register x0

STR: Writes the value in the register to memory

🌰 STR x0, [x0, x8] writes the value of register x0 to the address of x0 + x8

4. Common IR syntax:

☞ Official document Portal Official document

  • @ Global identifier
  • % local identifier
  • Alloca opens up space
  • Align Memory alignment
  • I32 32 bits, 4 bytes
  • Store writes to memory
  • Load data
  • Call calling function
  • Ret return

bitcast 

To read the value of the original class with the new data type (that is, type conversion)

Ty2 <result> = bitcast <ty> <value> to <ty2> Bitcast (%struct. STR * @global to i8*) // read global of type i8*Copy the code

getelementptr

  • Instruction (GER) : Perform address calculations to obtain the addresses of the child elements of a compound data structure, without accessing memory.
    • There must be at least two parameters
    • The first argument is type & address, which is the first address of the variable to start reading memory
    • The second and subsequent parameters indicate the parameters to be evaluated, such as the number of elements in the structure or data
Getelementptr <ty> <value>, <index>, <index>,...Copy the code

If p is an array, the second argument gets the value of the array with index 1, that is, p[1]

getelementptr %struct.munger_struct* %P, i32 1

Copy the code

If p is an array containing several structures, the second argument still represents p[1], and the third argument retrieves the element with index 0 of the structure with index 1, i.e. P [1][0].

getelementptr %struct.munger_struct* %P, i32 1, i32 0

Copy the code

If the structure has nested data types, and then gets the nested values, you can continue to append parameters, so there are at least two parameters, the first is the address of the data, and the second parameter starts with the index value.

Website:

struct munger_struct { int f1; int f2; }; void munge(struct munger_struct *P) { P[0].f1 = P[1].f1 + P[2].f2; }... munger_struct Array[3]; . munge(Array);Copy the code

Convert to IR code:

void %munge(%struct.munger_struct* %P) {
entry:
  %tmp = getelementptr %struct.munger_struct* %P, i32 1, i32 0
  %tmp = load i32* %tmp
  %tmp6 = getelementptr %struct.munger_struct* %P, i32 2, i32 1
  %tmp7 = load i32* %tmp6
  %tmp8 = add i32 %tmp7, %tmp
  %tmp9 = getelementptr %struct.munger_struct* %P, i32 0, i32 0
  store i32 %tmp8, i32* %tmp9
  ret void
}


Copy the code

getelementptr inbounds: 

  • Similar to getelementptr, except that it separates the type and value as two parameters. So this instruction has at least three arguments, the first two are the type and the value, and the third one starts with the index argument

<result> = getelementptr inbounds <ty>, <ty>* <ptrval>{, [inrange] <ty> <idx>}*

Copy the code

extractvalue

<result> = extractValue <aggregate Type > <val>, <idx>{, <idx>}*Copy the code

insertvalue

<result> = insertValue <aggregate Type > <val>, <ty> <elt>, <idx>{, <idx>}*; Struct. Tm %ret_val.fca.0. Insert, i32 %b1, 1 // insert %b1 into %ret_val.fca.0Copy the code

The structure of the body

%T = type {<type list>}
// eg: %swift.refcounted = type { %swift.type*, i64 }

Copy the code

An array of

[< elementNumber > x < elementType >] // eg: alloca[24 x i8], align8 means that there are 24 i8 integers in arrayCopy the code

Pointer to the

<type> * //eg: i64* indicates a 64-bit integerCopy the code

Green mountains do not change, green water flow, see you soon, thank every beautiful woman for her support!