This article explores the nature of classes and structures in conjunction with the source code

Similarities and differences between classes and structures

In Swift, classes and structures have many similarities, but also differences.

As we all know, memory allocation can be divided into Heap and Stack. Because stack area memory is continuous, memory allocation and destruction are performed by loading and unloading operations, which are faster than the heap area. The heap stores advanced data types, and unused memory is searched for during data initialization and cleared from memory upon destruction, so the heap’s data stores are not necessarily contiguous.

Classes and structs are different in memory allocation. Basic data types and structs are allocated on the stack by default, while advanced data types such as classes are stored in the heap. The heap is not thread-safe, and locks are required for frequent data reads and writes.

In addition to storing attributes more securely and efficiently, the distribution of functions is also more efficient. Since the structure cannot be inherited, that is, the type of the structure is modified by the final, the internal function is statically distributed, the execution address of the function is determined at compile time, the function call is optimized by inline, and the memory is continuous, reducing the function addressing and memory offset calculation. Its operation is more efficient than dynamic distribution.

In addition, reference techniques can take a toll on class efficiency, so structs should be used whenever possible.

Structs are value types that are copied when specified to constants or variables or when passed to functions. In fact, all of the basic types in Swift: integers, floating-point numbers, booleans, strings, arrays and dictionaries, and enumerations, are value types and are implemented behind the scenes as structures. This means that strings, arrays, and dictionaries are actually passing copies of values when they are assigned to a new constant or variable, or when they themselves are passed to a function or method. This is different from OC’s NSString, NSArray, and NSDictionary, which are classes and assignments and passes are references.

The Swift heap is implemented via a bidirectional linked list, which theoretically reduces the need for retain traversal and doubles the efficiency. However, it is still not as efficient as the stack, so Apple changed some of the types placed in the heap to value types.

Value types store values, which are copied when assigned without affecting each other. A reference type stores the memory address of an object. Assignment copies Pointers to the same object (memory space).

Similarities and differences between classes and structures:

Similarities: Can define attributes, methods, initializers; Can add extension extension; Are able to follow protocol;

Differences: Classes are reference types and are stored in the heap; Structs are value types stored in the stack area. Classes have inheritance properties; The structure doesn’t have one. Class instances can be referenced multiple times, with reference counting. Classes have de-initializers (destructors) to release resources. Conversion allows you to check and interpret the type of an instance of a class at run time.

Struct example

struct Book {
    var name: String
    var high: Int
    func turnToPage(page:Int) {
        print("turn to page \(page)")}}var s = Book(name: "Book of changes", high: 8)
var s1 = s
s1.high = 10
print(s.high, s1.high) / / 8 to 10
Copy the code

In this code, the structure high is initialized as 18, and the entire structure is copied when the value is assigned to S1, which means that S1 is a new structure. After the high of S1 is changed to 10, the age of S is still 8, and S and S1 do not affect each other.

Through LLDB debugging, it can be seen that S and S1 are different structures. One is in 0x0000000100008080 and the other is in 0x0000000100008098.

(lldb) frame variable -L s
0x0000000100008080: (SwiftTest.Book) s = {
0x0000000100008080:   name = "Book of changes"
0x0000000100008090:   high = 8
}
(lldb) frame variable -L s1
0x0000000100008098: (SwiftTest.Book) s1 = {
0x0000000100008098:   name = "Book of changes"
0x00000001000080a8:   high = 10
}
Copy the code

Kind of sample

class Person {
    var age: Int = 22
    var name: String?
    init(_ age: Int._ name: String) {
        self.age = age
        self.name = name
    }
    func eat(food:String) {
        print("eat \(food)")}func jump(a) {
        print("jump")}}var c = Person(22."jack")
var c1 = c
c1.age = 30
print(c.age, c1.age) / / 30 30
Copy the code

If c1=c, copy the pointer to the same object and create a new reference to the same object. Change c1’s age to 30, and c’s age will also change to 30.

(lldb) frame variable -L c
scalar: (SwiftTest.Person) c = 0x0000000100679af0 {
0x0000000100679b00:   age = 30
0x0000000100679b08:   name = "jack"
}
(lldb) frame variable -L c1
scalar: (SwiftTest.Person) c1 = 0x0000000100679af0 {
0x0000000100679b00:   age = 30
0x0000000100679b08:   name = "jack"
}
(lldb) cat address 0x0000000100679af0
address:0x0000000100679af0, (String) $R1 = "0x100679af0 heap pointer, (0x30 bytes), zone: 0x7fff8076a000"
Copy the code

Through LLDB debugging, it is found that the class instance C and c1 are actually the same object, and the custom address command can be used to show that the object is on the heap.

C and C1 are two different Pointers, each containing the address 0x0000000100679AF0.

(lldb) po withUnsafePointer(to: &c, {print($0)})
0x0000000100008298
0 elements
(lldb) po withUnsafePointer(to: &c1, {print($0)})
0x00000001000082a0
0 elements
Copy the code

The build process

Clang compiler

Languages such as OC and C use Clang as the front end of the compiler, which is compiled into the intermediate language IR, which is handed to the back end LLVM to generate executable files.

The Clang compilation process has the following disadvantages:

  • There is a huge abstraction gap between source code and LLVM IR

  • IR is not suitable for source-level analysis

  • CFG(Control Flow Graph) lacks precision

  • CFG is out of the main path

  • Repeat analysis occurs in CFG and IR downgrades

Swift, the compiler

To solve these shortcomings, Swift developed its own Swift front-end compiler, the most critical of which is the introduction of SIL.

SIL

The introduction of SIL into the Swift Intermediate Language has the following advantages:

  • Fully preserve the semantics of the program
  • Can not only code generation, but also code analysis
  • In the hot path of the build pipeline
  • Bridge the abstraction gap between source code and LLVM

SIL performs high-level semantic analysis and optimization on Swift. Like LLVM IR, it has structures such as Module, Function, and BasicBlock. Unlike LLVM IR, it has a richer type system, information about loops and error handling is retained, and virtual function tables and type information is retained in a structured form. It is designed to retain the meaning of Swift for powerful error detection, advanced optimizations for memory management, and more.

Swift Compilation steps

The Swift front-end compiler converts the Swift code to SIL and then IR.

Below are the commands and explanations for each step

Swiftc main. swift-dump-parse // 2 和 swiftc main. swift-dump-parse Swiftc main. swift-dump-ast // 3 SILGen components: swiftc main. swift-dump-ast Generate intermediate languages, raw sils that are not optimized // A series of running on raw sils that are used to determine optimization and diagnostics, embedding language-specific diagnostics for code that fails. Swiftc main. Swift - EMIT -silgen // 4 Generate intermediate language (SIL) In general, it is optional to run SIL optimization on a formal SIL, as this detection can improve the performance of the resulting executable file Swiftc main. Swift - EMIT - IR // 6 LLVM backend optimization Swiftc main. Swift-emit - BC // 7 Swiftc main. Swift-emit -assembly // 8 Generate binary machine code. Out file swiftc-o main.o main.swiftCopy the code

When analyzing sil files, we convert swift files directly to SIL files with the following command:

swiftc -emit-sil main.swift > main.sil
Copy the code

Class life cycle

Let’s examine the class creation process as follows

class Human {
    var name: String
    init(_ name: String) {
        self.name = name
    }
    func eat(food:String) {
        print("eat \(food)")}}var h = Human("hali")
Copy the code

To sil, swiftc-EMIT -sil main.swift > human.sil

Analyzing the sil file, you can see the following code, which is the __allocating_init initialization method

// Human.__allocating_init(_:)
sil hidden [exact_self_class] @$s4main5HumanCyACSScfC : $@convention(method) (@owned String.@thick Human.Type) - >@owned Human {
// %0 "name" // user: %4
// %1 "$metatype"
bb0(%0 : $String.%1 : $@thick Human.Type) :%2 = alloc_ref $Human                           // user: %4
  // function_ref Human.init(_:)
  %3 = function_ref @$s4main5HumanCyACSScfc : $@convention(method) (@owned String.@owned Human) - >@owned Human // user: %4
  %4 = apply %3(%0.%2) : $@convention(method) (@owned String.@owned Human) - >@owned Human // user: %5
  return %4 : $Human                              // id: %5
} // end sil function '$s4main5HumanCyACSScfC'
Copy the code

Next, in Xcode, put the symbol breakpoint __allocating_init,

The swift_allocObject method is called, and if Human inherits from NSObject, the objc_allocWithZone method of ObjC is called and OC is initialized.

Analyze Swift source code, search swift_allocObject, locate the heapObject.cpp file,

Internal callsswift_slowAlloc.

So far, by analyzing sil, assembly, and source code, we can conclude that the swift object initialization process is as follows:

__allocating_init -> swift_allocObject -> _swift_allocObject_ -> swift_slowAlloc -> Malloc
Copy the code

Class memory structure

Initialization returns a HeapObject with the following definition:

// The members of the HeapObject header that are not shared by a
// standard Objective-C instance
#define SWIFT_HEAPOBJECT_NON_OBJC_MEMBERS       \
  InlineRefCounts refCounts // 

/// The Swift heap-object header.
/// This must match RefCountedStructTy in IRGen.
struct HeapObject {
  /// This is always a valid pointer to a metadata object. 
  HeapMetadata const *metadata; / / 8 bytes

  SWIFT_HEAPOBJECT_NON_OBJC_MEMBERS; // 64-bit bit-field information, 8 bytes; Metadata and refCounts together constitute the memory size of the default 16-byte instance object

#ifndef __swift__
	/ /...

#endif // __swift__
};
Copy the code

The metadata of a HeapObject is a HeapMetadata type, essentially TargetHeapMetadata, which we can find in the source code

using HeapMetadata = TargetHeapMetadata<InProcess>;
Copy the code

Go to TargetHeapMetadata,

template <typename Runtime>
struct TargetHeapMetadata : TargetMetadata<Runtime> { // Inherit from TargetMetadata
  using HeaderType = TargetHeapMetadataHeader<Runtime>;
// Next is initialization
  TargetHeapMetadata() = default;
  constexpr TargetHeapMetadata(MetadataKind kind) / / pure swift
    : TargetMetadata<Runtime>(kind) {}
#if SWIFT_OBJC_INTEROP // Interact with objc
  constexpr TargetHeapMetadata(TargetAnyClassMetadata<Runtime> *isa) //isa
    : TargetMetadata<Runtime>(isa) {}
#endif
};
Copy the code

You can see here that kind is given if it’s pure Swift, and ISA is given if it’s OC.

TargetMetadata, the parent of TargetHeapMetadata,

/// The common structure of all type metadata.
template <typename Runtime>
struct TargetMetadata { // The final base class for all metaclass types
  using StoredPointer = typename Runtime::StoredPointer;

  /// The basic header type.
  typedef TargetTypeMetadataHeader<Runtime> HeaderType;

  constexpr TargetMetadata(a)
    : Kind(static_cast<StoredPointer>(MetadataKind::Class)) {}
  constexpr TargetMetadata(MetadataKind Kind)
    : Kind(static_cast<StoredPointer>(Kind)) {}

#if SWIFT_OBJC_INTEROP
protected:
  constexpr TargetMetadata(TargetAnyClassMetadata<Runtime> *isa)
    : Kind(reinterpret_cast<StoredPointer>(isa)) {}
#endif

private:
  /// The kind. Only valid for non-class metadata; getKind() must be used to get
  /// the kind value.
  StoredPointer Kind;//Kind member variable
public:
	/ /...

  /// Get the nominal type descriptor if this metadata describes a nominal type,
  /// or return null if it does not.
  ConstTargetMetadataPointer<Runtime, TargetTypeContextDescriptor>
  getTypeContextDescriptor(a) const {
    switch (getKind()) { // Classify different classes according to kind
    case MetadataKind::Class: {
      const auto cls = static_cast<const TargetClassMetadata<Runtime> *>(this);// force this to TargetClassMetadata
      if(! cls->isTypeMetadata())
        return nullptr;
      if (cls->isArtificialSubclass())
        return nullptr;
      return cls->getDescription(a); }case MetadataKind::Struct:
    case MetadataKind::Enum:
    case MetadataKind::Optional:
      return static_cast<const TargetValueMetadata<Runtime> *>(this)
          ->Description;
    case MetadataKind::ForeignClass:
      return static_cast<const TargetForeignClassMetadata<Runtime> *>(this)
          ->Description;
    default:
      return nullptr; }}/ /...
};
Copy the code

TargetMetadata is the final base class with a Kind member variable that has a fixed value of 0x7FF.

TargetMetadata is forced to convert to other types based on kind, so TargetMetadata is the base class for all meta-types.

In the case of strong-cast classes, the strong-cast type is TargetClassMetadata, click the jump and analyze its inheritance as follows

TargetClassMetadata : TargetAnyClassMetadata : TargetHeapMetadata : TargetMetadata
Copy the code

By analyzing the source code, you can get the diagram

Therefore, the memory structure of the class can be obtained by integrating the member variables on the inheritance chain:

struct Metadata {
    var kind: Int
    var superClass: Any.Type
    var cacheData: (Int.Int)
    var data: Int
    var classFlags: Int32
    var instanceAddressPoint: UInt32
    var instanceSize: UInt32
    var instanceAlignmentMask: UInt16
    var reserved: UInt16
    var classSize: UInt32
    var classAddressPoint: UInt32
    var typeDescriptor: UnsafeMutableRawPointer
    var iVarDestroyer: UnsafeRawPointer
}
Copy the code

PS: Add kind type, this is a fixed value

Mutation methods were analyzed by SIL

Both classes and structs can define methods, but by default, value types cannot be modified by themselves, which means struct methods cannot modify their properties. Error Left side of mutating operator isn’t mutable: ‘self’ is immutable

struct Point {
    var x = 0.0, y = 0.0
    func moveBy(x deltaX: Double.y deltaY: Double) {
        self.x + = deltaX //Left side of mutating operator isn't mutable: 'self' is immutable
        self.y + = deltaY //Left side of mutating operator isn't mutable: 'self' is immutable}}Copy the code

Add the mutating keyword in front of the method.

struct Point {
    var x = 0.0, y = 0.0
    func test(a) {
        print("test")}mutating func moveBy(x deltaX: Double.y deltaY: Double) {
        self.x + = deltaX
        self.y + = deltaY
    }
}
Copy the code

What is mutating? We converted the code to sil to analyze swiftC-EMIT -sil main.swift > main.sil

// Point.test()
sil hidden @$s4main5PointV4testyyF : $@convention(method) (Point) - > () {// %0 "self" // user: %1
bb0(%0 : $Point):
  debug_value %0 : $Point.let, name "self", argno 1 // id: %1
Copy the code

Unlike OC, Swift has only one default argument, self, which is passed in as the last argument, placed by default in the X0 register. Debug_value Indicates the value directly and cannot be modified.

// Point.moveBy(x:y:)
sil hidden @$s4main5PointV6moveBy1x1yySd_SdtF : $@convention(method) (Double.Double.@inout Point) - > () {// %0 "deltaX" // users: %10, %3
// %1 "deltaY" // users: %20, %4
// %2 "self" // users: %16, %6, %5
bb0(%0 : $Double.%1 : $Double.%2 : $*Point):
  debug_value %0 : $Double.let, name "deltaX", argno 1 // id: %3
  debug_value %1 : $Double.let, name "deltaY", argno 2 // id: %4
  debug_value_addr %2 : $*Point.var, name "self", argno 3 // id: %5
Copy the code

(moveBy) (@inout, @inout, @inout, @inout, @inout, @inout, @inout, @inout, @inout, @inout, @inout) We’re passing the address that was initialized by the debug_value_addr. We’re taking the address of the *Point content, and we’re modifying self with a pointer.

When a function defines its parameters, changes to the parameters inside the function do not affect the outside of the function. However, adding the inout keyword in front of the function becomes an input/output parameter, and these changes are preserved outside the function.

Method dispatch

Three distribution mechanisms of Swift function

Swift has three function distribution mechanisms:

  1. Static dispatch

    The distribution of the calling method can be determined at compile time. Static distribution in Swift uses the function address directly.

  2. Dynamic Dispatch/virtual dispatch

    Dynamic dispatch refers to the inability to determine which method should be called at compile time. The method can only be called at run time, and then the method can be called by looking up the function address in the virtual function table.

  3. Message Dispatch

    Objc uses the runtime objc_msgSend for message distribution, so some of objC’s dynamic features can be restricted to Swift.

Static dispatch is faster than dynamic dispatch, and static dispatch will carry out some optimization, such as inlining, reduce the function addressing process, reduce the memory address offset calculation and a series of operations, so that the execution of the function is faster, higher performance.

In general, different types of functions are scheduled as follows

type scheduling extension
Value types Static distributed Static distributed
class Function table distribution Static distributed
NSObject subclass Function table distribution Static distributed

Dynamic distribution of class functions

Through a case study of dynamic distribution/virtual function table distribution table this way, the program is how to find the function address

class LGTeacher {
  func teach(a){
    print("teach")}func teach1(a){
    print("teach1")}func teach2(a){
    print("teach2")}}var t = LGTeacher()
t.teach()
Copy the code

In a program, a breakpoint at a function goes into assembly code to read a value in a register sink,

This 0x10004bab4 is the address of the teach() function. Let’s explore where the middle address comes from.

Interpretation of source code

In general, Swift stores all methods in the virtual table of the class, which can be found in the SIL file.

So just from our previous analysis, the class structure TargetClassMetadata has a property Description, which is the Description of the Swift class TargetClassDescriptor.

  // Description is by far the most likely field for a client to try
  // to access directly, so we force access to go through accessors.
private:
  /// An out-of-line Swift-specific description of the type, or null
  /// if this is an artificial subclass. We currently provide no
  /// supported mechanism for making a non-artificial subclass
  /// dynamically.
  ConstTargetMetadataPointer<Runtime, TargetClassDescriptor> Description;
Copy the code

TargetClassDescriptor its memory structure is as follows

struct TargetClassDescriptor{ 
  var flags: UInt32 
  var parent: UInt32 
  var name: Int32 
  var accessFunctionPointer: Int32 
  var fieldDescriptor: Int32 
  var superClassType: Int32 
  var metadataNegativeSizeInWords: UInt32 
  var metadataPositiveSizeInWords: UInt32 
  var numImmediateMembers: UInt32 
  var numFields: UInt32 
  var fieldOffsetVectorOffset: UInt32 
  var Offset: UInt32 
  var size: UInt32 
  //V-Table 
}
Copy the code

The attribute between the start of the description and the vtable has 13 ✖️ 4 = 52 bytes, followed by the vtable that stores the method description TargetMethodDescriptor.

struct TargetMethodDescriptor {
  /// Flags describing the method.
  MethodDescriptorFlags Flags; // 4 bytes, identifying the type of method, initializing /getter/setter, etc

  /// The method implementation.
  TargetRelativeDirectPointer<Runtime, void> Impl; // Relative address, Offset

  // TODO: add method types or anything else needed for reflection.
};
Copy the code

So TargetMethodDescriptor is the description of the method, Flags is the type of method, it’s four bytes, Impl is not really a method IMP, it’s a relative offset, So you need to find this TargetMethodDescriptor + 4 bytes + relative offset to get the actual address of the method.

Interpretation of executable files

In the executable file, the _TEXT and _swift5_types sections are used for the Discripter address information of Class, Struct, and Enum.

On iOS, in small end mode, So we read the address information + offset 0xFFFFFBF4 + 0xBC68 = 0x10000B85C to get the LGTeacher Description

address in MachO. The base address of virtual memory is 0x100000000, so B85C is the offset of Description.

Find B85C,

According to the memory structure of the TargetClassDescriptor, 52 bytes after B85C is the vtable, corresponding to the offset B890.

Vtable is an array, so the first element 10, 00, 00, 00, 20 C2, FF, FF is TargetMethodDescriptor, and then depending on the memory structure of TargetMethodDescriptor, the first four bytes are Flags, The next four bytes are the Impl Offset, Offset FFFFC220.

Back in the program,

Output the loading address of the executable file through the image list, where the first one is the starting address of the program, 0x100044000 plus the V-table offset, get the address of v-table in the program running. That’s the address of the first function teach() TargetMethodDescriptor 0x100044000 + 0xB890 = 0x10004F890

Then add 4 bytes of Flags, 0x10004F890 + 0x4 = 0x10004F894 to get the Impl,

Add Offset and subtract virtual memory base address 0x10004F894 + 0xFFFFC220-0x100000000 = 0x10004BAB4

Before obtaining the function address 0x10004BAB4.

Struct functions are statically dispatched

struct LGTeacher {
  func teach(a){
    print("teach")}func teach1(a){
    print("teach1")}func teach2(a){
    print("teach2")}}var t = LGTeacher()
t.teach()
Copy the code

In the above case, it is changed to Struct, so it is the address of the function directly called, which belongs to static distribution.

extension

No matter Class or Struct, extension functions are statically distributed and cannot be replaced or changed at run time, because the methods in extension are determined at compile time and exist in a hard-coded way in the program and will not be placed in the Vtable.

extension LGTeacher{
	func teach3(a){
    print("teach3")}}var t = LGTeacher()
t.teach3()
Copy the code

Both are direct calls to function addresses

Therefore, polymorphism cannot be supported through Extension.

So why did Swift design extension to be static?

In OC, if the subclass inherits and does not rewrite the method, it will find the method implementation in the parent class. However, when Swift class inherits, it forms the method of the parent class into a Vtable, which is also to save the method search time. If extension is added to vtable, It is not necessary to append directly to the end of the subclass Vtable. It is necessary to record the index of the parent class method in the subclass, insert the extension method of the parent class into the position next to the index of the parent class method in the subclass Vtable, and then move the method of the subclass backward. Such an operation costs a lot.

Influence of the most distributed mode of keyword

Different function modifier keywords also have different effects on the distribution mode

final

Final: Functions with the final keyword cannot be overridden/inherited, are statically distributed, do not appear in the Vtable, and are not visible to objC runtime.

dynamic

Dynamic: The dynamic keyword can be added to all functions to give dynamics to functions of non-objC classes and value types, but the distribution mode is still function table distribution.

class LGTeacher {
  dynamic func teach(a){
    print("teach")}}extension LGTeacher {
    @_dynamicReplacement(for: teach())
    func teach3(a) {
        print("teach3")}}var t = LGTeacher()
t.teach3() // teach3
t.teach()  // teach3
Copy the code

In the above code, the teach() function is the function table distribution, there is vtable, and dynamic is given dynamic, and @_dynamicreplacement (for: Teach () keyword with teach3(), change the implementation of teach() function to teach3() implementation, equivalent to OC change teach() SEL corresponding to teach3() IMP, implementation method replacement.

This concrete implementation is handled by the LLVM compiler. In IR, the teach() function has two branches, one original and one forward. If we have a replacement function, we go forward.

#Convert to IR intermediate language.ll file
swiftc -emit-ir main.swift > dynamic.ll 
Copy the code

@objc

Objc: this keyword exposes the Swift function to the objC runtime, which is still distributed from the function table.

@objc dynamic

@objC Dynamic: Sends messages in the same way as OC. In real development, Swift and OC interactions will mostly use this approach.

For pure Swift classes, @objc Dynamic allows methods to use the Runtime API in the same way as OC.

If you want to interact with OC, you need to inherit classes from NSObjec.

The resources

Swift Advanced Class

GitHub: Apple-Swift source code

Learning iOS Programming with Daming: Getting the core points Straight

Self-cultivation of the Programmer

Swift programming language – classes and structures

Preliminary study on Swift Intermediate Language

Swift performance efficient reasons for in-depth analysis

Swift compiler intermediate code SIL

Swift’s high-level intermediate language: SIL