This is the sixth day of my participation in the First Challenge 2022. For details: First Challenge 2022.

String source code parsing

How is Swift String stored in memory

Today we’re going to take a look at the String class. Let’s first look at what happens when we create an empty String.

var empty = "" 
print(empty)
Copy the code

The first step is to find the source code for String, and then to find the corresponding initialization method. Here we can directly search the source file and see the following code:

  /// Creates an empty string.
  ///
  /// Using this initializer is equivalent to initializing a string with an
  /// empty string literal.
  ///
  ///     let empty = ""
  ///     let alsoEmpty = String()
  @inlinable @inline(__always)
  @_semantics("string.init_empty")
  public init() { self.init(_StringGuts()) }
Copy the code

The current init method calls the internal init method, which takes a _StringGuts object as an argument.

public struct String {
  public // @SPI(Foundation)
  var _guts: _StringGuts

  @inlinable @inline(__always)
  internal init(_ _guts: _StringGuts) {
    self._guts = _guts
    _invariantCheck()
  }
Copy the code

Also, we can see in the code above that the structure String holds _StringGuts as a member variable.

So we’re going to focus on the _StringGuts property, and we’re going to go straight to the StringGuts. Swift file to see how to initialize it

// Empty string
@inlinable @inline(__always) 
init() {
  self.init(_StringObject(empty: ())) 
}
Copy the code

The same StringGuts is a structure that holds StringObject as a member variable

internal var _object: _StringObject
Copy the code

Let’s follow this lead, find the stringobject. Swift file, and locate the corresponding method

@inlinable @inline(__always)
  internal init(empty:()) {
    // Canonical empty pattern: small zero-length string
#if arch(i386) || arch(arm) || arch(arm64_32) || arch(wasm32)
    self.init(
      count: 0,
      variant: .immortal(0),
      discriminator: Nibbles.emptyString,
      flags: 0)
#else
    self._countAndFlagsBits = 0
    self._object = Builtin.valueToBridgeObject(Nibbles.emptyString._value)
#endif
    _internalInvariant(self.smallCount == 0)
    _invariantCheck()
  }
Copy the code

You can see that in the conditional branch, init(Count: Variant: Discirminator: flags:) is called, which are also members of the StringObject structure

Now that we know the basic String data structure above, let’s take a look at what we store when we create a String

  @usableFromInline
  internal var _count: Int

  @usableFromInline
  internal var _variant: Variant

  @usableFromInline
  internal var _discriminator: UInt8

  @usableFromInline
  internal var _flags: UInt16
Copy the code

This means that the current String structure stores the contents of the bottom layer as the contents of the top.

So what is Nibbles

  // Namespace to hold magic numbers
  @usableFromInline @frozen
  enum Nibbles {}
Copy the code

As you can see, it is also an enumerated type, but this is just a definition, we can find the definition of it in the source code with a little digging:

extension _StringObject.Nibbles {
  // The canonical empty string is an empty small string
  @inlinable @inline(__always)
  internal static var emptyString: UInt64 {
    return _StringObject.Nibbles.small(isASCII: true)
  }
}

extension _StringObject.Nibbles {
  // Discriminator for small strings
  @inlinable @inline(__always)
  internal static func small(isASCII: Bool) -> UInt64 {
    return isASCII ? 0xE000_0000_0000_0000 : 0xA000_0000_0000_0000
  }
Copy the code

As you can see, the discriminator is 0xE000_0000_0000_0000 if the current discriminator is ASCII, or 0xA000_0000_0000_0000 if not

Here’s an example:

For an empty string, the output is as follows

Print the following for a string containing Chinese characters:

We already know that A and E are used to indicate whether the current string is ASCII, and the following number is used to indicate the current number of strings.

StringObject{
  #if arch(i386) || arch(arm)
    _count 
    _variant 
    _discriminator
  #else
    @usableFromInline
    internal var _countAndFlagsBits: UInt64

    @usableFromInline
    internal var _object: Builtin.BridgeObject 
}
Copy the code

_discriminator has four digits, each of which is identified as follows:

┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ╥ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ┐ │ Form ║ b63 │ b62 │ b61 │ b60 │ ╞ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ╬ ═ ═ ═ ═ ═ ╪ ═ ═ ═ ═ ═ ╪ ═ ═ ═ ═ ═ ╪ ═ ═ ═ ═ ═ ╡ │ Immortal, Small ║ │ │ │ │ 1 ASCII 0 1 ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ╫ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ┤ │ Immortal, Large ║ │ │ │ │ 0 0 0 1 ╞ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ╬ ═ ═ ═ ═ ═ ╪ ═ ═ ═ ═ ═ ╪ ═ ═ ═ ═ ═ ╪ ═ ═ ═ ═ ═ ╡ │ Native ║ │ │ │ │ 0 0 0 0 ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ╫ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ┤ │ Shared ║ │ │ x │ │ 0 0 0 ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ╫ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ┤ │ Shared, Bridged ║ │ │ │ │ 1 0 0 0 ╞ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ═ ╬ ═ ═ ═ ═ ═ ╪ ═ ═ ═ ═ ═ ╪ ═ ═ ═ ═ ═ ╪ ═ ═ ═ ═ ═ ╡ │ Foreign ║ │ │ x │ │ 0 0 1 ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ╫ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ┤ │ Foreign, Bridged ║ │ │ 1 0 0 1 │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ╨ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ┘Copy the code

The structure of the arrangement of Nibbles is as follows:

┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ nativeBias │ ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤ 32 │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ b63: b60 │ b60: b0 │ ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤ │ discriminator │ objectAddr │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code

In the case of native Swift strings, tail-allocated storage is taken, that is, allocated extra space in the current instance beyond its last storage property, the extra space can be used to store arbitrary data directly in the instance without additional heap allocation. Here we verify:

0x8000000100000F60 = 0x8000000100000F60 = 0x8 = 0x8 = 0x8

Also, combining with the arrangement of Nibbles in memory, we know that B60: B0 is the address where the string is stored, and of course this address has an offset, which is 32, which we can verify by using a calculator here

So what are the first eight bytes? Let’s start with the initialization process

So if we look at this, all we have left, besides our current address and flag bits, is countAndFlags. Here we can see the layout as follows:

┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ┐ │ b63 │ b62 │ b61 │ b60 │ b59:48 │ b47:0 │ ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ┤ │ isASCII │ isNFC │ isNativelyStored │ IsTailAllocated │ TBD │ count │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code

The first flag bit is isASCII, which will change if we change it to Chinese

Second, the Swift Index

So let’s answer the first question first, and then we have to figure out what does Swift String stand for? A series of characters can be expressed in various ways, such as ASCII code, which we are most familiar with, specifying 128 characters in total. 128 characters is enough for English characters, but it is far from enough compared with other languages.

This means that different languages have different countries need to have their own coding format, this time the same binary file can be translated into different characters, is there a code that included all the symbols, this is what we are familiar with Unicode, but the Unicode only requires the corresponding binary symbols, It is not specified how this binary should be stored.

Let’s say we have a string I’m Kody, and what are the Unicode equivalents

6212 is 662F K 004B O:006F D: 0064 Y: 0079Copy the code

As you can see, each of the above characters corresponds to a hexadecimal number, which can be recognized by the computer as binary, so if stored at this time, the following situation will occur

I 0110 0010 0001 0010 is 0110 0110 0010 1111 K 0000 0000 0100 1011 O 0000 0000 0110 1111 D 0000 0000 0110 0100 y 0000 0000 0111 1001Copy the code

One of the biggest features of UTF-8 is that it is a variable length encoding method. It can use 1 to 4 bytes to represent a character, varying the length of the byte depending on the symbol. Here are the utF-8 rules:

  1. A single-byte character. The first byte is set to0For English texts,UTF-8The code takes only one byte, andASCIICodes are exactly the same;
  2. nA character of one byte(n>1), before the first bytenBit is set to 1, number 1n+1Bit is set to0, the first two characters of the following bytes are set to10, thisnThe character is filled with the remaining space of four bytesunicodeCode, high position0Make up.
I 11100110 10001000 10010010 is 11100110 10011000 10101111 K 0100 1011 O 0110 1111 D 0110 0100 Y 0111 1001Copy the code

For Swift, a String is a collection of characters, which means that each element in the String is of unequal length. So that means that we have different steps when we move memory. What does that mean? For example, if we have an Array (Int), when we iterate over the elements in the Array, the offset is 8 bytes at a time because each element has the same memory size.

But for strings, for example, IF I want azimuth STR [1], do I have to walk through my field to determine the offset of is? Each iteration of the input-in sequence must be repeated to calculate the offset, which undoubtedly increases the memory consumption. That’s why we can’t access String via Int as a subscript, right

Here we can see the definition of Index intuitively:

We can get a rough idea of what the above statement means from the following notes:

Position aka encodedffSet: a 48bit value for recording the number of code points used by a character Transcoded offset: a 2bit value for recording the number of code points used by a character grapheme Cache: A 6bit value, which will record the boundary of a character. Reserved: 7bit field Scalar aligned: A 1bit value, which will record whether scalars are aligned or not

Moya source code analysis

For this question, we can directly borrow a picture from Moya’s official website. We will deal with the network daily, whether using AFN or Alamofire, although both of them encapsulate URLSession and do not require us to use the official tedious API.

As time goes by, we will find that codes related to AFN and Alamofire are scattered everywhere in our APP, which is not convenient for unified management and many code contents are repetitive. Therefore, we will create an intermediate layer Network Layer to uniformly manage the use of AFN and Alamofire in our code.

At the same time, we just hope that our App will only deal with our Network layer without caring about which third party’s Network library is used at the bottom. Even if the migration is carried out, there should be no change to our upper business logic, because we coupling business logic through the Network layer.

However, because the granularity of abstraction is not enough, we often bypass the Network layer and directly deal with our three-party Network library when we are writing, which violates our design principles. Moya is an abstraction of Network business logic, and we can initiate Network requests only by following relevant protocols. You don’t have to worry about the low-level details.

How was Moya built step-by-step?

Before looking at how Moya is built step by step, let’s take a look at how Moya is used. First we create a new file called test. swift, which is used to store our network layer logic. Then we create an enum TEST. We don’t have that many logical branches yet, so we leave it empty and use the current enum. Here we follow the protocol TargetType. Click into the header file to see the basic network request data defined in the following TargetType.

Moya’s modules can be roughly divided into the following categories:

Secondly, the main data processing process of Moya can be represented by the following diagram: Moya flow chart. Let’s analyze this diagram step by step. Let’s look at the first stage first

The first step is to create an enumeration that complies with the TargetType protocol, which completes the basic configuration of the network request. The endpointClosure is then processed to generate an endPoint. Click into the endPoint file and you can see that there is a layer of rewrapping for TargetType. The endpointClosure code looks like this

public typealias EndpointClosure = (Target) -> Endpoint public let endpointClosure: EndpointClosure @escaping EndpointClosure = MoyaProvider.defaultEndpointMapping final class func DefaultEndpointMapping (for target: target) -> Endpoint {return Endpoint(url: url (target: target).absoluteString, sampleResponseClosure: { .networkResponse(200, target.sampleData) }, method: target.method, task: target.task, httpHeaderFields: target.headers ) } let endpointClosure = { (target: GitHub) -> Endpoint in Endpoint( url: URL(target: target).absoluteString, sampleResponseClosure: { .networkResponse(200, target.sampleData) }, method: target.method, task: target.task, httpHeaderFields: target.headers ) }Copy the code

That’s how TargetType is converted to an endPoint via endpointClosure.

The next step is to use requestClosure, pass it to the endPoint, and generate the Request. The request generation process is very similar to the endPoint. Let’s take a look

public typealias RequestResultClosure = (Result<URLRequest, MoyaError>) -> Void

public typealias RequestClosure = (Endpoint, @escaping RequestResultClosure) -> Void

public let requestClosure: RequestClosure

final class func defaultRequestMapping(for endpoint: Endpoint, closure: RequestResultClosure) {
        do {
            let urlRequest = try endpoint.urlRequest()
            closure(.success(urlRequest))
        } catch MoyaError.requestMapping(let url) {
            closure(.failure(MoyaError.requestMapping(url)))
        } catch MoyaError.parameterEncoding(let error) {
            closure(.failure(MoyaError.parameterEncoding(error)))
        } catch {
            closure(.failure(MoyaError.underlying(error, nil)))
        }
    }
Copy the code

The whole thing is to initialize an urlRequest using a do catch statement, passing different parameters to the closure depending on the result. We start with a try call to endpoint.urlRequest() and switch to a catch statement if an error is thrown. As for endpoint.urlRequest(), what it does is simply initialize an NSURLRequest object based on the endpoint properties described earlier.

Once the Request is generated, it’s up to the Provider to initiate the network Request

@discardableResult
    open func request(_ target: Target,
                      callbackQueue: DispatchQueue? = .none,
                      progress: ProgressBlock? = .none,
                      completion: @escaping Completion) -> Cancellable {

        let callbackQueue = callbackQueue ?? self.callbackQueue
        return requestNormal(target, callbackQueue: callbackQueue, progress: progress, completion: completion)
    }
Copy the code

RequestNormal is used

let endpoint = self.endpoint(target)
let stubBehavior = self.stubClosure(target) 
let cancellableToken = CancellableWrapper()
Copy the code

A stub is the code associated with the test stub, which we will ignore for the moment. CancellableToken is the cancellation flag

internal class CancellableWrapper: Cancellable {
    internal var innerCancellable: Cancellable = SimpleCancellable()

    var isCancelled: Bool { innerCancellable.isCancelled }

    internal func cancel() {
        innerCancellable.cancel()
    }
}

internal class SimpleCancellable: Cancellable {
    var isCancelled = false
    func cancel() {
        isCancelled = true
    }
}
Copy the code

CancellableWrapper is another layer of wrapper around SimpleCancellable that follows the Cancellable protocol. We can also follow our own protocol, so we can see that the current Class is internal. The performNetworking closure expression will be analyzed step by step

if cancellableToken.isCancelled { 
    self.cancelCompletion(pluginsWithCompletion, target: target) 
    return
}
Copy the code

If the request is cancelled, the cancelled callback is called, returning directly, and the following statement in the closure is not executed.

var request: URLRequest!

switch requestResult {
            case .success(let urlRequest):
                request = urlRequest
            case .failure(let error):
                pluginsWithCompletion(.failure(error))
                return
            }

            cancellableToken.innerCancellable = self.performRequest(target, request: request, callbackQueue: callbackQueue, progress: progress, completion: networkCompletion, endpoint: endpoint, stubBehavior: stubBehavior)
Copy the code

Perform requestClosure

requestClosure(endpoint, performNetworking)

{(endpoint:Endpoint, closure:RequestResultClosure) in 
    do {
        let urlRequest = try endpoint.urlRequest()
        closure(.success(urlRequest))
    } catch MoyaError.requestMapping(let url) {
        closure(.failure(MoyaError.requestMapping(url))) 
    } catch MoyaError.parameterEncoding(let error) {
        closure(.failure(MoyaError.parameterEncoding(error))) 
    } catch {
        closure(.failure(MoyaError.underlying(error, nil))) 
    }
}
Copy the code

Higher-order functions

Higher-order functions are essentially functions with two characteristics

  • Accept functions or closures as arguments
  • The return value is either a function or a closure

The Map function

The Map function applies to each element in the Collection and returns a new Collection.

FlatMap function

Let’s first look at the definition of flatMap

public func flatMap<SegmentOfResult : Sequence>(_ transform: (Element) throw
Copy the code

The arguments to the closure in the flatMap are the same element types as those in the Sequence, but their return type is SegmentOfResult. In the function body stereotype definition, the SegmentOfResult type is actually Sequence and the flatMap function returns an array of segmentofResult. Element. In terms of the return value of the function, the difference between the function and map is that flatMap “flattens” the elements in the Sequence and returns an array of the elements in the Sequence, whereas Map returns an array of the closure return type.

Compared with our map, the two main functions of flatMap are flattening and filtering null values.

Here’s another example:

As you can see here, reslut is optional after we use map to do the set operation, so we have to consider more in the process of using result

With flatMap we can get an optional value instead of an optional value

Let’s look at the source code

FlatMap returns an optional value when the closure is applied when an optional value is entered, which is then flattened to return an unpacked result. Essentially, flatMap does an unpack at the optional level, as opposed to Map.

What does it mean to use flatMap to do no extra unpacking when making chain calls? Let’s start by looking at how we use map to make chain calls

What we get here is an optional optional, and we still need to unpack if necessary during the call

When to use compactMap

Use compactMap when the conversion closure returns an optional value and you expect the result to be a sequence of non-optional values.

let arr = [[1, 2, 3], [4, 5]]

let result = arr.map { $0 } 
// [[1, 2, 3], [4, 5]]

let result = arr.flatMap { $0 } 
// [1, 2, 3, 4, 5]

let arr = [1, 2, 3, nil, nil, 4, 5]

let result = arr.compactMap { $0 } 
// [1, 2, 3, 4, 5]
Copy the code

When to use flatMap

Use flatMap when the transformation closure returns a sequence or collection for elements in a sequence and you expect a one-dimensional array.

let scoresByName = ["Hank": [0, 5, 8], "kody": [2, 5, 8]]

let mapped = scoresByName.map { $0.value }
// [[0, 5, 8], [2, 5, 8]] - An array of arrays 
print(mapped)

let flatMapped = scoresByName.flatMap { $0.value } 
// [0, 5, 8, 2, 5, 8] - flattened to only one array
Copy the code

CompactMap function

When to use compactMap: Use compactMap when the conversion closure returns optional values and you expect the result to be a sequence of non-optional values.

When to use flatMap: When the transformation closure returns a sequence or collection of elements in a sequence and you expect a one-dimensional array

The Reduce function

To better understand the working principle of reduce, we try to implement map, flatMap, and filter functions

func customMap(collection: [Int], transform: (Int) -> Int) -> [Int] { 
    return collection.reduce([Int]()){
        var arr: [Int] = $0
        arr.append(transform($1)) 
        return arr
    } 
}

let result = customMap(collection: [1, 2, 3, 4, 5]) { 
    $0 * 2
}
Copy the code

How do I find the maximum value in an array

let result = [1, 2, 3, 4, 5].reduce(0) { 
    return $0 < $1 ? $1 : $0
} 
print(result)
Copy the code

Or how do we reverse the order by reducing

let result = [1, 2, 3, 4, 5].reduce([Int]()){ 
    return [$1] + $0
} 
print(result)
Copy the code