preface

Learning is like rowing upstream; not to advance is to drop back. ‘!!!!! Today I’m going to share a preliminary article about concurrency in Swift.

The original | address

Some basic concepts

Synchronous and asynchronous

When we talk about thread execution, synchronous and asynchronous are the most basic set of concepts in this topic. A synchronous operation means that the thread running the operation is occupied until the function is finally thrown or returned. Before Swift 5.5, all functions were synchronization functions, and we simply declared such a synchronization function using the func keyword:

var results: [String] = []
func addAppending(_ value: String, to string: String) {
    results.append(value.appending(string))
}
Copy the code

AddAppending is a synchronization function. Until it returns, the thread running it cannot perform any other operation, or it cannot be used to run any other function and must wait for the current function to complete before the thread can do anything else.

In iOS development, the UI development framework we use, namely UIKit or SwiftUI, is not thread-safe: processing of user input and drawing of the UI must be done in the Main Runloop tied to the main thread. Assuming that we want the user interface to run at 60 frames per second, the maximum processing time allowed between two draws in the main thread is 16 milliseconds (1/60s). When other operations to be synchronized in the main thread take very little time (like ours)addAppending, which can take as little as a few tens of nanoseconds), which poses no problem. However, if this synchronization takes too long, the main thread will be blocked. It can’t accept user input, nor can it submit requests to the GPU to draw a new UI, which can cause the UI to drop frames or even freeze. Such “time-consuming” operations are common: fetching data from a network request, loading a large file from disk, or performing some very complex encryption and decryption operations.

The following loadSignature reads a string from a network URL: If this operation occurs on the main thread and takes more than 16ms (which is likely because establishing a network connection over a handshake protocol, and receiving data, is a complex set of operations), the main thread will not be able to handle any other operations and the UI will not refresh.

Func loadSignature() throws -> String? {// someURL is a remote URL, such as https://example.com let data = try data (contentsOf: someURL) return String(data: data, encoding: .utf8) }Copy the code

loadSignatureThe final time was over 16 ms, and the processing of UI refreshes or operations had to be delayed. In the user’s view, this will show up as frame drops or the entire interface is stuck. This is one of the things you should definitely avoid in client development.

Prior to Swift 5.5, the most common way to solve this problem was to convert time-consuming synchronous operations to asynchronous operations by putting the actual long-running task on another thread (or background thread) and then providing a callback running on the main thread for UI operations at the end of the operation:

func loadSignature( _ completion: @escaping (String? , Error?) -> Void ) { DispatchQueue.global().async { do { let d = try Data(contentsOf: someURL) DispatchQueue.main.async { completion(String(data: d, encoding: .utf8), nil) } } catch { DispatchQueue.main.async { completion(nil, error) } } } }Copy the code

DispatchQueue.globalResponsible for adding tasks to the global backend dispatch queue. At the bottom,The GCD library(Grand Central Dispatch) will conduct thread scheduling, for the actual time heavyData.init(contentsOf:)Allocate appropriate threads. Time-consuming tasks are processed outside the main thread and processed after completionDispatchQueue.mainSend back the main thread and call it as a resultcompletionCallback method. This way, the main thread is no longer responsible for time-consuming tasks, and UI refreshes and user event handling can be ensured.

Asynchronous operation can avoid lag, but there are many problems in use, including:

  • Error handling is hidden in the arguments to the callback function and is not availablethrowTo explicitly tell and force the calling side to do error handling.
  • Calls to callback functions are not guaranteed by the compiler, and developers may forget to callcompletion, or multiple callscompletion.
  • throughDispatchQueueScheduling threads quickly complicates your code. Without looking at the source code, it is almost impossible to determine the thread state in which the code is currently running, especially if the thread-scheduled operation is hidden in the called method.
  • There is no good cancellation mechanism for executing tasks.

In addition, there are other problems that are not listed. All of these can be breeding grounds for potential bugs in our programs, and we’ll revisit this example and explore these issues in detail in a later section on asynchronous functions.

It should be noted that the loadSignature(_:) method, which takes a callback function as an argument, is itself a synchronous function, although we call the loading of data running in a background thread an asynchronous operation. This method still occupies the main thread until it returns, but its execution time is now very short and uI-related operations are no longer affected.

Before Swift 5.5, there was no real concept of asynchronous functions in the Swift language, and we’ll see how asynchronous functions decorated with async simplify the above code later.

Serial and parallel

Another important set of concepts is serialization and parallelism. For synchronous operations performed through synchronous methods, these operations must occur serially in the same thread. “Do one thing, then move on to the next” is one of the most common ways we humans can understand code execution:

if let signature = try loadSignature() {
  addAppending(signature, to: "some data")
}
print(results)
Copy the code

LoadSignature, addAppending, and Print are called sequentially and occur in strict order within the same thread. This method of execution is called serial.

The synchronization operation performed by the synchronization methodIs a sufficient but not necessary condition for serialization. Asynchronous operations may also be executed serially. Assume that in addition toloadSignature(_:)In addition, we have a function that reads a series of data from the database, which uses a similar method to put the specific work into other threads for asynchronous execution:

func loadFromDatabase( _ completion: @escaping ([String]? , Error?) -> Void ) { // ... }Copy the code

If we first read data from the database, then use loadSignature to fetch signatures from the network after completion, and finally append signatures to each string retrieved from the database, we could write:

loadFromDatabase { (strings, error) in
  if let strings = strings {
    loadSignature { signature, error in
      if let signature = signature {
        strings.forEach { 
                strings.forEach { 
        strings.forEach { 
          addAppending(signature, to: $0)
        }
      } else {
        print("Error")
      }
    }
  } else {
    print("Error.")
  }
}
Copy the code

Although these operations are asynchronous, they (reading [String] from the database, downloading the signature from the network, and finally adding the signature to each piece of data) are still serial, loading the signature must occur after reading the database. The final addAppending must also occur after loadSignature:

Although loadFromDatabase and loadSignature are drawn on the same thread, they may actually be executed on different threads. In the case of the code above, however, the order remains strictly the same.

In fact, there is no dependency between loadFromDatabase and loadSignature, although the final addAppending task requires both raw data and a signature to proceed. If they work together, our program has a good chance of getting faster. At this point, we need more threads to perform both operations simultaneously:

// loadFromDatabase { (strings, error) in // ... // loadSignature { signature, error in { // ... LoadFromDatabase {(strings, error) in //... } loadSignature { signature, error in //... }Copy the code

To ensure that the content loaded from the database and the signatures downloaded from the network are ready when addAppending executes, we need some way to ensure that the data is available. In GCD, you can usually do this using DispatchGroup or DispatchSemaphore. But we are not a book about THE COMMUNIST Party, so we will skip this section.

Both load methods start working at the same time, and theoretically with sufficient resources (sufficient CPU, network bandwidth, etc.) they will now consume less than the sum of the two in serial:

At this time,loadFromDatabase 和 loadSignatureThese two asynchronous operations are executed simultaneously in different threads. For this approach of having multiple sets of resources executing simultaneously, we call itParallel (parallel)

What is Swift concurrency

With those basic concepts in mind, let’s finally talk about the term concurrency. In computer science, parallelism refers to the characteristic of multiple computations being executed simultaneously. The simultaneous execution involved in concurrent computing is mainly the overlap between the start and end times of several operations. It doesn’t care how it’s executed: we can call it concurrency to run multiple operations in the same thread alternately (which requires that the operations be temporarily suspended), and these operations will be run on a timeshare basis; We can also call tasks running in different processor cores concurrent, in which case they must be in parallel.

And when Apple defines what “Swift concurrency” is, it’s not that different from the classic computer science definition above. Swift’s official documentation explains this:

Swift provides built-in support for developers to write asynchronous and parallel code in a structured way… The term concurrency refers to the common combination of asynchrony and parallelism.

So when referring to Swift concurrency, it refers to a combination of asynchronous and parallel code. This is semantically a subset of traditional concurrency: it limits concurrency to asynchronous code, a limitation that makes it easier to understand concurrency. In this book, unless otherwise specified, we refer to Swift concurrency in the simplified sense of “combinations of asynchronous and parallel code,” or specifically to the syntax and framework for dealing with concurrency introduced in Swift 5.5.

With the exception of a slightly different definition, Swift concurrency faces almost the same challenges as other programming languages when dealing with the same problems. From Edsger W. Dijkstra’s conception of semaphore to Sir Tony Hoare’s use of CSP to describe and attempt to solve the philosopher’s dining problem, When it comes to the actor model or channel model, the biggest difficulty of concurrent programming and the problems these tools need to solve are roughly only two:

  1. How do you ensure that interactions or communications between different operational steps are performed in the correct order
  2. How do you ensure that computing resources are securely shared, accessed, and passed between operations

The first problem is about concurrency logic correctness, and the second problem is about concurrency memory security. In the past, developers often needed a lot of experience to write concurrent code using GCD, otherwise it was difficult to get the above issues right. Swift 5.5 designs the writing method of asynchronous functions. On this basis, structured concurrency is used to ensure correct interaction and communication of computing steps, and actor model is used to ensure that shared computing resources can be accessed and operated correctly in isolation. Together, they provide a set of tools that make it easy for developers to write stable and efficient concurrent code. We’ll briefly glance at these parts and explore each topic in a few glances.

Dykstra also published the famous “Go To Statement Considered Harmful” and, together with Sir John Hall, promoted the development of structured programming. Hall later argued against NULL as well, eventually leading to Optional (or something like Maybe or NULL Safety) designs that are common in modern languages. Without them, we would be writing code today dealing with endless goto and NULL checks, which would be a lot harder.

An asynchronous function

To solve these two problems more easily and elegantly, Swift needs to introduce new tools at the language level: the first step is to add the concept of asynchronous functions. To declare a function as an asynchronous function, add the async keyword to the return arrow of the function declaration:

Func loadSignature() async throws -> String {fatalError(" not implemented ")}Copy the code

The async keyword for asynchronous functions helps the compiler ensure two things:

  1. It allows us to use it inside the function bodyawaitKey words;
  2. It requires others to use theawaitThe keyword.

This is somewhat similar to the throws keyword in a similar position. When using throws, it allows us to throw an error inside a function using a throw and requires the caller to use a try to handle possible throws. Async also plays a role in requiring the current function to be marked under certain circumstances, which is a clear indication to the developer that the function has some special properties: Try /throw means that the function can be thrown, and await means that the function may abandon the current thread at this point, which is a potential pause point for the program.

The ability to drop threads means that asynchronous methods can be “paused” and the thread can be used to execute other code. If the thread is the main thread, the interface will not stall. The await statement will be allocated to another appropriate thread by the underlying mechanism, and after execution, the previous “pause” will end, and the asynchronous method will start after the await statement and continue.

We’ll cover asynchronous function design and more in depth in a later section. Here, let’s look at the use of a simple asynchronous function. There are already asynchronous functions in the Foundation framework, such as using URLSession to load data from a URL, and there are now asynchronous versions as well. In an asynchronous function flagged by async, we can call other asynchronous functions:

func loadSignature() async throws -> String? {
  let (data, _) = try await URLSession.shared.data(from: someURL)
  return String(data: data, encoding: .utf8)
}
Copy the code

Some of these asynchronous functions in Foundation, or in AppKit or UIKit, are rewritten and newly added, but more often they are converted from the corresponding Objective-C interface. Objective-c functions that meet certain conditions can be directly converted into Swift asynchronous functions, which is very convenient. We’ll talk more about that in a later chapter.

If we write loadFromDatabase as an asynchronous function as well. So, in the serial section above, the original nested asynchronous operation code:

loadFromDatabase { (strings, error) in
  if let strings = strings {
    loadSignature { signature, error in
      if let signature = signature {
        strings.forEach { 
                strings.forEach { 
        strings.forEach { 
          addAppending(signature, to: $0)
        }
      } else {
        print("Error")
      }
    }
  } else {
    print("Error.")
  }
}
Copy the code

It can be written very simply like this: let strings = try await loadFromDatabase() if let signature = try await loadSignature() { strings.forEach { addAppending(signature, to: $0) } } else { throw NoSignatureError() }

Needless to say, the number of lines of code alone can tell the pros and cons. Asynchronous functions greatly simplify writing asynchronous operations by avoiding embedded callbacks and writing asynchronous operations sequentially as “synchronous execution.” In addition, this notation allows us to handle errors using a try/throw combination, and the compiler guarantees all return paths, rather than constantly checking to see if all paths are handled, as callbacks do.

Structured concurrency

For synchronous functions, threads determine the execution environment. For asynchronous functions, the execution environment is determined by the Task. Swift provides a set of task-related apis that allow developers to create, organize, check, and cancel tasks. These apis build a structured Task tree around the core type of Task for each set of concurrent tasks:

  • A task with its own priority and de-identification can have several subtasks and perform asynchronous functions in them.
  • When a parent task is cancelled, the cancellation flag of the parent task is set and passed down to all child tasks.
  • The child task reports the result up to the parent task whether it completes properly or throws an error, and the parent task does not complete until all of the child tasks complete (either completes properly or throws).

These features look similar to the Operation class, but Task can be expressed more succinctly by directly using the syntax of asynchronous functions. Operation relies on subclasses or closures.

When calling an asynchronous function, you need to prefix it with the await keyword; On the other hand, only in asynchronous functions can we use the await keyword. So the question is, where does the context in which the first asynchronous function is executed, or the root of the task tree, come from?

Simply using task.init lets us get the context in which a Task is executed, which accepts a closure of the async flag:

	
struct Task<Success, Failure> where Failure : Error {
  init(
    priority: TaskPriority? = nil, 
        priority: TaskPriority? = nil, 
    priority: TaskPriority? = nil, 
    operation: @escaping @Sendable () async throws -> Success
  )
}
Copy the code

It inherits features such as the priority of the current task context and creates a new task root node where we can use asynchronous functions:

var results: [String] = []

func someSyncMethod() {
  Task {
    try await processFromScratch()
    print("Done: (results)")
  }
}

func processFromScratch() async throws {
  let strings = try await loadFromDatabase()
  if let signature = try await loadSignature() {
    strings.forEach {
      results.append($0.appending(signature))
    }
  } else {
    throw NoSignatureError()
  }
}
Copy the code

Note that processing in processFromScratch is still serial: await loadFromDatabase will cause the asynchronous function to pause here until the actual operation is complete, and loadSignature will then be executed:

We would certainly want both operations to be done at the same time. After both are ready, appending is called to actually attach the signature to the data. This requires tasks to be organized in a structured way. This can be done using the Async let binding:

	
func processFromScratch() async throws {
  async let loadStrings = loadFromDatabase()
  async let loadSignature = loadSignature()

  results = []

  let strings = try await loadStrings
  if let signature = try await loadSignature {
      strings.forEach {
        addAppending(signature, to: $0)
      }
  } else {
    throw NoSignatureError()
  }
}
Copy the code

An async let, called an asynchronous binding, creates a new subtask in the context of the current Task and uses it as the running environment for the bound asynchronous function (that is, the expression on the right side of the async let). Unlike task.init, which creates a root node for a Task, async lets create subtasks that are leaf nodes in the Task tree. An operation bound asynchronously starts execution immediately, and even if execution is completed before await, the result can wait to be evaluated at await statement. In the example above, loadFromDatabase and loadSignature will be executed concurrently.

Compared with the concurrency of GCD scheduling, structured concurrency based on task has unique advantages in controlling concurrent behavior. To demonstrate this advantage, we can try to make things a little more complicated. The processFromScratch above does the work of loading data locally, getting signatures from the network, and finally attaching signatures to each piece of data. Assuming we’ve probably done this before and already have all the results stored on the server, we have the opportunity to try to load them directly as an “optimization path” while performing local calculations, avoiding duplicate local calculations. Similarly, an asynchronous function can be used to represent the operation of “loading results directly from the network” :

Func loadResultremotelys () async throws {// Simulation of time spent on network loading await task.sleep (2 * NSEC_PER_SEC) results = ["data1^sig", "data2^sig", "data3^sig"] }Copy the code

In addition to async lets, another way to create structured concurrency is to use Task groups. For example, if we want processFromScratch to run simultaneously with load ResultreMotelys, we could write the two operations in the same task group with withThrowingTaskGroup:

	
func someSyncMethod() {
  Task {
    await withThrowingTaskGroup(of: Void.self) { group in
      group.addTask {
        try await self.loadResultRemotely()
      }
      group.addTask(priority: .low) {
        try await self.processFromScratch()
      }
    }          
        }          
    }          
    print("Done: (results)")
  }
}
Copy the code

For processFromScratch, we give it a special priority of.low, which causes the task to be scheduled in another low-priority thread. We’ll see the effect of this in a moment.

WithThrowingTaskGroup and its non-throwing version withTaskGroup provide another way to organize for creating structured concurrency. When the number of tasks is known at run time, or when we need to set different priorities for different subtasks, we will have to use Task Groups. In most other cases async lets and task groups can be used interchangeably or even interchangeably:

The group in the closure satisfies the AsyncSequence protocol, which allows us to access the results of asynchronous operations in a synchronous looping style using “for await” notation. Also, by calling cancelAll for the group, we can mark the task as cancelled when appropriate. For example, we can cancel processFromScratch in progress to save computing resources when loadResultRemotely returns soon. Asynchronous sequences and task cancellations are topics that we will explore later in a special section.

Actor models and data isolation

In processFromScratch, we set results to [], then process each piece of data and add the result to results:

	
func processFromScratch() async throws {
  // ...
    results = []
    strings.forEach {
      addAppending(signature, to: $0)
    }
  // ...
}
Copy the code

In the loadResultreMotelys example, we now assign results directly to results:

func loadResultRemotely() async throws {
  await Task.sleep(2 * NSEC_PER_SEC)
  results = ["data1^sig", "data2^sig", "data3^sig"]
}
Copy the code

So, in general, we would think that regardless of the order in which processFromScratch and load ResultreMotelys are executed, we should always get uniquely determinable results, Data [” datA1 ^sig”, “datA2 ^sig”, “data3^sig”]. But in fact, if we tweaked the task.sleep time of loadResultRemotely to match the time taken by processFromScratch, we might see unexpected results. In addition to correctly printing three elements, it sometimes prints six:

["data1^sig", "data2^sig", "data3^sig", "datA1 ^sig"," datA2 ^sig", "data3^sig"]Copy the code

We assign different priorities to the two tasks when addTask, so the code in them will run on different scheduling threads. Two asynchronous operations access results on different threads at the same time, causing a data race. In the result above we can interpret it as processFromScratch sets results to an empty sequence and loadResultreMotelys to the correct result, ForEach in processFromScratch then adds the calculated three signatures.

That’s probably not what we want. Fortunately, the two operations don’t actually change the memory of results at the same time, they are still sorted, so it’s just the final data that’s a bit weird.

ProcessFromScratch and LoadResultreMotelys operated on variable Results in different task environments. Because these two operations are executed concurrently, an even worse situation can occur: they operate on Results at the same time. If the underlying store of Results is changed by multiple operations at the same time, we get a runtime error. As an example (though not very practical), it is easy to crash the program by increasing the number of times someSyncMethod is run:

for _ in 0 .. < 10000 {someSyncMethod()} // Runtime crash. Thread 10: EXC_BAD_ACCESS (code=1, address= 0x55a8fDBc060c) Thread 10: EXC_BAD_ACCESS (code=1, address= 0x55a8fDBc060c)Copy the code

To ensure that resources (in this case, the memory that Results points to) are safely shared and accessed between operations, it used to be common to place the associated code in a serial Dispatch queue and dispatch access to the resource synchronously. This prevents multiple threads from accessing the resource at the same time. We can do some refactoring along these lines, putting results into a new Holder type and protecting it with a private DispatchQueue:

class Holder {
    private let queue = DispatchQueue(label: "resultholder.queue")
    private var results: [String] = []
    
    func getResults() -> [String] {
        queue.sync { results }
    }
    
    func setResults(_ results: [String]) {
        queue.sync { self.results = results }
    }
    
    func append(_ value: String) {
        queue.sync { self.results.append(value) }
    }
}
Copy the code

Next, the run-time crash can be resolved by replacing the original use of Results: [String] in the code with Holder and replacing the original direct operations on Results with exposed methods.

// var results: [String] = []
var holder = Holder()

// ...
// results = []
holder.setResults([])

// results.append(data.appending(signature))
holder.append(data.appending(signature))

// print("Done: (results)")
print("Done: (holder.getResults())")
Copy the code

This pattern is very common when using GCD for concurrent operations. But it has some problems that are hard to ignore:

  1. Large and error-prone template code: whatever is involvedresultsOperation, both need to be usedqueue.syncBut the compiler doesn’t give us any guarantee. Memory is still a danger in cases where queues are forgotten and the compiler does not prompt. Code complexity explodes as there are more resources to protect.
  2. Watch out for deadlocks: in onequeue.syncCall another one inqueue.syncCauses a thread deadlock. This is easy to avoid when the code is simple, but as the complexity increases, it becomes difficult to understand which queue the code is currently running on and on which thread it is running. It must be carefully designed to avoid repeated distribution.

To some extent, async can be used instead of Sync dispatch to alleviate deadlock issues; Or abandon queues and use locks (such as NSLock or NSRecursiveLock) instead. Either way, it requires a deep understanding of thread scheduling and the shared memory based data model, otherwise it’s easy to write a lot of potholes.

Swift concurrency addresses these issues by introducing a new data-sharing model, the Actor model, that has been proven to work many times in the industry. The simplest way to think of an actor, though biased, is as a class that “encapsulates a private queue.” Change the class from Holder to actor and remove the relevant part of queue to get an actor type. This type is similar to class in that it has reference semantics and defines properties and methods in the same way as a normal class:

actor Holder {
  var results: [String] = []
  func setResults(_ results: [String]) {
    self.results = results
  }
    
  func append(_ value: String) {
    results.append(value)
  }
}
Copy the code

This actor implementation of “auto-file” is much cleaner than the “manual block” class protected by private queues. An isolation domain is provided inside actors: access to their own stored properties or other methods, such as using results in the append(_:) function, is unrestricted and automatically isolated in a encapsulated “private queue.” But when members of an actor are accessed externally, the compiler will require a switch to the actor’s isolation domain to ensure data security. When this request occurs, the currently executing program may pause. The compiler will automatically convert functions to asynchronous functions that cross isolated domains and ask us to call them with await.

Actors don’t actually hold a private queue in the underlying implementation, but for now, you can simply understand it that way. We’ll explore this in more depth later in the book.

When we convert the Holder from class to actor, the original call to the Holder also needs to be updated. In simple terms, when accessing the related member, add await:

// holder.setResults([])
await holder.setResults([])

// holder.append(data.appending(signature))
await holder.append(data.appending(signature))

// print("Done: (holder.getResults())")
print("Done: (await holder.results)")
Copy the code

Accessing holder in a concurrent environment no longer causes a crash. However, even with Holder, whether based on DispatchQueue or actor, the above code may still have more than three elements in the result. This is expected: data isolation only addresses memory problems caused by simultaneous access (in Swift, this unsafe behavior is most often represented by program crashes). The data correctness here is related to the reentrancy of the actor. To understand reentrant properly, we must first understand more about the nature of asynchronous functions, so we will cover this topic in a later chapter.

In addition, the actor type does not yet provide the means to specify how it should work. While we can use @MainActor to ensure isolation of UI threads, for general actors, we can’t yet specify which threads should be run in which way the isolation code should run. We’ll also see data models for global actors, nonisolated tags, and actors later.

summary

I think that’s enough for this chapter. We start with basic concepts, show the difficulties that can arise when handling concurrent programs using GCD or some other “primitive” means, and then go on to describe how to handle and solve these problems in Swift concurrency.

Although there are many concepts involved in Swift concurrency, the boundaries of various modules are clear:

  • Asynchronous functions: Provides syntactic tools to express asynchronous behavior in a more concise and efficient way.
  • Structured concurrency: Provides a concurrent runtime environment responsible for the correct order of function scheduling, cancellation, and execution, as well as the life cycle of tasks.
  • Actor model: Provides well-wrapped data isolation to secure concurrent code.

Being familiar with these boundaries helps us clearly understand the design intent of the various parts of Swift concurrency so that the tools we have can be used in the right place. As an overview, in this chapter the reader should have seen how to write concurrent code using Swift concurrency tools. The rest of the book will explore each module in greater depth in order to expose more details hidden beneath the macro concept.

The ios data | address