WWDC21 Swift concurrency: Behind the scenes

Introduction to the

Dig into the details of Swift concurrency and discover how Swift can provide greater security while improving performance, avoiding data contention and thread explosions. We’ll look at how Swift tasks differ from Grand Central Dispatch, how the new cooperative threading model works, and how to ensure optimal performance for your application. To get the most out of this conference, We recommend first watching “Meet Async /await in Swift”, “Explore Structured Concurrency in Swift”, and “Protect Mutable State with Swift” Actors “.

preface

Today, we talk to you about some of the basic nuances of concurrency in Swift. This is an advanced lecture that builds on earlier lectures on concurrency in Swift. If you are not familiar with the concepts of async/await, structured concurrency, and actors, I encourage you to take a look at other lectures first. In previous lectures on Concurrency in Swift, you learned about the various language features native to Swift this year and how to use them. In this lecture, we will take an in-depth look at why these primitives are designed the way they are, not only for language security, but also for performance and efficiency. As you try and adopt Swift concurrency in your own applications, we hope this lecture gives you a better mental model of how to reason about Swift concurrency and how it interconnects with existing thread libraries such as Grand Central Dispatch.

We’re going to discuss a couple of things today.

First, we’ll discuss the threading model behind Swift concurrency and compare it to Grand Central Dispatch. We’ll talk about how we can leverage the features of the concurrent language to create a new thread pool for Swift for better performance and efficiency.

Finally, in this section, we’ll talk about considerations when porting code to use Swift concurrency.

Then, we’ll talk about synchronization through actors in Swift concurrency. We’ll talk about how actors work and how they compare to existing synchronization primitives you’re probably already familiar with, such as serial scheduling queues,

Finally, some things to be aware of when writing code with actors.

We have a lot to cover today, so let’s get right to it.

Threading model

Grand Central Dispatch

In our discussion of the threading model today, we’ll start by looking at a sample application written with today’s available technologies, such as Grand Central Dispatch. Then, we’ll look at how the same application behaves when rewritten with Swift concurrency.

Let’s say I want to write my own news feed reader application.

Let’s talk about what the advanced components of my application are.

My application will have a main thread that drives the user interface.

I’ll have a database to keep track of the news feeds to which users subscribe, and finally a subsystem to handle the network logic to get the latest content from the news feeds.

Let’s consider how to construct this application with the Grand Central Dispatch queue.

Let’s say the user asks to see the latest news.

On the main thread, we will handle gestures for user events.

From here, we will dispatch requests asynchronously to a serial queue that handles database operations.

The reasons for this are twofold.

First, by assigning work to different queues, we ensure that the main thread remains responsive to user input even while waiting for potentially large amounts of work to occur.

Second, access to the database is protected because a serial queue guarantees mutual exclusion.

In the database queue, we will traverse the news feeds to which the user subscribed and schedule a network request for each news feed to our URLSession to download the content from that source.

When the result of the network request occurs, the callback to URLSession will be invoked in our delegate queue, which is a concurrent queue. In the completion handler for each result, we synchronously update the latest request for each feed in the database so that it can be cached for future use. Finally, we wake up the main thread to refresh the user interface.

This seems like a perfectly reasonable way to construct such an application. We have ensured that the main thread is not blocked while processing the request. By processing network requests concurrently, we have taken advantage of the parallelism inherent in our programs. Let’s take a closer look at a code snippet that shows how we process the results of a network request.

First, we create an URLSession to perform the download from the news feed. As you can see here, we have set the delegate queue for this URLSession to be a concurrent queue.

We then iterate through all the news feeds that need to be updated and schedule a data task in URLSession for each news feed. In the completion handler of the data task, which will be invoked in the delegate queue, we deserialize the downloaded results and format them into articles.

We then schedule synchronously against our database queues before updating the feed results.

So here you can see that we’ve written some linear code to do something fairly straightforward, but this code has some hidden performance pitfalls.

To further understand these performance issues, we need to first take a deeper look at how threads handle the work of GCD queues.

At Grand Central Dispatch, when work is queued, the system calls up a thread to process the work item.

Since a concurrent queue can process multiple work items at the same time, the system starts multiple threads until all of our CPU cores are saturated.

However, if a thread is blocked — as seen here on the first CPU core — and there is more work to be done on the concurrent queue, the GCD will take up more threads to exhaust the remaining work items.

The reasons for this are twofold.

First, by giving your process another thread, we can ensure that each core has a thread performing work at all times. This gives your application a good, consistent level of concurrency.

Second, the blocked thread may be waiting for a resource, such as a semaphore, before making further progress. A new thread brought into the queue to continue working might help unlock a resource that was waiting for the first thread.

Now that we know more about thread promotion in GCD, let’s go back and look at the CPU execution code in our news application.

On a dual-core device like the Apple Watch, the GCD first brings out two threads to process news updates. When these threads are blocked from accessing the database queue, more threads are created to continue processing the network queue. The CPU must then perform a context switch between the different threads processing the network results, as shown in the white vertical line between the different threads.

This means that in our news app, we can easily have a very large number of threads. If the user has a hundred Feeds to update, each URL data task will have a completion block in the concurrent queue when the network request completes. As each callback blocks on the database queue, the GCD brings out more threads, causing the application to have too many threads.

Now you might ask, what’s so bad about having a lot of threads in our application? Having a large number of threads in our application meant that the system overcommitted itself to threads beyond our CPU core.

Consider an iPhone with six CPU cores. If our news app has a hundred feed updates to process, that means we’ve overconfigured the iPhone with 16 times more threads than the core. This is what we call a thread explosion. Some of our previous WWDC talks have gone into further detail on the risks associated with this, including the possibility of deadlocks in your application. Thread explosions also come with memory and scheduling costs that may not be immediately apparent, so let’s take a closer look.

Looking back at our news application, each blocked thread is holding on to precious memory and resources as it waits to run again.

Each blocked thread has a stack and an associated kernel data structure to track that thread. Some of these threads may hold locks that other running threads may need. This is a huge resource and memory footprint for threads that are not progressing. There is also greater scheduling overhead due to thread explosions. As the new thread appears, the CPU needs to perform a full thread context switch to switch from the old thread to start executing the new thread. When the blocked thread becomes runnable again, the dispatcher must time the thread on the CPU so that they can all make progress.

Now, if this happens only a few times, thread timesharing is fine — that’s the power of concurrency. However, when there is a thread explosion, having to share hundreds of threads time-sharing on a device with a limited core can lead to excessive context switching. The scheduling delay of these threads exceeds the amount of useful work they can do and, as a result, the CPU becomes less efficient.

As we’ve seen so far, it’s easy to miss some of the nuances of thread hygiene when writing applications using GCD queues, leading to poor performance and greater overhead.

Concurrency in Swift

Based on this experience, Swift takes a different approach to designing concurrency in the language. We also built Swift concurrency with performance and efficiency in mind, so your applications can enjoy controlled, structured, and secure concurrency. With Swift, we want to change the execution mode of the application from the following mode, which has a lot of thread and context switching, to this mode.

Here you can see that only two threads are executing on our dual-core system, and there is no thread context switch. All of our blocked threads are gone, replaced by a lightweight object called a continuation to track the resumption of work. When threads perform work under Swift concurrency, they switch between **cont.** instead of performing a full thread context switch. This means that we now only have to pay the cost of the function call.

Therefore, we want the runtime behavior of Swift concurrency to be that only as many threads are created as CPU cores, and that threads can switch cheaply and efficiently between work items when blocked. We want you to write linear code that is easy to reason with, and provide you with safe, controllable concurrency.

In order to achieve the kind of behavior we are pursuing, the operating system needs a runtime contract that threads do not block, and this is only possible if the language can provide us with such a contract. Thus, Swift’s concurrency model and the semantics surrounding it were designed with this goal in mind. To that end, I want to delve into two features at the Swift language level that enable us to maintain a contractual relationship with the runtime.

The first is the semantics from await and the second is the tracing of task dependencies from the Swift runtime.

Let’s consider these language features in the example of a news application.

This is the code snippet we walked through earlier, which processes the results of our news feed update. Let’s take a look at what this logic looks like when written in the Swift concurrency primitive.

We’ll start by creating an asynchronous implementation of the helper function. Instead of processing the results of network requests in a concurrent scheduling queue, we use a task group here to manage our concurrency. Within the task group, we will create subtasks for each feed that needs to be updated. Each subtask will perform a download from the Feed’s URL using a shared URLSession. It will then deserialize the downloaded results, format them into articles, and finally, we will call an asynchronous function to update our database. Here, when any asynchronous function is called, we annotate it with an await keyword.

From the lecture “Meet async/await in Swift “, we know that await is an asynchronous wait. That is, it does not block the current thread while waiting for the result of an asynchronous function. Instead, the function might be paused and the thread freed to perform other tasks.

How did this happen? How do you drop a thread? My colleague Varun will now illustrate how this is achieved under the hood of the Swift when it is running.

Await and non-blocking of threads

Before discussing how asynchronous functions are implemented, let’s quickly review how non-asynchronous functions work.

In a running program, each thread has a stack that stores the status of function calls.

Now let’s focus on one thread.

When a thread executes a function call, a new stack frame is pushed onto its stack.

This newly created stack frame can be used by functions to store local variables, return addresses, and any other information needed.

Once the function completes and returns, its stack frame is ejected.

Now let’s consider asynchronous functions.

Suppose a thread calls an add(:) method on the Feed type from the updateDatabase function. At this stage, the most recent stack frame will be add(:).

This stack frame stores local variables that do not need to cross the pause point. The body of add(_:) has a pause point, marked with await. Local variables, ID and article, are used in the body of the for loop as soon as they are defined, without any pause points in between. So they’re going to be stored in this stack frame.

In addition, the heap will have two asynchronous call frames, one for updateDatabase and one for Add. The information stored by an asynchronous call frame really needs to be available between pause points.

Note that the newArticles parameter is defined before await, but needs to be available after await.

This means that add’s asynchronous call frames keep track of newArticles.

Assume that the thread continues.

When the save function starts executing, the stack frame for add is replaced by the stack frame for Save.

Instead of adding a new stack frame, replace the top stack frame, because any future variables needed are already stored in the list of asynchronous call frames.

The save function also gets an asynchronous call frame to use.

It would be nice if the thread did something useful instead of being blocked when the article was saved to the database.

Suppose the execution of the save function is paused. Instead of being blocked, threads are reused to do some other useful work.

Because all information that crosses the pause point is stored on the heap, it can be used to continue execution at a later stage.

This list of asynchronous call frames is a runtime representation of a Continuation.

Let’s say after a while, the database request completes, and let’s say some threads are released.

This could be the same thread as before, or it could be a different thread.

Once it completes and returns some ids, the stack frame for Save is replaced again by the stack frame for Add.

After that, the thread can begin to execute the ZIP.

Compressing the two arrays is a non-asynchronous operation, so it creates a new stack frame.

As Swift continues to use the operating system stack, both asynchronous and non-asynchronous Swift code can be efficiently called to C and Objective-C.

In addition, C and Objective-C code can continue to effectively call non-asynchronous Swift code.

Once the ZIP function completes, its stack frame will be popped up and execution will continue.

So far I have described how await is designed to ensure efficient pausing and resuming while freeing the thread’s resources to do other work.

Tracking of dependencies in Swift task model

As mentioned earlier, a function can be decomposed into Continuations at a waiting point (also known as a potential pause point).

In this case, the URLSession data task is an asynchronous function, and the rest of the work after it is Continuations. Continuations can only be performed after the asynchronous function has completed.

This is a dependency tracked by the Swift concurrent runtime.

Similarly, in a task group, a parent task may create several sub-tasks, each of which needs to be completed before the parent task can proceed. This is a dependency that is expressed in your code through the scope of the task group and is therefore explicitly known by the Swift compiler and runtime. In Swift, tasks can only wait for other tasks known to Swift at runtime — whether Continuations or subtasks.

Thus, when building code with Swift’s concurrent primitives, runtime clearly understands the chain of dependencies between tasks.

So far, you’ve seen how Swift’s language features allow tasks to be paused while waiting.

In contrast, the thread of execution can reason about the dependencies of a task and pick up a different task.

This means that code written with Swift concurrency can maintain a runtime contract that the thread can always progress.

We have leveraged this runtime contract to build integrated operating system support for Swift concurrency.

This is in the form of a new cooperative thread pool that supports Swift concurrency as the default executor.

The new thread pool will only generate the same number of threads as the CPU kernel, ensuring that the system is not overcommitted.

Unlike GCD’s concurrent queues, which spawn more threads when a work item is blocked, Swift’s threads can always move forward. Therefore, the default runtime wisely controls the number of threads generated.

This allows us to give your application the concurrency you need, while ensuring that we avoid the known pitfalls of excessive concurrency.

In an earlier WWDC lecture on Grand Central Dispatch concurrency, we suggested that you structure your application into different subsystems and maintain a serial scheduling queue in each subsystem to control the concurrency of your application. This means it’s hard to get more than one concurrency in a subsystem without the risk of exploding threads.

In Swift, the language gives us powerful immutability that the runtime takes advantage of to transparently give you better control over concurrency in the default runtime.

Now that you know more about Swift’s concurrent threading model, let’s take a look at some of the issues to be aware of when adopting these exciting new features in your code.

Adoption of Swift concurrency

The first consideration you need to keep in mind relates to performance when converting synchronous to asynchronous code. Earlier, we talked about some of the costs associated with concurrency, such as additional memory allocation and logic for the Swift runtime. Therefore, it is important to note that you write new code with Swift concurrency only if the cost of introducing concurrency into your code exceeds the cost of managing it.

The code snippet here may not actually benefit from the extra concurrency that spawning a subtask, just to read a value from the user’s default value. This is because the useful work done by subtasks is undermined by the cost of creating and managing tasks. Therefore, we recommend that when using Swift concurrency, your code be analyzed with instrumental system tracing to understand its performance characteristics.

The second thing to note is the atomicity concept around await.

Swift does not guarantee that the thread executing code before await is also the thread to be continued.

In fact, await is a definite point in your code indicating that atomicity has been broken because the task may be voluntarily unscheduled.

Therefore, you should be careful not to hold locks while waiting.

Similarly, thread-specific data is not kept in await.

Any assumptions in your code that expect thread positioning should be reexamined to take into account the pause behavior of await.

Finally, the final consideration relates to the runtime contract, which underpins the efficient threading model in Swift.

To recall, in Swift, the language allows us to adhere to a runtime contract that threads can always move forward.

It is based on this contract that we set up a cooperative thread pool as the default executor for Swift.

When you adopt Swift concurrency, make sure you also continue to maintain this contract in your code so that the cooperative thread pool works optimally.

By using secure primitives that make dependencies explicit and known in your code, it is possible to maintain this contract in a cooperative thread pool.

With Swift concurrency primitives such as await, Actors, and task groups, these dependencies are already known at compile time. Therefore, the Swift compiler enforces this and helps you preserve the runtime contract.

Primitives like OS_UNFAIR_LOCKS and NSLocks are also safe, but you need to be careful when using them. Using locks in synchronized code is safe when used to synchronize data around a tight, well-known critical part. This is because the thread holding the lock is always able to make progress in releasing the lock. Thus, while this primitive may block threads for a short period of time in a race, it does not violate the run-time contract to move forward. It is worth noting that, unlike the Swift concurrency primitive, there is no compiler support to help use locks properly, so it is your responsibility to use this primitive correctly.

On the other hand, primitives such as semaphores and conditional variables are not safe to use in Swift concurrency. This is because they hide dependency information from the Swift runtime, but introduce dependencies when executing in your code. Because the runtime is unaware of this dependency, it is unable to make the right scheduling decisions and resolve these problems. In particular, do not use primitives that create unstructured tasks and then retroactively introduce dependencies across task boundaries by using semaphores or insecure primitives. Such a code pattern means that one thread can block the semaphore indefinitely until another thread can unblock it. This violates the runtime contract for thread advancement.

To help you identify the use of such insecure primitives in your code base, we recommend testing your application with the following environment variables. This will run your application during a modified debug run that enforces the forward invariant. This environment variable can be set in Xcode in the Run Arguments pane of your project schema, as shown.

When running your application with this environment variable, if you see a thread from the cooperative thread pool that appears to be suspended, this indicates that an unsafe blocking primitive was used.

Synchronization

mutual exclusion

Now that you’ve seen how the threading model was designed for Swift concurrency, let’s look at the primitives that can be used to synchronize state in this new world.

In the Swift lecture on actors concurrency, you’ve seen how actors can be used to protect mutable state from concurrent access.

In other words, Actors provide a powerful new synchronization primitive that you can use.

To recap, actors are guaranteed to be mutually exclusive: an Actor can only perform at most one method call at a time. Mutually exclusive means that the state of the Actor is not accessed simultaneously, preventing data contention.

Let’s see how actors compare to other forms of mutexes.

Consider the previous example of updating some articles in a database by synchronizing to a serial queue. If the queue is not running, we say there is no race. In this case, the calling thread is reused to execute the new work item on the queue without any context switch. Conversely, if a sequence queue is already running, it is said to be in contention. In this case, the calling thread is blocked. This blocking behavior is what causes threads to explode as described earlier in the talk. Locks also behave this way.

Due to problems related to blocking, we generally recommend that you use Dispatch Async. The main benefit of Dispatch Async is that it is non-blocking. Therefore, even in the case of contention, it does not cause the thread to explode. The disadvantage of using Dispatch Async in serial queues is that when there is no contention, the Dispatch needs to request a new thread to do the asynchronous work, while the calling thread continues to do something else. Therefore, frequent use of Dispatch Async results in excessive thread wakes and context switches.

Which brings us to actors.

Swift’s actors combine the best of both worlds by taking advantage of cooperative thread pools for efficient scheduling. When you call a method on an Actor that is not running, the calling thread can be reused to perform the method call. With the called Actor already running, the calling thread can pause the function it is executing and pick up other work.

Let’s see how these two properties work in the example of the news application. Let’s focus on the database and network subsystems.

When the application is updated to use Swift concurrency, the serial queue of the database is replaced by a database Actor. The network’s concurrent queues can be replaced by one Actor per news feed. For simplicity, I’ve only shown three actors here — sports actors, weather actors, and health actors — but in practice, there will be more. These actors will run in a pool of cooperative threads. FeedActor interacts with the database to save articles and perform other actions. This interaction involves performing a switch from one Actor to another.

We call this process an Actor jump. Let’s talk about how Actor jumps work.

Suppose the Actor for the sports channel is running on a thread in the cooperative thread pool and decides to save some articles to the database.

Now, let’s consider that the database is not being used. This is a situation where there is no competition.

Threads can jump directly from the sports channel Actor to the database Actor.

There are two things to note here. First, the thread does not block when it jumps to the Actor. Second, the jump does not require a different thread; The runtime can directly pause the work item of the sports show Actor to create a new work item for the database Actor.

Suppose the database Actor has been running for a while, but it hasn’t finished its first work item. At this point, suppose the weather forecast Actor tries to save some articles in the database.

This creates a new work item for database Actor. Actors ensure security by ensuring mutual exclusion; At most, only one work item is active at any given time. Since there is already an active work item D1, the new work item D2 will be retained.

Actors are also non-blocking. In this case, the weather forecast Actor is paused, and the thread it was executing is now freed up to do other work.

After a while, the original database request completes, so the active work item for the database Actor is removed.

At this point, the runtime can choose to start executing pending work items for database actors. Or it can choose to restore a carry Actor. Or it can do something else on the freed thread.

Reentrancy and prioritization

When there is a lot of asynchronous work, especially when there is a lot of debate, the system needs to make trade-offs based on what work is more important. Ideally, high-priority work, such as work involving user interaction, will take precedence over background work, such as saving backups. Because of the concept of reentrant, actors are designed to allow systems to prioritize their work nicely. But to understand why reentrancy is important here, let’s first look at how GCDS handle priorities.

Consider a raw news application with a serial database queue. Suppose the database receives some high-priority work, such as getting the latest data to update the user interface. It also receives low-priority work, such as backing up databases to iCloud. This needs to happen at some point, but not necessarily immediately. As the code runs, new work items are created and added to the database queue in some staggered order. The Dispatch Queue executes received items in a strict first-in, first-out order. Unfortunately, this means that after project A is executed, five low-priority projects need to be executed before moving on to the next high-priority project. This is called priority inversion.

Serial queues solve the problem of priority inversion by increasing the priority of all work in the queue that precedes higher-priority work. In practice, this means that the work in the queue will get done faster. However, this does not address the main problem, which is that items 1 to 5 still need to be completed before project B can begin execution. Solving this problem requires changing the semantic model away from strict FIFO.

This brings us to reentrant with actors. Let’s explore how reentrant relates to sorting through an example.

Consider database actors that execute on a thread. Suppose it is paused, waiting for some work, and the sports Actor starts executing on that thread. Suppose that after a while, the sports channel’s Actor calls the database Actor to hold some articles. Because the database Actor is unrequited, threads can jump to the database Actor even though it has a work item to work on. To perform the save operation, a new work item is created for the database Actor.

This is what actor reentrancy means; When one or more old work items on an Actor are paused, new work items on that Actor can progress. Actors remain mutually exclusive: at most one work item can be executing at a given time.

After some time, project D2 will complete execution. Note that D2 completes execution before D1, even though it was created after D1. Therefore, reentrant support for actors means that actors can execute items in a non-strictly first-in, first-out order.

Let’s look at the previous example again, but with a database Actor instead of a sequence queue. First, work item A will be executed because it has A high priority. Once executed, the same priority inversion occurs as before.

Because actors are designed for reentrant, the runtime can choose to move higher-priority items to the front of the queue, ahead of lower-priority items. This way, the higher-priority work can be performed first and the lower-priority work later. This directly addresses the problem of priority inversion, allowing for more efficient scheduling and resource utilization. I’ve talked a little bit about how actors that use cooperative thread pools are designed to maintain mutually exclusive and support efficient work priorities.

Main actor

Another type of Actor, the MainActor, is a little different because it abstracts an existing concept in the system: the main thread.

Consider the example of a news application that uses actors.

When updating the user interface, you need to make calls to and from MainActor. Since the main thread is unrelated to threads in the partner thread pool, this requires a context switch. Let’s take a look at the performance impact of this with a code example. Consider the following code, where we have a function updateArticles on MainActor that loads articles from the database and updates the UI for each article.

Each iteration of the loop requires at least two context switches: one from MainActor to database Actor and one back. Let’s take a look at the CPU usage for such a loop.

Because each iteration of the loop requires two context switches, a repeating pattern occurs, with two threads running in quick succession. This may be true if the number of iterations of the loop is small and each iteration is doing a lot of work.

However, the overhead of switching threads can start to increase if the MainActor is frequently bounced around during execution. If your application spends a lot of time on context switching, you should reorganize your code so that MainActor works in batches.

You can do this in batches by pushing loops into loadArticles and updateUI method calls and making sure they process arrays instead of one value at a time. Batch work reduces the number of context switches. While jumping between actors on a cooperative thread pool is quick, you still need to be aware of jumping to and from MainActor when writing your application.

conclusion

Looking back, in this talk you’ve seen how we tried to make the system as efficient as possible, from the design of cooperative thread pools — non-blocking wait mechanisms — to how actors were implemented. At each step, we are using some aspect of the runtime contract to improve the performance of your application. We’re excited to see how you can use these incredible new language features to write clear, efficient, and enjoyable Swift code. Thanks for watching and have a great WWDC.