Volatile and memory barriers

introduce

The purpose of the volatile keyword is to prevent the compiler from making any optimizations to access variables that may be modified in ways the compiler cannot determine.

Variables declared as volatile are not optimized because their values can be changed at any time by code outside the scope of the current code. The system always reads the current value of a variable from memory and does not use the value in a register, even if the previous instruction has just manipulated the data. (Volatile affects much more than whether register values are used or not.)

Application scenarios

A variable should be declared volatile when its value can change unexpectedly. In fact, there are only three cases:

Memory mapped peripheral registers
Global variables modified by interrupt handlers
A shared variable that is accessed by multiple threads

In the first case, the value of the peripheral register can be changed externally at any time, obviously beyond the scope of the code. In the second case, the interrupt handler executes differently than normal programs. When the interrupt arrives, the current thread suspends, executes the interrupt handler, and then resumes the execution of the code. The interrupt handler can be considered parallel to the current program, independent of the normal code execution sequence. The third case, which is more common, is general concurrent programming.

The principle of

The compiler assumes that the only way a variable’s value can change is if it is modified by code.

int a = 24;
Copy the code

The compiler now assumes that the value of a is always 24, unless it encounters a statement that changes the value of a. If there is code following:

int b = a + 3;
Copy the code

The compiler will assume that since it already knows that a has the value of 24, b must have the value of 27, so there is no need to generate the instruction that evaluates a + 3.

If the value of a is changed between two statements, the compilation results in an error. But why is the value of A suddenly changed? It won’t.

If a is a stack variable, its value will not change unless a reference to it is passed. Such as:

doSomething(&a);
Copy the code

The function doSomething has a pointer to A, which means that the value of A can be changed, and the value of A may no longer be 24 after this line of code. If written like this:

int a = 24;
doSomething(&a);
int b = a + 3;
Copy the code

The compiler will not optimize the calculation of a + 3. Who knows what a is after doSomething? The compiler obviously doesn’t know.

For global variables or instance variables of objects, the problem is more complicated. These variables are not on the stack, but in the heap, which means that different threads can access them.

// Global Scope
int a = 0;

void function(a) {
  a = 24;
  b = a + 3;
}
Copy the code

Could B be 27? Probably, but it is also possible, though less likely, that other threads have changed the value of A between statements. Will the compiler be aware of this? It won’t. That’s because C itself doesn’t know anything about threads — at least it didn’t in the past (native threads are finally known in the latest C standard, but until now all threading functionality was provided by apis provided by the operating system, not C itself). So the C compiler still thinks b is 27 and optimizes the calculation away, which results in the wrong result.

This is where volatile comes in. If the flag variable is volatile:

volatile int a = 0;
Copy the code

We tell the compiler that the value of A can change suddenly at any time. For the compiler, this means that it cannot assume the value of A, even if it was the same value 1 picosecond ago, and no code seems to change it. Each time a is accessed, its current value is always read.

The overuse of volatile prevents many compiler optimizations, can significantly slow down computed code, and is often used unnecessarily. For example, the compiler does not make value assumptions across the memory barrier. What a memory barrier is is beyond the scope of this article, except that typical synchronization structures are memory barriers, such as locks, mutexes, or semaphores. For the following code:

// Global Scope
int a = 0;

void function(a) {
  a = 24;
  pthread_mutex_lock(m);
  b = a + 3;
  pthread_mutex_unlock(m);
}
Copy the code

Pthread_mutex_lock is a memory barrier (as is pthread_mutex_unlock), so there is no need to declare A volatile. The compiler does not cross the memory barrier assuming the value of A, ever.

Objective-c is a lot like C in every way, after all, it’s just an extended version of C with a runtime. It is important to note that atomic attributes are memory barriers and therefore do not need to declare volatile for attributes. If you need to access properties in multiple threads, you can declare them atomic (if nonatomic is not declared, they are atomic by default). If you don’t need multiple threads to access the property, marking it as nonatomic makes it faster, but only if the property is accessed frequently (not 10 times a minute, but thousands of times a second).

When does obJ-C code need to use volatile?

@implementation SomeObject { volatile bool done; } - (void)someMethod { done = false; // Start some background task that performes an action // and when it is done with that action, it sets `done` to true. // ... // Wait till the background task is done while (! done) { // Run the runloop for 10 ms, Then check again [[NSRunLoop currentRunLoop] runUntilDate: [NSDate dateWithTimeIntervalSinceNow: 0.01]]. } } @endCopy the code

Without volatile, the compiler might foolishly assume that done won’t change and simply replace done with true, creating an endless loop.

When Apple was using GCC 2.x, the code above did cause an endless loop if it was not volatile (only if the optimized release compilation mode was enabled, debug mode was not). This has not been verified on modern compilers, and perhaps the current version of Clang is smarter. But we obviously can’t count on the compiler to be smart enough to handle this correctly. It also depends on how you start the background task. If you dispatch a block, it’s easy for the compiler to know if done will be changed. If you pass a pointer to done somewhere, the compiler knows that the value of done may change, so it makes no assumptions about its value.

Memory barriers

If you’ve seen apple’s atomic operations provided in the file or in the atomic man page, you may have noticed that there are two versions of each operation: One X and one xBarrier(for example, OSAtomicAdd32 and OSAtomicAdd32Barrier). Now you know that one with “Barrier” in its name is one memory Barrier and another is not.

Memory barriers apply not only to compilers, but also to cpus (some CPU instructions are considered memory barriers). The CPU needs to be aware of these barriers because it reorders instructions so that they are pipelined out of order. Such as:

a = x + 3; / / (1)
b = y * 5; / / (2)
c = a + b; / / (3)
Copy the code

If the pipeline of the adder is busy and the pipeline of the multiplier is idle, the CPU may execute (2) before (1), since the order of execution does not affect the final result. This prevents the pipeline from stagnating. Of course, the CPU is also smart enough to know that (3) cannot execute earlier than (1) or (2), because the results of (3) depend on the results of (1) and (2).

Pipeline, simply speaking, is a CPU core with multiple sets of arithmetic units, each instruction is divided into several stages, and multiple instructions are executed in parallel.

load instruction decode load instruction load data decode load instruction operation load data decode save data operation load data save data operation save data

However, certain types of order changes can break the code or the programmer’s intent. Consider the following code:

x = y + z; / / (1)
a = 1; / / (2)
Copy the code

The adder pipeline is busy, so why not execute (2) before (1)? They have no dependencies, so the order doesn’t matter, right? It depends. Suppose a thread is listening for a, and when a changes to 1, it reads the value of x, which, if executed in order, would be y + z. But if the CPU changes the order of execution, the value of x is the same as before the code was executed, and another thread gets a value that is not what the programmer expects.

In this case, order is important, which is why the CPU also needs barriers: the CPU does not reorder instructions across barriers. Therefore, instruction (2) needs to be a barrier instruction (or a barrier instruction between (1) and (2), depending on the CPU).

Reorder instructions are a feature of modern cpus, but an older problem is deferred memory writes. If the CPU delays writing to memory (which is common for some cpus because memory access is too slow for the CPU), it ensures that all delayed writes are executed and completed before crossing the memory barrier, so that when other threads are accessing, All memory is in the correct state (see where the term “memory barrier” comes from).

Dealing with memory barrier place may be a lot more than we realize (GCD – Grand Central Dispatch is full of memory barriers, and based on the GCD NSOperation/NSOperationQueue), This is why we need to use volatile only in very rare, exceptional cases. Maybe you write 100 apps and you don’t need to use them once. However, if we need to write a lot of low-level, multithreaded code and expect maximum performance, we will sooner or later have to use volatile to ensure that it functions correctly. If volatile is not used in this case, it can lead to endless loops or unexplained problems with incorrect variable values. If such problems are encountered, especially if they occur only in Release mode, they are most likely due to the absence of volatile or memory barriers.

conclusion

To optimize code performance, the compiler by default inferences the value of a variable based on the context of the current code to reduce unnecessary calculations. In the case of single-threaded, normal execution, there should be no problem. But in the case of interrupt handlers, multithreaded concurrency, and memory-mapped I/O, the value of a variable can suddenly be changed outside the scope of the current code, beyond the compiler’s awareness. Therefore, we need to explicitly tell the compiler not to infer the values of these variables because they can be modified at any time by code or hardware outside the scope of the current code.

In addition, the compiler does not infer the value of a variable across the memory barrier. In real programming, many memory barriers are hidden, because common synchronization tools already have memory barriers such as locks, mutexes, and semaphores. The MOST commonly used GCD in iOS parallel programming is full of memory barriers, and the atomic attribute is a memory barrier.

Barriers apply not only to compilers, but also to cpus. Most modern cpus introduce pipelining, which executes multiple instructions in parallel out of order. Barrier instructions are also used when we want to ensure that instructions are executed in order. The CPU does not reorder instructions across barriers.

It is important to note that C itself does not have the concept of threads. Threads are an API provided by the operating system, so the compiler does not assume that global variables can be modified by other threads at any time. Of course, compilers are getting smarter, and the C language itself is evolving. However, let’s not rely on the cleverness of the compiler.

The resources

Understanding “volatile” qualifier in C
Which scenes keyword “volatile” is needed to declare in objective-c?
How to Use C’s volatile Keyword

introduce

Application scenarios

The principle of

Memory barriers

conclusion

The resources

Related Posts

Non-jailbreak iOS code injection &HOOK wechat login

AppStore secrets you definitely didn’t know

Network Requests and Downloads (NSURLSession)