This article is part of the Java Concurrent Programming series. If you want to learn more, check out the Java Concurrent Programming Overview.

preface

Now that we’ve looked at Java’s memory model, visibility issues and instruction reordering and the coin-before principle, let’s take a look at the keyword volatile. Volatile is the lightweight synchronization implementation provided by Java, but synchronized is often used in development. Volatile was not always properly and completely understood. So let me take a look at volatile with you.

The effect of volatile

Visibility of threads

When a variable is volatile, it is “visible” to all threads. “Visible” means that when one thread changes the value of the variable, the new value is immediately known to other threads. Maybe it’s not easy to understand. If you read the Java Memory model for Concurrent Programming in Java in the previous article, you should be quick to understand. But it doesn’t matter. You should get a sense of it quickly through the following pictures.

We already know that in the Java memory model, memory is divided into working memory for threads and main memory. In the figure above, thread A and thread B take variable A (volatile) from main memory into their working memory. The variable A in the working memory of thread A and thread B is now 12. When thread A changes the value of A to 8, thread A synchronizes the value (A =8) to main memory. This invalidates the cache a variable in thread B (a= 12) and causes thread B to retrieve the new value from main memory (a=8).

The rationale for volatile visibility

As we discussed in the previous article on the Java Memory Model for Concurrent Programming in Java, physical computers deal with cache inconsistencies. The protocol of cache consistency is proposed. The core idea of cache consistency is as follows: When a CPU writes data, if it finds that the variable is a shared variable, that is, a copy of the variable exists in other cpus, it sends a signal to inform other cpus to set the cache line of the variable to invalid state. Therefore, when other cpus need to read the variable, they find that the cache line of the variable is invalid. Then it will re-read from memory.

Since volatile modifiers can be “visible,” volatile must be at the bottom of the hierarchy and must satisfy the cache consistency principle. We don’t need to know anything about assembly language when it comes to low-level assembly, except that the assembly instruction that is generated when volatile is used has one more Lock instruction than the normal variable declaration. So the Lock instruction does two things on a multi-core processor.

  • Write data from the current processor cache row directly to system memory (in the Java memory model, write data from the thread working memory directly to main memory).
  • This write back to memory invalidates the data cached in the other CPU. (From the Java memory model, when thread A synchronizes the data from the working memory (new value) to main memory, thread B’s original value from main memory (old value) is invalid.)

Reordering prevention

Similarly, in the previous article on the Java Memory Model for Concurrent Programming in Java, we mentioned that in order to speed up the processing of data, the CPU reorders instructions that are not data dependent. However, CPU reordering can cause problems for multiple threads. We use the following pseudocodes to illustrate specific problems:

public class Demo {
    private int a = 0;
    private boolean isInit = false;
    private Config config;

    public void init() {
        config = readConfig(); //1 isInit =true; //2 } public voiddoSomething() {
        if (isInit) {//3
            doSomethingWithconfig();//4
        }
    }
}
Copy the code

IsInit is used to indicate whether the initial configuration is complete. Operations 1,2 have no data dependence, and operations 3 and 4 also have no data dependence. Then the CPU(processor) may reorder operations 1 and 2. Reorder operations 3 and 4. Now let’s add thread A to the Init() method and thread B to the doSomething() method, so let’s look at the effect of sorting on multithreading.

In the figure above, operation 2 comes before operation 1. When the CPU time slice goes to thread B. Thread B decides if (isInit) is true and then executes doSomethingWithconfig(), but we haven’t initialized Config yet. So in the case of multiple threads. Reordering affects the execution of the program. So in order to prevent reordering problems. The Java memory model specifies that using volatile to modify variables prevents the CPU from disallowing reordering while processing instructions. The details are shown in the following figure.

public class Demo {
    private int a = 0;
    private volatile boolean isInit = false;
    private Config config;
	 public void init() {
        config = readConfig(); //1 isInit =true; //2 } public voiddoSomething() {
        if (isInit) {//3
            doSomethingWithconfig();//4
        }
    }
}
Copy the code

Volatile prevents reordering rules

So in order to deal with CPU reordering. Java defines the following rules to prevent CPU reordering.

As can be seen from the table above

  • When the second operation is a volatile write, no matter what the first operation is, reordering is not allowed. This ensures that operations before voatile write are not sorted by the compiler after volatile.
  • When the first operation is a volatile read, no matter what the second operation is, it cannot be reordered. This rule ensures that operations after volatile reads are not reordered by the compiler to those before volatile reads.
  • Reorder cannot be performed when the first operation is volatile write and the second operation is volatile read or write.

Volatile prevents reordering

In Java, for volatile variables, the compiler inserts a memory barrier into the instruction sequence to prevent certain types of processor reordering problems when generating bytecode. Before we look at memory barriers, let’s review the previous eight atomic operations that main memory interacts with working memory, because memory barriers primarily limit a few atomic operations on the Java memory model. The specific memory has eight atomic operations, as shown in the figure below:

Java memory model for Java concurrent programming

Two operations involved in memory barriers are explained here:

  • Load: variable applied to working memory, which puts the value of the variable obtained from main memory by the read operation into a copy of the working memory variable.
  • Store: a variable applied to working memory that transfers a variable value from working memory to main memory. For subsequent write operations.
Memory barrier insertion policy

For volatile variables, the compiler conservatively inserts the following memory barriers:

  • Insert a StoreStore barrier before each volatile write.
  • Insert a storeLoad barrier after each volatile write.
  • Insert a loadload barrier after each volatile read.
  • Insert a LoadStore barrier after each volatile read.

Volatile write memory barriers

  • Storestore barrier: for statements like store1; storestore; Store2. Before store2 and subsequent write operations are executed, ensure that the write operations of store1 are visible to other processors. (If a storeStore barrier is present, the store1 instruction must be executed before store2, and the CPU will not reorder store1 and store2.)
  • Storeload barrier: for statements like store1; storeload; Load2, ensure that writes to store1 are visible to all processors before load2 and all subsequent reads are executed. (if the storeload barrier is present, the store1 instruction must be executed before load2 and the CPU will not reorder store1 and load2.)

Volatile reads memory barriers

  • Loadload barrier: for statements such as load1; loadload; Load2: Ensure that the data to be read by load1 is completed before the data to be read by load2 and subsequent read operations are accessed. (If there is a loadload barrier, load1 must be executed before load2, and the CPU will not reorder load1 and load2.)
  • Loadstore barrier: for statements such as load1; loadstore; Store2, ensure that the data to be read by load1 is completed before store2 and subsequent write operations are flushed out. (That is, if there is a loadStore barrier, load1 must be executed before store2 and the CPU will not reorder load1 and store2.)

Compiler memory barrier optimization

We have seen that the compiler inserts different barriers for volatile writes and volatile reads with conservative policies. Now let’s take a look at optimizing the compiler’s use of memory barriers in real development.

public class VolatileBarrierDemo {
    int a;
    volatile int v1 = 1;
    volatile int v2 = 2;

    public void readAndWrite() { int i = v1; // the first volatile read int j = v2; // The second volatile reads a = I + j; // v1 = I + 1; // volatile v2 = j * 2; }}Copy the code

Then for the above code, we generate the corresponding barrier (the picture may not be clear when viewed on the mobile terminal, so it is suggested to watch it on the PC terminal)

Looking at the figure above, we see that the compiler omitted the loadstore barrier on the first volatile read, the loadload barrier on the second volatile read, and the storeload barrier on the first volatile write. In combination with the semantics of the loadstore barrier, loadload barrier, and storeload barrier, we can get the reason for the omission of the following barriers.

  • Omit the loadstore barrier on the first volatile read: Because the next operation on the first volatile read is the second volatile read, there is no write operation (i.e., store) involved. So we can omit it.
  • Omit the loadload barrier under the second volatile read: Because the next operation on the second volatile read is a normal write and does not involve a read (i.e., load). So you can omit
  • Omit the storeload barrier written by the first volatile: Because the next operation written by the first volatile is the write of the second volatile, there is no read operation (i.e., load) involved. So we can omit it.

After the second volatile write, the compiler may not know for sure if there will be any read/write operations, so the compiler usually adds a storeLoad barrier.

Processor memory barrier optimization

We’ve seen that when creating barriers, the compiler will omit unnecessary memory barriers based on the logic of the program. However, because different processors have different memory models of “loose coupling”, the optimization of memory barrier has different optimization methods according to different processors. Take the x86 processor as an example. For the compiler memory barrier optimization diagram we described above. In x86 processors, all barriers are omitted except the last StoreLoad barrier.

X86 processor and other processors of the memory barrier optimization, but described here, interested partners can refer to the relevant information to continue to study.

Precautions for using volatile

When using volatile, it is important to note that volatile only guarantees visibility, not atomicity.

Atomicity definition

In Java, access and read/write operations to variables of basic data types are atomic operations, and they cannot be paused and scheduled in the CPU without being interrupted, completed, or not executed.

It is difficult to understand directly by definition, so let’s take a look at the following example.

x = 10; // statement 1 x++; // statement 2 x = x + 1; / / 3Copy the code

You can guess which of these three statements are atomic. All right. Let me tell you the answer. Only statement 1 is atomic. You’re going to get confused about this.

  • For statement 1: the value 10 is directly assigned to x, i.e. the value 10 is directly assigned to the working memory
  • For statement 2: read the value of x, calculate the value of x plus 1, and assign the calculated value to x.
  • For statement 3: same as statement 2.

For statement 2,3, because there are multiple operations involved, and in the case of multiple threads, the CPU can switch time slices (that is, it can pause after an operation). Thread-safety issues can arise.

Why is volatile not atomic

Having described atomicity, the question is, “What does it matter that volatile is not atomic? The reason is simple: while volatile is visible (meaning that when one thread changes the value of the variable, the new value is immediately known to other threads), there can be multiple operations on the variable, such as x++ mentioned above. So in the case of multiple operations, the CPU can still pause and then schedule. Since volatile can continue dispatching after being paused, it is certainly not atomic.

Usage scenarios for volatile

Volatile is used when volatie is used. Volatie only guarantees visibility, not atomicity. Therefore, volatile is used in lightweight thread synchronization. But there are two conditions:

  • First condition: The result of the operation does not depend on the current value of the variable, or can ensure that only a single thread changes the value of the variable.
  • Second condition: variables do not need to participate in invariant constraints with other state variables.

It may be difficult to understand the above two conditions directly, but the following two conditions are explained separately:

For the first condition

volatile int a = 0; // Error in multithreaded case, correct way in single-threaded case public voiddoSomeThingA() {// In the case of multiple threads, the final value of a depends on the current value of a, a++; } // The correct way to use it is public voiddoSomeThingB() {// There is no thread-safety problem in either single-threaded or multi-threaded situationsif(a==0){ a = 1; }}Copy the code

From the above pseudocode, we can clearly see that volatile variables are true in both multi-threaded and single-threaded situations as long as they do not depend on the result of the operation. Of course, I’m only defining a as an int here, and that applies to the rest of the underlying data.

For the second condition

The second condition can be understood in reverse: volatile variables cannot be included in the invariants of other variables. The following pseudocode provides a counter example:

	private volatile int lower;
    private volatile int upper;  
  
    public void setLower(int value) {   
        if(value > upper) throw new IllegalArgumentException(...) ; lower = value; } public voidsetUpper(int value) {   
        if (value < lower)   
            throw new IllegalArgumentException(...);  
        upper = value;  
    }  
}
Copy the code

In the above code, it is obvious that there is an invariant — the lower bound is always less than or equal to the upper bound (i.e. Lower <=upper). In the case of multiple threads, if two threads execute setLower and setUpper with inconsistent values at the same time, the scope will be in an inconsistent state. For example, if the initial state is (0, 5), and thread A calls setLower(4) and thread B calls setUpper(3) at the same time, both threads will pass the check used to protect the invariant. So that the final range is (4, 3). Obviously this result is wrong.

conclusion

  • Volatile is visible, not atomic, and prevents instruction reordering.
  • Volatile is visible because of the underlying Lock instruction, which writes the current processor cache row directly into system memory, and this write back to memory invalidates data cached at that address by other cpus.
  • Volatile prevents instruction reordering because the Java compiler inserts a memory barrier for volatile modified variables. Memory barriers prevent reordering of instructions by the CPU