The keyword volatile is one that many of you have probably heard of and probably used. Before Java 5, it was a controversial keyword because its use in programs often led to unexpected results. After Java 5, the volatile keyword found a new lease of life.

The volatile keyword, while relatively simple to understand on the face of it, is not easy to use. Since the volatile keyword is related to Java’s memory model, we will look at the concepts and knowledge related to the memory model before covering the volatile keyword, then examine how it is implemented, and finally show some scenarios where it can be used.

1. Concepts related to memory model

As we all know, when the computer is executing the program, every instruction is executed in the CPU, and in the execution of the instruction process, it is bound to involve the reading and writing of data. Because the program is running in the process of temporary data is stored in main memory of (physical memory), then there is a problem, because the CPU execution speed quickly, and write and read data from memory to memory data process compared to the speed of the CPU executes instructions are much more slowly, so if any time to the operation of the data will be made by interaction and memory, It slows down the execution of instructions, and therefore has a cache in the CPU.

In other words, during the execution of a program, the data needed for an operation is copied from main memory to the CPU’s cache, so that the CPU can read and write data directly from its cache while performing a calculation, and when the operation is complete, the data from the cache is flushed to main memory. Take a simple example, such as this code:

i = i + 1;
Copy the code
When the thread executes this statement, it reads the value of I from main memory, copies a copy to the cache, increments I by one, writes the data to the cache, and flusher the most recent value of I from the cache to main memory.Copy the code

This code runs in a single thread without any problems, but runs in multiple threads without any problems. In multicore cpus, each thread may be running on a different CPU, so each thread has its own cache at run time. (For single-core cpus, this problem also occurs, but it is executed separately in the form of thread scheduling.) In this article, we take multi-core cpus as an example.

For example, if two threads execute this code at the same time, and the initial value of I is 0, we want the value of I to be 2 after both threads finish executing. But will that be the case?

The following scenario may occur: initially, two threads read the value of I and store it in their respective CPUS ‘caches, then thread 1 increments by 1, and writes the latest value of I, 1, to memory. Thread 2’s cache will still have the value of I at 0, so after adding 1, I will be 1, and then thread 2 will write the value of I to memory.

The final value of I is 1, not 2. This is known as the cache consistency problem. Variables that are accessed by multiple threads are commonly referred to as shared variables.

That is, if a variable is cached across multiple cpus (which is typically the case with multithreaded programming), then there may be cache inconsistencies.

In order to solve the cache inconsistency problem, there are generally two solutions:

1) by adding LOCK# to the bus

2) Through the cache consistency protocol

Both methods are provided at the hardware level.

In the early days of cpus, cache inconsistencies were solved by placing a LOCK# lock on the bus. Since the CPU communicates with other components through the bus, locking the bus prevents other cpus from accessing other components (such as memory) so that only one CPU can use the variable’s memory. For example, if a thread is executing I = I +1, and the LCOK# lock is issued on the bus during the execution of this code, then the other CPU can read the variable from the memory of variable I until the code is fully executed, and then perform the corresponding operation. This solves the cache inconsistency problem.

There is a problem with the above approach, however, because during the bus lock, other cpus cannot access the memory, resulting in inefficiency.

Hence the cache consistency protocol. The best known is Intel’s MESI protocol, which ensures that copies of shared variables used in each cache are consistent. Its core idea is: When a CPU writes data, if it finds that the variable is a shared variable, that is, a copy of the variable exists in other cpus, it sends a signal to inform other cpus to set the cache line of the variable to invalid state. Therefore, when other cpus need to read the variable, they find that the cache line of the variable is invalid. Then it will re-read from memory.



Three concepts in concurrent programming

In concurrent programming, we usually encounter three problems: atomicity, visibility, and orderliness. Let’s take a look at these three concepts:

1. The atomicity

Atomicity: An operation or operations are either all performed without interruption by any factor, or none at all.

A classic example is the bank account transfer problem:

For example, transferring 1000 yuan from account A to account B must involve two operations: subtract 1000 yuan from account A and add 1000 yuan to account B.

Imagine what would happen if these two operations were not atomic. Suppose after you subtract $1000 from account A, the operation suddenly stops. In this way, although 1000 yuan was deducted from account A, account B did not receive the 1000 yuan transferred from this account.

So these two operations must be atomic so that there are no unexpected problems.

What happens when you do the same with concurrent programming?

As a simple example, think about what would happen if assignment to a 32-bit variable were not atomic.

i = 9;
Copy the code
If a thread executes this statement, I'll assume for the moment that assigning a 32-bit variable involves two processes: assigning the lower 16 bits and assigning the higher 16 bits.Copy the code

Then it is possible to write a low 16 bit value and suddenly stop while another thread reads the value of I, reading the wrong data.

2. The visibility

Visibility means that when multiple threads access the same variable and one thread changes the value of the variable, other threads can immediately see the changed value.

For a simple example, look at this code:

// thread 1 executes code int I = 0; i = 10; // thread 2 executes code j = I;Copy the code
If thread 1 is executing CPU1, thread 2 is executing CPU2. If thread 1 executes "I =10", the initial value of "I" is loaded into CPU1's cache and then assigned to "10", then the value of "I" in CPU1's cache becomes "10", but is not immediately written to main memory.Copy the code

If thread 2 executes j = I, it will fetch the value of I from main memory and load it into CPU2’s cache. Note that the value of I in memory is still 0, so j will be 0 instead of 10.

This is the visibility problem. After thread 1 makes changes to variable I, thread 2 does not immediately see the value changed by thread 1.

3. The order

Orderliness: that is, the order in which the program is executed is the order in which the code is executed. For a simple example, look at this code:

int i = 0; boolean flag = false; i = 1; // statement 1 flag = true; / / 2Copy the code
The above code defines an int variable, a Boolean variable, and then assigns to both variables. If statement 1 precedes statement 2 in code order, does the JVM guarantee that statement 1 precedes statement 2 when it actually executes this code? Not necessarily. Why? Instruction Reorder can occur here.Copy the code

Explain what is under the instruction reordering, generally speaking, the processor in order to improve the efficiency of the program runs, may optimize the input code, it does not guarantee that all the statements in the program execution of the order the order of the code, but it would ensure that program execution results and the results of the code sequence is the same.

For example, in the code above, it makes no difference which statement 1 or statement 2 executes first, so it is possible that statement 2 executes first and statement 1 follows during execution.

However, it is important to note that although the processor will reorder the instructions, it will guarantee that the final result of the program will be the same as the result of the sequential execution of the code, so how does this guarantee? Here’s another example:

int a = 10; // statement 1 int r = 2; // statement 2 a = a + 3; // statement 3 r = a*a; / / 4

This code has four statements, so one possible order of execution is:


  

Statement 2 statement 1 statement 4 statement 3

This is not possible because the processor considers data dependencies between instructions when reordering. If Instruction 2 must use the results of Instruction 1, the processor guarantees that Instruction 1 will be executed before Instruction 2.

Although reordering does not affect the results of program execution within a single thread, what about multithreading? Here’s an example:

// thread 1: context = loadContext(); // statement 1 inited = true; Statement 2 // Thread 2: while(! inited ){ sleep() } doSomethingwithconfig(context);Copy the code
In the above code, statements 1 and 2 May be reordered because they have no data dependencies. If a reorder occurs, thread 1 executes statement 2 first, thread 2 thinks the initialization is done, and then it breaks out of the while loop and executes the doSomethingwithconfig(Context) method when the context has not been initialized, It will cause the program to fail.Copy the code

As can be seen from the above, instruction reordering does not affect the execution of a single thread, but does affect the correctness of concurrent execution of the thread.

That is, in order for concurrent programs to execute correctly, they must be atomic, visible, and ordered. As long as one of them is not guaranteed, it may cause the program to run incorrectly.

Java memory model

I talked a little bit earlier about the memory model and some of the problems that can arise in concurrent programming. Let’s take a look at the Java Memory model and see what guarantees the Java memory model provides and what methods and mechanisms it provides in Java to ensure correct execution when doing multithreaded programming.

In the Java virtual machine specification, we try to define a Java Memory Model (JMM) to shield the difference of Memory access between different hardware platforms and operating systems, so that the Java program can achieve consistent Memory access effect on various platforms. So what does the Java memory model specify? It defines the rules for accessing variables in a program and, more broadly, the order in which the program is executed. Note that the Java memory model does not restrict the execution engine from using processor registers or caches to speed up instruction execution, nor does it restrict the compiler from reordering instructions in order to achieve better execution performance. That said, there are also cache consistency issues and instruction reordering issues in the Java memory model.

The Java memory model specifies that all variables are stored in main memory (similar to physical memory) and that each thread has its own working memory (similar to caching). All thread operations on variables must be done in working memory, not directly in main memory. And each thread cannot access the working memory of other threads.

To take a simple example: In Java, execute the following statement:

i  = 10;
Copy the code
The executing thread must assign variable I to the cache line in its worker thread before writing to main memory. Instead of writing the number 10 directly to main memory.Copy the code

So what guarantees of atomicity, visibility, and orderliness do the Java language itself provide?

1. The atomicity

In Java, reads and assignments to variables of primitive data types are atomic operations, that is, they are uninterruptible and either performed or not performed.

Although the above sentence seems simple, it is not so easy to understand. Look at the following example I:

Please analyze which of the following operations are atomic operations:

x = 10; // statement 1 y = x; // statement 2 x++; // statement 3 x = x + 1; / / 4Copy the code
At first glance, some friends might say that the operations in the above four statements are atomic operations. In fact, only statement 1 is atomic; the other three statements are not atomic.Copy the code

Statement 1 assigns the value 10 directly to x, meaning that the thread executing this statement writes the value 10 directly to the working memory.

Statement 2 actually contains two operations that read x and then write x to the working memory. Although reading x and writing x to the working memory are atomic operations, they are not atomic operations combined.

Similarly, x++ and x=x+1 involve three operations: reading the value of x, incrementing by 1, and writing the new value.

Therefore, only the operation of statement 1 has atomicity in the above four statements.

That is, only a simple read, assignment (and must be a number assigned to a variable; assignment between variables is not atomic) are atomic operations.

However, there is one thing to note here: on 32-bit platforms, reading and assigning to 64-bit data requires two operations, and atomicity is not guaranteed. But it seems that in the latest JDK, the JVM has guaranteed that reads and assignments to 64-bit data are atomic as well.

As you can see from the above, the Java memory model only guarantees that basic reads and assignments are atomic operations, and that atomicity for broader operations can be achieved through synchronized and Lock. Since synchronized and Lock guarantee that only one thread executes the code block at any one time, atomicity is naturally eliminated and thus guaranteed.

2. The visibility

For visibility, Java provides the volatile keyword to ensure visibility.

When a shared variable is volatile, it guarantees that the value is immediately updated to main memory, and that it will read the new value in memory when another thread needs to read it.

Common shared variables do not guarantee visibility, because it is uncertain when a common shared variable will be written to main memory after modification. When another thread reads a common shared variable, it may still have the old value in memory, so visibility cannot be guaranteed.

Visibility is also guaranteed by synchronized and Lock, which ensure that only one thread at a time acquies the Lock and executes the synchronization code, and that changes to variables are flushed into main memory before the Lock is released.

3. The order

In the Java memory model, the compiler and processor are allowed to reorder instructions, but the reordering process does not affect the execution of a single-threaded program, but affects the correctness of multithreaded concurrent execution.

In Java, you can use the volatile keyword to ensure some “order.” In addition, synchronization and Lock can be used to ensure order. Obviously, synchronized and Lock ensure that one thread executes the synchronized code at each moment, which is equivalent to ordering the thread to execute the synchronized code sequentially, which naturally ensures order.

In addition, the Java memory model has some innate “orderliness”, that is, orderliness that can be guaranteed without any means, which is often referred to as the happens-before principle. If the order of two operations cannot be deduced from the happens-before principle, they are not guaranteed to be ordered, and the virtual machine can reorder them at will.

Here’s a look at the happens-before principle:

Sequence rule: In a thread, in order of code, operations written earlier take place before those written later

Lock rule: An unLock operation occurs first and a subsequent lock operation occurs with the same lock volume

Volatile variable rules: Writes to a variable occur first before reads to that variable occur later

Transfer rule: If operation A precedes operation B and operation B precedes operation C, it follows that operation A precedes operation C

Thread start rule: The start() method of the Thread object occurs first for each action of the Thread

Thread interrupt rule: A call to the threadinterrupt () method occurs before code in the interrupted thread detects the occurrence of an interrupt event

Thread termination rules: All operations in a Thread occur before the Thread terminates. Thread termination can be detected by the end of thread.join () method and the return value of thread.isalive ()

Object finalization rule: The finalization of an object occurs first at the beginning of its Finalize () method

These eight principles are excerpted from Understanding the Java Virtual Machine.

Of these eight rules, the first four are the most important and the last four are obvious.

Let’s explain the first four rules:

My understanding of the program order rule is that the execution of a piece of program code appears to be ordered in a single thread. Note that although this rule states that “operations written first occur before operations written later”, this should mean that the program appears to be executed in code order, since the virtual machine may reorder the program code. Although reordering is performed, the result of the final execution is the same as the result of the sequential execution of the program. It only reorders instructions that do not have data dependence. Therefore, it is important to understand that program execution appears to be ordered in a single thread. In fact, this rule is used to guarantee the correctness of programs executed in a single thread, but not in multiple threads.

The second rule is also easy to understand, which states that a locked lock must be released before it can be locked, whether in a single thread or in multiple threads.

The third rule is a more important one and will be the focus of this article. Intuitively, if a thread writes a variable first and then reads it, the write must precede the read.

The fourth rule actually reflects the transitivity of the happens-before principle.

Fourth, an in-depth look at the volatile keyword

We’ve covered a lot of ground here, but it’s really all about the volatile keyword, so let’s get to the subject.

1. Two semantics of the volatile keyword

Once a shared variable (a member variable of a class, a static member variable of a class) is volatile, there are two levels of semantics:

1) It ensures visibility when different threads operate on the variable, i.e. one thread changes the value of a variable and the new value is immediately visible to other threads.

2) Forbid instruction reordering.

Let’s take a look at some code. If thread 1 executes first and thread 2 executes later:

// thread 1 Boolean stop = false; while(! stop){ doSomething(); } // thread 2 stop = true;Copy the code
This code is a typical piece of code that many people might use when interrupting a thread. But will this code actually work correctly? Is it necessary to interrupt the thread? Not necessarily. Most of the time, this code may interrupt the thread, but there is a chance that it will not interrupt the thread (although this is very unlikely, but once it does, it will cause an endless loop).Copy the code

Here’s how this code might make it impossible to interrupt the thread. As explained earlier, each thread has its own working memory during execution, so thread 1 will make a copy of the value of the stop variable into its working memory when it runs.

So when thread 2 changes the value of the stop variable, but before it can write to main memory, thread 2 moves on to do something else, and thread 1 doesn’t know about thread 2’s change to the stop variable, so it continues to loop.

But using volatile is different:

First, using the volatile keyword forces the modified value to be written to main memory immediately.

Second, using volatile will invalidate the stop line in thread 1’s working memory when thread 2 changes it (or the L1 or L2 line on the CPU).

Third: thread 1 reads stop again from main memory because the cache line of stop is invalid in thread 1’s working memory.

So when thread 2 stop value (of course including two operations, here in modified thread 2 working memory value, then the value of the modified into memory) and make the thread 1 stop working memory cache variable cache line is invalid, then read by a thread 1, found himself the cache line is invalid, it will wait for the cache line corresponding to the main memory address is updated, Then go to the corresponding main memory to read the latest value.

Thread 1 reads the latest correct value.

2. Does volatile guarantee atomicity?

The volatile keyword guarantees visibility, but does volatile guarantee atomicity on variables?

Here’s an example:

public class Test { public volatile int inc = 0; public void increase() { inc++; } public static void main(String[] args) { final Test test = new Test(); for(int i=0; i<10; i++){ new Thread(){ public void run() { for(int j=0; j<1000; j++) test.increase(); }; }.start(); } while(thread.activecount ()>1) thread.yield (); System.out.println(test.inc); }}Copy the code
What is the output of this program? Maybe some of my friends think 10,000. But in fact, running it will find that every time you run it, the result is inconsistent, with a number less than 10,000.Copy the code

The value of inc is increment by volatile, and the value of inc increment by volatile is visible to all other threads. Ten threads each increment by inc 1000 times. So the final value of inc should be 1000*10=10000.

The error is that the volatile keyword guarantees visibility, but the program fails to guarantee atomicity. Visibility only ensures that the latest value is read each time, but volatile does not guarantee atomicity on variables.

As mentioned earlier, increment is non-atomic and involves reading the original value of a variable, incrementing by one, and writing to working memory. That is, the three suboperations of the increment operation may be executed separately, which may result in the following situation:

If at some point the value of inc is 10,

Thread 1 increments the variable. Thread 1 reads the original value of variable inc, and thread 1 blocks.

Then thread 2 variables for the operation, thread 2 also went to read the original value of the variable inc, because thread 1 just to read operation of variable inc, without modify variables, so will not lead to thread 2 working memory cache variable inc cache line is invalid, so the thread 2 will go directly to the value of the main memory read inc When the value of inc is found, 10 is incremented, 11 is written to working memory, and finally to main memory.

Since thread 1 has read the value of inc, notice that the value of inc in thread 1’s working memory is still 10, so thread 1 increments the value of inc to 11, then writes 11 to the working memory, and finally to main memory.

So after the two threads each perform an increment operation, Inc only increases by 1.

If a variable is volatile, the cache row will be invalidated. If a variable is volatile, the cache row will be invalidated. And then some other thread reading it will read the new value, yeah, that’s right. This is the rule for volatile variables in the happens-before rule above, but note that thread 1 does not modify the inc value if it blocks after reading the variable. And while volatile ensures that thread 2 reads inc from memory, thread 1 does not modify it, so thread 2 does not see the changed value at all.

At the root of this is that augmentation is not atomic, and volatile does not guarantee that any operation on a variable is atomic.

Changing the above code to any of the following will do the trick:

USES synchronized:

public class Test { public int inc = 0; public synchronized void increase() { inc++; } public static void main(String[] args) { final Test test = new Test(); for(int i=0; i<10; i++){ new Thread(){ public void run() { for(int j=0; j<1000; j++) test.increase(); }; }.start(); } while(thread.activecount ()>1) thread.yield (); System.out.println(test.inc); }}Copy the code

USES the Lock:

public class Test { public int inc = 0; Lock lock = new ReentrantLock(); public void increase() { lock.lock(); try { inc++; } finally{ lock.unlock(); } } public static void main(String[] args) { final Test test = new Test(); for(int i=0; i<10; i++){ new Thread(){ public void run() { for(int j=0; j<1000; j++) test.increase(); }; }.start(); } while(thread.activecount ()>1) thread.yield (); System.out.println(test.inc); }}Copy the code

USES AtomicInteger:

public class Test { public AtomicInteger inc = new AtomicInteger(); public void increase() { inc.getAndIncrement(); } public static void main(String[] args) { final Test test = new Test(); for(int i=0; i<10; i++){ new Thread(){ public void run() { for(int j=0; j<1000; j++) test.increase(); }; }.start(); } while(thread.activecount ()>1) thread.yield (); System.out.println(test.inc); }}Copy the code

In Java 1.5 Java. Util. Concurrent. Atomic package provides some atomic operation, namely the basic data types on the (1), since (1) operation, reduction and addition operation (plus a number), subtraction operation (subtract a number) for the encapsulation, ensure that these operations are atomic operations. Atomic operations use CAS to perform atomic operations (Compare And Swap). CAS is actually implemented using the CMPXCHG instruction provided by the processor, which is an atomic operation.

3. Does volatile guarantee order?

The volatile keyword mentioned earlier prevents instruction reordering, so volatile ensures some degree of order.

The volatile keyword disallows instruction reordering in two ways:

1) When a program performs a read or write to a volatile variable, all changes to the preceding operation must have occurred and the result must be visible to subsequent operations; The operation behind it has certainly not taken place;

2) During instruction optimization, statements that access volatile variables must not be executed after them, nor must statements that access volatile variables be executed before them.

Here’s a simple example:

//flag is volatile x = 2; //flag is volatile x = 2; // statement 1 y = 0; // statement 2 flag = true; // statement 3 x = 4; // statement 4 y = -1; 5 / / statementCopy the code
Since flag is volatile, instruction reordering does not place statement 3 before statement 1 or 2, nor does it place statement 3 after statement 4 or 5. Note, however, that the order of statements 1 and 2 or 4 and 5 is not guaranteed.Copy the code

And the volatile keyword guarantees that statements 1 and 2 must have completed by the time statement 3 is executed, and that the results of statements 1 and 2 are visible to statements 3, 4, and 5.

So let’s go back to the example we gave earlier:

// thread 1: context = loadContext(); // statement 1 inited = true; Statement 2 // Thread 2: while(! inited ){ sleep() } doSomethingwithconfig(context);Copy the code
In this example, it is possible that statement 2 will execute before statement 1, so that the context is not initialized, and thread 2 will use the uninitialized context, causing the program to fail.Copy the code

This problem does not occur if the inited variable is modified with the volatile keyword, because the context must be initialized by the time statement 2 is executed.

4. Principle and implementation mechanism of volatile

After looking at some of the uses stemming from the volatile keyword, let’s look at how volatile actually ensures visibility and disallows instruction reordering.

The following excerpt is from Understanding the Java Virtual Machine:

“Looking at the assembly code generated with and without volatile, it was found that volatile had an extra lock prefix.”

The LOCK prefix directive actually acts as a memory barrier (also known as a memory fence) that provides three functions:

1) It ensures that instruction reordering does not place subsequent instructions in front of the memory barrier, nor does it place previous instructions behind the memory barrier; That is, by the time the memory barrier instruction is executed, all operations in front of it have been completed;

2) It forces changes to the cache to be written to main memory immediately;

3) If it is a write operation, it invalidates the corresponding cache line in the other CPU.

V. Scenarios where the volatile keyword is used

Synchronized prevents multiple threads from executing a piece of code at the same time, which can affect program execution efficiency. Volatile is better than synchronized in some cases. However, it is important to note that volatile is not a substitute for synchronized. The volatile keyword does not guarantee atomicity. In general, two conditions must be met to use volatile:

1) Write operations to variables do not depend on the current value

2) This variable is not included in invariants with other variables

In effect, these conditions indicate that the valid values that can be written to volatile variables are independent of the state of any program, including the current state of the variables.

In fact, my understanding is that the two conditions above require that the operation be atomic in order for a program that uses the volatile keyword to execute correctly on concurrency.

Here are a few scenarios where volatile is used in Java.

1. Amount of status markers

volatile boolean flag = false; while(! flag){ doSomething(); } public void setFlag() { flag = true; }Copy the code
volatile boolean inited = false; // thread 1: context = loadContext(); inited = true; // thread 2: while(! inited ){ sleep() } doSomethingwithconfig(context);Copy the code

2.double check

class Singleton{ private volatile static Singleton instance = null; private Singleton() { } public static Singleton getInstance() { if(instance==null) { synchronized (Singleton.class) { if(instance==null) instance = new Singleton(); } } return instance; }}Copy the code

This article is from “Reflections on Java Architecture”, a partner of the cloud community. For more information, you can pay attention to “Reflections on Java Architecture”.