“Five years on the job, and you don’t know the volatile keyword!”

Listening to the architect who had just finished the interview, several other colleagues also participated in the joke.

It is said that the interview in China is “the interview builds the aircraft carrier, the job turns the screw”, sometimes you will PASS because of a question.

How long have you been working? Do you know the volatile keyword?

Today let’s learn the keyword volatile and be a screwman who can build aircraft carriers in an interview!

volatile

The third edition of the Java language specification defines volatile as follows:

The Java programming language allows threads to access a shared variable, and to ensure that the shared variable can be updated accurately and consistently, threads should ensure that the variable is acquired separately through an exclusive lock.

The Java language provides volatile, which in some cases is more convenient than locking.

If a field is declared volatile, the Java thread memory model ensures that all threads see the variable’s value as consistent.

The semantic

Once a shared variable (a member variable of a class, a static member variable of a class) is volatile, there are two levels of semantics:

  1. This ensures visibility when different threads operate on the variable, i.e. when one thread changes the value of a variable, the new value is immediately visible to other threads.

  2. Command reordering is disabled.

  • Pay attention to

If final variables are also declared volatile, this is a compile-time error.

Ps: One means that change is visible, the other means that it never changes. Nature is incompatible with fire and water.

Problem is introduced into

  • Error.java
/ / thread 1
boolean stop = false;
while(! stop){ doSomething(); }/ / thread 2
stop = true;
Copy the code

This code is a typical piece of code that many people might use when interrupting a thread.

Problem analysis

But will this code actually work correctly? Is it necessary to interrupt the thread?

Not necessarily. Most of the time, this code may interrupt the thread, but there is a chance that it will not interrupt the thread (although this is very unlikely, but once it does, it will cause an endless loop).

Here’s how this code might make it impossible to interrupt the thread.

As explained earlier, each thread has its own working memory during execution, so thread 1 will make a copy of the value of the stop variable into its working memory when it runs.

So after thread 2 changes the stop variable, but before it can write to main memory, thread 2 goes to do something else,

Thread 1 is unaware of thread 2’s changes to the stop variable, so it will keep looping.

Using volatile

First, using the volatile keyword forces the modified value to be written to main memory immediately.

Second, using volatile will invalidate the stop line in thread 1’s working memory when thread 2 changes it (or the L1 or L2 line on the CPU).

Third: thread 1 reads stop again from main memory because the cache line of stop is invalid in thread 1’s working memory.

So when thread 2 changes the value of stop (of course, there are two operations, modifying the value of stop in thread 2’s working memory and then writing the value to memory), thread 1 invalidates the cache line of the variable stop in thread 1’s working memory. Then thread 1 reads and finds that its cache line is invalid. It waits for the main memory address corresponding to the cache row to be updated, and then goes to the corresponding main memory to read the latest value.

Thread 1 reads the latest correct value.

Does volatile guarantee atomicity

The volatile keyword guarantees visibility, but does volatile guarantee atomicity on variables?

Problem is introduced into

public class VolatileAtomicTest {

    public volatile int inc = 0;

    public void increase(a) {
        inc++;
    }

    public static void main(String[] args) {
        final VolatileAtomicTest test = new VolatileAtomicTest();
        for (int i = 0; i < 10; i++) {
            new Thread(() -> {
                for (int j = 0; j < 1000; j++) {
                    test.increase();
                }
            }).start();
        }

        // Ensure that all previous threads are finished
        while (Thread.activeCount() > 1) { Thread.yield(); } System.out.println(test.inc); }}Copy the code
  • What is the result?

You might think it’s 10,000, but it’s actually less than that.

why

The value of inc is increment by volatile, and the value of inc increment by volatile is visible to all other threads. Ten threads each increment by inc 1000 times. So the final value of inc should be 1000*10=10000.

The error is that the volatile keyword guarantees visibility, but the program fails to guarantee atomicity.

Visibility only ensures that the latest value is read each time, but volatile does not guarantee atomicity on variables.

  • The solution

Use Lock Synchronized or AtomicInteger

Does volatile guarantee order

The volatile keyword disallows instruction reordering in two ways:

  1. When a program performs a read or write to a volatile variable, all changes to the preceding operation must have occurred and the result must be visible to subsequent operations. The operation behind it has certainly not taken place;

  2. During instruction optimization, statements that access volatile variables cannot be executed after them or statements that access volatile variables cannot be executed before them.

The instance

  • Instance of a
//x and y are non-volatile variables
//flag is volatile
 
x = 2;        //语句1
y = 0;        //语句2
flag = true;  //语句3
x = 4;        //语句4
y = -1;       5 / / statement
Copy the code

Since flag is volatile, instruction reordering does not place statement 3 before statement 1 or 2, nor does it place statement 3 after statement 4 or 5.

Note, however, that the order of statements 1 and 2 or 4 and 5 is not guaranteed.

And the volatile keyword guarantees that statements 1 and 2 must have completed by the time statement 3 is executed, and that the results of statements 1 and 2 are visible to statements 3, 4, and 5.

  • Example 2
/ / thread 1:
context = loadContext();   //语句1
inited = true;             //语句2
 
/ / thread 2:
while(! inited ){ sleep() } doSomethingwithconfig(context);Copy the code

In the previous example, it was possible that statement 2 would execute so long before statement 1 that the context was not initialized, and thread 2 would use the uninitialized context, causing the program to fail.

This problem does not occur if the inited variable is modified with the volatile keyword, because the context must be initialized by the time statement 2 is executed.

Common Usage Scenarios

Volatile, on the other hand, performs better than synchronized in some cases,

Note, however, that volatile is no substitute for synchronized because volatile does not guarantee atomicity.

In general, two conditions must be met to use volatile:

  1. Writes to variables do not depend on the current value

  2. This variable is not contained in an invariant with other variables

In effect, these conditions indicate that the valid values that can be written to volatile variables are independent of the state of any program, including the current state of the variables.

In fact, my understanding is that the two conditions above require that the operation be atomic in order for a program that uses the volatile keyword to execute correctly on concurrency.

Common scenarios

  • State marker quantity
volatile boolean flag = false;
 
while(! flag){ doSomething(); }public void setFlag(a) {
    flag = true;
}
Copy the code
  • Singleton double check
public class Singleton{
    private volatile static Singleton instance = null;
     
    private Singleton(a) {}public static Singleton getInstance(a) {
        if(instance==null) {
            synchronized (Singleton.class) {
                if(instance==null)
                    instance = newSingleton(); }}returninstance; }}Copy the code

JSR – 133

The old Java memory model prior to JSR-133 allowed reordering between volatile variables and normal variables, although reordering between volatile variables was not allowed in the old Java memory model.

In the old memory model, the VolatileExample program may be reordered into the following sequence:

class VolatileExample {
    int a = 0;
    volatile boolean flag = false;

    public void writer(a) {
        a = 1;                      / / 1
        flag = true;                / / 2
    }

    public void reader(a) {
        if (flag) {                / / 3
            int i =  a;            / / 4}}}Copy the code
  • A time line
Timeline: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- > A: (2) write A volatile variable thread; (1) Modify shared variable thread B: (3) read volatile variable; (4) Read shared variablesCopy the code

In the old memory model, when there was no data dependency between 1 and 2, it was possible to reorder between 1 and 2 (3 and 4 were similar).

As A result, when reader THREAD B executes 4, it may not be possible to see the changes that writer thread A made to the shared variable during execution 1.

Thus, in the old memory model, volatile write-reads did not have the memory semantics of monitor freefall.

To provide a more lightweight mechanism for communication between threads than monitor locks,

The JSR-133 Expert Group decided to enhance the memory semantics of volatile:

Strictly restrict compiler and processor reordering of volatile and normal variables to ensure that volatile write-reads have the same memory semantics as monitor liberal-fetch.

The compiler reordering rules and processor memory barrier insertion policies prohibit reordering between volatile and normal variables where the memory semantics of volatile might be broken.

How volatile works

The terms defined

The term The English word describe
Shared variables Shared variables Variables that can be shared between multiple threads are called shared variables. Shared variables include all instance variables, static variables, and array elements. They are stored in heap memory, and volatile only applies to shared variables
The memory barrier Memory Barriers Is a set of processor instructions that implement sequential restrictions on memory operations
The buffer line Cache line The smallest unit of storage that can be allocated in the cache. When the processor fills the cache line, the entire cache line is loaded, requiring multiple main memory read cycles
Atomic operation Atomic operations An operation or series of operations that cannot be interrupted
Cache row padding cache line fill When the processor recognizes that reading operands from memory is cacheable, the processor reads the entire cache line to the appropriate cache (L1, L2, L3, or all)
A cache hit cache hit If the memory location for the cache row fill operation is still the address for the next processor access, the processor reads the operands from the cache, not from memory
Writing a hit write hit When a processor writes an operand back to an area of the memory cache, it first checks to see if the cached memory address is in the cache line. If a valid cache line exists, the processor writes the operand back to the cache, not to memory. This operation is called a write hit
Write a lack write misses the cache A valid cache line is written to a memory region that does not exist

The principle of

So how does volatile guarantee visibility?

Use a tool on an x86 processor to get the assembly instructions generated by the JIT compiler to see what the CPU does with volatile writes.

  • java
instance = new Singleton();// Instance is volatile
Copy the code

Corresponding to the assembly

0 x01a3de1d: movb $0 x0, 0 x1104800 (% esi); 0x01a3de24: lock addl $0x0,(%esp);Copy the code

Volatile shared variables are written with a second line of assembly code. According to the IA-32 architecture software developer’s manual, the lock prefix causes two things on a multicore processor.

  • The data for the current processor cache row is written back to system memory.

  • This write back to memory invalidates data cached in other cpus.

In order to improve processing speed, the processor does not directly communicate with the memory, but first reads the data in the system memory to the internal cache (L1,L2 or other) and then performs operations, but does not know when the operation will be written to the memory.

If you write to a volatile variable, the JVM sends the processor an instruction prefixed with Lock to write the variable’s cached row back to system memory.

But even if the write is back to memory, if the value cached by other processors is still old, performing the calculation will be a problem.

So on a multiprocessor, in order to ensure that each processor cache is consistent, will realize the cache coherence protocol, each processor by sniffing the spread of the data on the bus to check the value of the cache is expired, when the processor found himself cache line corresponding to the memory address has been changed, and will be set for the current processor cache line in invalid state, When the processor attempts to modify the data, it forces the data to be read from system memory back into the processor cache.

Both of these things are detailed in chapter 8 of the Multi-processor Management section of Volume 3 of the IA-32 Software Developer Architecture Manual

The Lock prefix instruction causes the processor cache to be written back to memory

The Lock prefix instruction causes the processor’s Lock # signal to be spoken during the execution of the instruction.

In a multiprocessor environment, the LOCK# signal ensures that the processor can exclusively use any shared memory during the time the signal is spoken. (because it locks the bus, other cpus cannot access the bus, which means they cannot access the system memory), but in recent processors, the LOCK# signal usually does not lock the bus, but locks the cache, which is expensive.

The effect of a lock operation on the processor cache is explained in detail in section 8.1.4. For Intel486 and Pentium processors, LOCK# signals are always spoken on the bus during lock operations.

But in P6 and more recent processors, the LOCK# signal is not said if the accessed memory region is already cached inside the processor.

Instead, it locks the cache of the memory region and writes it back to memory, using a cache consistency mechanism to ensure atomicity. This operation is called “cache locking.” The cache consistency mechanism prevents simultaneous modification of data in the memory region cached by more than two processors.

Writing one processor’s cache back to memory invalidates another processor’s cache

The IA-32 and Intel 64 processors use the MESI (Modify, Exclusive, Share, invalid) control protocol to maintain consistency between internal caches and those of other processors.

When operating on a multi-core processor system, the IA-32 and Intel 64 processors can sniff out other processors accessing system memory and their internal caches.

They use sniffing techniques to keep their internal caches, system memory, and other processors’ caches consistent across the bus.

For example, in the Pentium and P6 family processors, if sniffing one processor detects that another processor intends to write to a memory address that currently handles shared state, the sniffing processor invalidates its cache row and forces cache row filling the next time the same memory address is accessed.

Optimizing the use of volatile

Doug Lea has added a queue collection class LinkedTransferQueue to the JDK7 package. He uses volatile variables as a way to append bytes to improve the performance of dequeuing and enqueuing.

Appending word energy saving optimization performance? This approach may seem like a magic trick, but a deeper understanding of the processor architecture can help explain the mystery.

Let’s start with the LinkedTransferQueue class, which uses an inner class type to define the Head and tail of the queue, The inner class PaddedAtomicReference does only one thing relative to its parent AtomicReference, which is to append the shared variable to 64 bytes.

We can calculate that a reference to an object is 4 bytes, it appends 15 variables for a total of 60 bytes, plus the parent Value variable for a total of 64 bytes.

  • LinkedTransferQueue.java
/** head of the queue */
private transient final PaddedAtomicReference < QNode > head;

/** tail of the queue */

private transient final PaddedAtomicReference < QNode > tail;


static final class PaddedAtomicReference < T > extends AtomicReference < T > {

    // enough padding for 64bytes with 4byte refs 
    Object p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pa, pb, pc, pd, pe;

    PaddedAtomicReference(T r) {

        super(r); }}public class AtomicReference < V > implements java.io.Serializable {

    private volatile V value;

    // omit other codeCopy the code

Why does appending 64 bytes make concurrent programming more efficient?

Because the L1, L2, or L3 caches for Intel Core I7, Core, Atom, and NetBurst, Core Solo, and Pentium M processors are 64 bytes wide, there is no support for partially filling cache rows, which means that if both the head and tail of the queue are less than 64 bytes, Processor will have all read the same cache line, each processor cache under multiple processors, the same end node when a processor is trying to modify the head contacts the whole cache line lock, so in the cache consistency mechanism, under the action of will lead to other processor cannot access their cache in the end node, The queuing and queuing operations need to constantly modify the head contact and tail node, so in the case of multi-processors will seriously affect the queuing and queuing efficiency.

Doug Lea uses an appending to 64 bytes to fill the cache line of the cache buffer, preventing the head and tail nodes from being loaded into the same cache line, so that the head and tail nodes do not lock each other when they are modified.

  • Should 64 bytes be appended whenever Volatile variables are used?

Isn’t.

There are two scenarios in which this should not be used.

First: Processors that cache lines that are not 64 bytes wide, such as the P6 series and Pentium processors, have L1 and L2 cache lines that are 32 bytes wide.

Second: shared variables are not written frequently.

Since appending bytes requires the processor to read more bytes into the cache, this in itself incurs a performance cost. If shared variables are not written frequently, there is very little chance of locking, so there is no need to append bytes to avoid locking each other.

Ps: I suddenly felt that I wanted to specialize in the art industry, and learned and wise are indispensable.

Double /long threads are not safe

One of the many rules defined by the Java Virtual Machine specification is that all operations on primitive types, with the exception of some operations on long and double types, are atomic.

Current JVMS (Java Virtual machines) use 32-bit as atomic operations, not 64-bit.

When a thread reads a long/double value from main memory into thread memory, it may be two 32-bit writes. Obviously, if several threads are working at the same time, it may be two 32-bit errors.

To share long and double fields between threads, you must operate in synchronized or declare them volatile.

summary

Volatile is a very important keyword in the JMM, and is basically a must-ask in high-concurrency interviews.

I hope this article has helped you in your job interview, and if you have any other ideas, please share them in the comments section.

All geeks’ likes, favorites and forwarding are the biggest motivation for Lao Ma to write!

More highlights can be found at 嶶ィ.