The interviewer asked me how Volatile works. From the OS level design back!

Synchronized and volatile both play important roles in multithreaded concurrent programming. Volatile is, so to speak, lightweight synchronized! Volatile ensures visibility of shared variables in multiprocessor development! Also can guarantee in multi-threaded concurrency instruction reordering situation!

What is visibility?

Computer processors don’t interact directly with memory for speed! Instead, the data is read into the internal cache first! After the operation, after the operation to meet certain conditions, the internal cache data will be written into memory! Therefore, multiple threads sharing variables, there may be a dirty read phenomenon, that is, it has changed the data! However, the internal cache of each processor has not been updated, resulting in dirty read phenomenon! Volatile exists to solve this problem! Variables that use volatile declarations write the data on the cached row to memory! At the same time, each processor checks to see if its cache is out of date by sniffing around the bus. Thus ensuring the visibility of data to various threads and processors!

So how does volatile guarantee visibility?

Let’s start with a piece of code!

public class TestVolatile{
    public static volatile int value;
    public static void main(String[] args) {
           int a = 10;
           value = 9; value += a; }}Copy the code

As we know, Java code, when compiled, is compiled into bytecode! The bytecode is eventually loaded into the JVM by the class loader! JVM execution of bytecode also eventually needs to be converted into assembly instructions to run on the CPU! Let’s compile this code into assembly language and see what happens to the volatile variable! Guaranteed visibility!

0x00007f96b93132ed: lock addl $0x0,(%rsp) ; *putstatic value ; - TestVolatile::main@5 (line 6)Copy the code

Focus on Lock addl $0x0,(% RSP) By looking at the IA-32 architecture software developer’s manual, two things happen to instructions with the lock prefix on multi-core processors:

Writes data from the current processor cache row back to system memory
This write back to memory invalidates data from other CPU cached rows

So when this data is modified, it is read from system memory into the cache line again!

The Java programming language allows threads to access a shared variable. To ensure that a shared variable can be updated accurately and consistently, threads should acquire that variable individually using an exclusive lock!

The Lock prefix instruction causes the processor’s Lock # signal to be spoken during the execution of the instruction. In a multiprocessor environment, the LOCK# signal ensures that the processor can monopolize any shared memory while the signal is spoken!

Why can processors monopolize any shared memory?

Because it locks up the bus, other cpus can’t access the bus, which means they can’t access system memory!

Bus locking locks up CPU and memory communication, making it expensive for other processors to manipulate data at other memory locations during the lock, so later cpus provided cache consistency, including Intel’s Pentium 486.

Cache consistency: The overall mechanism of cache consistency is that when one CPU has operated on data in the cache, it notifies other cpus to abandon their internal cache or to read it again from main memory. MESI describes the principle as follows:

MESI protocol: The MESI protocol is named after several states (Modified, Exclusive, Share or Invalid) of a cache line (the basic unit of data cached, typically 64 bytes on Intel cpus). The protocol requires that two state bits be maintained on each cache row, so that each data unit may be in one of four states: M, E, S, and I. The meanings of each state are as follows:

M: It was modified. Data in this state is cached only in the local CPU and not in other cpus. Its state has been modified relative to the value in memory and has not been updated in memory.
E: Exclusive. The data in this state is cached only in the local CPU and is not modified, that is, consistent with the data in the memory.
S: Shared. Data in this state is cached in multiple cpus and is consistent with memory.
I: Invalid. The cache in this CPU is invalid.

A cache row in the M state must always listen for all attempts to read the main memory address of the cache row, and if so, must write the data in the cache row back to the CPU before the operation is performed. A cached row in the S state must always listen for requests to invalidate or monopolize the cached row, and if it does, its cached row state must be set to I. A cache line in the E state must always listen for other attempts to read the main memory address corresponding to the cache line. If it listens, its cache line state must be set to S.

When a CPU needs to read data, if the state of its cache row is I, it needs to read the data from the memory and change its state to S. If it is not I, it can directly read the value in the cache. However, before reading the value, it must wait for the listening result of other cpus. Wait for it to update the cache to memory before reading.

When the CPU needs to write data, it can only do so if its cache line is M Or E. Otherwise, it needs to issue special RFO instructions (Read Or Ownership, which is a bus transaction) to inform other cpus that the cache is invalid (I). In this case, the performance overhead is relatively high. After the write is complete, change its cache state to M.

So if a variable is being changed frequently by only one thread at a time, then it is perfectly possible to use its internal cache without involving bus transactions. If the cache is held exclusively by one CPU and another CPU, then RFO instructions are constantly generated to affect concurrency performance.

In JDK7, Doug Lea, the famous concurrency programmer, added a new set of linked-transferqueues, which he used to optimise volatile in a special way, appending bytes. We may give a detailed explanation later, want to explore him, must explore to the processor hardware configuration! We’ll talk about it when we have time!

A small case of visibility

public class NoVisibility {
    private static boolean ready;
    private static class ReaderThread extends Thread{
        public void run(a){
            while(! ready) { System.out.println(3);
            }
            System.out.println("------------- how do I execute?? -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --"); }}public static void main(String args[]) throws Exception{
        new ReaderThread().start();
        ready=true; }}Copy the code

For the above code, it would normally print 3 all the time, but if a dirty read happens! That is, the cached row data is not updated, so it is possible to execute this code:

System.out.println("------------- how do I execute?? -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --");
Copy the code

What is instruction reorder

When executing a program, the compiler and processor often reorder instructions to improve performance.

Compiler optimized reordering. The compiler can rearrange the execution order of statements without changing the semantics of a single-threaded program.
Instruction – level parallel reordering. Modern processors use Instruction-LevelParallelism (ILP) to execute multiple instructions on top of each other. If there is no data dependency, the processor can change the execution order of the machine instructions corresponding to the statement.
Memory system reordering. Because the processor uses caching and read/write buffers, this makes the load and store operations appear to be out of order.

How does volatile prevent instruction reordering?

Lock addL $0x0,(% RSP); The LOCK instruction acts as a memory barrier, and the instructions before and after the lock will not be reordered.

Memory barrier: THE CPU term is defined as a set of processor instructions that implement sequential restrictions on memory operations!

In hotspot’s source code, the memory barrier is also implemented using this directive, not the mfence directive, which hotspot explains can be very expensive.

Lock addL $0x0 (% RSP) ¶ If the Java interpreter encounters a volatile variable, it will add a lock addL $0x0 (% RSP) instruction to prevent memory reordering.

This may not be easy to understand at first, but let’s give an example:

package com.zanzan.test;

public class TestVolatile {
    int a = 0;
    boolean flag = false;

    public void testA(a){
        //语句1
        a = 1;
        //语句2
        flag = true;
    }

    public void testB(a){
        if (flag){
            a = a + 5; System.out.println(a); }}public static void main(String[] args) {
        TestVolatile testVolatile = new TestVolatile();
        new Thread(new Runnable() {
            @Override
            public void run(a) { testVolatile.testA(); }},"testVolatileA").start();

        new Thread(new Runnable() {
            @Override
            public void run(a) { testVolatile.testB(); }},"testVolatileB").start(); }}Copy the code

The normal result is :6

However, after instruction rearrangement occurs, statement 2 is executed first and the thread time slice is switched after execution. Thread 2 executes testB (), where a = 0 and the result is: 5

This is instruction reorder!

This can be solved with volatile: how?

volatile boolean  flag = false;
volatile int a = 0;
Copy the code

Welcome to pay attention to my public number! Let’s talk big! Pay attention to the public number, reply architect, have all kinds of video materials to provide reference!

The interviewer asked me how Volatile works. From the OS level design back!

What is visibility?

So how does volatile guarantee visibility?

A small case of visibility

What is instruction reorder

How does volatile prevent instruction reordering?

Related Posts

How to write elegant asynchronous code — CompletableFuture

Algorithmic theory + Practical AdaBoost algorithm

Algorithms, 4th Edition by Robert Sedgewick and Kevin Wayne