1. Hardware memory model

The CPU cache can be divided into level-1 cache (L1), level-2 cache (L2), and level-3 cache (L3) according to the order in which data is read and the tightness of the CPU. The data stored in each level-2 cache is part of the data stored in the next level-2 cache.

1.1. Data loading

1.1.1. Processing process

  • Search level 1 cache, level 2 cache, level 3 cache, and if not, search main memory.
  • The found data is successively loaded into the multi-level cache, and the next time the relevant data is used to look it up directly from the cache.

1.1.2. The cache line

Loading contiguous data in memory is typically 64 bytes in a row, so if you access an array of type LONG (4 bytes), when one of the values in the array is loaded into the cache, the other 7 elements will also be loaded into the cache. This is the concept of “cache rows”.

1.2. Execution process

In order to make full use of the operation unit of the CPU, the CPU is likely to enter the code in order to perform optimization, and then after calculating the out-of-order execution as a result of restructuring, ensure that the results are consistent with the results of the order, but does not guarantee all the statements in the program computing the order input sequence is consistent with the code, therefore, If one computing task depends on the results of another, the sequentiality is not guaranteed by the sequencing of the code.



2. Java memory model

The Java Memory Model (JMM) is a higher level abstraction based on the hardware Memory Model. It shields the difference of Memory access between various hardware and operating systems, so that Java programs can achieve consistent concurrent effect on various platforms. The purpose is to solve the consistency of shared variables in multi-threaded environments.

2.1. Memory division

2.1.1. Memory model

The Java memory model specifies that all variables are stored in main memory, and that each thread has its own working memory (analogous to a CPU’s cache).

  • Working memory holds a copy of the main memory copy of variables used by the thread.
  • The thread must operate on variables in working memory, including reading and assigning values. It cannot read or write variables directly from main memory.
  • Different threads cannot directly access variables in each other’s working memory. The transfer of variable values between threads must be done through main memory.

2.1.2. Model analogy

  • Main memory: Corresponds primarily to hardware memory, the instance portion of an object in the heap
  • Working memory: corresponds primarily to the cache and register parts of the CPU, a portion of the virtual machine stack

2.2. Interaction between main memory and working memory

2.2.1. 8 interactive protocols

  • Lock, a variable applied to main memory that identifies a variable in main memory as a thread-exclusive state;
  • Unlock, a variable that operates on the main memory. It frees a locked variable so that it can be locked by another thread.
  • Read, a variable operating on main memory that transfers a variable from main memory to working memory for subsequent load operations;
  • Load a variable that acts on working memory and places the variables that the read operation takes from main memory into a copy of the variables in working memory.
  • A variable in working memory that passes a variable in working memory to the execution engine. This operation is performed whenever the virtual machine reaches a bytecode instruction that requires the value of the variable to be used.
  • Assign, a working memory variable that assigns a value to a variable received from the execution engine. This operation is used whenever the virtual machine accesses a bytecode instruction that assigns a value to the variable.
  • Store, a variable applied to working memory that passes the value of a variable in working memory to main memory for subsequent write operations.
  • Write, a variable operating on main memory, which puts the value of a variable in main memory from the store operation;

Operations should be sequential, but not necessarily sequential

If you want to copy a variable from main memory to working memory, read and load operations are performed sequentially.

If you want to synchronize a variable from working memory back to main memory, store and write operations are performed sequentially.

For example, accesses to variables A and B in main memory can be performed in the following order:

Read a -> Load A -> read B -> Load B -> load A

2.2.2. Eight basic rules

  • One of the read and load, store and write operations is not allowed to occur separately, that is, it is not allowed to read from main memory but not accepted by working memory, or write back from working memory but not accepted by main memory.
  • It is not allowed for a thread to discard its most recent assign operation. If a variable changes in working memory, it must synchronize the change back to main memory.
  • It is not allowed for a thread to synchronize a variable from working memory back to main memory without a reason (i.e., without an assign operation).
  • A new variable must be created in main memory. It is not allowed to load or assign a variable that has not been initialized in working memory. In other words, the load and assign operations must be performed before the use and store operations on a variable.
  • A variable can be locked by only one thread at a time. However, the lock operation can be performed by the same thread several times. After the lock operation is performed several times, the variable can be unlocked only after the same number of UNLOCK operations are performed.
  • If you execute the lock operation on a variable, the value of the variable will be emptied from the working memory. Before using the variable, you need to load or assign the variable again to initialize the value.
  • It is not allowed to unlock a variable that has not been locked by a lock operation, nor is it allowed to unlock a variable that has been locked by another thread.
  • Before an unlock operation can be performed on a variable, the variable must be synchronized back to the main memory, i.e. store and write operations.

2.3. The consistency

Consistency mainly contains three characteristics: atomicity, visibility and order

2.3.1. Atomicity

Atomicity means that once an operation is started, it will continue to the end without being interrupted by other threads. This operation can be a single operation or multiple operations.

⭐️ Ensures that instructions are not affected by context switching.

For example, if A static global variable int I is assigned by two threads, thread A assigns it the value 1 and thread B assigns it -1. So the value of I can only be 1 or -1. There is no interference between thread A and thread B. That’s the thing about atomicity, it can’t be interrupted.

  • A pure assignment is an atomic operation:a = 1
  • Compound assignment is not atomic:a++
public class ThreadTest {

    private static int count = 0;

    public static void main(String[] args) {
        Thread 1 increments count 5000 times
        Thread thread1 = new Thread(() -> {
            for (int i = 0; i < 5000; i++) count++;
        });
	Thread 2 decrement count 5000 times
        Thread thread2 = new Thread(() -> {
            for (int i = 0; i < 5000; i++) count--; }); thread1.start(); thread2.start(); }}Copy the code
  • The ideal situationAfter the two threads finish runningcount == 0
  • The actual situationAfter the two threads finish runningcount ! = 0

I ++ and I — are not atomic operations in Java. For I ++ (I is a static variable), this actually produces the following JVM bytecode instructions:

getstatic	i	// Get the value of static variable I
iconst_1		// Prepare constant 1
iadd			/ / since the increase
putstatic	i 	// Store the modified value into the static variable I
Copy the code

And the corresponding I — is similar:

getstatic	i 	// Get the value of static variable I
iconst_1		// Prepare constant 1
isub			/ / the decrement
putstatic	i 	// Store the modified value into the static variable I
Copy the code

If a context switch occurs at the same time as the instruction is executed, it is possible to increment and decrement the I! = 0.

2.3.2. Visibility

Visibility refers to whether when one thread changes the value of a shared variable, other threads are immediately aware of the change.

⭐️ Ensure that instructions are not affected by CPU cache.

The Java memory model is implemented by synchronizing changes back to main memory and flushing variable values from main memory before variables are read. It is dependent on main memory, whether common or volatile.

public class ThreadTest {

    static boolean run = true;
    public static void main(String[] args) {
        new Thread(() -> {
            // When run is false, the loop is broken and the thread finishes running
            while (run){}
        }).start();

        Thread.sleep(1000);
        // Change run to false to end the thread
        run = false; }}Copy the code
  • Ideally, the child thread terminates.
  • In reality, the child thread still runs.

Thread 2 performs the run = false operation. However, this change is not visible to thread 1 (thread 1 does not know that thread 2 also modified run), so thread 1 still gets run from the cache (the old data run is true). That’s where the data inconsistency comes in.

【CPU active refresh 】

public class ThreadTest {

    static boolean run = true;
    static int count = 0;

    public static void main(String[] args) throws InterruptedException {
        new Thread(() -> {
            while (run){
                // ① println() is synchronized
                System.out.println();
                // sleep() blocks the thread, leaving the CPU free, so the CPU will flush the thread's working memory
                Thread.sleep(100);
                // ③ count++ will always occupy the CPU, so the CPU will not actively refresh the working memory of the thread
                count++;
            }
        }).start();

        Thread.sleep(1000);
        run = false; }}Copy the code

2.3.3. Order

Order within threads (serial semantics), disorder between threads (” instruction reorder “phenomenon and” working memory and main memory synchronization delay “phenomenon). Therefore, the execution of the program may be out of order in the event of concurrency. It makes intuitive sense that code written earlier might be executed later.

⭐️ guarantees that instructions are not affected by parallel optimization of CPU instructions.

[Command rearrangement]

To improve performance, compilers and processors often reorder instructions in accordance with the as-if-serial semantics.

Divided into 3 types:

  • The compiler optimizes reordering. The compiler can rearrange the execution order of statements without changing the semantics of a single-threaded program.
  • Instruction level parallel reordering. Modern processors use instruction-level parallelism to superimpose multiple instructions. If there is no data dependency, the processor can change the execution order of the machine instructions corresponding to the statement.
  • Memory system reordering. Because the processor uses caching and read/write buffers, this makes the load and store operations appear to be out of order.
int a = 0;

// Thread A may do instruction reordering
a = 1;           // ① flag = true;
flag = true;     // a = 1;

// Thread B if instruction reordering occurs in thread A
if (flag) {      // ③ execute between ① and ②. Check that flag is true
    int i = a;   // if (I = 0, I = 1
}   
Copy the code

Why reorder?

Modern CPU supports multilevel instruction pipeline, for example, support simultaneous instruction fetching – instruction decoding – instruction executing – memory access – data write back processor, can be called five-level instruction pipeline.

At this time, THE CPU can run five different stages of instructions simultaneously in one clock cycle (equivalent to one complex instruction with the longest execution time), IPC=1. In essence, pipeline technology cannot shorten the execution time of a single instruction, but it improves the throughput rate of instruction in a hidden way.

2.4. Principle of order

Against 2.4.1. Happens-before principle

Happens-before means that operation A Happens Before operation B, and the effects of operation A can be sensed by operation B, such as changing the values of variables in the shared memory, sending messages, and invoking methods.

  • The sequential principle of programs: the serialization of semantics within a thread is precisely controlled by flow order rather than code order, because branching, loops, and so on are taken into account.
  • Monitor locking principle: An UNLOCK operation occurs first after a lock operation on the same lock.
  • The volatile rule: Writes to a volatile variable occur first after reads.
  • Thread start rule: for threadsstart()An operation is any operation that takes place in a thread first.
  • Thread termination principle: All operations in a thread occur first when thread termination is detected and can passThread.join(),Thread.isAlive()Is returned to detect whether the thread has terminated.
  • Thread interrupt principle: Thread interrupt (interrupt()Method) precedes the interrupted code and can passThread.interrupted()Detects whether an interruption has occurred.
  • Object finalization rule: Object constructor execution and finalization take precedence overfinalize()Methods.
  • The principle of transitivity: IF A precedes B and B precedes C, then A must precede C.

2.4.2. As if – serial semantics

No matter how reordered, the result of a single thread execution cannot be changed. Operations that have data dependencies are not reordered because this changes the execution result, but operations that have no data dependencies can be reordered.

double p = 3.14;         / / 1
double r = 1.0;          / / 2
double area = p * r * r; / / 3
Copy the code

Steps 1 and 2 have instructions rearranged, but 1 and 2 cannot be rearranged with instructions 3, that is, 3 cannot be executed before steps 1 and 2, otherwise the execution result of the program will be changed.



3. Volatile keyword

zhuanlan.zhihu.com/p/138819184

The volatile keyword guarantees visibility and order, not atomicity.

There are two main functions:

  • Ensuring memory visibility of variables: bus sniffing mechanism.
  • Disallow instruction reordering: memory barriers.
public class VolatileExample {

    public static void main(String[] args) {
        MyThread myThread = new MyThread();
        myThread.start();

        // Main thread execution
        while (true) {
            // Access child thread variables
            if (myThread.flag) {
                // Will never print this sentence
                // Thread changes to shared variables are not immediately updated to main memory,
                // Or the thread fails to immediately synchronize the latest value of the shared variable to the working memory,
                // This causes threads to use the value of a shared variable that is not up to date.
                System.out.println("Main thread accesses flag variable");
            }
// Solution 1: Use synchronizer for locking
// When a thread enters a synchronizer code block, the thread acquires a lock that emptying local memory,
// Then copy the latest value of the shared variable from main memory to local memory as a copy, execute the code,
// The modified copy value is flushed to main memory, and the thread releases the lock.
// synchronized (myThread) {
// if (myThread.isFlag()) {
// system.out. println(" the main thread accesses the flag variable ");
/ /}
/ /}}}private static class MyThread extends Thread {

        public boolean flag = false;
// Solution 2: Use the volatile keyword
// When each thread manipulates a variable, it copies the variable from main memory to local memory as a copy.
// The CPU informs other threads that the copy of the variable is invalid and needs to be read from main memory again.
// private volatile boolean flag = false;

        @Override
        public void run(a) {
            try {
                Thread.sleep(1000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            // Modify variable values
            flag = true;
            System.out.println("Change variable value flag = true"); }}}Copy the code

3.1. Bus sniffing mechanism

3.1.1. Cache Consistency Protocol -MESI

In the early days of cpus, cache inconsistencies were solved by locking the bus directly, but it was inefficient because other cpus could not access the memory during the bus lock.

The best known is Intel’s MESI protocol, which ensures that copies of shared variables used in each cache are consistent.

  1. Modify: When data in a cache row is modified, the cache row is set to the M state.
  2. Exclusive: State E when only one cache row uses the data.
  3. Shared: When other cpus read data into the cache row, all the cache rows holding the data are set to the S state.
  4. Invalid: When a cache row is modified, other cache rows holding the data are set to the I state.

3.1.2. Bus sniffing mechanism

To ensure that the cache of each processor is consistent, the cache consistency protocol is implemented, and sniffing is a common mechanism to achieve cache consistency.

Volatile ensures memory visibility of variables: When a volatile variable is modified by one thread, other threads can see its latest value immediately when it is written back to main memory.

  • [principle] each processor by monitoring the spread of the data on the bus to check their cache value is expired, if the processor found himself cache line corresponding to the memory address modification, will set the current processor cache line invalid state, when the processor to modify the data operation, will be to read the data from main memory to the processor cache.
  • Note that the JVM implements volatile visibility based on the CPU cache consistency protocol, but because of the bus sniffing mechanism, it constantly monitors the bus and can cause bus storms if volatile is used in large numbers. Therefore, the use of volatile needs to be contextually appropriate.

Sniffing mechanism is like a listener

  1. CPU1 reads the dataa=1, CPU1 has data in its cacheaThe cache row is set to the (E) state.
  2. CPU2 also reads, and CPU2 also has dataa=1If the bus detects that CPU1 also has this data, the CPU1 and CPU2 cache rows are set to the (S) state.
  3. CPU1 Modifies dataa=2, CPU1’s cache and main memorya=2, and CPU1’s cache line is set to state (S), the bus sends a notification, and CPU2’s cache line is set to state (I).
  4. CPU2 reads againa, although CPU2 hits the data in the cachea=1, but the status was found to be (I), so the data was directly discarded and the latest data was obtained from the main memory.

3.2. Memory barriers

Volatile guarantees memory order for variables in the thread (prohibiting instruction reordering) :

  • Read barriers: The compiler does not reorder volatile reads and any subsequent memory operations.
  • Write barrier: The compiler does not reorder volatile writes from any previous memory operations.
Resort or not Second operation
Ordinary reading/writing Volatile read Volatile write
First operation Ordinary reading/writing ⭕ ️ ⭕ ️
Volatile read
Volatile write ⭕ ️

3.3. Singleton mode – double-checked lock

public class Singleton {
    // Volatile ensures visibility and disallows instruction reordering
    private static volatile Singleton singleton;

    public static Singleton getInstance(a) {
        // First check: The singleton is not null, so no lock is required to consume excess performance
        if (singleton == null) {
            // Synchronize code blocks to prevent multiple instances during the first initialization
            synchronized(this.getClass()) {
                // Second check: prevent another thread from blocking at the door of the synchronized code block while one thread is creating the instance, resulting in the creation of a new object after releasing the lock
                if (singleton == null) {
                    // Object instantiation is non-atomic: allocate memory -> initialize instance -> return memory address to reference
                    singleton = newSingleton(); }}}returnsingleton; }}Copy the code

The Importance of Volatile

Singleton = new singleton (); Non-atomic operation, corresponding bytecode is:

New // ① create an object, push the object reference dup // ② copy the object reference (reference address) invokeespecial // ③ use an object reference, call the constructor putStatic // ④ Use an object reference, Assign to static SingletonCopy the code

However, if the static Singleton is not volatile, the JVM may rearrange the instructions to swap ③ and ④, i.e. the static Singleton points to an uninitialized object. If (Singleton == null) is false, that is, return an uninitialized singleton.

3.4. Volatile summary

  • The volatile modifier is used when an attribute is shared by multiple threads. One thread modifiers the attribute, and the other threads immediately obtain the value. Or as a state variable, such asflag = tureTo achieve lightweight synchronization.
  • Volatile reads and writes are unlocked and are not a substitutesynchronizedBecause it does not provide atomicity and mutual exclusion. It is low-cost because there is no lock and no time to acquire and release the lock.
  • Volatile can only apply to attributes so that the compiler does not reorder the instructions for that attribute.
  • Volatile provides visibility so that changes made by any thread are immediately visible to other threads. Volatile attributes are not cached by threads and are always read from main memory.
  • Volatile provides a happens-before guarantee that writes to a volatile variable are happens-before any subsequent read to it by any thread.
  • Volatile can make pure assignments atomic, as inboolean flag = true; falg = false
  • Volatile ensures security by enabling visibility and disallowing instruction reordering in singleton double-checks.


4. Atomicity

www.cnblogs.com/chengxiao/p…

Volatile does not guarantee atomicity of operations on data, that is, unsafe threads, and can use locks or atomic classes (such as AtomicInteger).

4.1. Pessimistic Scenario (blocking synchronization)

Every time data is manipulated, it is assumed that other threads will compete to modify the data, so exclusive locking is used to directly lock the data. Only one thread can hold the lock at a time, and other threads block. Thread suspended recovery has a significant performance overhead, and using synchronized or another heavyweight lock to handle it is simply not reasonable.

synchronized(this){
    num++; // Non-atomic operations: read -> add 1-> write
}
Copy the code

4.2. Optimistic Scenario (non-blocking synchronization)

Every time data is manipulated, it is assumed that no other thread will compete to modify it, so no lock is placed. It is best if the operation succeeds, but if it fails, it will not block, and some compensation mechanism can be used (retry repeatedly).

// same num++; Implemented using incrementAndGet of the AtomicInteger class

// AtomicInteger.java
public final int incrementAndGet(a) {
    return U.getAndAddInt(this, VALUE, 1) + 1;
}

// Unsafe.java
@HotSpotIntrinsicCandidate
public final int getAndAddInt(Object o, long offset, int delta) {
    int v;
    // The loop continues trying to update until weakCompareAndSetInt returns true
    do {
        // Get the current value
        v = getIntVolatile(o, offset); 
    } while (
        // Perform atomic update:
        // Check whether the current value is equal to current.
        // If the value is equal, it returns true, meaning that the value has not been modified by other threads and is updated to the target value.
        // If not, return false.! weakCompareAndSetInt(o, offset, v, v + delta) );return v;
}

@HotSpotIntrinsicCandidate
public final boolean weakCompareAndSetInt(Object o, long offset, int expected, int x) {
    return compareAndSetInt(o, offset, expected, x);
}

// Native method: use CAS machine instructions to guarantee atomicity directly
@HotSpotIntrinsicCandidate
public final native boolean compareAndSetInt(Object o, long offset, int expected, int x);
Copy the code