Concurrent programming twelve-Java memory model and underlying implementation principles

Introduction: JMM Fundamentals – Computer principles

During interviews, we often confuse the MEMORY model of the JVM with the runtime data area of the JVM.

The Java Memory Model is also known as the Java Memory Model, or JMM. The JMM defines how the Java Virtual Machine (JVM) works in computer memory (RAM). The JVM is the entire computer virtual model, so the JMM belongs to the JVM. It was refactored in Version 1.5 of Java, and Java still uses version 1.5. The problems encountered by the Jmm are similar to those encountered in modern computers.

There are many similarities between the concurrency problem in physical computers and that in virtual machines, and the processing scheme of the concurrency in physical machines is also of great reference significance to the realization of virtual machines.

According to Jeff Dean’s presentation at Google’s All-engineering conference

The response time required by computers to do some of the basic operations that we normally do is different. (The following cases are only illustrative and do not represent the true situation.)

How long would it take to read 1M ints from memory and have the CPU add them up?

Do a simple calculation, 1M data, Java int is 32 bits, 4 bytes, a total of 1024*1024/4 = 262144 integer, then CPU computing time: 262144 0.6 = 157,286 ns while we know that it takes 250,000 ns to read 1 MB from memory (although it’s not a small difference, 100,000 ns is enough time for the CPU to execute nearly 200,000 instructions), but it’s still an order of magnitude. However, the absence of any caching mechanism means that each number needs to be read from memory, which, when added to by the CPU, takes 100 nanoseconds to read from memory. The total number of integers read from memory to CPU plus computation time is 262,144,100 + 250,000 = 26,464,400 nanoseconds. That’s an order of magnitude difference.

In reality, most computing tasks cannot be completed only by the processor “calculation”. The processor must at least interact with memory, such as reading the data of the operation, storing the results of the operation, and so on. This I/O operation is basically impossible to eliminate (it is impossible to complete all the computing tasks by only relying on registers). In early computers, the CPU and memory were about the same speed, but in modern computers, the CPU’s instruction speed is far faster than the memory access speed, because the computer’s storage device and the processor are several orders of magnitude faster. So modern computer systems have to add a buffer between memory and the processor, a Cache that reads and writes as fast as possible as the processor: The data needed for the operation is copied to the cache so that the operation can be performed quickly. When the operation is finished, it is synchronized back from the cache so that the processor does not have to wait for slow reads and writes to memory.

In computer systems, registers are L0 level cache, followed by L1, L2, and L3 (followed by memory, local disk, and remote storage). The higher the cache, the smaller the storage space, the faster the speed, and the higher the cost; The lower you go, the bigger the storage space, the slower the speed, and the lower the cost. From top to bottom, each layer can be regarded as the cache of the next layer, that is, register L0 is the cache of L1, L1 is the cache of L2, and so on. The data at each layer comes from the layer below it, so the data at each layer is a subset of the data at the next layer.

On modern cpus, L0, L1, L2, and L3 are generally integrated into the CPU, and L1 is divided into a Data Cache (D-cache, L1d) and a Instruction Cache (i-cache, L1i). Instruction decoding for storing data and executing data, respectively. Each core has an independent processing unit, controller, register, L1, L2 cache, and then multiple cores of a CPU share the final CPU cache L3

1. Problems with the physical memory model

Cache-based storage interactions are great for resolving processor/memory speed conflicts, but they also add complexity to computer systems because they introduce a new problem: Cache Coherence. In a multiprocessor system, each processor has its own cache, and they share the same MainMemory. When multiple processors’ computing tasks involve the same main memory area, their cache data may be inconsistent.

Modern processors use write buffers to temporarily hold data written to memory. The write buffer keeps the instruction pipeline running and avoids the delays caused by the processor stopping to wait for data to be written to memory. At the same time, the memory bus usage is reduced by flushing the write buffer in batch mode and merging multiple writes to the same memory address in the write buffer. Despite all the benefits of a write buffer, a write buffer on each processor is only visible to the processor on which it resides. This feature has a significant impact on the order in which memory operations are performed: the processor’s read/write operations to memory may not be executed in the same order as the actual read/write operations in memory.

Processor A and processor B perform memory access in parallel, in programmed order, and may end up with x=y=0.

Processors A and B can simultaneously write A shared variable to their own write buffer (steps A1, B1), then read another shared variable from memory (steps A2, B2), and finally flush the dirty data from their own write cache into memory (steps A3, B3). When executed in this sequence, the program can get the result x=y=0.

In terms of the order in which the memory operations actually occur, write A1 is not actually executed until processor A performs A3 to refresh its write cache. Although processor A performs memory operations in the order A1→A2, the actual sequence of memory operations is A2→A1.

If this happens, whose cache data will be used when synchronizing back to main memory? In order to solve the consistency problem, each processor needs to follow some protocols when accessing the cache and operate according to the protocols when reading and writing. Such protocols include MSI, Illinois Protocol (MESI), MOSI, Synapse, Firefly, and Dragon Protocol.

2. Pseudo sharing

As we already know, there are several levels of caching in the CPU. However, in the CPU cache system, the storage is in the unit of cache line. Currently, the Cache Line size of the mainstream CPU Cache is 64Bytes. A Cache Line can be simply understood as the smallest Cache unit in a CPU Cache. Today, cpus no longer access memory in bytes, but in 64-byte chunks, which is called a Cache Line. When you read a particular memory address, the entire cache line is swapped from main memory into the cache.

A cache line can store multiple variables (the number of bytes that fill the current cache line); In the case of multiple threads, if you need to change “variables that share the same cache line”, you will inadvertently affect each other’s performance. This is called False Sharing.

To avoid fake sharing, we can use data filling, where a single CacheLine is filled with data. This is essentially a way of space for time. However, this approach may not work after Java7.

The official solution is already available in Java8, and a new annotation @sun.misc.contended has been added in Java8.

For example, the JDK ConcurrentHashMap is used

Classes with this annotation will automatically complete cache lines. Note that this annotation is invalid by default and only takes effect when -xx: -restrictContEnded is set at JVM startup.

The test code

public class FalseSharing implements Runnable { public final static int NUM_THREADS = Runtime.getRuntime().availableProcessors(); public final static long ITERATIONS = 500L * 1000L * 1000L; private final int arrayIndex; /* Array size and number of cpus are the same */ // / Private static volatilelelong [] longs = new volatileelong [NUM_THREADS]; // private static VolatileLongPadding[] longs = new VolatileLongPadding[NUM_THREADS]; private static VolatileLongAnno[] longs = new VolatileLongAnno[NUM_THREADS]; Static {/* initialize array */ for (int I = 0; i < longs.length; i++){ longs[i] = new VolatileLongAnno(); } } public FalseSharing(final int arrayIndex){ this.arrayIndex = arrayIndex; } public static void main(final String[] args) throws Exception{ final long start = System.nanoTime(); runTest(); System.out.println("duration = " + (System.nanoTime() - start)); } private static void runTest() throws InterruptedException{/* Create threads with the same number of cpus. */ Thread[] Threads = new Thread[NUM_THREADS]; for (int i = 0; i < threads.length; i++){ threads[i] = new Thread(new FalseSharing(i)); } for (Thread t : threads){ t.start(); } /* wait for all threads to complete */ for (Thread t: threads){t.thread (); Public void run(){long I = ITERATIONS + 1; while (0 ! = --i){ longs[arrayIndex].value = i; } } public final static class VolatileLong { public volatile long value = 0L; } // Long padding-avoiding false sharing // Long padding-avoiding false sharing Public final static class volatility lelongPADDING {public long P1, p2, p3, P4, P5, P6, P7; public volatile long value = 0L; volatile long q0, q1, q2, q3, q4, q5, q6; } /** * jdk8 new feature, Contended comment avoids false sharing * Restricted on user classpath * Unlock: -XX:-RestrictContended */ @sun.misc.Contended public final static class VolatileLongAnno { public volatile long value = 0L; }Copy the code

In a class, there is only one variable of type long:

When you define an array of the type VolatileLong and have multiple threads concurrently access the array, you can imagine that when multiple threads process data simultaneously, multiple VolatileLong objects in the array may exist in the same cache row.

After running, you can get the running time

It took more than 39 seconds.

We use variables that have been filled with cache lines instead

It took 8.1 seconds. If any comment is filled above or below any line, the time is inconsistent, ranging from 8 seconds to 20 seconds, but still faster than no padding. The cause is unknown.

Use the annotated variables again, and add the -xx: -restrictContEnded parameter

It took 7.7 seconds.

The above experimental results show that pseudo-sharing does affect the performance of applications.

3. Java Memory Model (JMM)

From an abstract point of view, the JMM defines an abstract relationship between threads and Main Memory: Shared variables between threads are stored in Main Memory, and each thread has a private Local Memory that stores copies of shared variables that the thread reads/writes to. Local memory is an abstract concept of the JMM and does not really exist. It covers caching, write buffers, registers, and other hardware and compiler optimizations.

4. Problems with the Java memory model

4.1 Visibility issues

The thread running in the left CPU copies the shared object obj from main memory to its CPU cache, changing the count variable of the object obj to 2. But this change is not visible to the thread running in the CPU on the right because the change has not been flushed into main memory.

In a multi-threaded environment, if a thread reads a shared variable for the first time, it first obtains the variable from main memory, then stores it in working memory, and then only needs to read the variable from working memory. Similarly, if the variable is modified, the new value is first written to working memory and then flushed to main memory. It is not certain when the latest value will be refreshed into main memory. Generally it will be soon, but the exact time is unknown.

To solve the problem of shared object visibility, we can use the volatile keyword or lock it.

4.2 Competition

Thread A and thread B share an object, obj. Suppose thread A reads the obj. count variable from main memory into its CPU cache, while thread B reads the obj. count variable into its CPU cache, and both threads increment obj. count by 1. At this point, the increment of Obj. Count has been performed twice, but in different CPU caches.

If the two increments are performed sequentially, the obj. count variable will be incremented by 2 to its original value, and the final obj. count value in main memory will be 3. However, the two increments are parallel. Whether thread A or thread B flush the result into main memory first, the obj. count in main memory is incremented only once to 2, even though there are two increments. To solve the above problem we can use Java synchronized code blocks

4.3 reorder

4.3.1 Reorder the type

In addition to the problems of shared and working memory, there is the issue of reordering: when executing a program, compilers and processors often reorder instructions to improve performance. There are three types of reordering.

1) Reorder of compiler optimization. The compiler can reorder the execution of statements without changing the semantics of a single-threaded program.

2) instruction level parallel reorder. Modern processors use Instruction-LevelParallelism (ILP) to overlap multiple instructions. If there is no data dependency, the processor can change the order in which statements are executed to the corresponding machine instructions.

3) Reorder the memory system. Because the processor uses caching and read/write buffers, it can appear that load and store operations are being performed out of order.

4.3.2 Data dependency

Data dependency: If two operations access the same variable, and one of the two operations is a write, there is a data dependency between the two operations. There are three types of data dependencies. In all three cases, the results of a program can be changed simply by reordering the execution order of two operations.

Such as:

Obviously, A and C have data dependence, B and C also have data dependence, but there is no data dependence between A and B. If the execution order of A and C or B and C is reordered, the execution result of the program will be changed.

Obviously, no matter how to reorder, the code must be guaranteed to run correctly under a single thread, even under a single thread can not be correct, let alone discuss the concurrent situation of multiple threads, so we put forward a concept of as-if-serial.

4.3.3 the as – if – serial

The as-if-serial semantics mean that the execution result of a (single-threaded) program cannot be changed no matter how much reordering is done (by the compiler and processor to improve parallelism). The compiler, Runtime, and processor must all follow the AS-if-serial semantics.

To comply with as-if-serial semantics, compilers and processors do not reorder operations that have data dependencies, because such reordering changes the execution results. (For the record, data dependencies are only for sequences of instructions executed in a single processor and operations performed in a single thread. Data dependencies between different processors and between different threads are not considered by the compiler or processor.) However, operations can still be reordered by the compiler and processor if there are no data dependencies between them.

There is A data dependency relationship between A and C, as well as between B and C. Therefore, in the final sequence of instructions executed, C cannot be reordered before A and B (the result of the program will be changed if C is arranged before A and B). However, there is no data dependency between A and B, and the compiler and processor can reorder the execution order between A and B.

The as-if-serial semantics protect single-threaded programs. Compilers, runtime, and processors that follow the AS-IF-serial semantics give us the sense that single-threaded programs appear to execute in program order. The ASif-serial semantics let single-threaded programmers not worry about reordering interfering with them, nor do they have to worry about memory visibility.

4.3.4 Control dependencies

In the above code, the flag variable is a flag that indicates whether a has been written. In the use method, the assignment of the variable I depends on the judgment of if (flag), which is called the control dependency. If a reorder occurs, the result is incorrect.

Looking at the code, we can see,

Operations 1 and 2 have no data dependencies and can be reordered by the compiler and processor. Similarly, operations 3 and 4 have no data dependencies and can be reordered by the compiler and processor. Operations 3 and 4 have what is called a controlling dependency.

In a program, when there are control dependencies in the code, it will affect the degree of parallelism of instruction sequence execution. To do this, the compiler and processor use Speculation to overcome the effect of controlling correlation on parallelism. In the case of the processor’s guess execution, the processor executing thread B can read and calculate a*a in advance, and then temporarily save the result to a hardware cache called a Reorder Buffer (ROB). When the condition of Operation 3 is true, the result of the calculation is written to variable I. Guess execution essentially reorders operations 3 and 4, but the problem is that the value of a has not yet been assigned to thread A.

In a single-threaded program, reordering operations that have control dependencies does not change the result of execution (which is why as-if-serial semantics allow reordering operations that have control dependencies).

But it’s A different story for multiple threads: let’s say there are two threads, A and B. A first executes the init () method, and B then executes the use () method. While thread B is performing operation 4, can we see thread A writing to shared variable A on operation 1? The answer is: not necessarily.

Let’s first look at what might happen when operations 1 and 2 are reordered, and operations 3 and 4 are reordered. Operation 1 and Operation 2 are reordered. When the program executes, thread A first writes the flag variable, and thread B then reads the variable. Since the condition is true, thread B will read the variable A. At this point, the variable a has not been written by thread A, and an error occurs!

Therefore, in a multithreaded program, reordering operations that have control dependencies may change the execution result of the program.

4.3.4 Memory Barrier

The Java compiler inserts memory barrier instructions at the appropriate points in the generated instruction sequence to prevent certain types of processors from reordering, allowing the program to execute as expected.

1. Ensure the order in which certain operations are executed.

2. Affects the memory visibility of certain data (or the execution result of an instruction).

The compiler and CPU can reorder instructions to ensure the same result, trying to optimize performance. Inserting a Memory Barrier tells the compiler and the CPU that no instruction can be reordered with that Memory Barrier.

Another thing Memory barriers do is force spawns of various CPU caches. for example, a write-barrier will spawn all data that was written to the cache before the Barrier, so that any thread on the CPU can read the latest version of that data.

The JMM divides memory barrier instructions into four categories

StoreLoad Barriers is an “all-purpose” barrier that has the effect of the other three Barriers. Most modern multiprocessors support this barrier (other types of barriers are not necessarily supported by all processors).

4.3.5 critical region

The JMM does some special processing at the critical time points of exit and entry to make the threads execute in a certain order at these two points.

Code in a critical section can be reordered (but the JMM does not allow code in a critical section to “escape” outside the critical section, which would break the semantics of the monitor). Although thread A does A reorder in the critical section, thread B here simply cannot “observe” the reorder of thread A in the critical section due to the mutex execution of the monitor. This kind of reordering not only improves the execution efficiency, but also does not change the execution result of the program.

Think back to why the usual double-checking in thread-safe singletons doesn’t guarantee true thread-safety?

4.3.6 happens-before

To help you understand this concept of memory visibility, the proposed Java specification proposes the happens-before concept to explain memory visibility between operations. For Java programmers, understanding happens-before is key to understanding the JMM.

The reason why the JMM does this is that the programmer does not care whether the two operations are actually reordered. The programmer cares that the semantics of the program execution cannot be changed (that is, the result of execution cannot be changed). Thus, happens-before relationships are essentially the same thing as as-if-serial semantics. The as-if-serial semantics guarantee that the execution result of a single-threaded program will not be changed, and the happens-before relationship guarantees that the execution result of a properly synchronized multithreaded program will not be changed.

define

The concept of happens-before is used to illustrate memory visibility between operations. In the JMM, if the result of an operation’s execution needs to be visible to another operation, there must be a happens-before relationship between the two operations.

A happens-before relationship between two operations does not mean that the previous operation has to be executed before the latter! Happens-before simply requires that the previous action (the result of an execution) be visible to the second, and that the first is visible to and ordered before the second.

Deepen the understanding

The above definition may seem contradictory, but in fact it stands in a different light.

1) From a Java programmer’s point of view: the JMM guarantees that if one action happens before another, the result of the first action will be visible to the second, and the first action will be executed before the second.

2) From the perspective of the compiler and processor: the JMM allows a happens-before relationship between two operations. It does not require implementations of the Java platform to execute in the order specified by the happens-before relationship. This reordering is allowed if the result of the reordering is the same as the result of the happens-before relationship.

To review our previous code with data dependencies:

From our perspective as Java programmers:

However, after careful investigation, 2 and 3 are required, while 1 is not. Therefore, THE treatment of these three happens-before relationships by JMM falls into two categories:

1. Reorder of program execution results that will change

2. Reorder without changing the results of program execution

The JMM uses different strategies to reorder these two different properties, as follows:

1. The JMM requires compilers and processors to prohibit reordering that changes the results of program execution.

2. The JMM has no compiler or processor requirements for reordering programs that do not change the results of program execution.

Therefore, from the perspective of our programmers, it seems that these three operations satisfy the happens-before relationship, while from the perspective of the compiler and processor, they are reordered, and the ordered execution results also satisfy the happens-before relationship.

Happens-before rules

The JMM provides us with the following happens-before rules:

1) Rule of program sequence: Every action in a thread happens before any subsequent action in that thread.

2) Monitor lock rule: the unlocking of a lock happens before the subsequent locking of that lock.

3) The volatile variable rule: a write to a volatile field happens before any subsequent reads to that field.

4) Transitivity: If A happens-before B, and B happens-before C, then A happens-before C.

5) start() rule: if thread A performs the operation threadb.start () (which starts ThreadB), then thread A’s threadb.start () action happens before any action in ThreadB.

6) the join() rule: if thread A performs the operation threadb.join () and returns successfully, then any operation in ThreadB happens-before thread A returns successfully from the operation threadb.join ().

7) The thread interrupt rule: The call to the thread interrupt method happens before the code of the interrupted thread detects the occurrence of an interrupt event.

5, volatile

5.1 volatile features

This chapter focuses on the implementation and principles of volatile. And some questions.

Single reads and writes on volatile variables can be thought of as synchronous reads and writes using the same lock

Can be seen as

Therefore, volatile variables themselves have the following properties:

Visibility. A read of a volatile variable will always see the last write (by any thread) to that volatile variable.

Atomicity: Reads and writes on any single volatile variable have atomicity, but compound operations such as volatile++ do not have atomicity.

5.2 Memory semantics of volatile

Memory semantics: It can be simply understood that volatile, Synchronize, atomic, and lock are implemented in the MEMORY aspects of the JVM.

The memory semantics of volatile writes are as follows: When a volatile variable is written, the JMM refreshes the shared variable value in the thread’s local memory to main memory.

The memory semantics of volatile reads are as follows: When a volatile variable is read, the JMM invalidates the thread’s local memory. The thread next reads the shared variable from main memory.

So for the code

If we modify the flag variable with the volatile keyword, then in effect: after thread A writes the flag variable, the values of the two shared variables in local memory A that were updated by thread A are flushed into main memory.

After reading the flag variable, the value contained in local memory B has been set to invalid. At this point, thread B must read the shared variable from main memory. A read from thread B causes the value of the shared variable in local memory B to be consistent with that in main memory.

If we combine the volatile write and volatile read steps, after reading A volatile variable by thread B, all values of the shared variable that were visible to thread A before writing the volatile variable will immediately become visible to thread B.

5.3 Why is Volatile Not Thread-safe

5.4 Implementation of volatile memory semantics

Volatile reorder table

To sum up:

When the second operation is a volatile write, no reorder is allowed regardless of the first operation. This rule ensures that operations before volatile writes will not be reordered by the compiler after volatile writes.

When the first operation is a volatile read, no reorder is allowed regardless of the second operation. This rule ensures that operations after volatile reads will not be reordered by the compiler to before volatile reads.

When the first operation is a volatile write and the second is a volatile read, reorder is not allowed.

Memory barriers for volatile

In Java, when generating bytecode for volatile variables, the compiler inserts a memory barrier in the instruction sequence to prevent certain types of handler reordering problems.

Volatile write

Storestore barrier: For statements such as store1; storestore; Store2. Before store2 and subsequent writes, ensure that writes to store1 are visible to other processors. (That is, if storeStore barriers occur, store1 must be executed before store2, and the CPU will not reorder store1 and store2.)

Storeload barrier: For statements such as store1; storeload; Load2, before load2 and all subsequent reads are performed, ensure that the write to Store1 is visible to all processors. (That is, if the storeLoad barrier is present, then store1 must be executed before load2, and the CPU will not reorder store1 and load2.)

Volatile read

Insert a LoadLoad barrier after each volatile read. Insert a loadStore barrier after each volatile read operation.

 Loadload barrier: For statements like this load1; loadload; Load2, ensure that the data to be read by load1 is read before the data to be read by load2 and subsequent read operations are accessed. (That is, if a loadload barrier occurs, load1 must be executed before load2, and the CPU will not reorder load1 and load2.)

 LoadStore barrier: For statements like this load1; loadstore; Store2, ensure that the data to be read by load1 is finished before store2 and subsequent writes spawn out. (That is, if a loadStore barrier occurs, load1 must be executed before store2, and the CPU will not reorder load1 and store2)

5.5 Implementation Principles of Volatile

Looking at the unsafe.cpp source code in the OpenJDK, you’ll see that the variable modified by the volatile keyword has a “lock:” prefix.

Lock prefix. Lock is not a memory barrier, but it can act like a memory barrier. A Lock locks the CPU bus and cache and can be understood as a Lock at the instruction level of the CPU.

This instruction also writes the current processor cache row directly to system memory, and this writing back to memory invalidates data cached at that address in other cpus.

In the specific execution, it first locks the bus and cache, then executes the following instructions, and finally releases the lock to refresh all the dirty data in the cache back to the main memory. When a Lock locks the bus, read and write requests to other cpus are blocked until the Lock is released.

6. Final memory semantics

When we construct threaded classes, we have a way to make all member variables in the class immutable, using the final keyword, so why is this final possible? Reordering is also an optimization action for constructors. This means that the JMM must have done something with the final keyword on a member variable.

6.1 Two reordering rules for final

For final fields, the compiler and processor need to follow two reordering rules. We take the code cn. Enjoyedu. Ch9. Semantics. FinalMemory to illustrate

/** * public class FinalMemory {int I; // Final int j; Static FinalMemory obj; Public FinalMemory() {// constructor I = 1; // Write the common domain j = 2; Public static void writer() {obj = new FinalMemory(); } public static void reader() {// FinalMemory object = obj; Int a = object.i; Int b = object.j; // Read the final field}}Copy the code

Let’s assume that one thread, A, executes the Writer method, and then another thread, B, executes the reader method.

1. There is no reorder between the write to a final field in the constructor and the subsequent assignment of a reference to the constructed object to a reference variable.

Obj = new FinalMemory(); obj = new FinalMemory(); . This line of code consists of two steps:

Construct an object of type FinalMemory.

Assign a reference to this object to the reference variable obj.

Suppose thread B reads the object reference (FinalMemory Object = obj) and the member field of the read object (int a = object.i; Int b = object.j) no reordering, the following diagram is a possible execution sequence:

As we can see from the above possible sequence diagram, the write normal field was reordered outside the constructor by the compiler, and the reader thread B mistakenly read the value of the normal variable I before it was initialized. While writing to the final field is “restricted” to the constructor by the reorder of writing to the final field, the reader thread B correctly reads the initialized value of the final variable.

Conclusion: The reordering of write final fields ensures that the final field of an object has been properly initialized before the object reference is visible to any thread, whereas normal fields do not have this guarantee.

2. There is no reorder between the first read of a reference to an object containing the final field and the subsequent first read of the final field

In a thread, the first read of an object refers to the final field that the object contains, and the JMM prohibits processors from reordering both operations. The compiler inserts a LoadLoad barrier before reading final field operations.

The reader() method consists of three steps:

The reference variable obj is first read

The first read reference variable obj points to the common field I of the object

The first read reference variable obj points to the final field j of the object

Assuming that no reordering occurs in writer thread A, the following is A possible sequence:

Operations in the common domain of the read object are reordered by the handler to precede the read object reference. When reading A normal field, the field has not been written by thread A, so the above is an incorrect read operation. However, after the reordering rules of the read final field “qualify” the read object reference for the operation in the final field of the read object, the final field has been initialized by the A thread and is A correct read operation.

Conclusion: The reordering rules for reading final fields ensure that the reference to the object containing the final field is read before the final field of an object is read.

6.2 The final domain is a reference type

Public class FinalRefMemory {final int[] intArray; Static FinalRefMemory obj; Public FinalRefMemory() {// constructor intArray = new int[1]; // 1 intArray[0] = 1; // 2} public static void writerOne() {obj = new FinalRefMemory(); // 3} public static void writeTwo() {// write thread B executes obj. IntArray [0] = 2; // 4} public static void reader() {if (obj! = null) { // 5 int temp1 = obj.intArray[0]; // 6}}}Copy the code

In the above code, the final field is a reference type, which refers to an array of type int. For reference types, the reordering rules for writing final fields impose the following constraints on the compiler and processor: There is no reorder between writing to the member field of an object with a final reference inside the constructor and assigning the reference to the constructed object to a reference variable outside the constructor.

Let’s assume that thread A executes write0, thread B executes write1, and thread C executes Reader. Here is A possible execution sequence:

1 is a write to the final field, 2 is a write to the member field of the object referenced by the final field, and 3 is assigning a reference to the constructed object to a reference variable. In addition to the above mentioned 1 and 3 cannot be reordered, 2 and 3 cannot be reordered.

The JMM ensures that reader thread C at least sees writer thread A write to the member field of the final reference object in the constructor. That is, C can at least see that the array index 0 has a value of 1. While writer thread B may or may not see a write to an element of the array by reader thread C. The JMM does not guarantee that the writes of thread B are visible to the reader thread C, because there is a data race between the writer thread B and the reader thread C, and the execution result is unpredictable.

If you want to ensure that the reader thread C sees the writes to the array elements by the writer thread B, you need to use synchronization (lock or volatile) between the writer thread B and the reader thread C to ensure memory visibility.

6.3 Final references cannot escape from within constructors

Reordering of write final fields ensures that the final field of the object to which the reference is pointing has been properly initialized in the constructor before the reference variable is visible to any thread. For this to work, there is a guarantee that references to the constructed object cannot be seen by other threads inside the constructor, i.e. object references cannot escape in the constructor.

/** * public class FinalEscape {final int I; static FinalEscape obj; public FinalEscape() { i = 10; // Write final obj = this; Public static void writer(){new FinalEscape(); } public static void reader(){ if(obj! =null){ //3 int temp = obj.i; / / 4}}}Copy the code

Suppose one thread A executes writer() and the other thread B executes reader(). Operation 2 here makes the object visible to thread B before construction is complete. Even though operation 2 here is the last step in the constructor, and operation 2 comes after operation 1 in the program, the thread executing the read() method may not be able to see the initialized value of the final field because operation 1 and operation 2 May be reordered.

Therefore, a reference to the constructed object cannot be seen by another thread until the constructor returns, because the final field may not have been initialized.

6.4 Implementation of final semantics

The compiler is required to insert a StoreStore barrier after a write to the final field and before the return constructor.

Reordering of read final fields requires the compiler to insert a LoadLoad barrier before reading final fields

7. Memory semantics of locking

7.1 Locked memory semantics

When a thread releases a lock, the JMM refreshes the shared variables in local memory corresponding to that thread into main memory.

When a thread acquires a lock, the JMM invalidates the local memory for that thread. Thus the critical section code protected by the monitor must read shared variables from main memory.

Let’s start with a program

/** * public class SynMemory {private static Boolean ready; private static int number; private static class PrintThread extends Thread{ @Override public void run() { while(! ready){ } System.out.println("number = "+number); } } public static void main(String[] args) { new PrintThread().start(); SleepTools.second(1); number = 51; ready = true; SleepTools.second(5); System.out.println("main is ended!" ); }}Copy the code

It turns out that the above thread keeps looping, even if the ready value is changed in the main method, until the program ends.

But if we add a print to the loop. What happens in execution?

while(! ready){ System.out.println("number = "+number); }Copy the code

The added print found that the value read by the thread had been modified, and the execution was terminated. Print out the value of number. Why? If we click on println we can see that

/** * Prints a String and then terminate the line. This method behaves as * though it invokes <code>{@link #print(String)}</code> and then * <code>{@link #println()}</code>. * * @param x The <code>String</code> to be printed. */ public void println(String x) { synchronized (this) { print(x); newLine(); }}Copy the code

Combined with the memory semantics of the previous lock, we know that when entering a synchronized block, the child thread is forced to read shared variables from main memory, including the ready variable, so the child thread is also aborted.

7.2 Implementation principles of synchronized

Implementations of Synchronized in the JVM are based on entering and exiting Monitor objects for method synchronization and code block synchronization. Although implementation details vary, they can be implemented with pairs of MonitorEnter and MonitorExit directives.

For a synchronized block, the MonitorEnter instruction is inserted at the beginning of the synchronized block. When the code executes to this instruction, it attempts to acquire ownership of the object Monitor, attempting to acquire a lock on the object. The monitorExit instruction is inserted at the end of the method and at the exception. The JVM guarantees that each MonitorEnter must have a corresponding MonitorExit.

For the synchronized methods, according to the results of decompiling the synchronized methods, the synchronization of the methods is not implemented by the monitorenter and monitorexit commands, and the constant pool of the methods is more ACC_SYNCHRONIZED than that of the ordinary methods.

The JVM uses this identifier to synchronize methods: When a method is called, the calling instruction checks to see if the ACC_SYNCHRONIZED access flag is set. If it is, the executing thread will first acquire the monitor, and then execute the body of the method after successfully obtaining it, and then release the monitor. No other thread can acquire the same Monitor object during method execution.

Synchronized uses locks that are stored in Java object headers,

The default data in the MarkWord is the HashCode that stores the object,

However, it changes as the object runs, and different lock states correspond to different record storage methods

8. Know the locks

8.1 the spin lock

The principle of

Spinlocks principle is very simple, if the thread holding the lock can lock is released in a very short time resources, and the thread lock wait for competition there is no need to do between kernel mode and user mode switch into the block pending state, they just need to wait for a while (spin), such as thread holding the lock immediately after releasing the lock locks, thus avoiding the consumption of user and kernel thread switching.

However, thread spin costs CPU, which means that the CPU is doing idle work. Threads can’t use CPU spin to do idle work all the time, so you need to set a maximum spin wait time.

If the thread holding the lock executes for longer than the maximum spin wait time and does not release the lock, other threads that are contending for the lock will not acquire the lock within the maximum wait time, and the contending thread will stop spinning and block.

Advantages and disadvantages of spin-locks

Spin-locking reduces thread blocking as much as possible. This gives a significant performance boost for code blocks that are not competitive locks and that are locked for very short periods of time, because the cost of spin is less than the cost of thread blocking and suspending!

But if the lock of the competition is intense, or thread holding the lock need long time to occupy the lock synchronization block, this time is not suitable for using a spin lock, because of the spin lock before acquiring a lock is CPU doing this all the time, for the XX XX, spin thread consumption is greater than the thread block suspend operation consumption, other threads to CPU and can’t get to the CPU, The CPU is wasted.

Spin-lock time threshold

The purpose of a spin lock is to hold CPU resources until the lock is acquired. But how to choose the execution time of the spin? If the spin execution time is too long, a large number of threads will be spinning and take up CPU resources, which will affect the overall system performance. So the number of spins is important

JVM for the choice, the number of spin jdk1.5 default is 10 times, in 1.6 introduces adaptive spin locks, adaptive spin lock means that the spin time is not fixed, but by the last spin in the same lock lock time, and the owner of the state to decide, basic thought a thread context switching time is the best one.

JDK1.6 -xx :+UseSpinning opens the spinlock; After JDK1.7, this parameter is removed and controlled by the JVM.

8.2 Lock status

There are four states, unlocked, biased, lightweight, and heavyweight, which escalate as the race progresses. Locks can be upgraded, but not degraded, to improve the efficiency of obtaining and releasing locks.

8.3 biased locking

Introduction background: In most cases, locks are not only non-multi-threaded, but are always acquired by the same thread multiple times. Bias locking has been introduced to reduce unnecessary CAS operations in order to make it cheaper for threads to acquire locks.

Biased locking, just as its name implies, it will tend to the first visit to lock the thread, if in the process of running, only one thread synchronization lock access, there is no multithreading contention, the thread is not to need to trigger synchronization, reduce some of the CAS lock/unlock operation (such as some of the waiting queue CAS operation), in this case, A biased lock is applied to the thread. If, during runtime, another thread preempts the lock, the thread holding the biased lock is suspended, and the JVM removes the biased lock from it, returning the lock to a standard lightweight lock. It further improves the performance of the program by eliminating the synchronization primitives in the case of uncontested resources.

Bias lock acquisition process:

Step 1. Check whether the biased lock flag in Mark Word is set to 1, whether the lock flag bit is 01, and confirm that the state is biased.

Step 2. If the state is biased, test whether the thread ID points to the current thread. If yes, go to Step 5; otherwise, go to Step 3.

Step 3. If the thread ID does not point to the current thread, the CAS operation is used to compete the lock. If the race is successful, set the thread ID in Mark Word to the current thread ID, and then go to 5; If the race fails, go to 4.

Step 4. If the CAS fails to obtain biased locks, competition exists. When safepoint is reached, the thread that acquired the biased lock is suspended, the biased lock is upgraded to a lightweight lock, and the thread that was blocked at the safepoint continues to execute the synchronization code. (Undo bias lock causes stop the word)

Step 5. Execute the synchronization code.

Release of biased locks:

The cancellation of biased locks is mentioned in step 4 above. A bias lock is only released by a thread that holds a bias lock when another thread tries to compete for it. A thread does not release a bias lock. When a bias lock is undone, it waits for the global safe point (at which point no bytecode is executing), it first suspends the thread that owns the bias lock, determines whether the lock object is locked, and then returns to the unlocked (flag bit “01”) or lightweight (flag bit “00”) state.

Application scenarios of biased locking

While performing synchronized block, always only one thread in it has not performed before releasing the lock, no other threads to execute the synchronized block, the lock without competition, once the competition will upgrade for lightweight locks, upgrade for lightweight locked when you need to reverse bias, revocation of biased locking time will lead to stop the word operation;

In the case of a lock race, biased locking does a lot of extra work, especially when undoing the biased lock leads to a safe point, which leads to STW and performance degradation, and should be disabled in this case.

The JVM turns bias locking on/off

Open the biased locking: – XX: XX: + UseBiasedLocking BiasedLockingStartupDelay = 0

To close the bias lock: -xx: -usebiasedlocking

8.4 Lightweight Lock

Lightweight locking is upgraded from bias locking, which runs when one thread enters a synchronized block and is upgraded to lightweight when a second thread enters a lock contention.

Lightweight locking process:

When the code enters the synchronization block, if the Lock state of the synchronization object is unlocked and no bias is allowed (the Lock flag is “01” and the bias Lock is “0”), the VIRTUAL machine will first create a space named Lock Record in the stack frame of the current thread. Swat Is used to store a copy of the lock object’s current Mark Word, officially called swat Mark Word.

After the copy is successful, the VM will use CAS operation to update the Mark Word of the object to the pointer pointing to the Lock Record, and the owner pointer in the Lock Record points to the Object Mark Word. If the update is successful, go to Step 4. Otherwise, go to Step 5.

If the update action is successful, then the thread owns the lock of the object and the lock bit of the object Mark Word is set to “00”, indicating that the object is in a lightweight locked state

If the update fails, the vm first checks to see if the object’s Mark Word points to the stack frame of the current thread. If it does, the thread already has the lock on the object and can proceed directly to the synchronization block. Otherwise, multiple threads are competing for the lock. When a competing thread tries and fails to occupy the lightweight lock multiple times, the lightweight lock will swell to a heavyweight lock. The heavyweight thread pointer points to the competing thread, and the competing thread will block until the lightweight thread releases the lock and wakes it up. The lock flag’s status value changes to “10,” the Mark Word stores a pointer to the heavyweight lock (mutex), and any thread waiting for the lock blocks.

8.5 Comparison of different locks

More lock optimizations in the 8.6 JDK

Escape analysis

If you show that an object does not escape outside of a method or thread, you can optimize for this variable:

Synchronization Elimination. Synchronization Elimination for this variable can be eliminated if an object does not escape the thread.

Lock removal and coarsening

Lock elimination: The virtual machine’s runtime compiler removes locks at run time if it detects that some code requiring synchronization is unlikely to have a shared data race.

Lock coarsening: Adjacent code blocks are combined with the same lock.

Eliminating meaningless lock acquisition and release can improve the performance of the program.

To this chapter, JAVA concurrent programming related knowledge is almost finished, a total of 12 articles, code about 10 demo, adhere to more than 3 months, I hope I can stick to it, continue to update. I hope you can get something out of it. ‘~