The memory model generates the background

Before introducing the Java memory model, let’s take a look at the concurrency problems in physical computers to understand the context in which the memory model arises.

The problem of concurrency in physical machines is similar to that in virtual machines, and the solution of physical machines has considerable reference significance to the implementation of virtual machines.

Physical machine concurrency problems

Hardware efficiency

Most of the running tasks of the computer processor can not be completed only by the processor “calculation”. The processor at least needs to interact with memory, such as reading operational data and storing operational results. This I/O operation is very difficult to eliminate (registers can not complete all the operation tasks).

Because there is a gap of several orders of magnitude between the computing speed of a computer’s storage device and that of its processor, modern computer systems try to keep the processor from waiting for slow memory to complete its read and write operations by adding a layer of read and write speeds to the cache as close as possible to the processor’s computing speed.

The cache serves as a buffer between memory and the processor: it copies the data needed for an operation to the cache to make the operation run quickly, and then synchronizes the operation from the cache back to memory when it is finished.


Cache consistency issues

Cache-based storage system interaction is a good solution to the processor/memory speed conflict, but it also introduces higher complexity to computer systems because of a new problem: cache consistency.

In a multi-processor system (or a single-processor, multi-core system), each processor (each core) has its own cache, and they share the same Main Memory.

When the computation tasks of multiple processors all involve the same main memory area, the cache data of each processor may be inconsistent.

Therefore, each processor must follow some protocols when accessing the cache, and operate according to the protocols when reading and writing the cache to maintain the consistency of the cache.


Out-of-order code execution optimization problem

In order to make full use of the computing unit inside the processor and improve the computing efficiency, the processor may execute the input code out of order.

The processor will recombine the out-of-order execution result after calculation. Out-of-order optimization can ensure that the execution result is consistent with the sequential execution result under single thread, but it does not guarantee that the sequence of each statement in the program is consistent with that in the input code.


Out-of-order execution is a process in which the processor optimizes the order of the code in order to improve the speed of the computation. In the single-core era, the processor ensured that the optimizations made would not lead to execution far from the intended goal, but this is not the case in a multi-core environment.

In a multi-core environment, if there is a computing task of one core dependent on the intermediate results of another core computing task.

And the relevant data read and write does not do any protection measures, so the sequence can not rely on the order of the code to ensure that the final result of the processor and our logic may be very different.


CPU core2 logic B relies on CORE1 logic A to execute first:

Normally, logic A is executed before logic B is executed.

In the case of out-of-order processor execution optimization, flag may be set to true in advance and logic B may be executed before logic A.

Composition analysis of Java memory model

Memory model concept

In order to better solve the problems mentioned above, the memory model is summarized and proposed. We can think of the memory model as a process abstraction of reading and writing access to a specific memory or cache under a specific operation protocol.

Physical computers with different architectures can have different memory models, and Java virtual machines have their own memory models.

The Java Virtual Machine specification attempts to define a Java Memory Model (JMM) to screen out the Memory access differences of various hardware and operating systems, so as to achieve consistent Memory access effects of Java programs on various platforms. There is no need to customize development programs for different platforms because of the memory models of physical machines on different platforms.

More specifically, the Java memory model aims to define the access rules for variables in a program, the low-level details of storing variables in and out of memory in the virtual machine.

Variables differ from Java programming in that they include instance fields, static fields, and elements that make up numeric objects, but not local Variables and method parameters, which are thread private.

Note: If a local variable is a Reference type, the object it refers to can be shared by individual threads in the Java heap, but reference itself is in the local variable table of the Java stack and is thread private.

Components of the Java memory model

Main memory

The Java Memory model specifies that all variables are stored in Main Memory (which is the same name as the Main Memory used to describe physical hardware, but which is only a portion of virtual machine Memory).

The working memory

Each thread has its own Working Memory (also known as local Memory, analogous to the processor cache introduced earlier), which holds a copy of the shared variables in the main Memory used by the thread.

Working memory is an abstraction of the JMM and does not really exist. It covers caches, write buffers, registers, and other hardware and compiler optimizations.

The Java memory model abstraction is shown below:


Concurrency problems with JVM memory operations

The JVM memory manipulation problem can be summarized by analogy with the physical machine processor processing problem described above. The Java memory model execution processing described below will focus on solving these two problems.

Working memory data consistency

Each thread will save a copy of the shared variable in the main memory when it manipulates data. If multiple threads’ computing tasks involve the same shared variable, their copies of the shared variable will be inconsistent. If this happens, whose copy of the data will be used to synchronize the data back to the main memory?

The Java memory model ensures data consistency through a series of data synchronization protocols and rules, which will be described later.

Instruction reorder optimization

In Java, reordering is usually a way for the compiler or runtime environment to reorder the execution of instructions to optimize program performance.

There are two types of reordering: compile-time reordering and run-time reordering, corresponding to compile-time and run-time environments respectively.

Similarly, instruction reordering is not arbitrary, it needs to satisfy the following two conditions:

You cannot change the results of a program run in a single-threaded environment. The just-in-time compiler (and processor) needs to ensure that the program complies with the AS-IF-serial attribute.

In layman’s terms, in a single-threaded case, you want to give the program the illusion of sequential execution. That is, the result of the reordered execution must be the same as the result of the sequential execution.

Data dependencies cannot be reordered.

In a multi-threaded environment, if there is a dependency between thread processing logic, the result may be different from the expected result due to instruction reordering. How to solve this situation will be discussed in the Java memory model.

Interaction between Java memory operations

Before we look at the protocols and special rules of the Java memory model, let’s first understand the interoperations between memory in Java.

Interactive operation flow

To better understand memory interactions, let’s take thread communication as an example and see how values are synchronized between threads:


Thread 1 and thread 2 each have a copy of the shared variable x in main memory, which initially has a value of 0.

Updating x to 1 in thread 1 and then synchronizing to thread 2 involves two main steps:

Thread 1 flushes the updated value of x from thread working memory to main memory.

Thread 2 goes into main memory and reads the x variable that was updated before thread 1.

As a whole, these two steps are thread 1 sending a message to thread 2, which must pass through main memory.

The JMM provides visibility of shared variables for individual threads by controlling the interaction between main memory and each thread’s local memory.

The basic operations of memory interaction

The Java memory model defines the following eight operations to accomplish the specific protocol of interaction between main memory and working memory, namely the implementation details of how a variable is copied from main memory to working memory and synchronized from working memory back to main memory.

Virtual machine implementations must ensure that each of the operations described below is atomic and non-divisible (load, store, read, and write are allowed for exceptions on some platforms for double and long variables).


8 basic operations, as shown below:

Lock, a variable that acts on main memory and identifies a variable as a thread-exclusive state.

Unlock, a variable that acts on main memory. It frees a variable that is locked so that it can be locked by another thread.

Read, a variable acting on main memory that transfers the value of a variable from main memory to the thread’s working memory for subsequent load action.

Load, a variable that acts on working memory. It puts the value of the variable from main memory into a copy of the variable in working memory.

Use, a working memory variable that passes the value of a variable in the working memory to the execution engine whenever the virtual machine reaches a bytecode instruction that requires the value of the variable to be used.

Assign, a working memory variable that assigns a value received from the execution engine to the working memory variable. This operation is performed whenever the virtual machine accesses a bytecode instruction that assigns a value to the variable.

Store, a variable applied to working memory that passes the value of a variable in working memory to main memory for subsequent write operations.

Write, a variable that operates on main memory. It puts the value of a variable in main memory that the Store operation fetched from working memory.

Java memory model running rules

Three features of the basic operation of memory interaction

Before introducing the eight basic operations of memory interaction, it is necessary to introduce three features of the operation.

The Java memory model is built around how these three features are handled in concurrent processes, and the definitions and basic implementation are briefly introduced here, followed by a step-by-step analysis.

Atomicity (Atomicity)

Atomicity, that is, one or more operations are either all performed without interruption by any factor or none at all.

Even when multiple threads are working together, once an operation has started, it cannot be disturbed by other threads.

Visibility (Visibility)

Visibility means that when multiple threads access the same variable and one thread changes the value of the variable, other threads can immediately see the changed value.

As illustrated in “Interactive Flow” above, the JMM achieves visibility by relying on main memory as the delivery medium by synchronizing the new value back to main memory after the working memory of the variable is modified in thread 1, and flushing the value from main memory before the variable is read in thread 2.

The differentiation (Ordering)

Orderliness rules occur in the following two scenarios:

Within a thread, from the point of view of a method, instructions are executed in a way called as-if-serial, which is already used in sequential programming languages.

Between threads, when that thread “watches” another thread execute asynchronously, any code may cross because of instruction reordering optimization.

The only constraint that matters is that for synchronized methods, operations on synchronized blocks (the synchronized keyword modifier) and volatile fields remain relatively orderly.

The Java memory model’s set of running rules may seem cumbersome, but in summary, they are built around atomicity, visibility, and orderliness.

Ultimately, the program can run as expected in an environment optimized for achieving data consistency in multiple threads of working memory with shared variables, multi-threaded concurrency, and instruction reordering.

Happens-before relations

Before introducing the set of rules, consider the happens-before relationship: used to describe the memory visibility of the next two operations. If A happens-before B, then the result of A is visible to B.

The analysis of happens-before relationship needs to be divided into single-threaded and multi-threaded cases:

Happens-before under a single thread, the bytecode sequence naturally involves a happens-before relationship: because a single thread shares a working memory, there is no problem with data consistency.

Happens-before The first bytecode in the program control flow path, that is, after the first bytecode has been executed, the result of the operation is visible to the later bytecode.

However, this does not mean that the former necessarily precedes the latter. In fact, they might be reordered if the latter did not depend on the results of the former.

Happens-before: Since each thread has A copy of the shared variable, thread 1 updates the value of the shared variable of operation A, thread 2 starts operation B, and the result of operation A may not be visible to operation B if the shared variable is not synchronized.

To facilitate program development, the Java memory model implements the following operations that support the happens-before relationship:

A procedural order rule, in which the happens-before actions are written before actions are written in code order within a thread.

Lock rule. An unLock operation happens before before a lock operation on the same lock.

A volatile variable rule that reads a variable after writes to the variable are happens-before.

Pass the rule, if A happens-before B and B happens-before C, then we can say A happens-before C.

Thread start rule, Thread object’s start() method happens-before every action of this Thread.

Thread interrupt rule. A call to the threadinterrupt () method happens-before is detected by the interrupt thread’s code.

Thread termination rule. All operations in a Thread are happens-before the Thread terminates. We can detect that the Thread has terminated by the end of thread.join () and the return of thread.isalive ().

Object finalization rule, an object’s initialization completes happens-before the start of its Finalize () method.

The memory barrier

How do you ensure order and visibility of the underlying operations in Java? The memory barrier can be passed.

A memory barrier is an instruction inserted between two CPU instructions to prevent processor instructions from being reordered (like a barrier).

In addition, in order to achieve the effect of a barrier, it also causes the processor to write the main memory value to the cache before writing or reading the value, clearing the invalid queue, thus ensuring visibility.

Here’s an example:

Store1; Store2; Load1; StoreLoad; // Memory barrier Store3; Load2; Load3;

For the above set of CPU instructions (Store for write instructions, Load for read instructions, and StoreLoad for write read memory barriers), the Store instructions before the StoreLoad barrier cannot be swapped with the Load instructions after the StoreLoad barrier. Reorder.

However, the instructions before and after the StoreLoad barrier are interchangeable, i.e. Store1 can be interchanged with Store2 and Load2 can be interchanged with Load3.

There are four common barriers:

LoadLoad barrier: for statements such as Load1; LoadLoad; Load2: Ensure that the data to be read by Load1 is completed before the data to be read by Load2 and subsequent read operations are accessed.

StoreStore barrier: for statements like Store1; StoreStore; Store2. Before Store2 and subsequent write operations are executed, ensure that the write operations of Store1 are visible to other processors.

LoadStore barrier: for statements such as Load1; LoadStore; Store2, ensure that the data to be read by Load1 is completed before Store2 and subsequent writes are executed.

StoreLoad barrier: for statements like Store1; StoreLoad; Load2, ensure that writes to Store1 are visible to all processors before Load2 and all subsequent reads are executed. It has the largest overhead of the four barriers (flushing the write buffer, emptying the invalid queue).

In most processor implementations, this barrier is a universal barrier that doubles as the other three memory barriers.

The use of memory barriers in Java is less common in generic code, such as blocks of code modified with the volatile and synchronized keywords (more on that later). Also, memory barriers can be used through the Unsafe class.

Eight rules for operation synchronization

To ensure data consistency between dimms, the JMM must comply with the following rules when performing the preceding eight operations:

Rule 1: If a variable is copied from main memory to working memory, read and load operations are performed sequentially. If a variable is synchronized from working memory back to main memory, store and write operations are performed sequentially.

But the Java memory model only requires that these operations be performed sequentially, not sequentially.

Rule 2: Do not allow one of the read and load, store and write operations to occur separately.

Rule 3: A thread is not allowed to discard its most recent assign operation, i.e. variables that have changed in working memory must be synchronized to main memory.

Rule 4: A thread is not allowed to synchronize data from working memory back to main memory without an unassigned operation.

Rule 5: A new variable can only be created in main memory. It is not allowed to directly use an uninitialized (load or assign) variable in working memory.

That is, load or assign operations must be performed before use and store operations can be performed on a variable.

Rule 6: A variable can be locked by only one thread at a time. However, a variable can be locked several times by the same thread. After multiple unlock operations are performed, the variable can be unlocked only after the same unlock operations are performed. So lock and unlock must come in pairs.

Rule 7: If you lock a variable, the value of the variable will be emptied from working memory. You need to load or assign the variable again before the execution engine can use it.

Rule 8: Do not unlock a variable that has not been previously locked by a lock operation. It is also not allowed to unlock a variable that has been locked by another thread.

Rule 9: Before you can unlock a variable, you must synchronize it to main memory (store and write).

These rules may seem cumbersome, but they are not hard to understand:

For rule 1 and rule 2, a shared variable in the working memory is a copy of the main memory, and the value of the main memory variable is synchronized to the working memory using both read and load.

Synchronizing the values of variables in working memory back to main memory requires store and write to be used together. Each of these two sets of operations is a fixed and ordered collocation and is not allowed to occur independently.

Rule 3, Rule 4, since shared variables in working memory are copies of main memory, to ensure data consistency, when variables in working memory are reassigned by the bytecode engine, they must be synchronized back to main memory. If variables in working memory have not been updated, unprovoked synchronization back to main memory is not allowed.

Rule 5, since shared variables in working memory are copies of main memory, they must be born from main memory.

Rules 6, 7, 8, 9, for safe use of variables in concurrent situations, a thread can monopolize a variable in main memory based on a lock operation, and other threads are not allowed to use or unlock the variable until it is unlocked by the thread.

Special rules for volatile variables

Volatile is the Chinese word for unstable or volatile. Volatile is used to make variables visible.

Semantics of volatile

Volatile has two main semantics:

Guaranteed visibility

Command reordering is disabled

Guaranteed visibility ensures memory visibility of operations on this variable by different threads. Ensuring visibility is not the same as ensuring the safety of concurrent operations on volatile variables.

As soon as a thread makes changes to a variable, it writes back to main memory.

When a thread reads a variable, it reads it from main memory, not from the thread’s working memory.

However, if multiple threads flush the updated variable value back to main memory at the same time, the value may not be the expected result.

An example would be to define volatile int count = 0 as the count++ operation performed by two threads simultaneously, each 500 times, resulting in a result less than 1000.

The reason is that count++ requires the following three steps per thread:

The thread reads the latest count value from main memory.

The execution engine increments the count value by 1 and assigns it to the thread working memory.

Thread working memory stores the count value to main memory.

It’s possible that at some point, both threads read 100 in step 1, get 101 after step 2, and flush 101 twice and save it to main memory.

Instruction reordering is prohibited. To be specific, the rules for reordering are as follows:

When a program performs a read or write to a volatile variable, all changes to the preceding operation must have occurred and the result must be visible to subsequent operations. The operation behind it has certainly not taken place.

During instruction optimization, statements that access volatile variables cannot be executed after them or statements that access volatile variables cannot be executed before them.

Ordinary variables only guarantee that the method will get the correct result wherever it depends on the assignment, not that the assignment will be executed in the same order as in the program code.

Here’s an example:

volatilebooleaninitialized =false; DoSomethingReadConfg (); doSomethingReadConfg(); doSomethingReadConfg(); initialized =true; Initialized is true, indicating that thread A has completed the initialization of configuration information. initialized) { sleep(); }// Use thread A to initialize the configuration information doSomethingWithConfig();

If the initialized variable is defined without the volatile modifier, it may be optimized for instruction reordering. Causes the last line of code in thread A, “initialized = true”, to be executed before “doSomethingReadConfg()”.

The semantics of the volatile keyword prohibiting reordering can prevent errors in thread B code that uses configuration information.

Implementation principle of volatile variables


When bytecode is generated at compile time, memory barriers are added to the instruction sequence to ensure this. The following is the JMM memory barrier insertion strategy based on conservative strategy:

Insert a StoreStore barrier before each volatile write. In addition to ensuring that writes before and after the barrier cannot be reordered, this barrier also ensures that any reads and writes prior to volatile writes are committed prior to Volatile.

Insert a StoreLoad barrier after each V olatile write operation. In addition to keeping v olatile writes from being reordered with subsequent reads, the barrier flusher the processor cache, making write updates of v olatile variables visible to other threads.

Insert a LoadLoad barrier after each volatile read. In addition to preventing volatile reads from reordering from previous writes, the barrier flushers the processor cache so that volatile variables are read with the latest values.

Insert a LoadStore barrier after each volatile read. In addition to disallowing the reordering of volatile reads with any subsequent writes, the barrier flushers the processor cache so that write updates of volatile variables from other threads are visible to the thread with the volatile reads.

Usage scenarios of volatile variables

To sum up, it is “write once, read everywhere”, with one thread responsible for updating variables and the other thread only reading (not updating) variables and performing logic based on their new value. For example, status flag bits are updated and observer model variable values are published.

Special rules for final variables

As we know, final member variables must be initialized at declaration time or in the constructor, otherwise a compilation error will be reported.

Visibility of the final keyword means that, once the initialization of a field modified by final is complete, the value of the final field can be correctly seen by other threads without synchronization at declaration time or in the constructor. This is because the value of the final variable is written back to main memory as soon as initialization is complete.

Special rules for synchronized

Control the read and write of data through the code area covered by synchronized keyword:

Read data. When a thread enters this area to read variable information, it cannot read the data from the working memory. It can only read the data from the memory to ensure that the latest value is read.

Write data, the operation of writing variables in the synchronization area. When leaving the synchronization area, the data in the current thread is flushed to the memory to ensure that the updated data is visible to other threads.

Special rules for long and double variables

The Java memory model requires that lock, unlock, read, load, assign, use, Store, and write operations be atomic.

In the case of 64-bit data types (long and double), however, the model is particularly lax: the virtual machine is allowed to divide reads and writes to 64-bit data that are not volatile into two 32-bit operations.

This means that the VIRTUAL machine can choose not to guarantee the atomicity of the load, Store, read, and write operations on 64-bit data types.

Because of this non-atomicity, it is possible for other threads to read the value of the “32-bit half-variable” that has not been synchronized.

In practice, however, the Java memory model strongly recommends that the virtual machine implement atomic reading and writing of 64-bit data.

Today’s commercial virtual machines on all platforms choose to treat 64-bit data reads and writes as atomic operations, so we generally don’t need to declare the long and double variables volatile.

conclusion

Because the Java memory model involves a set of rules, most of the articles on the web are devoted to parsing these rules, but many fail to explain why they are needed or what they do.

In fact, this is not conducive to beginners learning, easy to round into these tedious rules do not know why, the following talk about my personal experience of learning knowledge:

The process of learning knowledge is not equal to just understanding and memorizing knowledge, but to make connections between the inputs and outputs of the problem that knowledge solves.

The essence of knowledge is problem solving, so understand the problem before learning, understand the input and output of the problem, and knowledge is a relational mapping of the input to the output.

Knowledge learning should combine a large number of examples to understand the mapping relationship, and then compress the knowledge. Hua Luogeng once said: “Read a book thick, then thin”, explaining this principle, first combine a large number of examples to understand the knowledge, and then compress the knowledge.

Recommend a learning group: 692845439, which will share some veteran architects recorded video: Spring, MyBatis, Netty source code analysis, high concurrency, high performance, distributed, microservice architecture principles, JVM performance optimization these become the architect’s essential knowledge system. You can also receive free learning resources, which have benefited a lot at present: