(I) The Java Memory Model (JMM) and the underlying principle of Volatile keyword

The introduction

Based on my personal understanding of the Java memory model and related books, this article comprehensively analyzes the JMM memory model. This paper describes the JVM memory model, hardware and OS (operating system) memory region architecture, Java multithreading principle and Java memory model JMM among the series relations, and then further analysis of the Java memory model. Because most people confuse the Java memory model with the JVM memory model when describing the JMM, the purpose of this article is to help you understand the JMM memory model thoroughly. (My articles are based on personal understanding + related books for the premise of writing, if you are wrong or doubt welcome to read the comments section to correct, thank you!)

A thorough understanding of the difference between the JVM memory model and the Java memory model JMM

1.1. JVM memory model (JVM memory region partition)

As is known to all, Java programs must be built on the premise of JVM if they want to run. Java uses the JVM virtual machine to shield direct contact with the operating system or OS like C, so that the Java language operations are all built on the basis of JVM, thus ignoring the platform and running everywhere once compiled.

The JVM when running a Java program to manage the memory is divided into the above areas (runtime data area), each region has its own purpose and play their roles in the Java program is running, and actually the runtime data area and runtime data division will be divided into the thread private area and Shared area (GC does not happen in a thread private area), The specific roles of each major region are as follows:

Method Area:

The method area (changed to metadata space after Java8) belongs to the area of memory shared by threads, also known as non-heap (non-heap). It is mainly used to store class information, constants, static variables, code compiled by the just-in-time compiler and other data that has been loaded by the virtual machine. According to the provisions of the Java Virtual Machine specification, the method area is used to store class information, constants, static variables, code compiled by the just-in-time compiler and other data. An OutOfMemoryError exception is thrown when the method area cannot meet the memory allocation requirements. It is worth noting that there is a section in the method area called the Runtime Constant Pool, which is used to store various literals and symbolic references generated by the compiler. These are stored in the Runtime Constant Pool after the class is loaded for later use.

JVM Heap (Java Heap) : The Java heap is also an area of memory shared by threads. It is created when the virtual machine starts up. It is the largest chunk of memory managed by the Java virtual machine and is used to store Object instances. Note that the Java heap is the primary area managed by the garbage collector, so it is often referred to as the GC heap. An OutOfMemoryError will be thrown if there is no memory in the heap to complete instance allocation and the heap can no longer be scaled.

Program Counter Register (Program Counter Register) : A thread private data area. It is a small memory space that represents the bytecode line number of the current thread. When the bytecode interpreter works, it changes the value of this counter to select the next bytecode instruction that needs to be executed. Basic functions such as branching, looping, jumping, exception handling, thread recovery, etc. depend on this counter to complete. Because the CPU’s time slice will “interrupt” one thread to allow another thread to start working, how will the “interrupted” thread know when it is rescheduled by the CPU which line of code was executed last time? It’s the program counter that’s responsible for this.

Java Virtual Machine Stacks:

A data region that is private to the thread and created simultaneously with the thread. The total is associated with the thread and represents the memory model for the execution of Java methods. When the thread starts executing, a stack frame is created for each method to store the method’s variables, operand stack, dynamically linked methods, return values, return addresses, and so on. Each method is called straight from the end of the call for a stack frame in the virtual stack and out of the process, as follows:

Native Method stack Native Method stack Native Method stack Local method stack belongs to the thread private data area, which is mainly related to Native methods written by C used by the virtual machine. When a program needs to call Native methods, JVM will maintain a register of local methods in the local method stack. Here, it just registers which local method interface is called by which thread. The call does not occur directly in the local method stack, because it is just a call register, and the actual call requires calling a function written in C in the local method library through the local method interface. In general, we do not care about this area.

Say this is the need to let everybody understand clearly the content of the JVM memory model and the JMM memory model is totally two different concepts, the JVM memory model is in the Java virtual machine level, the JVM, actually for the operating system, JVM still exists in main memory, essentially the JMM is a Java language and OS and hardware architecture level, The main role is to define the hardware architecture and the Java language’s memory model, but there is no such thing as JMM, which is just a specification, not a technical implementation.

1.2. Overview of the Java Memory Model JMM

The Java Memory Model (JMM) itself is an abstract concept that does not really exist. It describes a set of rules or specifications that define how to access variables in a program, including instance fields, static fields, and elements that form array objects. Because the JVM to run the program of the entity is a thread, and every thread creation when the JVM to create a working memory (called the stack space) in some places, and threads used to store the private data, while the Java memory model that all variables are stored in main memory, main memory is Shared memory region, all threads can access, Thread but if you want to read the assignment to a variable such as operation must be in working memory, so a thread to operating variables to first get a variable from the main memory copy their working memory space, and then to operating variables, variable again after operation to complete flash back to the main memory and cannot be directly operating variables in main memory, Working memory stores copies of variables in main memory. (PS: For those who are wondering, when a thread executes a method in Java, even if the object is referenced or created, isn’t it stored in the heap?) Does the stack store more than just reference addresses of objects? For a brief note, when the thread actually runs to this line, it will find the real object in main memory based on the object reference address in the local table, and then copy the object to its own working memory and operate on….. , but when the object of action is a big object (1 MB +) doesn’t completely copy, but their operations and need to copy) that part of the members, said earlier that the working memory is the private data area each thread, so don’t have access to the other side of the work between different threads of memory, the communication between threads (by value) must be done through the main memory, The brief access process is shown below:

Pay attention!! The division of JMM and JVM memory regions is a different conceptual level. It is not necessary to understand the JMM with the memory model of the JVM. It is more appropriate to say that the JMM describes a set of rules that control the access of various variables in the Java sequence in the shared data region and private data region. Orderly, visible and extended. JMM and Java memory area only similarities, there are Shared data area and the private data area, main memory in JMM belongs to the Shared data area, from a certain extent area should include the heap and method, and thread the private data area, the working memory data from a certain extent it should be including the program counter, virtual machine as well as the method of local stack. In some places, we might see the main memory described as heap memory and the working memory as thread stack, but they all mean the same thing. Main memory and working memory in the JMM are described as follows:

Main memory:Main storage is a Java object instance, all threads create an instance of the object are stored in main memory (excluding opens the escape analysis and scalar replace allocated on the stack and TLAB distribution), and whether the instance object is a member variable or method of local variables (also called a local variable), of course also includes the Shared information, constants, and static variables. Because it is a shared data region, multiple threads may experience thread-safety issues when performing non-atomic operations on the same variable.

Working memory:Each thread can only access its own working memory. In other words, local variables in a thread are not visible to other threads, even if two threads are executing the same piece of code. They each create local variables belonging to the current thread in their own working memory, as well as bytecode line number indicators and information about the relevant Native methods. Note that since the working memory is the private data of each thread, threads cannot access the working memory. Communication between threads still depends on main memory, so the data stored in the working memory does not have thread-safety problems.

Once we’ve figured out the main and working memory, we’ll look at the types of data stores in the main and working memory and how they operate. According to the virtual machine specification, for a member method in an instance, If the method contains the local variable is the basic data types (Boolean, byte, short, char, int, long, float, double), will be directly stored in working memory, local variables in the stack frame structure, but if the local variable is a reference type, The in-memory reference address of the object will be stored in the local variable table of the frame stack structure in the working memory, and the object instance will be stored in the main memory (shared data area, heap). However, the instance object’s member variables, whether they are primitive data types or wrapper types (Integer, Double, etc.) or reference types, are stored on the heap (except for on-stack allocation and TLAB allocation). Information about the static variables and the class itself will be stored in main memory. It is important to note that in the main memory of the instance objects can be multi-threaded Shared, if two threads simultaneously invoked in the same class of the same method, then the two threads will be operating data copies of a job to his memory, completes operation after flushed to main memory, simple diagram as shown below:

II. Computer hardware memory architecture, OS and Java multithreading implementation principle and Java memory model

2.1 Memory architecture of computer hardware

As shown in the figure above, the simple diagram with simplified CPU and memory operations is actually not so simple. For the sake of understanding, we omit the north and south Bridges here. In terms of the current computer, usually with multiple cpus and each CPU core, there may be multiple multi-core is integrated in a single processor (CPU) of two or more complete calculation engine (kernel), so that you can support multitasking parallel execution, from the scheduling of threads, each thread will be mapped to each CPU core run in parallel. There is a set of CPU registers inside the CPU. The register is the data directly accessed and processed by the CPU. It is a temporary space for putting data. General CPU can access data from internal to register, and then for processing, but due to the processing speed is far lower than the CPU memory, the CPU processing instruction tend to spend too much time in waiting for memory to prepare for work, so in the register is added between the CPU and main memory cache, CPU cache is small, but the access speed is much faster than main memory, If the CPU always operates on data at the same location in the main memory, it is easy to affect the execution speed of the CPU. At this time, the CPU cache can temporarily store the data fetched from the memory. If the register wants to fetch the data at the same location in the memory, it can be fetched directly from the cache, without the need to directly access from the main memory. It is important to note that the register does not fetch data from the cache every time. In case the data is not in the same memory address, the register must bypass the cache directly to fetch data from memory. So don’t get the cache the data every time, have a professional name this phenomenon called a cache hit ratio, from the cache hit, not take from the memory from the cache, it missed, visible cache hit ratio of high and low will also affect CPU performance, this is a brief interaction between CPU, cache and main memory process, In short when a CPU need access to main memory, will read the first part of main storage data to the CPU cache (of course, if need to exist in the CPU cache data will be directly from the cache), then it reads the CPU cache to register, when the CPU needs to write data to the main memory, also in the data to the CPU cache refresh register first, The data is then flushed to main memory. In fact, similar to the APPKalition (Java) -> Cache(Redis) -> DB(MySQL) relationship, the performance of Java programs has been impacted by DB’s need to go to disk, resulting in Java programs having to wait for DB’s processing results when processing requests. At this time, the thread responsible for processing the request is still in the blocking waiting state, and will continue to work only when the DB processing result returns. In fact, the problem in the whole model is: The speed of DB can not keep up with the performance of Java program, so the whole request processing becomes very slow. However, in fact, during the process of DB processing, the Java thread is in the state of blocking and not working, so it is actually unnecessary, because it will eventually lead to the throughput decline of the whole system. At this point, we can add Cache(Redis) to improve the response efficiency of the program, thus improving the overall system throughput and performance. (In fact, the goal of performance tuning is to make every aspect of the system faster, and architecture is really about designing a system that can handle a larger number of requests.)

2.2. Relationship between OS and JVM threads and implementation principle of Java threads

In the above elaboration, we have a general understanding of the hardware memory architecture, JVM memory model and Java memory model, and then understand the Java thread implementation principle, understanding the principle of thread implementation, help us to understand the Java memory model and hardware memory architecture relationship, on Windows OS and Linux OS, The implementation of Java thread is based on the one-to-one thread model, the so-called one-to-one model, in fact, through the language level level program to indirectly call the system kernel thread model, that is, when we use Java thread, such as:new Thread(Runnable);Inside the JVM, the kernel thread that calls the current operating system is turned to complete the current Runnable task. One term to understand here is kernel-level threads (KLT). It is a Thread supported by the operating system Kernel. This Thread is switched by the operating system Kernel, which manipulates the scheduler to schedule the threads, and maps the tasks of the threads to individual processors. Each kernel thread can be thought of as a clone of the kernel, which is why an operating system can multitask simultaneously. Because we write multithreaded programs at the language level, they don’t call the kernel thread directly. Instead, they use a Light Weight Process, which is a thread in the normal sense, because each lightweight Process maps to one kernel thread. Therefore, we can call the kernel thread through the lightweight process, and then the operating system kernel maps the tasks to the individual processors. This 1-to-1 relationship between the lightweight process and the kernel thread is called the thread-to-OS one-to-one model in Java programs. The diagram below:

Each thread in a Java program is mapped through the OS to the CPU for processing. Of course, if the CPU has multiple cores, then a CPU can also schedule multiple threads to execute in parallel.

2.3 Relationship between JMM and hardware memory architecture

Through the previous JVM memory model, the Java memory model JMM, the hardware memory architecture and the implementation principle of Java multithreading, we can find that the execution of multithreading will eventually be mapped to the hardware processor for execution, but the Java memory model and the hardware memory architecture are not completely consistent. Only for hardware memory registers, the concept of cache memory, primary memory, not the working memory (thread the private data area) and main memory (heap memory), that means the division of the memory of the Java memory model for hardware memory and does not have any effect, because the JMM is an abstract concept, is a set of rules that does not actually exist, Whether working memory data or data from main memory, for computer hardware will be stored in the computer main memory, of course, may also be stored in the CPU caches or register, so in general, the Java memory model and the computer hardware memory architecture is an overlapping relations, is an abstract concept division and cross of the real physical hardware. (Note that the same is true for JVM memory locale partitioning)

2.4. Why is the existence of JMM necessary?

Let’s move on to the necessity of the Java memory model, because if we’re going to learn something we need to know what it is and why it is. Since threads are the smallest unit of operation in the OS, all programs run as entities that are essentially threads, including Java programs that need to run on the OS. When a thread is created, the JVM creates a working memory (called a stack space in some places) for each thread to store data private to the thread. If a thread wants to manipulate a variable in main memory, it must do so indirectly through working memory. The main process is to copy the variable from the main memory into the thread’s own working memory space, then operate on the variable in the working memory first, and then brush the variable back to main memory after the operation is completed. Thread-safety issues can arise if two threads operate on variables of instance objects in the same main memory. In the following figure, there is a shared variable int I = 0 in main memory, in the first case (left figure) :

Now there are two threads A and B operating on the variable I respectively. Each thread A/B will copy I from main memory to its own working memory and store it as A copy of the shared variable I, and then increment I. So suppose that A/B copies I =0 from main memory to its own working memory for operation at the same time. So in fact I in A work in his memory on the operating is A copy of the memory of B work I not visible, then A will done since increase operating results 1 flash back to the main memory and B also did i++ operation at this time, so in fact B flash back to the main memory value is based on previous copy from main memory to the value of the working memory I = 0, So in fact B writes back to main memory at 1, but in fact I’m incremeting I in main memory by both threads, and the ideal result would have been I =2, but in this case it would have been I =1.

The second scenario (right) :

Suppose thread A wants to change the value of I to 2, but thread B wants to read the value of I. Does thread B read the value of A’s updated value of 2 or the value of 1? The answer is not sure, that is, B thread may read A thread to update the value of 1, before the may also read A thread on the updated value 2, this is because the working memory is each thread private data area, and thread A modified variable I, first of all is to variable copy from main memory to A thread’s working memory, and then to operating variables, Operation is completed before the variable I wrote back to the Lord, and for B threads are similar, so it may cause data consistency problems between main memory and working memory, if A thread after the modification is to write data back to the main memory and the thread B is read from main memory, at this time is I = 1 copy to their working memory, If thread A writes x=2 back to main memory before reading it, then thread B reads x=2. But which happens first? It’s not certain.

Therefore, the above two situations should not be applied to the program. Assuming that the variable I is changed into the inventory of the goods in Taobao’s Singles’ Day, and the A/B thread is changed into the users who participate in the Singles’ Day, the problem that this will cause is that for the Taobao business team, it may lead to overselling, repeat selling and other problems. This will result in the economic loss of business due to technical problems, especially in the promotion activities such as Taobao Double 11. If such problems are not properly controlled, the risk of problems will multiply. In fact, this is the so-called thread safety problem.

In order to solve the problem of similar as mentioned in the paper, the JVM defines a set of rules, through the set of rules to determine when a thread of Shared variables writes to another thread can be seen, this set of rules, also known as the Java memory model (JMM), JMM is around the whole program execution atomicity, orderliness and visibility, let’s look at these three features.

2.5. Three features that the Java Memory Model JMM revolves around

2.5.1 Atomicity

Atomicity refers to the fact that an operation is uninterruptible, even in a multi-threaded environment, so that once an operation is started it is not affected by other threads. For example, if A static variable int I = 0 is assigned by two threads at the same time, the operation of thread A is equal to I = 1, and the operation of thread B is equal to I = 2. No matter how the thread runs, the final value of I is either 1 or 2. There is no interference between thread A and thread B, and this is called atomic operation. The characteristic of being uninterruptible. A little note that for 32 bit system, long data and type double data (for basic data types, byte, short, int, float, Boolean, char, speaking, reading and writing is atomic operation), their reading and writing is not atomic, That is to say, if there are two threads reading and writing data of type long or type double at the same time, there will be interference with each other. Because for a 32-bit virtual machine, each atomic read and write is 32-bit, while long and double are 64-bit storage units, this will cause one thread to write. After performing the first 32 atomic operations, thread B happens to read only the last 32 bits of data, so it may read a variable that is neither the original value nor the thread-modified value. It may be the value of a “half variable”, i.e. the 64-bit data is read twice by two threads. But don’t worry too much, because read a “half variable” is relatively rare, at least in the current commercial virtual machine, almost all 64-bit data read and write operations as atomic operations to perform, so don’t worry too much about this problem, know that it can be. So essentially atomic operations are a set of large operations that either all succeed or all fail. For example: order: {increase orders, inventory reduction} so orders for users is an operating system must guarantee atomicity order operation, or to increase orders and inventory reduction all successful, there is no increase in orders successful, failed to reduce inventory, then this example from macro is an atomic operation, the atomic operation on the other hand, The root cause of thread-safety problems is also the non-atomic operation of a shared resource in a multi-threaded situation. But one thing that needs to be noted before we dive into concurrent programming in Java, and before we dive into visibility, is what the computer does to optimize the program as it executes — instruction reordering. When a computer executes a program, in order to improve performance, the compiler and processor often rearrange the instructions, which are generally divided into the following three categories:

Compiler-optimized rearrangement: The compiler can rearrange the execution order of statements without changing the semantics of a single-threaded program.
Instruction parallel rearrangement: Modern processors use instruction-level parallelism to overlap multiple instructions. If there is no data dependency (that is, a statement executed later does not have to depend on the result of a statement executed earlier), the processor can change the execution order of the machine instructions corresponding to the statement.
Memory system reordering: Because the processor uses the cache and reads and writes the cache area, the load and store operations may appear to be performed out of order, because there is a lag between the memory and the cached data due to the presence of the level 3 cache. Among them, compiler optimization reordering belongs to compile-time reordering, instruction parallelism reordering and memory system reordering belong to processor reordering. In a multi-threaded environment, these reordering optimizations may cause problems in memory visibility of programs. The following illustrates the possible problems caused by these two kinds of reordering optimizations.

2.5.1.1. Compiler optimization instruction rearrangement

int a = 0; int b = 0; // Thread A Thread B 1: int x = A; Int y = b; Code 2: b = 1; Code 4: a = 2;

At this point, there are 4 lines of code 1, 2, 3 and 4, of which 1 and 2 belong to thread A, and 3 and 4 belong to thread B. Both threads are executed at the same time. From the perspective of program execution, the final result x = 0 due to the parallel execution. y=0; It’s essentially not going to be x equals 2; y = 1; However, in fact, this situation is likely to occur, because the compiler generally carries out compiler optimization instruction rearrangement on some lines of code that have no influence on the code before and after the code and the coupling degree is 0. Suppose that after the compiler rearranges and optimizes this code instruction, the following situation may occur:

// Thread A thread B code 2: B = 1; Code 4: a = 2; Int x = a; Int y = b;

In this case, combined with the previous thread safety issues, it is possible that x = 2; y = 1; This result, this also means that in a multithreaded environment, because the compiler will do to code instructions rearrangement of optimized operation (because code to execute by commonly, directive rearrangement is OS the optimization of single thread running), eventually lead to multiple threads in a multithreaded environment using variables can ensure consistency is uncertain (PS: Compiler rearrangements are based on code that does not have dependencies, which can be divided into two types: data dependencies (int a = 1; Int b = a;) And control dependencies (Boolean f = true; if(f){sout(“123”); })).

2.5.1.2 Processor instruction rearrangement

Let’s first understand the concept of instruction rearrangement. Processor instruction rearrangement is to optimize the performance of CPU. From the perspective of instruction execution, an instruction can be completed in multiple steps, as follows:

Take refers to: the IF

Decoding and fetching register operand: ID

Execute or valid address calculation: EX

Memory access: MEM

Write back: WB

CPU at work, need to the above instruction can be divided into several steps in sequence (note the different hardware may not be the same), because each step will be used to different hardware operations, such as when they take that registers and memory, only PC decoding will execute commands to a set of registers, perform when ALU (arithmetic logic unit), and write back to when to use a set of registers. To improve hardware utilization, CPU instructions are executed in a pipelined manner, as follows:

(assembly line technology: similar to the production line in the factory, the workers perform their duties, finish their own to the back of the transmission, and then start a new, finished to the back of the transmission….. And is the same as the instruction execution, if wait until after an instruction has been completed to start the next execution, as factory production lines, to wait for a product to start the next after production is completed, the efficiency is very low and waste of human, so that a line at the same time can only have one of the workers in the work, other watch, Only when the product is in the hands of the last person and the last worker has finished the assembly can the first worker start the work on the second product.)

Can be seen from the diagram when instruction 1 has not been executed, article 2 of the instructions and spare hardware implementation, it is good to do so, if spend each step 1 ms, so if you need to wait for 1 to 2 instruction instruction execution after the completion of execution, then you will need to wait for 5 ms, but if use line technology, Instruction 2 only needs to wait 1ms for execution to begin, which can greatly improve the CPU’s execution performance. Although pipeline technology can greatly improve the performance of the CPU, but unfortunately once appear, interruption of water, all hardware devices will enter a pause period, when once again make up the breakpoints could take several cycles, such performance loss will be very big, like cell phone factory assembly line, once a finishing cut off, Subsequent workers of the part may then go through one or more rounds of waiting for the part to be assembled. Therefore, we need to try our best to prevent instruction interrupts. Instruction reordering is one of the means to optimize interrupts. Let’s illustrate how instruction reordering can prevent pipeline technology interrupts through an example, as follows:

i = a + b;
y = c - d;

instruction	describe
LW R1,a	The LW instruction means LOAD, where LW R1, and A means to load the value of A into register R1
LW R2,b	Loads the value of b into register R2
ADD R3,R1,R2	The ADD instruction represents addition, adding the values of R1 and R2 and storing them in the R3 register.
SW i,R3	SW means store stores the value of register R3 into variable I
LW R4,c	Loads the value of c into register R4
LW R5,d	Loads the value of d into register R5
SUB R6,R4,R5	The SUB instruction represents subtraction, subtracting the values of R4 and R5, and storing them in register R6.
SW y,R6	To hold the value of register R6 into the variable Y

The above is the assembly instruction execution, on some instruction is a sign of X, X represents the meaning of the interrupt, where as long as there is X leads to halt instruction pipelining technology, at the same time also will affect the subsequent instruction execution, may need to pass one or several instruction cycle can be back to normal, why stop? Part this is because the data is still not ready, such as performing the ADD command, you need to use the instruction of the data to the front R1, R2, and R2 MEM operation did not complete at this time, that is not copying to memory, such addition calculation will not be able to, must wait until the MEM after the completion of the operation to perform, has stalled because of this, other instructions is also a similar situation. As mentioned above, pauses can degrade CPU performance, so we should find ways to eliminate these pauses. In this case, we need to use instruction reordering, as shown in the figure below. Since the ADD instruction needs to wait, we can use the waiting time to do something else, such as moving LW R4,c and LW R5,d to the front to execute. After all, the execution of LW R4, C and LW R5, D does not have data dependencies, and the instructions of Sub R6,R5 and R4 that have data dependencies on them are executed after the loading of R4 and R5, which has no effect. The process is as follows:

As shown in the figure above, all pauses are perfectly eliminated and the instruction pipeline does not have to be interrupted, thus providing a significant improvement in CPU performance. This is where processor instruction reordering comes in. About the compiler to row rearrangement and instructions (both rearrangement unified behind us is called instruction) relevant content is clear, we must realize that for the single thread instruction rearrangement will not bring any impact, almost than unexpectedly rearrangement is the premise of guarantee serial semantic consistency of execution, but just for a multithreaded environment, Instruction reordering can cause serious program rotation problems, as follows:

int a = 0; boolean f = false; public void methodA(){ a = 1; f = true; } public void methodB(){if(f){int I = a + 1; }}

For example, if thread A and thread B operate on the instance object at the same time, where thread A calls methodA and thread B calls methodB, the execution order of the program may change to the following due to instructions reordering or other reasons:

Thread A Thread B methodA: methodB: Code 1:f= true; Code 1: f = true; Code 2:a = 1; Code 2: a = 0; I = a + 1; I = a + 1;

Because of instruction rearrangement, thread A f to true was carried out in advance, and thread A still in the execution of A = 1, at this time because of the f = true, so the thread B just read the f value is true, direct access to A value, while the thread is still in their working memory to A copy of A copy of variables for the assignment of operations, The result has not been written to main memory, then the value of A read by thread B is still 0, then the value of A copied to thread B’s working memory is 0; Then, I = A + 1 operation is performed in its own working memory. At this time, thread B reads A as 0 due to the processor’s instruction rearrangement, resulting in the final result of I being 1 instead of the expected value of 2. This is the result of program execution out of order caused by instruction rearrangement in a multi-threaded environment. Therefore, keep in mind that instruction reordering only ensures consistent execution of serial semantics in a single thread, and can be used to optimize programs and eliminate CPU pauses in a single threaded environment, but it does not care about semantic consistency across multiple threads.

2.5.2 Visibility

Visibility refers to whether one thread changes the value of a shared variable, and whether other threads can immediately know the value of the change. For serial programs, visibility does not exist, because if we change the value of a variable in any one operation, the value of the variable can be read in subsequent operations, and the new value is changed. However, in A multi-threaded environment, this is not necessarily the case. As we have analyzed above, since all the operations of threads on shared variables are copied to their respective working memory for operation before they are written back to main memory, there may be A thread A that changes the value of shared variable I and does not write back to main memory. Another thread B the same Shared variable (I) in the main memory, but this time A thread working memory Shared variable I is not visible to thread B, this kind of working memory and the main memory synchronization delays the visibility problems caused by additional instructions rearrangement and compiler optimizations may also lead to the visibility problem, through the analysis of the front, We know that the rearrangement of both compiler and processor optimizations in a multithreaded environment does lead to problems with program rotation, and thus visibility problems.

2.5.3 Orders

Orderliness means that for single-threaded code, we always think of the code as being executed in order, which is fine in a single-threaded environment, which is true in a single-threaded environment, code is executed from the top down in the order it was encoded, and even if reordering occurs, Since the premise of all hardware optimizations is to comply with as-if-serial semantics, no matter how they are sorted, the execution result of a single-threaded program will not and cannot be affected. We call this sequential execution. On the contrary, for multi-threaded environment, out-of-order phenomenon may occur, because the program may be compiled into machine code instructions reorder phenomenon, rearranged instructions and the original instructions may not be in the same order. Want to know is that in a Java program, if in this thread, all operating orderly behavior, as if it is a multithreaded environment, observed in one thread to another thread, all operations are disorderly, within the first half of the sentence refers to a single thread to ensure serial semantic consistency of execution, after half sentence is refers to the instruction rearrangement phenomenon is synchronous with the main memory and working memory delay phenomenon.

2.6. How does JMM solve the above problem in Java?

Having really understood all of this, let’s look at the solutions that Java offers us, such as atomicity, for atomicity operations at the method level or block level, in addition to the atomicity of read and write operations on basic data types provided by the JVM itself. You can use the Synchronized keyword or an implementation class of the Lock Lock interface to ensure atomically executed programs. A detailed explanation of Synchronized (which guarantees that all three features do not prohibit instruction reordering) will be covered later. Visibility problems caused by the synchronization delay between working memory and main memory can be resolved with locking or the Volatile keyword, both of which make a variable that is modified by one thread immediately visible to other threads. The visibility and ordering problems caused by instruction reordering can be solved with the volatile keyword, as another function of volatile is to disallow reordering optimizations, which we’ll look at later. In addition to atomicity, visibility, and orderliness guaranteed by the sychronized and volatile keyword (which does not guarantee atomicity, but only disallows instruction rearrangement and visibility issues), The JMM also internally defines a set of happens-before principles to ensure atomicity, visibility, and orderliness between two operations in a multithreaded environment.

2.7 The happens-before principle in the Java Memory Model JMM

2.7.1 Thread interaction with memory during execution

However, before understanding the happens-before principle in JMM, it is necessary to have a simple understanding of the interaction with memory during thread execution. Java programs are actually programmed by the OS to schedule JVM “thread” execution, while the interaction with memory occurs during execution. There are eight types of memory interactions (virtual machine implementations must ensure that each operation is atomic and not separable, with exceptions on some platforms for variables of type double and long, such as load, store, read, and write) :

Lock: A variable applied to main memory, marking a variable as thread-exclusive;
Unlock: A variable in main memory that releases a locked variable so that it can be locked by another thread.
Read: Acts on a main memory variable. It transfers the value of a variable from main memory to the thread’s working memory for subsequent load actions.
Load: Variable applied to working memory, which puts a read operation variable from main memory into working memory;
Use: applies to a variable in working memory. It transfers the variable in working memory to the execution engine. This instruction is used whenever the virtual machine reaches a value that needs to be used for a variable.
Assign: applies to a variable in working memory by placing a value received from the execution engine into a copy of the variable in working memory;
Store: A variable in main memory. It transfers the value of a variable from working memory to main memory for use by subsequent writes.
Write: Acts on a variable in main memory. It places the value of the variable retrieved from working memory by the store operation into a variable in main memory

The JMM has the following rules for the use of these eight instructions:

1) Do not allow one of the READ and LOAD, STORE and WRITE operations to occur separately. If you use read, you must load, and if you use store, you must write.
2) A thread is not allowed to drop its recent assign operation. In other words, it must inform main memory of any change to its work variable.
Do not allow a thread to synchronize data from working memory to main memory without assign;
4) A new variable must be born in main memory. Working memory is not allowed to use an uninitialized variable directly. You must pass the assign and load operations before using the use and store operations.
5) Only one thread can lock a variable at a time. After locking several times, an unlock must be executed the same number of times.
6) If you lock a variable, you will empty all the working memory of this variable. Before the execution engine can use this variable, you must reload or assign the value of the variable.
7) If a variable is not locked, it cannot be unlocked. You cannot unlock a variable that has been locked by another thread.
8) Before you unlock a variable, it must be synchronized back into main memory; The JMM applies these eight rules and some special rules for volatile to determine which operations are thread-safe and which are thread-unsafe. But these rules are so complicated that it is difficult to analyze them directly in practice. So we don’t usually use the above rules to do the analysis. More often than not, the happens-before rule in the JMM is used for analysis.

2.7.2 The happens-before principle in JMM

If we need to use volatile or locking to solve these problems in multithreaded development, it would be very difficult to write a program, and locking essentially turns the parallel execution of multiple threads into serial execution, which would greatly affect the performance of the program. Is it really necessary? No, because JMM also provides us with the happens-before principle to assist in ensuring the atomicity, visibility and order of program execution. It is the basis to judge whether there is competition in data and whether the thread is safe. The happens-before principle is as follows:

First, the procedure sequence principle: namely in a thread must ensure semantic serialization, that is, in accordance with the code sequence execution.
An unlock operation must occur before a subsequent lock of the same lock. If a lock is added to a lock after the unlock action, the unlock action must follow the unlock action.
3. Volatile rule: The visibility of volatile variables is guaranteed by the fact that writes to volatile variables occur before reads. The simple idea is that a volatile variable forces a read from main memory every time it is accessed by a thread, and then forces the latest value to be flushed to main memory whenever the variable changes. Different threads can always see the latest value of the variable.
If thread A changes the value of A shared variable before it executes the start method of thread B, then when thread B executes the start method, the changes made by thread A to the shared variable will be visible to thread B.
5. Transitive priority rule: A precedes B, B precedes C, then A must precede C.
Thread termination rule: All operations of a Thread precede the end of the Thread. The function of Thread.join() method is to wait for the current execution of the Thread to terminate. Assuming that the shared variable has been modified before thread B terminates, the changes made by thread B to the shared variable will be visible to thread A after thread A successfully returns from thread B’s join method.
Interrupt () : The interrupt() method occurs when the interrupted Thread’s code detects an interruption, so if the Thread has been interrupted(Thread.interrupted()), it can be used to detect if the Thread has been interrupted.
Object finalization rule: Object constructor execution, end before finalize() method. The happens-before principle does not need to add any means to ensure that, as it is stipulated by the JMM, Java programs abide by the above eight principles by default. Now let’s re-understand how these eight principles determine whether a thread will have security problems through the previous cases:
```
int a = 0; boolean f = false; public void methodA(){ a = 1; f = true; } public void methodB(){if(f){int I = a + 1; }}
```
Similarly, if thread A calls methodA() on the instance object, and thread B calls methodB() on the instance object, and thread A calls methodB() on the instance object, and thread A calls methodB() on the instance object, and thread A calls methodB() on the instance object, and thread A calls methodB() on the instance object, what value of I does thread B read? Now, according to the eight rules, the program ordering principle is not appropriate because there are two threads calling at the same time. Neither method, methodA() nor methodB(), uses synchronization, and the locking rules are inappropriate. The volatile keyword is not used. The volatile variable principle does not apply. Thread start rules, thread termination rules, thread interrupt rules, object termination rules, transitivity, and this test case are also inappropriate. Thread A and Thread B are started at different times, but the execution result of Thread B is uncertain, which means that the above code does not fit any of the 8 principles, and does not use any synchronization means, so the operation above is not thread safe, so the value read by Thread B is also uncertain. The fix is as simple as adding a synchronization method (lock) to methodA() and methodB() methods or adding a volatile keyword to the shared variable to ensure that it is always visible to other threads after being modified by one thread.

3. The Volatile keyword

3.1 Visibility guaranteed by the Volatile keyword

Volatile is a lightweight synchronization tool provided by Java. It guarantees visibility and disallows reordering, but it does not guarantee atomicity. If your program must be atomicity, consider using atomic classes in the atomic package of JUC (more on that in a later chapter) or locking it. But let’s assume that if volatile is used to modify a shared variable, it ensures that changes made by one thread to the variable it modifies will always be visible to other threads, as follows:

volatile int i = 0;
public void add(){
      i++;
}

For the code above, any of our threads calling the add() method and doing I ++ on I are visible to other threads, but isn’t this code thread-safe? Existence. Why? Because i++ is not an atomic operation, i++ is actually a combination of three operations, from the main memory read value, the working memory +1 operation, the operation result is written back to the main memory operation, one of the three steps may be interrupted at any step. There is still a thread-safety issue (see case 1 above), but remember that if multiple threads call add(), there is still a thread-safety issue, and you need to use sync or lock or an atomic class to fix it. The volatile keyword only disallows instruction reordering and visibility. Let’s look at another scenario where volatile can be used to modify variables for thread-safety purposes, as follows:

volatile boolean flag; public void toTrue(){ flag = true; } public void methodA(){ while(! Flag){system.out. println(" I am false.... false..... false......." ); }}

Since the change to the value of the Boolean variable flag is an atomic operation, you can achieve thread-safety by modifying the variable flag with volatile so that it is immediately visible to other threads. So how does the JMM implement a volatile variable that is immediately visible to other threads? In fact, when a volatile variable is written, the JMM flasher the value of the shared variable from the thread’s working memory to main memory. When a volatile variable is read, the JMM invalids the thread’s working memory and the thread can only read the shared variable again from main memory. Volatile variables are made visible to other threads in this write-read way (though their memory semantics are implemented through a memory barrier, as explained later).

3.2 How does Volatile disable instruction reordering?

Another use of the volatile keyword is to prevent the compiler or processor from performing instruction reorder optimizations that would otherwise occur in out-of-order execution in a multithreaded environment. How does volatile enable reorder optimizations? Let’s start with a concept: Memory barriers. A memory barrier, also known as a memory fence, is a CPU instruction that has two functions: to ensure the order in which a particular operation is executed, and to ensure the memory visibility of certain variables (using this feature to make memory visibility volatile). Because both the compiler and the processor can perform instruction reorder optimization. Inserting a Memory Barrier between instructions tells the compiler and the CPU that no instruction can be reordered with the Memory Barrier, which means that inserting a Memory Barrier prevents instructions before and after the Memory Barrier from performing reorder optimizations. Another effect of Memory barriers is to force the cache of various CPUs to be flushed so that any thread on any CPU can read the latest version of the cache.

Barrier type	Order sample	instructions
LoadLoad Barriers	Load1; LoadLoad; Load2;	Ensure that loading of Load1 instruction data precedes loading of Load2 and all subsequent loading instructions.
StoreStore Barriers	Store1; StoreStore; Store2;	Ensure that the store of STORE1 data is visible to other processors (flushed to memory) and that the data that previously occurred in STORE2 and all subsequent store instructions are written.
LoadStore Barriers	Load1; LoadStore; Store2;	Ensure that data writes to STORE2 and all subsequent store instructions occur prior to the loading of Load1 instruction data.
StoreLoad Barriers	Store1; StoreLoad; Load2;	Ensure that the storage of STORE1 data is visible to other processors (flushed to memory) and that the previous load of data occurred on Load2 and all subsequent load instructions. Storeload Barriers causes all memory access instructions prior to this barrier (store and load) to be completed before any memory access instructions behind this barrier are executed.

The Java compiler inserts memory barrier instructions at the appropriate point in the generated instruction sequence to prohibit reordering of certain types of handlers, allowing the program to execute as we expect. The JMM classifies memory barrier instructions into four categories. Storeload Barriers is a “universal” barrier that has the effect of the other three Barriers at once. Most modern multiprocessors support this barrier (other types of barriers are not necessarily supported by all processors).

In short, volatile variables achieve their in-memory semantics through the memory barrier, namely visibility and disallowed rearrangement optimizations. Examples are as follows:

public class Singleton{
  private static Singleton singleton;
  private Singleton(){}
  public static Singleton getInstance(){
     if(singleton == null){
          synchronized(Singleton.class){
                if(singleton == null){
                      singleton = new Singleton();
               }
          }
      }
  }
}

The above code is a classic double-checked singleton mode code, this code in a single threaded environment is no problem, but if in a multi-threaded environment can appear thread safety issues. The reason for this is that the reference object of a singleton may not have been initialized until the first check when a thread reads a singleton that is not null. Because singleton= new singleton (); This can be done in the following three steps (pseudocode)

memory = allocate(); //1. Allocate object memory singleton(memory); //2. Initialize the object singleton = memory; // Set the memory address of the newly allocated singleton to the memory address. = null

Because there may be a reorder between steps 1 and 2, as follows:

memory = allocate(); //1. Singleton = memory; // Set the memory address of the newly allocated singleton to the memory address. = null singleton(memory); // Initialize the object

Since there is no data dependency between steps 2 and 3, and the execution result of the program does not change in a single thread either before or after the rearrangement, this rearrangement optimization is allowed. However, instruction rearrangement only ensures consistent execution of serial semantics (single thread), but does not care about semantic consistency across multiple threads. So when a thread accesses a Singleton that is not NULL, it creates a thread-safety issue because the instance of the Singleton may not have been initialized. To solve this problem, we simply use volatile to prevent singleton variables from being optimized for reordering executed instructions.

private volatile static Singleton singleton;

Four,

If you read this article carefully, I believe you have a clear understanding of the Java memory model JMM. In fact, this article is the first threshold for us to explore concurrent programming in Java. I will continue to publish articles related to concurrency topics in the future. If you have any other comments or questions about any of the points in the article, please feel free to share them in the comments section. Thank you!

V. Reference materials and books

“Deep Understanding of JVM Virtual Machines”
The Beauty of Concurrent Programming in Java
High Concurrency Programming in Java
“The core technology of website architecture with hundred million traffic”
Java Concurrency Programming in Action