1. What is the memory model

The concept of a memory model. We can think of it as a process abstraction of read and write access to a particular memory or cache under a particular operating protocol. Physical computers with different architectures can have different memory models, and the JVM also has its own memory model.

As we know, unlike languages like C, the JVM is the platform on which Java bytecode runs. “Write once, run everywhere” is at the heart of what the Java language is all about. The JVM is responsible for much of the management and scheduling, most crucially by masking the details of the various operating systems and defining its own model. Enable your own bytecode programs to execute in your own virtual machine.

2. Main memory and working memory

Those of you who have taken principles of Computer Composition should know that computers have several characteristics:

  1. The CPU cannot read the value from the hard disk, but can only access the memory after loading the value to the memory.
  2. The CPU cannot calculate values in the memory. Instead, it needs to read and write memory, load data into registers, and calculate values through components such as the ALU inside the CPU.
  3. To alleviate the speed mismatch between CPU registers and memory, we introduced a new widget:Cache (Cahce)The cache is usually integrated into the CPU. When we read or write to memory, we keep a copy in the cache according to:Locality principleThe next time memory is accessed, if the memory address is still the same as the copy’s memory address, the faster cache is accessed directly.

In fact, the above points, highlights the computer’s entire storage system: CPU register ->CPU cache -> memory -> flash/hard disk. The logic behind this is very complex, but the operating system or the hardware logic behind it has been processed for us, and we don’t feel it when we write code or use a computer. However, the multi-copy storage mode also brings the problem of read and write consistency.

Anyway, the idea of the main memory of the JVM, that’s the memory in our computers; Working memory corresponds to registers and caches in the CPU. Due to the existence of multi-core cpus, there will also be multiple different working memory, core and working memory corresponding. Which raises a question:

【 QUESTION 1】 Suppose that core A modifies the value of X when operating on data X in main memory. If core B also expects to modify the value of X, assuming that A finishes modifying it first, can B know that A has modified it? This problem is called the visibility problem. In fact, visibility problems also exist in the physical structure, for example, whether the CPU should modify the memory value when changing the Cache value, whether to modify the write value or to modify the obsolete value, different implementation methods have different processing methods. For example, if you want to change out, you need to add a marker bit to mark whether this part of memory has been modified.

Java’s memory model specifies that all variables are stored in main memory. Working memory holds a copy of the main memory of variables used by the thread. Working memory does not exist, but is an abstract concept that covers caches, write buffers, registers, and other hardware and compiler optimizations.

The Java memory model dictates that all operations on variables by a thread must be done in working memory, and variables cannot be read or written directly from main memory. Different threads also cannot directly access variables in each other’s working memory. However, the transfer of variable values between threads needs main memory to complete. Due to the existence of [Problem 1], the Java memory model also adopts a series of data synchronization protocols and rules to ensure data consistency.

3. Order rearrangement

Leaving the memory model aside, let’s look at instruction rearrangement.

Instruction rebeats are instructions that may be reordered by the compiler and CPU during the execution of a program for performance reasons. Also in the course “Principles of Computer Composition”, we know that the execution of an instruction can be divided into many cycles. The common ones are as follows:Take refers to,decoding,performThere are different instructions for exampleinterrupt,Between the address. There’s another important concept,Instruction pipeline. If all instructions are executed sequentially, assuming that all instructions have 3 cycles (value, decode and execute), then 3000 cycle times of 3*1000 are needed to execute 1000 instructions. However,Take refers toBasically access memory,decodingMainly CPU internal decoding mechanism processing, andperformIn fact, it is the work of the three. There is no need for serial execution. It can be executed in paralleldecodingCommand 2 can enterTake refers toTo form a pipeline of instructions.

As shown in the figure above, the execution of instruction within 8 cycles. The execution of instruction pipeline has completed the execution of instruction 6, while the other group has just completed the decoding stage of instruction 3. To execute 1000 instructions, only: 2+1000 cycle time is required. This is two-thirds less than the alternative without an instruction pipeline.

However, there is a problem of result dependence between instructions. For example, instruction 1 obtains the result X through calculation, while instruction 2 depends on the result of X, so instruction 2 has to wait for instruction 1 to store the instruction back into memory after finger fetching and decoding, as can be seen from Reference Source 3 for details.

This generated waiting period will consume a certain amount of time, and we can reasonably advance the following instructions without violating the data dependency and the overall logic, which can reduce the running clock cycle to a certain extent. This is the order reordering.

Instruction reordering is fine in a single-threaded environment, but in a multi-threaded environment, where cpus are not visible to each other, it can have unpredictable consequences if logic is dependent on one another. See [4.3]

4. Memory interaction

Java memory interaction has three major features: atomicity, visibility, and orderliness. Ultimately, these three features are for the consistency of multithreaded data, so that the program can run as scheduled under the condition of multithreaded concurrency and instruction rearrangement optimization.

4.1 atomic

Atomicity: One or more operations either all (without interruption by any factor) or none at all.

Even when multiple threads are working together, once an operation has started, it cannot be disturbed by other threads. In Java, two high-level bytecode instructions, Monitorenter and Monitorexit, are provided to ensure atomicity. The Java equivalent for these two bytecodes is synchronized.

Therefore, synchronized can be used in Java to ensure that operations within methods and code blocks are atomic.

4.2 the visibility

Visibility: When multiple threads access the same variable, if one thread changes the value of the variable, other threads can immediately see the changed value.

The Java memory model relies on main memory as a transfer medium by **” synchronizing variable changes with new values back to main memory and refreshing variable values from main memory before variables are read “**.

Java implements multithreaded visibility in the following ways:

  • volatile
  • synchronized
  • final

4.3 order

Orderliness distinguishes between two scenarios: intra-thread and inter-thread:

  • In-thread: Inside a single thread, instructions are executed in an as-if-serial fashion that has been used in sequential programming languages.
  • Interthread: When this thread “watches” other threads concurrently execute code that is out of sync, any code may cross because of instruction reordering optimization.

The only constraint that matters is that for synchronized methods, operations on synchronized blocks (the synchronized keyword modifier) and volatile fields remain relatively orderly.

How does instruction reordering affect order between multiple threads?

When there is a control dependency in the code, it will affect the parallelism of instruction sequence execution. To do this, compilers and processors employ a guess execution to overcome the effect of controlling correlation on parallelism. For example, thread A initializes the value of isAccept to false and needs to evaluate it to determine what isAccept is. Thread B reads the value of isAccept and acts on it. But when thread B reads this value, thread B thinks that it can save time by taking the value earlier. So we should ensure visibility of shared variables.

5. Memory barriers

How do you ensure order and visibility of the underlying operations in Java? The memory barrier can be passed. A memory barrier is an instruction inserted between two CPU instructions to prevent processor instructions from being reordered (like a barrier). In addition, in order to achieve the effect of a barrier, it also causes the processor to write the main memory value to the cache before writing or reading the value, clearing the invalid queue, thus ensuring visibility. See source 1 for details

6. Volatile

Volatile is a lightweight synchronization mechanism provided by the JVM. Visibility of variables in multiple threads is guaranteed.

Volatile variables have two properties:

  1. Ensure that variables are visible to all threads: When a thread changes the value of a volatile variable, the new value is immediately known to other threads. This cannot be done with ordinary variables, whose values are passed between threads through main memory.
  2. Disallow command reordering.

If a field is Volatile, the Java memory model inserts a write barrier instruction after a write operation and a read barrier instruction before a read operation. This means that if you write to a volatile, you must know:

  1. Once you’re done writing, any thread that accesses the field will get the latest value.
  2. Before you write, everything that happened before is guaranteed to have happened, and any updated values are visible because the memory barrier flusher all the previously written values to the cache.

The JVM flushers the modified value to all threads’ caches so that all threads get the new value, but if another CPU core changes the value before then, the modified value will be lost and overwritten. There are four steps from Load to store to memory barrier. Only the last step is to ensure that the variable is visible to all threads, so the Volatile keyword does not guarantee atomicity.

Volatile forbids instruction reordering and naturally guarantees orderliness, but Volatile variables do not guarantee the safety of concurrent operations. If you think that the Volatile keyword has the same effect as Sychronized, but only on variables, then you are wrong. If we have multiple threads concurrently adding operations to a Volatile variable X, for example, each thread executes a for loop, 5,000 times on X++, for six threads. The result should be 5000 * 6 = 30000, but the actual result may be less than 30000.

The reason is simple: two threads get X to 1000 at the same time, then both threads change X to 1001, write to main memory, and then flush working memory. However, when thread 1 performs the memory barrier operation, thread 2 performs the same operation as thread 1, which increases X from 1000 to 1001, so the end result is naturally less.

Thus, the Volatile keyword guarantees visibility and order, but not atomicity of operations on variables.

The third thread-safe singleton pattern

Thread-safe singleton patterns using sychronized locking and static inner classes are common, but based on the nature of volatile, we can implement another thread-safe singleton pattern based on volatile and Sychonrized block locking.

For the sychronized method, there is an obvious disadvantage. The granularity of the lock is too large, and the lock will lock the whole sychronized method, so that other threads cannot get the lock and enter the method body.

Class Singleton {private static Singleton instance = null; // In normal singletons, contention between threads for getInstance() is reduced due to sychronized locking at method entry. public synchronized static Singleton getInstance() { if (instance == null) { instance = new Singleton(); } return instance; }}Copy the code

However, in fact, the ultimate purpose of locking is not to repeatedly rebuild the instance, we only need to determine whether to rebuild the instance in the first getInstance. The second and third accesses are completely unnecessary, so we actually don’t need to do the mutex on the second and third accesses. We’ll change the method a little bit:

class Singleton2 { private static Singleton2 instance = null; public static Singleton2 getInstance() { if(instance == null){ synchronized (Singleton2.class){ if (instance == null) { instance = new Singleton2(); } } } return instance; }}Copy the code

If the method has been created after the first time, just check if it has been created at if. If it has been created, return the instance directly, so that you don’t waste time stuck in sychronized at the method. Can speed up the access efficiency. However, as we know from above, there is not only one instance in memory, there is one instance in main memory, but there is also a copy of the main instance in working memory between different threads. The Java memory model ensures data consistency through a series of data synchronization protocols and rules. So let’s say thread A modifies instance and initializes it, thread B doesn’t know about it immediately, so it’s also in the process of creating it, so singletons fail because of the lag in object updates. So we need A way to let thread A know immediately when thread A updates the value. Combine this with the volatile property described above: “Once you have written, any thread accessing this field will get the latest value.” We solve this invisible problem if we declare instance as volatile:

Class SingletonDCL {private volatile static SingletonDCL instance = null; public static SingletonDCL getInstance() { if (instance == null) { synchronized (SingletonDCL.class) { if (instance == null) { instance = new SingletonDCL(); } } } return instance; }}Copy the code

This is the third thread-safe slacker singleton pattern: double-checked locks (DCL).

Reference source
  1. Java memory model
  2. What about main memory and working memory?
  3. Java memory model and instruction rearrangement
  4. Why does volatile guarantee visibility? (Memory barrier)