This article is participating in “Java Theme Month – Java Development Practice”, for details: juejin.cn/post/696826…

This is the sixth day of my participation in Gwen Challenge

The introduction of the cache

In modern computers, the instruction speed of CPU is far faster than the access speed of memory. Due to the difference of several orders of magnitude between CPU and memory, the modern computer system adds a layer of Cache, whose read and write speed is as close to CPU’s operation speed as possible, as the buffer between memory and processor. The CPU performs an operation on a copy of the data in memory. When the operation is complete, the copy is synchronized from the cache back to memory, so that the processor does not have to wait for slow memory reads and writes.

Read cache

When the CPU wants to read a piece of data, it first looks for it in the level-1 cache, then in the level-2 cache if it doesn’t find it, and then in the level-3 cache or memory if it still doesn’t find it.

SMP structure

In the operation of the computer, first take out the first instruction from the memory, through the controller decoding, according to the requirements of the instruction to take out the data from the memory to make logical operation, and then according to the memory address to write the results back to the memory, and then take out the second instruction, traverse execution.

SMP logical structure (CPU and cache consistency)

  • Single thread: The CPU core cache is accessed by only one thread. Cache exclusivity, no access conflicts.
  • Single-core CPU, multithreading: Process of multiple threads can access Shared data in process at the same time, the CPU will be loaded into a piece of memory cache, different threads when they visit the same physical address, will be mapped to the same cache location, so even if the thread switching, the cache still not failure, when only one thread in execution, not access conflict problem.
  • Multi-core CPU, multi-threading: Each core has at least one L1 cache. If multiple threads access a shared memory in a process and execute on different cores, each core will keep a shared memory buffer in its Cache. Since multiple cores can be parallel, multiple threads may write to their caches at the same time. Data inconsistency between caches is a problem (a root cause of concurrency).

CPU Multi-core Cache Architecture (IMPLEMENTATION of MESI)

To solve this problem, it is necessary for each processor to follow some protocol when accessing the cache, and to operate according to the protocol when reading and writing. This protocol is called MESI (Cache Consistency Protocol).

  1. Memory access logic varies between hardware manufacturers and operating systems, resulting in code that works well and is thread-safe on one system, but has all sorts of problems on another.
  2. This is because different processors have differences in processor optimization and instruction rearrangement, resulting in the same code, after different processor optimization and instruction rearrangement, the final execution results may be different, which is unacceptable to us.
  3. JMM came into being, basically to solve the concurrent environment, ensure the security of data, to meet the visibility, atomicity, order of the scene.
state describe Listening task
M change (Modified) This Cache line is valid. The data is modified and inconsistent with the data in memory. The data only exists in this Cache. The cached row must always listen for all attempts to read the cached row relative to main memory, which must be deferred until the cache writes the cached row back to main memory and changes the state to S (shared).
E Exclusive (Exclusive) The Cache line is valid and the data is consistent with the data in memory. The data exists only in the local Cache. The cached row must also listen for other caches reading the cached row from main memory. Once this happens, the cached row needs to become an S (shared) state.
S sharing (Shared) This Cache line is valid, and the data is consistent with the data in memory, which exists in many caches. The cache line must also listen for requests from other caches to invalidate or monopolize the cache line and make it Invalid.
I Invalid (Invalid) The Cache line is invalid. There is no

CPU thread switching causes atomicity issues (atomicity)

Atomicity: The property that an operation or operations are considered as a whole and cannot be interrupted during execution is called atomicity.

  • In order to maximize CPU utilization, the CPU adopts the time-slice mode, switching thread execution. Switching threads can cause atomicity problems

Instruction reordering problem (orderliness)

The essence of processes and threads is to increase CPU execution efficiency by increasing the number of parallel/concurrent tasks. The essence of caching is to increase CPU utilization by reducing I/O time.


  1. The original intention of CPU instruction optimization is to improve the efficiency of CPU instruction execution by adjusting the execution sequence of CPU instructions or asynchronous operations.

  2. In order to maximize the utilization of the arithmetic unit in the processor, the processor may optimize the out-of-order execution of the input code. The processor will reorganize the out-of-order execution results after the computation. The optimization principle is to ensure that the reordering does not affect the single-thread execution results (As If Serial).

  3. The general logic of reordering is to prioritize the CPU’s most time-consuming instructions, and then use the free time to execute other instructions. ** The Java virtual machine has a similar instruction reordering optimization in the just-in-time compiler.

(Compile operation – (semantic optimization) instruction reordering -> processor-level instruction reordering -> Memory optimization instruction reordering (memory analysis optimization))


The memory model

In order to ensure the correctness (atomicity, orderliness, visibility) of shared memory usage, the memory model defines the specification of multithreaded read and write operations in shared memory. Through these rules to regulate the main memory read and write, so as to ensure the correct execution of instructions.

It solves the memory access problems caused by multi-level CPU cache, processor optimization and instruction rearrangement, and ensures the consistency, atomicity and order in concurrent scenarios.

Java Memory Model (JMM)

The JMM is a specification designed to solve problems caused by inconsistent working memory data, compiler reordering of code instructions, processor out-of-order execution of code, and CPU thread switching when multiple threads communicate through shared memory.

  • JMM has matured since the release of JSR-133 from Java 5;

JMM complies with the specification of the memory model. It shields the access differences between different hardware and operating systems, and ensures that Java programs can access memory consistently on different platforms. (As standard as the JVM).

The Java thread memory model specifies:

  1. All variables are stored in the main memory, and each thread has its own working memory. The working memory of the thread stores a copy of the main memory of the variables used in the thread. All operations on variables must be carried out in the working memory, rather than directly reading and writing the main memory.

  2. Different threads cannot directly access variables in each other’s working memory. The transfer of variables between threads requires data synchronization between their own working memory and main memory, and also specifies how and when to perform data synchronization.

Memory interoperation

There are eight types of memory interaction operations, and virtual machine implementations must ensure that each operation is atomic and non-separable:

  • Lock: A variable that acts on main memory, marking a variable as thread-exclusive
  • Unlock: A variable that acts on main memory. It releases a locked variable so that it can be locked by another thread
  • Read: Acts on a main memory variable that transfers the value of a variable from main memory to the thread’s working memory for subsequent load action
  • Load: a variable operating on working memory that places a read operation on a variable from main memory into working memory.
  • Use: Applies to variables in working memory. It transfers variables in working memory to the execution engine. This instruction is used whenever the virtual machine reaches a value that requires the variable to be used
  • Assign: A variable applied to working memory that places a value received from the execution engine into a copy of the variable in working memory
  • Store: Acts on a variable in main memory. It transfers a value from a variable in working memory to main memory for subsequent writes
  • Write: A variable in main memory that puts the value of a variable in main memory from the store operation in working memory

The JMM lays down the following rules for the use of these eight directives:

  1. One of the read and load, store and write operations is not allowed alone, i.e. read must load and store must write

  2. A thread is not allowed to discard its most recent assign operation. That is, it must inform main memory when the working variable’s data changes. A thread is not allowed to synchronize unassigned data from working memory back to main memory.

  3. A new variable must be created in main memory. Working memory is not allowed to use an uninitialized variable directly. Before performing use and store operations on variables, load and assign operations must be performed.

  4. Only one thread can lock a variable at a time. You must unlock the device for the same number of times.

  5. If a variable is locked, all of its values in working memory will be emptied. Before the execution engine can use the variable, the variable must be reloaded or assigned to initialize its value.

  6. You cannot unlock a variable if it is not locked. You cannot unlock a variable that is locked by another thread.

  7. Before an UNLOCK operation can be performed on a variable, it must be synchronized back to main memory.

Characteristics of the

visibility

When multiple threads access the same variable, if one thread changes the value of the variable, other threads can immediately see the changed value.

Java visibility, with the use of the volatile keyword, is that variables that are modified are immediately synchronized to main memory (which is also the response time), mainly by memory barriers;

The Java keywords Synchronized and Final are also visible;

atomic

  • To be considered as a unit in an operation and either to be performed together or not at all;

  • To ensure atomicity, Java provides two advanced bytecode instructions Monitorenter and Monitorexit, Synchronized;

order

  • The order in which the program is executed is the order in which the code is executed;

  • Synchronized and volatile can be used to ensure order between multiple threads, but they are implemented in different ways. Volatile prevents instruction rearrangement, while synchronized ensures that only one thread is allowed to operate at a time.

At this point, it seems that Synchronized seems to satisfy all three characteristics, which is why it is used frequently, but it is performance critical, and while the compiler provides many lock optimization techniques, it is not recommended to overuse Synchronized.

Happen – Before the rules

To analyze whether a concurrent program is safe or not, most of the time, it depends on the Happen-Before principle.

That is, when operation A takes place before operation B, the influence of operation A can be observed by OPERATION B when operation B takes place. “Influence” includes modifying the value of shared variables in memory, sending messages, calling methods, etc.

  • Program Order rules: In a thread, the execution rules of a Program are the same as the writing rules of the Program, and are executed from top to bottom.

  • Monitor Lock Rule: An Unlock operation must precede the next Lock operation. This has to be the same lock. Similarly, it can be considered that when synchronized synchronizes a lock, the code executed in the lock first is completely visible to the subsequent thread that synchronizes the lock.

  • Volatile Variable rules: For a volatile Variable, the first write must precede the subsequent read

  • Thread Start Rule: The Start () method of a Thread object precedes any action of the Thread

  • Thread Termination Rule: The Termination detection operations (such as thread.join () and thread.isalive ()) on the Thread object must be performed later than all operations in the Thread

  • Interruption Rule: A call to the Thread Interruption () that detects the occurrence of thread.interrupted () before the invoked Thread does

  • Object termination Finalizer Rule: An object’s initialization method executes the Finalizer() method before one

  • Transitivity: If operation A precedes operation B and operation B precedes operation C, operation A precedes operation C