The memory model of modern computers

In modern computers, the CPU’s instruction speed is much faster than the memory’s access speed. Because there is a gap of several orders of magnitude between the CPU and memory, modern computer systems add a layer of Cache, which reads and writes as fast as possible as the CPU’s operation speed, to serve as a buffer between the memory and the processor. The CPU performs an operation on a copy of the data in memory and synchronizes the copy back to memory from the cache when the operation is complete, so that the processor does not have to wait for slow reads and writes.

When the CPU wants to read a piece of data, it first looks for it from the level 1 cache. If it doesn’t find it, it looks for it from the level 2 cache. If it still doesn’t find it, it looks for it from the level 3 cache or from memory.

The creation of processes and threads

Only at the beginning of the CPU run a run a program, excess CPU computing power is complete at this moment, most of the time after receiving the task is waiting for IO, so you need to run any more programs simultaneously, the method of such a program for when you can switch to another program continue to execute instruction, improve the CPU usage, Based on this idea, processes and threads are generated

With a process, a memory will be divided into different areas to each process to manage, when a process I/O block can switch to another process to continue executing instructions; In order to reasonably and fairly allocate CPU to each process, THE CPU divides its time into several time unit segments. Each process executes one time unit segment and then switches to another process to execute instructions. This is the concept of time slice. With processes we can run multiple programs at the same time.

Because between processes are independent of memory space, process between switch need to switch the memory mapping address, and a process to create all the threads are sharing the same piece of memory space, this thread switching costs much smaller than the process of switching cost, so the operating system is based on a more lightweight for thread scheduling

Principles of modern computer hardware structure

In the operation of the computer, the first instruction is taken out from memory, decoded by the controller, according to the requirements of the instruction, the data is taken out from memory to carry out the specified logical operation, and then the result is written back to memory according to the memory address, and then the second instruction is taken out, and the sequence of traversal execution goes on.

Root cause problems caused by concurrency

CPU and cache consistency issues (visibility)

  • Single-core single-thread: The CPU core cache can only be accessed by one thread. Thread exclusives do not cause access conflicts
  • Single-threaded: Multiple threads access a block of memory data at the same time, the CPU copies data from memory to the cache area, different threads access high-speed buffer data physical address is the same, will be mapped to the same cache problem, even if the thread cache will not failure, only one thread at any time in the execution, not access conflict problem
  • Multi-core and multi-threading: Multiple threads access the same shared memory, and the threads are running in different CPU cores. Each core keeps a copy of the shared memory in various high-speed buffers. When the threads operate on the data, they are actually operating on the copies of the data

A cache buffer is added between the CPU and memory. In multithreading, the cache data may be inconsistent. That is, the content of the data in each copy of the CPU is inconsistent

CPU multi-core cache architecture

To solve this problem, each processor needs to follow a protocol when accessing the cache, and then operate according to the protocol when reading and writing. This protocol is called MESI (Cache Consistency Protocol).

state describe Listening task
M change (Modified) This Cache line is valid. The data has been modified and is inconsistent with the data in memory. The data exists only in this Cache. The cache line must always listen for all attempts to read the cache line relative to main storage. This operation must be delayed before the cache writes the cache line back to main storage and changes the state to S (shared).
E Exclusive (Exclusive) This Cache line is valid. The data is consistent with the data in the memory and only exists in the Cache. The cache line must also listen for other caches to read the cache line in main storage. Once this happens, the cache line needs to be in the S (shared) state.
S sharing (Shared) This Cache line is valid, and the data is consistent with the in-memory data. The data exists in many caches. The cache line must also listen for requests from other caches to invalidate or take possession of the cache line and make the cache line Invalid.
I Invalid (Invalid) The Cache line is invalid. There is no

CPU thread switching causes atomicity issues (atomicity)

Atomicity: The property of treating one or more operations as a whole and not being interrupted during execution is called atomicity.

In order to maximize the utilization of the CPU, the CPU uses time slices to switch between threads. This can lead to atomicity problems during thread switching.

Order reordering problem (orderliness)

The essence of processes and threads is to increase the number of parallel tasks to improve CPU efficiency, and the essence of caching is to increase CPU utilization by reducing I/O time. The original intention of CPU instruction optimization is to improve the efficiency of CPU instruction execution by adjusting the execution order of CPU instructions or asynchronous operations.

In order to make the processor within the computing unit can be make full use of as far as possible, the processor may out-of-order execution of input code are optimized, the processor can be carried after calculating the out-of-order restructuring as a result, optimization principle is to ensure that does not affect the single thread execution after reorder as a result, the general logic of the reorder is put the CPU time consuming instruction in the first priority, Other instructions are then executed in the spare time that these instructions are executed. A similar instruction reordering optimization is available in the Just-in-time compiler of the Java Virtual Machine.

The memory model

To ensure correct shared memory usage (atomicity, order, visibility), the memory model defines the specification of multi-threaded read and write operations in shared memory. These rules are used to regulate the reading and writing of main memory, so as to ensure the correct execution of instructions. It solves the memory access problems caused by multi-level CPU cache, processor optimization and instruction rearrangement, and ensures the consistency, atomicity and order in concurrent scenarios.

Java Memory Model (JMM)

The JMM is a specification designed to address the problems caused by inconsistent local memory data, compiler reordering of code instructions, processor execution of code out of order, and CPU thread switching when multiple threads communicate through shared memory.

The JMM is a mechanism that conforms to the memory model specification, shields the access differences of various hardware and operating systems, and ensures the consistent effect of Java programs accessing memory on various platforms.

Java thread memory model provides all of the variables are stored in main memory, each thread has its own working memory, thread’s working memory holds the variables is used in the thread of the main memory copy of a copy of the thread of variables all the operations must be done in working memory, and cannot be directly read and write the main memory. Different threads cannot directly access variables in each other’s working memory. The transfer of variables between threads requires data synchronization between their own working memory and main memory.