The author’s knowledge and understanding are limited, mistakes and deficiencies are welcome to point out.

reading

  • An in-depth introduction to the concepts of concurrent programming models
  • An in-depth tutorial on sequential consistency in concurrent programming models
  • An in-depth tutorial on the three concurrent features of the concurrent programming model

The concept of concurrent programming model was analyzed in detail in the previous article. This article will take a closer look at reordering.

Reorder

Reordering is divided into:

  • Compiler optimized reordering
  • Instruction – level parallel reordering
  • Memory system reordering

The compiler can reorder the execution of program statements when compiling source code, as long as there is no impact on the semantics of single-threaded execution (in effect, the result of single-threaded execution). This is compiler optimized reordering.

The processor can optimize the order in which program instructions are executed, usually in parallel, if there is no effect on program execution. So, how do you know that changing the way an instruction is executed will not affect the results of the program? Depending on whether there are data dependencies between instructions. , instructions with data dependence cannot change the execution order, and those without can be modified without affecting the program execution result. This is reordering instruction level parallelism. What data dependencies are, we’ll see below.

Next, what is memory system reordering? Let’s start by learning about the processor cache.

As we know, the processor uses a three-level cache strategy to speed up data reading. Next, let’s analyze what the CPU level 3 cache is.

  • Definition of CPU cache

A temporary data exchange between CPU and memory. It was developed to solve the mismatch between CPU processing speed and memory read and write speed – cache speed is much faster than memory speed.

First, let’s look at where the CPU level 3 cache is in the computer’s storage system:

Computer storage system:

CPU level 3 cache:

Noun explanation

  • L1-d cache: Level 1 data cache
  • L1-i cache: Level 1 instruction cache
  • L2 cache: level 2 cache
  • L3 cache: Level 3 cache

Execution speed: L1 > L2 > L3

As we can see, the CPU cache is directly exchanging data with main memory. Because the CPU cache execution speed is much higher than the main memory, we can cache the commonly used data in the CPU cache, greatly improving the execution speed of instructions. The details of how CPU caching works are not covered here, but those who are interested can refer to the relevant books.

So let’s go back to reordering. Here we can analyze the following memory system reordering.

In the CPU tier 3 cache system above, there are two types of cache consistency mechanisms.

  • Bus locking mechanism

If the CPU reads a piece of data, it locks it across the bus. No other thread can read or write to this data. In fact, this implementation is very problematic, in the multi-threaded environment, if there are multiple threads reading and writing the same critical volume, then this locking efficiency is very low.

  • MESI protocol (a cache consistency protocol)

MESI How it works can be read in this article MESI- CPU Cache Consistency Protocol.

In an in-memory system, some of the program instructions executed by the processor are cached by the CPU and some are retrieved from main memory. Then, because of the processor’s use of caching and read/write buffers, the load and store operations may appear to be out of order. (It can be understood that some instruction read and write operations do not necessarily follow the sequence requirements of the program to the processor, because the operation of obtaining instructions from the cache is much faster than the operation of obtaining instructions from main memory). This is what the authors understand as memory system reordering.

Data dependency

If two operations access the same variable, and one of them is a write operation, there is a data dependency between the two operations. There are three types of data dependencies:

  • Writing after reading

After writing a variable, read the variable. For example, a=1, b=a

  • Write after

Write a variable, and then write that variable. For example, a = 1, a = 2

  • Got to write

After reading a variable, write the variable. For example, a=b, b=1

In the above three cases, the result of the program’s execution will be changed by reordering the execution order of the two operations. As mentioned earlier, the compiler and processor may reorder operations. The compiler and processor adhere to data dependencies when reordering, and do not change the order in which two operations with data dependencies are executed. Note that data dependencies refer only to sequences of instructions executed in a single processor and operations performed in a single thread. Data dependencies between processors and between threads are not considered by compilers and processors. Data dependency does not guarantee thread-safety.

The as – if – serial semantics

As -if-serial

No matter how much reordering is done (by the compiler and processor to improve parallelism), the execution result of a (single-threaded) program cannot be changed. The compiler, runtime, and processor must comply with the AS-IF-Serial semantics.

To comply with the as-if-serial semantics, the compiler and processor do not reorder operations that have data dependencies because such reordering changes the execution result. However, if there are no data dependencies between the operations, they can be reordered by the compiler and processor.

conclusion

In this article, we analyze the three methods and basic principles of reordering, as well as the necessity of data dependency and the use of as-IF-serial semantics. That’s all you need to know, you don’t have to go into reordering.

In short, reordering is to speed up the execution of program instructions without affecting the execution results. Reordering only has no effect on the execution result of program instructions in a single thread. However, in multi-threaded mode, even if there is no data-dependent instruction, the execution result of the program may be affected after reordering.