0 foreword

In many cases, accessing a program variable (object instance field, class static field, and array element) may be executed in a different order than the order specified by program semantics. Specific cases are as follows:

  1. The compiler is free to change the order of instructions in the name of optimization;
  2. In certain circumstances, the processor may execute instructions out of order;
  3. Data may be moved in registers, processor buffers, and main memory in a different order than the order specified by the program;

For example, if a thread writes values to field A and then to field B, and the values of B do not depend on the values of A, the processor is free to adjust their execution order, and the buffer can flush the values of B to main memory before A. There are many potential sources of reordering, such as compilers, JIts, and buffers.

So, at least compile time and run time are required to go from Java source code to a program that can be executed by a machine (or virtual machine). During both periods, reordering falls into two categories: compiler reordering and handler reordering (out of order), corresponding to compile-time and run-time environments respectively. Because of reordering, the actual order in which instructions are executed is not the order seen in the source code.

1 Compiler reorder

The compiler can rearrange the execution order of statements without changing the semantics of the single-threaded program, reduce the number of register reading and storage as much as possible, and fully reuse the stored values of registers without changing the semantics of the program.

Suppose the first instruction evaluates A value assigned to variable A and places it in A register, the second instruction has nothing to do with A but occupies A register (assuming it will occupy the same register as A), and the third instruction uses the value of A and has nothing to do with the second instruction. Then, according to the sequential consistency model, A is put into the register after the execution of the first instruction, A no longer exists when the second instruction is executed, and A is read into the register again when the third instruction is executed. In this process, the value of A does not change. The compiler usually swaps the position of the second and third instructions so that A is in A register when the first instruction ends and can then be read directly from the register, reducing the overhead of repeated reads.

Another compiler optimization: when a variable is read in a loop, the compiler first reads the variable into a register for faster access. When the variable is fetched, it is fetched directly from the register, not from memory. This reduces unnecessary memory access. But while improving efficiency, it also introduces new problems. If another thread changes the value of a variable in memory, it is possible that the loop will not end because the value of the variable in the register has not changed. The compiler optimizes the code, which can improve the efficiency of the program, but may also lead to wrong results. So programmers need to prevent the compiler from making the wrong optimizations.

2 Handler reorder

2.1 Instruction parallel reordering

The compiler and processor may reorder operations, but to comply with data dependencies, the compiler and processor do not change the order in which the two operations that have data dependencies are executed. If two operations access the same variable, and one of them is a write operation, there is a data dependency between the two operations. There are three types of data dependencies:

The name of the Code sample instructions
Writing after reading a = 1; b = a; After writing a variable, read the position.
Write after a = 1; a = 2; You write a variable, and then you write that variable.
Got to write a = b; b = 1; After reading a variable, write the variable.

In the above three cases, the result of the program’s execution will be changed by reordering the execution order of the two operations. Operations with direct dependencies like this are not reordered. Note that dependencies are only within a single thread.

For example:

class Demo {
    int a = 0;
    boolean flag = false;

    public void write(a) {
        a = 1; / / 1
        flag = true; / / 2
    }

    public void read(a) {
        if (flag) { / / 3
            int i = a * a; / / 4}}}Copy the code

Since operations 1 and 2 have no data dependencies, the compiler and processor can reorder these two operations; Operations 3 and 4 have no data dependencies, and the compiler and processor can also reorder these two operations.

  1. What might happen when operations 1 and 2 are reordered?

    As shown in the figure above, operations 1 and 2 are reordered. When the program executes, thread A first writes the flag variable, which thread B then reads. Since the condition is judged true, thread B will read variable A. At this point, variable A has not been written by thread A at all, where the semantics of the multithreaded program are broken by reordering!

  2. What might happen when operations 3 and 4 are reordered? (With this reordering, control dependencies can be explained incidentally.)

    In the program, operations 3 and 4 have control dependencies. When there is a control dependency in the code, it will affect the parallelism of instruction sequence execution. To do this, compilers and processors employ a guess execution to overcome the effect of controlling correlation on parallelism. Take the processor’s guess execution as an example:

    The processor executing thread B can read and calculate A * A ahead of time, then temporarily save the results to a hardware cache called the Reorder Buffer ROB. When the condition for the following operation 3 is judged to be true, the result is written to variable I.

    As we can see from the figure, the guess execution essentially reorders operations 3 and 4. Reordering breaks the semantics of multithreaded programs here!

    In a single-threaded program, reordering operations with control-dependencies does not change the execution result (which is why the as-if-serial semantics allow reordering operations with control-dependencies).

    In multithreaded programs, reordering operations that have control dependencies may alter the program’s execution results.

2.2 Reordering instructions out of order

Cpus today typically use pipelining to execute instructions. The execution of an instruction can be divided into several stages: finger fetching, decoding, access, execution, write back and so on. Then, multiple instructions can exist in the pipeline and be executed at the same time. The instruction pipeline is not serial and does not cause a long instruction to stay in the “execute” phase for a long time, resulting in subsequent instructions stuck in the “execute” phase. Instead, pipelining is parallel, and multiple instructions can be in the same stage at the same time, as long as the corresponding processing parts of the CPU are not occupied. For example, if the CPU has an adder and a divider, an addition instruction and a division instruction may be in the “execute” phase at the same time, while the two addition instructions may only work in serial during the “execute” phase.

In this way, however, disorder can occur. For example, an addition instruction originally appears after a division instruction, but because the division takes a long time to execute, the addition instruction may finish before it finishes. For example, two fetch instructions may complete before the first instruction because the second instruction hits the cache. In general, instruction out of order is not a deliberate attempt by the CPU to reorder instructions before they are executed. The CPU always fetches instructions sequentially from memory and then pipelines them sequentially. However, the various conditions during the execution of instructions and the interaction between instructions may lead to the instructions that are put into the assembly line in sequence and are finally executed out of order. This is called “in order, out of order”.

In addition to the fact that the instruction pipeline can get stuck when there are insufficient resources (as in the case of one adder dealing with two addition instructions described earlier), the correlation between instructions can also cause the pipeline to get jammed. CPU out-of-order execution is not arbitrary, but to ensure that the program context causality. With this premise, the correctness of CPU execution is guaranteed.

Such as:

a++; 
b=f(a); 
c--;
Copy the code

Since the instruction b=f(a) depends on the execution of the previous instruction a++, b=f(a) will block until the execution of a++ is generated; And c– it doesn’t depend on that, so it might be done before b=f(a). Note that f(a) does not represent a function call that takes a, but rather an instruction that takes a operand. C calls, which require several instructions, are more complicated.

If such dependent instructions are close to each other, the latter instruction will be blocked in the pipeline for a long time, consuming the resources of the pipeline, waiting for the result of the execution of the previous one. Compiler reordering, as a means of compiler optimization, tries to distance two such instructions by reordering them so that by the time the latter instruction enters the CPU, the result of the former instruction has already been obtained, and there is no need to block. For example, reorder the instruction to:

a++; 
c--; 
b=f(a);
Copy the code

The compiler’s out-of-order is a real adjustment of instruction order, as opposed to the CPU’s out-of-order instructions. But compiler out-of-order must also ensure that program context causality does not change.

Due to the existence of reordering and out-of-order execution, it is easy to have all kinds of seemingly weird problems if the shared data is not synchronized properly in concurrent programming.