This document is shared with Possible Problems with GaussDB(DWS) Long Jump in huawei cloud community by thunder and Showers.

Problem description: In the coding practice of GaussDB(DWS), it is found that the debug version without compiler optimization does not have problems, but in the release version, some variables are invalid after assigning values, and the old values are still the bug. This paper will briefly analyze this from two perspectives.

What is a long jump?

In C, goto statements often implement localjump in program execution, longjmp() and setjmp() functions implement remote jump in program execution (nonlocaljump, also called farjump).

The main correlation is the signature of two functions:

int setjmp(jmp_buf env); 
void longjmp(jmp_buf env, int value); 
Copy the code

The setjmp function stores various context information when executing this function in jMP_buf, mainly including the current stack position, register state. The longjmp function jumps to the context (snapshot) saved in the env buffer. And some people have suggested that it has to do with implementation.

I think the following sentence is more credible:

The setjmp() function saves the contents of most of the general purpose registers, in the same way as they would be saved on any function entry. It also saves the stack pointer and the return address. All these are placed in the buffer. It then arranges for the function to return zero.

Compiler optimization issues

The problem occurred when the Debug and Release versions produced different results, and the difference was mainly due to the compiler’s optimization process during the build. Common compiler optimizations include caching memory variables into registers.

Since accessing a register is much faster than accessing a memory unit, compiler optimization sometimes reads a variable into a register first in order to improve the speed of accessing a variable. The value of the variable is then taken directly from the register. However, in many cases, dirty data will be read, which seriously affects the performance of the program.

Solution C++ Volatile keyword

Volatile, as defined in the dictionary: Volatile; Variable; Volatile. Personal understanding is after each to the variable assignment, need to put it in the memory, rather than the direct use of registers, can avoid the moment because jump and jump function not write from memory to assignment failed (is still the old values), or compiler optimizations, the values directly in a register (this value may be used multiple times because it is over, avoid read from the memory back and forth many times).

Problem of repetition

Instance not optimized, debug not optimized version

#include <stdio.h> #include <stdlib.h> #include <setjmp.h> static jmp_buf env; static void doJump(int nvar, int rvar, int vvar) { printf("Inside doJump(): nvar=%d, rvar=%d, vvar=%d\n" , nvar,rvar, vvar); // dead block int nvar0 = nvar; int rvar0 = rvar; int vvar0 = vvar; longjmp(env, 1); } int main(int argc, char** argv) { int nvar; register int rvar; volatile int vvar; nvar = 111; rvar = 222; vvar = 333; if(setjmp(env) == 0) { nvar = 777; rvar = 888; vvar = 999; doJump(nvar, rvar, vvar); } else { int nvar1 = nvar; int rvar1 = rvar; int vvar1 = vvar; printf("After longjmp(): nvar =%d, rvar=%d, vvar=%d\n", nvar, rvar, vvar); } exit(EXIT_SUCCESS); }Copy the code

Program running result

Build the program through GCC without using any optimizations. Run the resulting binary file and get the following result:

As you can see, the register variable rvar is still 222, contrary to the expected value, but the normal int and volatile values are correct. Note After a long jump, the reassignment of register variables during the jump is likely to be lost.

Assembly Angle observation

As shown in the following figure, rvar is directly placed in the ESI register during assignment, but does not overwrite the 222 value saved in memory before, that is, 888 is assigned to the register, which should still be 222, and the rest 777,999 goes into memory. And when the next custom function function is entered, all three variables are put into registers. Perform value transfer.

When jump returns, the real value of rvar (888) has been lost. The register value is flushed out by the jump buffer, and the old value is read from memory.

Memory view

The 888 is assigned to a register (as can be seen from the assembly).222 is not covered here.

Finally, jump is returned, and the value is read from memory. It is found that 777,222,999 is read, and the program has an accident. The figure below shows the value of the memory address, 222 at the -0x28 + 0x7ffffFFFE160 address bit.

Example optimization O2, release version

Program running result

Add the O2 compiler optimization to the compilation and run the program. At this time, it was found that the values of NVAR and RVAR were changed. Instead of storing 777 and 888 as we expected, the old value was not changed.

Because of compiler optimization problem, the rewrite value of variables Nvar and Rvar is put into the register during jump, and the register value is flushed after jump, resulting in this problem. The value of vvar is stored in memory, and can still be accessed by the register pointer after jump.

The following is to check the running process of the program and analyze the results.

Assembly Angle observation

You can run objdump -d volatile_og to view the disassembly code of the compiled file. We mainly observe the main function, which starts from 10c0. In the figure above, env is divided into 3 sections according to whether env is equal to 0. It is found that there is no call to the function Dojump in the assembly (Dojump does not appear after the callq instruction), presumably because the compiler is optimized for inline function. At the same time, the initialization of variables nvar0, Rvar0 and vvar0 in this function was turned into dead code blocks, which were also removed in the optimization process.

As you can see from the following figure, only the vvar that uses the volatile keyword can be found in the stack memory. The remaining variables are not Lvalues.

Memory view

To see what happens during a jump, check the memory values before and after the jump:

Figure 1 shows the values in the register before jump. Only 333 were entered into memory. Rvar and NVAR are not accessible by memory address.

After jump, the value in memory E15C is changed to 999. After Jump, the stack memory space is as follows:

In the following figure, only Vvar can fetch the address.

The appendix

The resources

  • What is a memory barrier? Why Memory Barriers ?
  • why-do-we-use-volatile-keyword
  • intro.races-13
  • Linux Assembly Language Development Guide Intel format -AT&T format
  • Detailed analysis of setjMP () and LongjMP ()
  • Setjmp and Longjmp in C language are used to realize exception capture and coroutine
  • Exactly what “program state” does setjmp save?

Specific optimization parameters that may be involved

L-fforce-mem: Forces the memory data to be copied to the register before performing an arithmetic operation. This makes all memory reference potentially common expressions, resulting in more efficient code, and instruction merge will eject individual register loads when there are no common subexpressions. This optimization may not have much effect on variables that involve only a single instruction.

But for variables that are involved in many instructions (requiring mathematical operations), this can be a significant optimization because the processor can access a value in a register much faster than it can access a value in memory.

L-fregmove: The compiler tries to reallocate the number of registers for move instructions or other simple instructions like operands to maximize the number of bound registers. This optimization is especially helpful for machines with dual operand instructions.

L-fschedule-insns: The compiler tries to rearrange instructions to eliminate delays caused by waiting for unprepared data. This optimization will help with slow floating-point machines and the execution of instructions that require load memory, since other instructions are allowed to execute until the load memory instruction completes or the floating-point instruction needs the CPU again. It allows data processing to complete other instructions first.

Summary: -fforce-mem may cause dirty data inconsistencies between memory and registers. Some logic that depends on the order of memory operations needs to be rigorously processed before it can be optimized. For example, using volatile to restrict how variables can be manipulated, or using barrier to force cpus to follow instructions strictly.

Memory Barriers

The root cause of the Cache consistency problem is the existence of a Cache exclusively owned by multiple processors, rather than multiple processors. It has many restrictions: multi-core, exclusive Cache, Cache write policy.

If either condition is not met, there is no cache consistency problem.

Multilevel Cache and storage read/write consistency for CPUS: Two buffers are added to the CPU to improve instruction executionstore buffer.invalidate queue.

Store Buffer:

Benefits: Store is for reading and writing between CPUS 0 and 1, without waiting for data to be fetched from another CPU’s Cache. (Speed up).

Disadvantages (Problem description) : CPU0 modifies the value, but sends a “read invalid” message later than CPU1 actually reads the value. As a result, the data is incorrect. Conflict resolution:

  • Hardware: Store Forwarding. If there is data in the local Store Buffer, read the local Store Buffer first.
  • Software: Hardware designers provide memory-barrier instructions for software to tell the CPU about these relationships.

Failure queue:

Store buffers are usually small, so they fill up after a few store operations, and the CPU must wait for an InValidation ACK message. Then remove it from store Buffer) to free up store Buffer buffer space.

Benefits: CPU1 may be under heavy load and execute a large number of invalid commands with heavier compounding. Increased speed

Disadvantages (problem description) : It is possible that the value itself is invalid, but the queue is not executed. (It’s late again.)

Solution: Still add barrier can be solved.

Click to follow, the first time to learn about Huawei cloud fresh technology ~