In order for a piece of code to be executed by a computer, it first needs to be translated into instructions that can be recognized and executed by a machine. The process of code compilation usually consists of several steps:

Code -> Lexical parsing -> Semantic parsing -> Intermediate code Generation -> Object code generationCopy the code

And in the process,

(1), (2) rely on the programming language design of the upper level.

(3) The analysis results will be compiled into intermediate code. At this stage, the compiler tries to optimize the intermediate code to help reduce the number of instructions eventually generated, or to use more efficient instructions, by reducing invalid or redundant code, computational strength optimization, and so on.

(4) Generate machine executable object code based on intermediate code, which is related to operating system, instruction set, memory, etc. Among them, different instruction sets will also bring different efficiency.

There are two important roles in this process, the compiler and the instruction set.

In order to improve the final execution efficiency of the code, the compiler will carry out a very important step, which is compilation optimization.

Compiler optimization

Using GCC as an example, let’s take a quick look at compile optimization.

The GCC compiler tries to optimize in three areas: compilation speed, generated code size, and generated code execution speed.

By default, the GCC compiler does not turn on compiler optimization because the compiler’s goal is to reduce compile time and ensure that the compiled results are tested as expected.

Without any optimization option, the compiler's goal is to reduce the cost of compilation and to make debugging produce the expected results. Statements are independent: if you stop the program with a breakpoint between statements, you can then assign a new value to any variable or change the program counter to any other statement in the function and  get exactly the results you expect from the source code.Copy the code

In general, this is a very abstract description, but let’s look at a more concrete example.

#include <stdlib.h>

void main() {
        int loop = 1000000000;
        long sum = 0;
        int index = 1;

        printf("%d", index + 6);
}
Copy the code

This is a section of C language code, code to do variable declaration and print.

Let’s take a look at the compiler language result without compiler optimization enabled:

(with gcc-s-fverbose -asm, we can get the result of the assembly with the variable remarks added)

.LC0:
        .string "%d"
main:
        pushq   %rbp    #
        movq    %rsp, %rbp      #,
        subq    $16, %rsp       #,
        movl    $1000000000, -16(%rbp)  #, loop
        movq    $0, -8(%rbp)    #, sum
        movl    $1, -12(%rbp)   #, index
        movl    -12(%rbp), %eax # index, tmp60
        addl    $6, %eax        #, D.2418
        movl    %eax, %esi      # D.2418,
        movl    $.LC0, %edi     #,
        movl    $0, %eax        #,
        call    printf  #
        ret
Copy the code

From there, you can see the creation of the various variables, the index + 6 operation through addl, and finally the printf method through call.

If we turn on compiler optimization, it makes a big difference.

.LC0:
        .string "%d"
main:
        movl    $7, %esi        #,
        movl    $.LC0, %edi     #,
        xorl    %eax, %eax      #
        jmp     printf  #
Copy the code

The most intuitive feeling is that there are fewer and more streamlined instructions, but if you look closely, you’ll notice some changes:

  • Redundant declarations are eliminated

Some variables that were declared, but not used, were eventually removed. For example, sum, loop, index.

  • Calculations that can be done during compilation are done ahead of time

In the code, the index variable is involved in the operation, but it is also removed in the final instruction, and the process is calculated in advance. Index = 6, so index + 1 can be predicted in advance and judged to be 7, so it is directly replaced in the compilation stage.

In the instruction, it will actually:

 movl    $1, -12(%rbp)   #, index
 movl    -12(%rbp), %eax # index, tmp60
 addl    $6, %eax        #, D.2418
 movl    %eax, %esi      # D.2418,
Copy the code

To:

 movl    $7, %esi
Copy the code

Simplified the results of compiled instructions.

  • Order to optimize

At the end of the function execution, you can see the following instruction:

 movl    $0, %eax        
Copy the code

Its purpose is to return the accumulator to 0, and when the compiler is enabled, it looks like this:

 xorl    %eax, %eax      
Copy the code

Why the change? We can compare these two commands briefly.

Objdump -d = objdump -d = objdump -d = objdump -d = objdump -d = objdump -d = objdump -d = objdump -d

 0:	b8 00 00 00 00       	mov    $0x0,%eax
     
 0:	31 c0                	xor    %eax,%eax
Copy the code

It can be found that the mov instruction occupies 2.5 times the size of XOR, and xOR is superior in terms of size.

It can be seen that compilation optimization will optimize instructions and choose instructions with lower cost. The size of the generated code is also a part of compilation optimization.

Compiler optimization item

As you can see from the example above, the compiler optimizes instructions from different angles.

In GCC’s compilation optimization options, let’s take a look at some of them:

-fmove-loop-invariants Move loop invariant computations out of loops -funroll-loops Perform loop unrolling when iteration count is known -finline-functions Integrate functions not declared "inline" into their callers when profitable  -fshort-double Use the same size for double as for float -fjump-tables Use jump tables for sufficiently large switch statements -fno-threadsafe-statics Do not generate thread-safe code for initializing local staticsCopy the code

Using -finline-functions as an example, this optimizer attempts to inline non-inline functions to optimize the following code:

#include <stdlib.h>
#include <time.h>

int add(int a, int b) {
        return a + b;
}

void main(int argc, char* argv[]) {
        printf("%d \n", add(atoi(argv[1]), atoi(argv[2])));
}
Copy the code

This code reads the two parameters of the main function and converts them to integer monkeys.

add:
        pushq   %rbp    #
        movq    %rsp, %rbp      #
        movl    %edi, -4(%rbp)  # a, a
        movl    %esi, -8(%rbp)  # b, b
        movl    -8(%rbp), %eax  # b, tmp61
        movl    -4(%rbp), %edx  # a, tmp62
        addl    %edx, %eax      # tmp62, D.2430
        popq    %rbp    #
        ret
main:
		  ......
        movl    %ebx, %esi      # b -> esi
        movl    %eax, %edi      # a -> edi
        call    add     #
        movl    %eax, %esi      # D.2433,
        movl    $.LC0, %edi     #,
        movl    $0, %eax        #,
        call    printf  #
        ......
Copy the code

As you can see from the main function, call Add is called first for addition and then call Printf for output.

When we turn on compiler optimization and turn on function inline optimization, let’s look at the compilation result:

add: leal (%rdi,%rsi), %eax ret main: ...... Leal 0(% RBP,% RAx), % ESI # a,b values have been placed in RBP, RAx MOVL $.lc0,% edi Call printf #......Copy the code

Gcc-o-finline-functions (gcc-o-finline-functions);

(1) The instruction of add function is simplified and optimized, because the parameter -o actually enables -o1 level optimization, which does not include -finline-functions, so function inline optimization is manually enabled.

(2) Call add has been removed from main function. Instead, add function is inlined and leal instruction is placed in main function

Compiler optimization level

In the optimization above, we talked about the concept of compiler optimization levels. GCC classifies compiler optimizations into multiple optimization levels for different scenarios.

Optimization level instructions
-O0 Default optimization level, which does not enable compilation optimization and only tries to reduce compilation time
-O/-O1 Try to reduce code size, shorten code execution time, and not perform optimizations that consume a lot of compile time. Compilation optimizations for large functions take more time and memory.
-O2 Similar to -o1, it improves compile time and code performance, enabling almost all optimizations except temporal and spatial trade-offs
-O3 On the basis of the -O2 level, more optimization items are enabled, the highest optimization level, to achieve higher performance at the cost of compilation time, code size, memory. Performance may degrade in some cases.
-Os Optimizing code size will enable optimization items in -O2 that do not increase the code size
-Ofast Enable all optimization items of -O3. The optimization level does not strictly comply with the language standards

Impact on code

Here’s an example of how compiler optimization affects the final code performance:

#include <stdlib.h> #include <time.h> void main() { int loop = 1000000000; long sum = 0; int start_time = clock(); int index = 0; for (index = 0; index < loop; index ++) { sum += index; } int end_time = clock(); Printf ("Sum: %ld, Time Cost: %lf \n", Sum, (end_time-start_time) * 1.0 / CLOCKS_PER_SEC); }Copy the code

Again, this is a circular addition example, and the time of the addition operation. Without compiler optimization enabled, the execution time is as follows:

After the -O3 optimization function is enabled, the execution time is as follows:

After optimization, the execution time is only about 10% of the original, and the effect is obvious.

(However, this example is a special example, the calculation process can be carried out in parallel with the superscalar calculation instructions, so as to achieve such an obvious effect, but not necessarily in other scenarios, I will have a chance to briefly introduce the instruction set later.)

Optimization is risky

Compiler optimizations can have side effects, as shown in the following example:

#include <stdlib.h>
#include <stdio.h>

int f() {
        int i;
        int j = 0;
        for ( i = 1; i > 0; i += i) {
                ++j;
        }
        return j;
}

void main() {
        int ret = f();
        printf("%d\n", ret);
}
Copy the code

In the for function, it depends on I to tell if it’s over. In normal judgment, the loop I plus overflow becomes negative, which terminates the loop as follows:

But when we turn on the -O2 optimization, we go into an endless loop:

Let’s look at the assembly instructions before and after optimization:

Main before optimization:...... movl $1, -8(%rbp) #, i jmp .L2 # .L3: addl $1, -4(%rbp) #, j movl -8(%rbp), %eax # i, tmp64 addl %eax, %eax # tmp64, tmp63 movl %eax, -8(%rbp) # tmp63, i .L2: cmpl $0, -8(%rbp) #, i jg .L3 #, ...... Main:...... after optimization .L2: jmp .L2 ......Copy the code

After compilation optimization, it becomes an infinite loop.

This is a special phenomenon in which Signed Integer Overflow (Signed Integer Overflow) is an undefined behavior in the C99 standard. The corresponding implementation of this behavior may be implicit conversions, exceptions, etc. In GCC, the compiler will assume overflow won’t happen and compile this code into an infinite loop.

(There may be other similar cases for your referenceCERT – Dangerous Optimizations and the Loss of Causality)

Selective compilation optimization

Since the risk factor is so high, should we give up? Of course not, we can choose to compile and optimize part of the code:

#pragma GCC optimize ("O3")
Copy the code

This statement will enable the corresponding level of compilation change for the code that follows it, but does not affect the code that precedes it, for example:

#include <stdlib.h>
#include <time.h>

int f();

void main() {
        int loop = 1000000000;
        long sum = 0;
        int index = 1;
        int ret = f();

        printf("%d \n", index + 6);
        printf("%d \n", ret);
}

# pragma GCC optimize("O3")
int f() {
        int i;
        int j = 0;
        for ( i = 1; i > 0; i += i) {
                ++j;
        }
        return j;
}
Copy the code

Compiler optimization is not enabled when GCC is compiled, resulting in compiled assembly instructions:

main:
        ......
        movl    $1000000000, -20(%rbp)  #, loop
        movq    $0, -8(%rbp)    #, sum
        movl    $1, -16(%rbp)   #, index
        movl    $0, %eax        #,
        call    f       #
        movl    %eax, -12(%rbp) # tmp60, ret
        movl    -16(%rbp), %eax # index, tmp61
        addl    $6, %eax        #, D.2431
        movl    %eax, %esi      # D.2431,
        movl    $.LC0, %edi     #,
        movl    $0, %eax        #,
        call    printf  #
        movl    -12(%rbp), %eax # ret, tmp62
        movl    %eax, %esi      # tmp62,
        movl    $.LC0, %edi     #,
        movl    $0, %eax        #,
        call    printf  #
        ret

f:
        pushq   %rbp    #
        movq    %rsp, %rbp      #
.L3:
        jmp     .L3     #
Copy the code

The f function after #pragma GCC optimize will be turned on and compiled into an infinite loop.

conclusion

Overall, compilation optimization can be beneficial in some scenarios, but it also has risks. GCC does not turn on compiler optimization by default, only basic compiler optimization, while other types of compiler are different choices, such as the official compiler provided by Java is compiler optimization, if you can correctly master the method and scenario used by compiler optimization, it can bring unexpected benefits.