The JVM series: Those things about just-in-time compilers

This is the sixteenth article in the Learn More ABOUT the JVM series

In our previous article on how Java executes, we mentioned just-in-time compilers, a technique used to make applications run more efficiently. Typically, code is interpreted and executed by the Java virtual machine, and hot code that is executed repeatedly is compiled into machine code on the fly and runs directly on the underlying hardware.

The question is, why do you need an interpreter when just-in-time compilation in HotSpot makes your application run more efficiently?

Interpreters and compilers

The first thing to understand is that not all Java virtual machines run with an interpreter and a compiler, but mainstream Java virtual machines such as HotSpot contain both. Note that the compiler here refers to the back-end compiler, while the Javac compiler is a front-end compiler (described above) and, unless otherwise specified, is a back-end compiler.

The reason why interpreters and compilers coexist is because each has its advantages.

1. When a program needs to be started and executed quickly, the interpreter can come into play first, saving compilation time and running immediately.

2. When a program is started, over time the compiler comes into play and compiles more and more code into native code. This reduces the interpreter’s intermediate wastage and achieves higher execution efficiency.

HotSpot uses just-in-time compilation to improve application efficiency, and the interpreter acts as a backup “escape door” for aggressive compiler optimizations. Aggressive optimization refers to letting the compiler choose optimizations based on probability that are not guaranteed to be correct in all cases, but will improve performance most of the time. When the assumption of radical optimization is not valid, for example, after loading a new class, the type inheritance structure changes or a “rare Trap” occurs, the implementation can be returned to the explained state through Deoptimization.

Therefore, the interpreter and compiler often work in tandem throughout the Java Virtual Machine execution architecture, as shown in the following figure:

The HotSpot virtual machine has two built-in JIT compilers (just-in-time compilers) implemented in C++, called “Client compiler” and “Server compiler” respectively. Or C1 and C2 compilers for short (C2 is also called Opto compiler in some sources and JDK source code). The above two compilers have two compilation methods, C1 compilation and C2 compilation for short. Users can also use the “-client” or “-server” parameter to force the VM to use C1 compilation or C2 compilation.

C1 compiles faster, while C2 compiles better, with performance typically 30% higher than C1.

The Graal compiler was introduced in JDK10 to replace the C2 compiler and is currently in the experimental phase. Graal is a Compiler for Java Bytecode that uses Java as the primary programming language. It is much more modular and easier to maintain than C1 and C2 implemented in C++. Graal can be used as a dynamic compiler to compile hot methods at run time. It can also be used as a static compiler to achieve AOT compilation.

Graal compiler can through the Java virtual machine parameters – XX: + UnlockExperimentalVMOptions – XX: + UseJVMCICompiler enabled. When enabled, it replaces the C2 compiler in HotSpot and responds to compilation requests that C2 was responsible for.

Three compilation modes

The compiler works in conjunction with the interpreter, an approach known in HotSpot as “Mixed Mode”. You can also use the -xint parameter to force the vm to run in Interpreted Mode, with all code executed in Interpreted Mode. Accordingly, you can use the argument “-xcomp” to force the virtual machine to run in “Compiled Mode,” in which Compiled Mode takes precedence, but the interpreter still has to step in if compilation is not possible.

Let’s test the following code:

public class HotCodeTest {

  public static void main(String[] args) {
    nlp();
  }

  public static void nlp(a) {
    int sum = 0;
    long start = System.currentTimeMillis();
    for (int i = 0; i < 20000000; i++) {
      sum += i;
    }
    long end = System.currentTimeMillis();
    System.out.println("cost:" + (end - start) + " ms"); }}Copy the code

Use the JavAC compiler to get the class file, then proceed to the following steps.

1. Run -xint to explain only

// The JVM parameter is: -xx :+PrintCompilation -xint // Result: cost:160 msCopy the code

The -xint parameter forces the JVM to interpret and execute all bytecodes, with no compilation information, which of course slows down running speed, often by 10 times or more.

2. Close the interpreter: -xcomp

// The JVM parameter is -xx :+PrintCompilation -xcomp // result: cost:1 msCopy the code

It takes very little time and prints a lot of compilation information, including the compilation information of NLP methods. The JVM compiles all the bytecode into native code the first time it is used, resulting in maximum optimization. This looks good, but as stated at the beginning of this article, the interpreter is more efficient if the program needs to be executed quickly. For example, if you loop the above code 100 times, you can see that the interpreted execution takes less time than the compiled execution.

3. Blend mode: -xmixed

-xx :+PrintCompilation // Result: cost:9 msCopy the code

After JDK8 HotSpot defaults to a mixed mode of interpretation execution +JIT. As a result, a large amount of compilation information is printed, including compilation information for NLP methods.

In summary, it is also a positive confirmation of the need for interpreter and compiler coexistence.

Layered compilation

Because just-in-time compilers take time to compile native code, it usually takes longer to compile optimized code; What’s more, you need the help of an interpreter, which may collect performance monitoring information for the compiler, which can affect the speed of interpreting the execution phase. So to achieve the best balance between responsiveness at startup and efficiency, the HotSpot virtual machine added tiered compilation to the compilation subsystem.

The concept of hierarchical compilation has been around for a long time, but it wasn’t implemented until JDK6, where JDK7 was turned on as the default compilation strategy.

Hierarchical compilation According to the scale and time of compilation and optimization by the compiler, different compilation layers are divided, including:

Level 0, Interpreter interprets execution, and the Interpreter does not turn on Profiling.

Level 1 and C1 Are compiled and executed without the Profiling function enabled.

Level 2 and C1 compile execution, enabling limited Profiling such as method calls and loopback side executions.

Level 3 and C1 Compile and execute with all performance monitoring enabled. In addition to layer 2 statistics, all statistics such as branch hops, virtual method invocation versions, and so on are collected.

Level 4. C2 compilation and execution.

To explain, profiling means that during a program’s execution, the interpreter collects data that reflects the state of the program. The data collected here is called the profile of the program.

The preceding layers are not fixed, and the number of layers can be adjusted according to different running parameters and versions of VMS. The interaction and conversion relationship between compilations at all levels are shown in the figure below:

C2 code is typically 30% more efficient than C1 code. The code executed by C1 layer, in order of execution efficiency, is layer 1 > layer 2 > layer 3. Of the five levels of execution states, layers 1 and 4 are termination states. After a method is compiled in the terminated state, the Java virtual machine will not issue a compile request for the method if the compiled code is not invalidated.

The four compilation approaches introduced in the figure above are described as follows:

The first path in the figure represents that the hotspot method will normally be compiled by layer 3 of C1, and then layer 4 of C2.
If the method has a small number of bytecodes (such as getters/setters) and layer 3 profiling has no data to collect. Then, the Java virtual machine decides that this method is equally efficient for C1 code and C2 code. In this case, the Java virtual machine chooses to compile with C1 at tier 1, after tier 3. Since this is a terminated state, the Java virtual machine does not continue C2 compilation with four layers.
While C1 is busy, the Java Virtual machine profiling the program during interpretation execution and then compiles it directly from 4-tier C2.
In the case of busy C2, methods are compiled by C1 at layer 2, and then C1 at layer 3 to reduce the execution time of methods at layer 3, based on the execution efficiency of C1 at layer 1 > layer 2 > layer 3 mentioned above.

Java 8 has hierarchical compilation enabled by default. The -client and -server parameters used to select just-in-time compilers are invalid regardless of whether hierarchical compilation is turned on or off. When hierarchical compilation is turned off (the -xx: -tieredcompilation command is used to turn off hierarchical compilation), the Java virtual machine uses C2 directly.

If you want to use C1 only, you can use -xx :TieredStopAtLevel=1 when hierarchical compilation is turned on. In this case, the Java virtual machine is compiled directly by layer 1 C1 after the interpretation execution. TieredStopAtLevel =4, then C2 of layer 4 is compiled.

Instantaneous compiling

Just-in-time compilation is based on the assumption that the program complies with the 80-20 rule, which means that 20 percent of the code takes up 80 percent of the computing resources.

For the most uncommon code, we don’t have to spend time compiling it to machine code, but run it interpretively. On the other hand, for a small percentage of hot code, we can compile it into machine code, which can be called repeatedly next time to achieve the desired speed.

Just-in-time compilation triggers

As mentioned above, the just-in-time compiler compiles “hot code”. There are two main types of hot code referred to here, including:

A method that is called multiple times.
The body of a loop that is executed multiple times.

According to the description of “hot code”, how many times is hot code executed? Also, how does the virtual machine count how many times a method or piece of code has been executed?

It is called “hotspot detection” to determine whether a piece of code is hotspot code and whether it needs to trigger real-time compilation. In fact, hotspot detection does not necessarily know how many times the method is called. Currently, there are two main methods to determine hotspot detection:

Hot spot detection based on sampling
Counter – based hotspot detection

The second counter based hot-spot detection approach is used in the HotSpot virtual machine, which has two types of counters for each method: method invocation counters and backside counters. Both counters have a certain threshold, provided that the virtual machine’s operating parameters are determined. When the counter overflows, JIT compilation is triggered. (Who is responsible for this? Remember from the layered compilation section that the interpreter turns on performance monitoring, in which the interpreter counts and validates data?)

Method call counter

As the name implies, this counter is used to count the number of times a method is called, and its default threshold is 1500 times in Client mode and 10000 times in Server mode. This threshold can be set manually with the -xx :CompileThreshold parameter. When a method is called, it is checked to see if a JIT-compiled version of the method exists, and if so, it is executed using compiled native code in preference. If no compiled version exists, the call counter value of this method is incremented by one, and the sum of the method call counter and the backside counter values exceeds the threshold of the method call counter. If the threshold is exceeded, a code compilation request for the method is submitted to the just-in-time compiler.

If this parameter is left unchecked, the method call counter does not count the absolute number of times a method is called, but rather a relative frequency of execution, the number of times a method is called over a period of time. After a certain time limit, if a method has not been called enough times to submit to the in-time compiler, the method’s call counter is reduced by half. This process is called the method’s call counter heat decrement, and this period of time is called the half-decrement period of the method’s statistics. Heat decay happens while the virtual machine is garbage collecting. By using the virtual machine parameter -xx: -usecounterdecay, heat decay can be turned off and the method counter can count the absolute number of method calls so that most methods will be compiled into local code if the system runs long enough. Alternatively, you can use the -xx :CounterHalfLifeTime parameter to set the time of the half-decay period in seconds.

If this parameter is not set, the execution engine does not wait synchronously for the compile request to complete. Instead, the execution engine goes directly to the interpreter and executes the bytecode as interpreted until the submitted request is completed by the compiler. When the compilation is complete, the method’s call entry address is automatically rewritten to the new one, and the compiled version is used the next time the method is called.

The following code looks like this:

// Parameter: -xx :+PrintCompilation -xx: -tieredcompilation (close hierarchical compilation)
private static Random random = new Random();

public static void main(String[] args) {
  long sum = 0;
  long start = System.nanoTime();
  for (int i = 0; i < 12000; i++) {
    sum += getRandomNum();
  }
  long end = System.nanoTime();
  System.out.println(end - start);
}

public static int getRandomNum(a) {
  return random.nextInt(10);
}
Copy the code

You can see the getRandomNum method in the compiled output. After hierarchical compilation is disabled, the compiler defaults to C2 compiler as described above, and the method counter threshold at this time is 10000. Although the number of method calls is not strictly equal to the result of method call counter statistics, it is also considered as hot code after reaching a certain number of times, and real-time compilation is finally triggered.

Back edge counter

Its function is a statistical method in the loop body, according to the number of code execution in the bytecode in control flow after the jump instruction is called “back side” (here refers to the number of times is the back side, rather than the cycles, because not all of the cycle are back edge, such as air circulation, in fact, can be regarded as to jump to their own process, so don’t count as control flow after the jump, Will not be counted by the backside counter). Obviously, the statistical purpose of the loopback technique is to trigger OSR compilation. HotSpot also provides a parameter -xx :BackEdgeThreshold similar to method call counter -xx :CompileThreshold, but the vm doesn’t actually use this parameter. So we need to set up another parameter – XX: OnStackReplacePercentage to indirect adjustment back side counter threshold, its computation formula is as follows:

(1) Client mode

Method call counter threshold x OSR ratio / 1000, where the OSR ratio defaults to 933. If both defaults are used, the counter threshold in Client mode should be 13995.

(2) Server mode

Method call counter threshold x (OSR ratio – interpreter monitoring ratio) / 100, where the OSR ratio defaults to 140 and the interpreter monitoring ratio defaults to 33. If the default values are used, the Server mode lower margin counter threshold should be 10700.

When the interpreter in a back edge instruction, will first search will be performed in a code fragment that has compiled version, if you have, it will be priority has compiled code, otherwise, the timer value add 1 back edge, and then judge method invocation counter and counter value is more than the sum of back to back side counter threshold. When the threshold is exceeded, an OSR compilation request is submitted and the return counter is lowered to continue the loop in the interpreter, waiting for the compiler to output the compilation results.

Unlike a method counter, a backside counter does not have heat decay, so this counter counts the absolute number of times the method loops. When a counter overflows, it also adjusts the value of the method counter to the overflow state, so that the standard compilation process is performed the next time the method is entered.

Take this code for example:

// Parameter: -xx :+PrintCompilation -xx: -tieredcompilation (close hierarchical compilation)
private static Random random = new Random();

public static void main(String[] args) {
  nlp();
}

public static void nlp(a) {
  long sum = 0;
  for (int i = 0; i < 18000; i++) {
    sum += random.nextInt(10); }}Copy the code

The result of compiling the above code still has the NLP method, which triggers just-in-time compilation. Take a look at the bytecode content of the NLP method.

 public static void nlp();
    descriptor: ()V
    flags: (0x0009) ACC_PUBLIC, ACC_STATIC
    Code:
      stack=4, locals=3, args_size=0
         0: lconst_0
         1: lstore_0
         2: iconst_0
         3: istore_2
         4: iload_2
         5: sipush        18000
         8: if_icmpge     29
        11: lload_0
        12: getstatic     #3                  // Field random:Ljava/util/Random;
        15: bipush        10
        17: invokevirtual #4                  // Method java/util/Random.nextInt:(I)I
        20: i2l
        21: ladd
        22: lstore_0
        23: iinc          2, 1
        26: goto          4
        29: return
Copy the code

It can be seen that the bytecode with offset 26 will jump back to the bytecode with offset 4, so the instruction with offset 26 is the back edge instruction. When the interpreter encounters this instruction, it looks for a compiled version of the code snippet to be executed, and if not, the backside counter is incremented by one until the threshold of the backside counter is met.

summary

For the above two kinds of counters, we draw the following conclusions:

Each method has two kinds of counters: method call counters and backside counters, each with a different threshold.
Method call counter threshold 1500; The backside counter threshold is 13995, and C1 compilation is triggered when the count exceeds the threshold.
Method call counter threshold 10000; The backside counter threshold is 10700, which triggers C2 compilation.

It is also important to note that the threshold is always compared to the sum of the method call counter and the return counter, because there may be a loop body in a method that is called multiple times. Take this code for example:

//-XX:+PrintCompilation
private static Random random = new Random();

public static void main(String[] args) {
  long sum = 0;
  long start = System.nanoTime();
  for (int i = 0; i < 6000; i++) {
    sum += getRandomNum();
  }
  long end = System.nanoTime();
  System.out.println(end - start);
}

public static int getRandomNum(a) {
  return random.nextInt(10);
}
Copy the code

According to the output, C1 and C2 compilations are triggered, although the number of loops is only 6000, but C2 compilations are triggered because both types of counters are counting, and the final sum exceeds the threshold to trigger just-in-time compilations.

It is obvious that the threshold value of the method call counter is smaller, so in general, compilation can be triggered as long as the threshold value of the method call counter is met. Why we need to set the threshold value of the back side counter is because of OSR. OSR has a higher trigger condition.

OSR

As you can see, there are two factors that determine whether a method is hot code: the number of times the method is called and the number of times the back edge is executed. Just-in-time compilation is triggered by the sum of these two counters. Why would a Java virtual machine need to maintain two different counters?

In both cases, the compiled target object is the entire method body, not the individual loop body. In the first case, since the compilation is triggered by a method call, the compiler naturally takes the entire method as the compilation object, which is standard just-in-time compilation in virtual machines.

For the latter, while compiling movement is triggered by the loop body, hot spots as part of a method, but the compiler still must compile the overall approach as object, just perform entrance (starting from the method which the bytecode instruction execution) will be slightly different, compile bytecode into execution entry points serial number (Byte Code Index, BCI) (this was explained in the bytecode file above in the back side counter section). Because compilation takes place while the method is executing, it is vividly referred to as “On Stack Replacement” (OSR), where the method is replaced while the Stack frame of the method is still On the Stack.

When hierarchical compilation is not enabled, OSR compilation is triggered by a multiple of the threshold specified by -xx :CompileThreshold. By default, the OSR compilation threshold is 13500 for C1 and 10700 for C2.

Is enabled under the condition of stratified compilation, the compiler triggers OSR threshold by parameter – XX: TierXBackEdgeThreshold specified threshold value multiplied by coefficient.

Codecache

We also know from previous studies that the JVM compiles hot code to native machine code at runtime to improve execution efficiency. So where does the machine code reside? Next, we’ll introduce the term Codecache, which is used to store the memory area of local machine code in off-heap memory. Normally, we are not exposed to the area may occasionally met online server goes down, inside the log to see Java. Lang. OutOfMemoryError code cache, is the memory problems.

configuration

Codecache memory size configuration:

InitialCodeCacheSize = 2555904 // Default size ReservedCodeCacheSize = 251658240 // CodeCacheExpansionSize =65536 //CodeCache expands the size each timeCopy the code

CodeCache output parameters:

-xx :+PrintCodeCache # PrintCodeCache usage when JVM stops One of max_used is the biggest usage in the whole operation process codeCache - XX: + PrintCodeCacheOnCompilation # for every time in method is compiled output codeCache usageCopy the code

To check the usage of CodeCache, run the following command:

-xx :+PrintCodeCache // CodeCache: size=245760Kb used=1123Kb max_used=1132Kb free=244636Kb bounds [0x0000000118979000, 0x0000000118be9000, 0x0000000127979000] total_blobs=285 nmethods=33 adapters=166 compilation: enabledCopy the code

If you want to change the maximum Codecache memory size, you can set it like this:

-xx :ReservedCodeCacheSize=2496K -xx :+PrintCodeCache // CodeCache: size=2496Kb used=1109Kb max_used=1120Kb free=1386Kb bounds [0x000000010ae06000, 0x000000010b076000, 0x000000010b076000] total_blobs=274 nmethods=28 adapters=160 compilation: enabledCopy the code

Codecache refresh option

UseCodeCacheFlushing, which defaults to true, clears CodeCache before JIT compilation is turned off.

CodeCache compiler restrictions

-xx :MaxInlineLevel # Maximum inlining depth for nested calls, default is 9 -xx :MaxInlineSize # Maximum Bytecode size that a method can be inlined, Defaults to 35 - XX: MinInliningThreshold # method can be the minimum number of inline, defaults to 250 - XX: + InlineSynchronizedMethods # whether to allow the inline synchronized methods, The default is trueCopy the code

What happens when the Codecache is full

Codecache takes up a fixed amount of memory, and the JVM will not compile any additional code because the JIT compiler is now off. In addition, we receive * “CodeCache is full… The compiler has been disabled* “. The JIT compiler is stopped and will not be restarted, so it reverts to interpreted execution, where compiled code is still executed as compiled, but uncompiled code is executed as interpreted.

Adding ReservedCodeCacheSize may be a solution, but this is usually only a temporary fix.

If the program cannot be compiled and executed, it will affect the efficiency of the program. The JVM provides a us codecacheflushing option to control flushing of code cache areas. This configuration defaults to true and releases occupied areas when:

Code cache is full. If the size of this area exceeds a certain threshold, the Codecache is cleared.
A certain amount of time has passed since the last cleanup.
Precompiled code is not hot enough. For each JIT-compiled method, the JVM has a heat tracking counter. If the counter value is less than the dynamic threshold, the JVM releases the precompiled code.

Knowing when the Codecache is cleared, what does the JVM do next?

The -xx :+PrintCompilation command outputs program compilation information, including made non entrant and Made zombie.

Made Not entrant means that the compiled method cannot be re-entered, that is, the machine code in Codecache cannot be reused, and that the JVM switches back from executing just-in-time compiler generated machine code to interpreted execution using a mechanism called Deoptimization **. More on this in subsequent chapters.

When the Codecache purge condition is met and the Java VIRTUAL machine detects that all threads exit the compiled “Made Not Entrant” method, it marks the method as “Made Zombie” and the space occupied by this code can be reclaimed.

extension

PrintCompilation parameters

-XX:+PrintCompilation
Copy the code

PrintCompilation will output statistics about compiled methods, so it is very convenient to use PrintCompilation to see which code is hot. Hot code means there is potential for optimization. By default, this option is disabled and diagnostic output is not printed.

// Add the parameter -xx :+PrintCompilation
public static void main(String[] args) {
  long start = System.nanoTime();
  for (int i = 0; i < 200; i++) {
    nlp();
  }
  long end = System.nanoTime();
  System.out.println(end - start);
}

public static void nlp(a) {
  int sum = 0;
  for (int i = 0; i < 2000; i++) { sum += i; }}Copy the code

Some intercepted logs are output as follows:

    185   24       4       sun.nio.cs.UTF_8$Encoder::encode (359 bytes)
    185   26       3       java.lang.StringBuilder::append (8 bytes)
    185   27       1       java.net.URL::getAuthority (5 bytes)
    186   28       3       java.lang.String::isEmpty (14 bytes)
    187   29 %     3       HotCodeTest::nlp @ 4 (22 bytes)
    187   30       3       HotCodeTest::nlp (22 bytes)
    187   31 %     4       HotCodeTest::nlp @ 4 (22 bytes)
    187   33       3       java.lang.System::getSecurityManager (4 bytes)
    187   32       3       java.lang.StringBuilder::toString (17 bytes)
    187   29 %     3       HotCodeTest::nlp @ -2 (22 bytes)   made not entrant
    187   34       4       HotCodeTest::nlp (22 bytes)
    188   35       3       java.util.Arrays::copyOfRange (63 bytes)
    188   30       3       HotCodeTest::nlp (22 bytes)   made not entrant
    188   36       4       java.lang.String::hashCode (55 bytes)
Copy the code

This section describes log meanings.

Column 1: time when the method starts compiling (in milliseconds)
Column 2: Compile ID for Java virtual machine maintenance

Column 3: Some flags

B Blocking compiler (always set for client) // Blocking application threads * Generating a native wrapper % On stack replacement (where the Compiled code is running) compiled on the stack triggered by the side counter! Methods are declared as synchronized // methods are declared as native // methods are declared as synchronized made non entrant compilation was wrong/incomplete, No future callers will use this version // indicates that the method can no longer be entered into made zombie code is not in use and ready for GC // indicates that this code can be reclaimedCopy the code

Column 4: Compilation levels, 0-4, corresponding to the image in the hierarchical compilation section
Column 5: the method name compiled

CITime parameters

-XX:+CITime
Copy the code

Printing costs time in JIT compilation

reference

Introduction to Graal

An in-depth look at the Java just-in-time compiler (Part 1)

Basic skills | Java compiler principle analysis and practice

Introduction to JVM code caching