JVM series: Deep learning method inlining

This is the 18th article in the Learn More ABOUT the JVM series

Method inlining, mentioned several times in previous articles, is the most important optimization technique for compilers, not only eliminating the performance overhead of the call itself, but also triggering more optimizations. This article will take you through the technology.

Methods the inline

Method inlining refers to the optimization of the method body of the target method in place of the original method call when a method call is encountered during compilation.

Take getter/setter as an example. If there is no method inline, when calling getter/setter, the program needs to save the execution position of the current method, create and push the stack frame for the getter/setter, access the field, pop the stack frame, and finally resume the execution of the current method. When method calls to getters/setters are inline, only field access is left. In C2, method inlining is done during the parsing of bytecode. Whenever a method call bytecode is encountered, C2 decides whether the method call needs to be inlined. If inline is required, the bytecode of the target method is parsed.

We use the following example to demonstrate method inlining:

  public static boolean flag = true;
  public static int value0 = 0;
  public static int value1 = 1;

  public static int foo(int value) {
    int result = bar(flag);
    if(result ! =0) {
      return result;
    } else {
      returnvalue; }}public static int bar(boolean flag) {
    return flag ? value0 : value1;
  }
Copy the code

The foo method in the above code will take an int parameter, and the bar method will take a Boolean parameter. The foo method reads the value of the static field Flag and calls the bar method as an argument. Looking at the code, we can see that the BAR method returns value0 fixed, but the Java interpreter doesn’t, so we can’t expect the interpreter to optimize the code.

Given the definition of method inlining, we know that it is possible for the compiler to optimize this piece of code, so we try calling method foo and making it hot enough to trigger just-in-time compilation. We added the test method as follows:

  public static void main(String[] args) {
    for (int i = 0; i < 20000; i++) { foo(i); }}Copy the code

Add -xx :+PrintCompilation parameter before executing to output compilation results. For details on what each inline message means, you can refer to the official website.

235 28 3 com.msdn.java.javac.jit.MethodInlineTest::foo (15 bytes) 235 29 3 com.msdn.java.javac.jit.MethodInlineTest::bar  (14 bytes) 236 30 4 com.msdn.java.javac.jit.MethodInlineTest::foo (15 bytes) 236 28 3 com.msdn.java.javac.jit.MethodInlineTest::foo (15 bytes) made not entrantCopy the code

As you can see, the foo method triggers just-in-time compilation, so let’s see if the bar method is inlined to the foo method using the following command.

-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
Copy the code

The output is:

Inline indicates that the bar method is successfully inlined.

The online inference for method inlination is to use IR diagrams, but that’s a bit of a struggle, and to view the IR diagram you need the Ideal Graph Visualizer tool, which has JDK version limitations and can only be configured for JDK6. It only supports Windows and Liunx, so WE won’t talk about IR graphs.

We demonstrate the optimization process step by step by modifying the original code.

The first step is to inline the method

public static int foo(int value) {
  int result = flag ? value0 : value1;
  if(result ! =0) {
    return result;
  } else {
    returnvalue; }}Copy the code

The second step is dead code elimination

public static int foo(int value) {
  int result = 0;
  if(result ! =0) {
    return result;
  } else {
    returnvalue; }}Copy the code

The above code method is successfully inlined, so when can methods be inlined, what are the restrictions? Let’s keep going.

Method inline conditions

Method inlining can trigger more optimizations. In general, the more inlining, the more efficient the generated code will be. However, for just-in-time compilers, the more inlining, the longer the compilation time, and the later the peak performance of the program will be.

The Codecache memory is a fixed size controlled by the Java virtual machine parameter -xx :ReservedCodeCacheSize). If the Codecache is full, It can cause performance problems.

Therefore, just-in-time compilers do not do method inlining indefinitely. Here are some of the inline rules for just-in-time compilers. (For other special rules, refer directly to the JDK source code.)

First, methods specified by the inline directive -xx :CompileCommand and those annotated by @forceInline (only JDK internal methods) are forced to be inlined. Methods specified by the dontinline directive or exclude (not to compile) in -xx :CompileCommand and methods annotated by @dontinline (only JDK internal methods) will never be inlined.

Based on the code above, we add the following JVM parameters:

-XX:+PrintCompilation -XX:CompileCommand=exclude,com/msdn/java/javac/jit/MethodInlineTest::bar
Copy the code

The compiled output is as follows:

### Excluding compile: static com.msdn.java.javac.jit.MethodInlineTest::bar
made not compilable on all levels  com.msdn.java.javac.jit.MethodInlineTest::bar (14 bytes)   excluded by CompilerOracle
    213   28       3       com.msdn.java.javac.jit.MethodInlineTest::foo (15 bytes)
    214   29       4       com.msdn.java.javac.jit.MethodInlineTest::foo (15 bytes)
    214   30       4       java.lang.String::hashCode (55 bytes)
    214   28       3       com.msdn.java.javac.jit.MethodInlineTest::foo (15 bytes)   made not entrant
    214   31       3       java.util.Arrays::copyOfRange (63 bytes)
Copy the code

Even perform – XX: + UnlockDiagnosticVMOptions – XX: + PrintInlining parameters, also will not output the inline logo.

Second, method calls cannot be inlined if the symbolic reference corresponding to the calling bytecode is not resolved, the class of the target method is not initialized, or the target method is a native method.

Third, C2 over 9 layer does not support inline invocation (MaxInlineLevel defaults to 9, can adjust the parameter size), and 1 layer of recursive call directly (by virtual machine parameters – XX: MaxRecursiveInlineLevel adjustment).

For a direct recursive call at level 1, the following example is used:

  public static int foo(int value) {
    if (1 == value) {
      return 1;
    } else {
      return value + foo(value - 1); }}public static void main(String[] args) {
    for (int i = 0; i < 10000; i++) {
      foo(3); }}Copy the code

Use the VM parameter -xx :+PrintInlining to print the inlining during compilation, but it cannot be inlining. Try modifying the MaxRecursiveInlineLevel parameter to execute the code again.

-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -XX:MaxRecursiveInlineLevel=5
Copy the code

An inline identifier is found in the results.

Fourth, the just-in-time compiler determines whether a method call can be inlined based on the heat of the program path where the method call instruction resides, the number and size of the target method call, and the size of the current IR graph.

Virtual methods are inlined

Before we talk about the inlining of virtual methods, let’s recall the previous section on How methods are called about virtual methods. Virtual and non-virtual methods are defined as follows:

Only private methods called with the Invokespecial directive, instance constructors, superclass methods, and static methods called with the InvokeStatic directive are parsed at compile-time, plus the final modified method (although it is called with the Invokevirtual directive), These five method calls resolve symbolic references to direct references to the method at class load time. These methods are collectively referred to as “non-virtual methods”, whereas other methods are called “Virtual methods”. “Non-virtual methods” can be inlined, and “virtual methods” can take relevant measures to achieve inlining.

There are two optimizations for virtual method calls: Inlining cache and method inlining.

Here’s a quick reminder of the description of inline caching, an optimization technique to speed up dynamic binding. It can cache the dynamic type of the caller in a virtual method call, as well as the target method corresponding to that type. During subsequent execution, if a cached type is encountered, the inline cache directly calls the target method for that type. If no cached type is encountered, inline caching degrades to using dynamic binding based on method tables.

The examples above are all static method calls (non-virtual method calls), and the just-in-time compiler can easily determine the unique target method.

For virtual method calls that need to be dynamically bound, the just-in-time compiler devirtualizes the virtual method call, converting it to one or more direct calls before method inlining.

In order to proceed smoothly, let’s first understand the meanings of several fields:

The inline (hot), use – XX: XX: + UnlockDiagnosticVMOptions – + PrintInlining parameters output code inline situation, if a method has been inline, will end with the inline (hot) identification, Of course, there are other logos as well. For more information on their meanings, see the official website.
Made Not entrant, which prints compilation of code when using the -xx :+PrintCompilation parameter. This flag indicates that de-optimization has occurred, which can be referred to in my last article.
Virtual_call, encountered when viewing assembly code using JITWatch or HSDIS, represents a virtual method call.
Optimized Virtual_Call is an optimized virtual method that is encountered when viewing assembly code using JITWatch or HSDIS.
Runtime_call, encountered when viewing assembly code using JITWatch or HSDIS, represents a runtime call.

Optimized the relationship between Virtual_Call and inline (HOT). Inline methods for virtual methods are optimized. That means Inline (Hot)= optimized Virtual_Call, that’s what I learned in the early days. Online articles have descriptions of it, such as:

The JIT has optimized Virtual_Call for the methodCall method. The optimized method can be inline.

I searched online for a long time. There are not many descriptions about Optimized Virtual_Call, so it cannot be considered as a judgment condition for method inlining. After several tests, some virtual methods are found to be inline (HOT). However, optimized Virtual_Call does not appear in assembly code, but runtime_call. So I won’t use that as a condition for inlining methods later in this article. (Please let me know if there is a connection between the two, thank you)

Let’s get back to how the Java virtual machine solves the inlining problem of virtual methods.

The methods of immediate compiler virtualization can be classified into full virtualization and guarded devirtualization.

Complete decontamination is an optimization method that identifies the only target method of a virtual method call through class hierarchy analysis and converts it into a direct call. The key is to prove that the target method of the virtual method call is unique.

Conditional decontamination is an optimization for converting virtual method calls into several type tests and direct calls. The key is to figure out what type you want to compare.

When it comes to dynamic binding, inheritance comes naturally to mind, so we’ll construct code like this:

public abstract class BinaryOp {
  public abstract int apply(int a, int b);
}

public class Add extends BinaryOp {
  public int apply(int a, int b) {
    returna + b; }}Copy the code

Complete decontamination based on type inheritance relationship analysis

To address the inlining of virtual methods, the Java Virtual Machine first introduced a technique called Class HierarchyAnalysis (CHA), an application-wide type analysis technique used to determine that classes currently loaded, Information such as whether an interface has more than one implementation, whether a class has subclasses, and whether a subclass overrides a virtual method of its parent class.

Full de-blurring based on type inference deduces the dynamic type of the caller through data flow analysis to determine the specific target method.

  public static int foo(a) {
    BinaryOp op = new Add();
    return op.apply(2.1);
  }

  public static int bar(BinaryOp op) {
    op = (Add) op;
    return op.apply(2.1);
  }
Copy the code

The foo and bar methods in the above code both call apply and the caller’s declaration type is BinaryOp.

Still not comfortable with IR diagrams, so let’s start with bytecode.

public static int foo(); descriptor: ()I flags: ACC_PUBLIC, ACC_STATIC Code: stack=3, locals=1, args_size=0 0: new #2 // class com/msdn/java/javac/jit/Add 3: dup 4: invokespecial #3 // Method com/msdn/java/javac/jit/Add."<init>":()V 7: astore_0 8: aload_0 9: iconst_2 10: iconst_1 11: invokevirtual #5 // Method com/msdn/java/javac/jit/BinaryOp.apply:(II)I 14: ireturn LineNumberTable: line 19: 0 line 20: 8 public static int bar(com.msdn.java.javac.jit.BinaryOp); descriptor: (Lcom/msdn/java/javac/jit/BinaryOp;) I flags: ACC_PUBLIC, ACC_STATIC Code: stack=3, locals=1, args_size=1 0: aload_0 1: checkcast #2 // class com/msdn/java/javac/jit/Add 4: astore_0 5: aload_0 6: iconst_2 7: iconst_1 8: invokevirtual #5 // Method com/msdn/java/javac/jit/BinaryOp.apply:(II)I 11: ireturnCopy the code

In the bytecode of the foo method, there is a key new instruction, while in the bar method there is the checkcast instruction, which can simply be viewed as the exact type cast. Since both nodes are precisely of type Add, the original Invokevirtual instruction points to the binaryop. apply method, but is recognized as a call to the Add.apply method during parsing.

To sum up, the real-time compiler, because it determines the object type, can directly de-blur. Although it is the Invokevirtual directive, it can determine the only way to achieve method inlination.

In the BAR method, we also force the OP object to be of type Add, ensuring uniqueness. So if there is no strong statement, can be inlined?

  public static int test(BinaryOp op) {
    return op.apply(2.1);
  }
  
  //main test method
  public static void main(String[] args) {
  	Add add = new Add();
    for (int i = 0; i < 20000; i++) { test(add); }}Copy the code

The output compilation results are as follows, with both the test and apply methods triggering just-in-time compilation.

    632   48       3       com.msdn.java.javac.jit.VirtualTest::test (7 bytes)
    632   49       3       com.msdn.java.javac.jit.Add::apply (4 bytes)
    632   50       1       com.msdn.java.javac.jit.Add::apply (4 bytes)
    632   51       4       com.msdn.java.javac.jit.VirtualTest::test (7 bytes)
    632   49       3       com.msdn.java.javac.jit.Add::apply (4 bytes)   made not entrant
    633   48       3       com.msdn.java.javac.jit.VirtualTest::test (7 bytes)   made not entrant
Copy the code

Unsurprisingly, the Apply method is also inlined.

How would things change if BinaryOp had another subclass? Add the Sub class first, and then test.

public class Sub extends BinaryOp{ @Override public int apply(int a, int b) { return a - b; }}Copy the code

With the main method unchanged, we continue to test by calling the test method, and here we print out the loading of the class. You can see that the Java virtual machine only loads Add, so the binaryop. apply method has only one concrete implementation, Add.apply. Therefore, when the immediate compiler encounters a call to Binaryop. apply, it directly inlines the contents of Add.apply.

If you initialize both Add and Sub classes, what does the compiler do?

Condition decontamination

According to the definition of type inheritance analysis, the technique can no longer handle cases where both Add and Sub classes exist and the binaryop. apply method has two concrete implementations. Then you have to rely on the following conditional de-ashing, which converts virtual method calls into direct calls by adding several type comparisons to the code.

The idea is simple: the dynamic type of the caller is compared in turn to the type recorded in the Type Profile collected by the Java virtual machine. If it does, the target method corresponding to the record type is called directly.

  public static int test(BinaryOp op) {
    return op.apply(2.1);
  }
Copy the code

Let’s continue with the previous example. Assuming that the compile-time type Profile records the caller’s two types Sub and Add, the just-in-time compiler can use this to de-hash the conditions, compare the caller’s dynamic type in turn to Sub or Add, and inline the corresponding methods. The pseudocode is as follows:

  public static int test(BinaryOp op) {
    if (op.getClass() == Sub.class) {
      return 2 - 1; // inlined Sub.apply
    } else if (op.getClass() == Add.class) {
      return 2 + 1; // inlined Add.apply
    } else{...// What if the type in the type Profile does not match?}}Copy the code

If the record in the type Profile matches the corresponding object type, the just-in-time compiler can do method inlining. Let’s see what happens:

  public static void main(String[] args) {
  	Add add = new Add();
    Sub sub = new Sub();
    for (int i = 0; i < 20000; i++) { test(add); test(sub); }}Copy the code

The compiled result is:

Look directly at the inline result as follows:

If the caller’s dynamic type does not match after iterating through all the records in the type Profile, the just-in-time compiler has two options.

First, if the type Profile is complete, that is, all dynamic types that have ever occurred are recorded in the type Profile, then the just-in-time compiler can let the program optimize to collect the type Profile again.

Second, if the type Profile is incomplete, that is, some of the dynamic types that have occurred are not recorded in the type Profile, then re-collection is of little use. At this point, the just-in-time compiler can have the program make the original virtual call, make the call through the inline cache, or bind it dynamically through the method table.

As for the above two points, I found only one alternative to test, which is to disable the compilation of apply method in Sub class, so that the Profile cannot be typed and does not contain Sub information. Also, since the compilation is manually disabled, the compiler does not collect Profile information.

Again based on the above test code, but with the following parameters:

-XX:CompileCommand=exclude,com/msdn/java/javac/jit/Sub::apply -XX:+PrintCompilation
Copy the code

The compilation result is as follows:

We also use the PrintInlining argument to output the inline information.

extension

De-optimizations due to class loading

Add and Sub are initialized at the same time. If you change the location of the Sub initialization, something unexpected can happen.

public static void main(String[] args) throws InterruptedException {
  Add add = new Add();
  for (int i = 0; i < 20000; i++) {
    test(add);
  }
  Thread.sleep(2000); 
  System.out.println("Loading Sub");
  Sub sub = new Sub();
}
Copy the code

Output compilation results found:

The loading time of Sub class is later than the call of test method. It can be seen from the compilation result that test method is de-optimized due to class loading. As to why de-tuning occurs, here’s why: A call to the apply method in the test method has been de-blurred to a call to the Add.apply method. If the Java virtual machine loads the Sub class later in the run, then this assumption is invalid and the Java virtual machine needs to trigger the test method to optimize the compilation result.

A little bit confused

We already created the Add and Sub classes, so let’s create a Mul class that also inherits BinaryOp.

public class Mul extends BinaryOp {

  @Override
  public int apply(int a, int b) {
    returna * b; }}// Test the code
public static void main(String[] args) {
  Add add = new Add();
  Sub sub = new Sub();
  Mul mul = new Mul();
  for (int i = 0; i < 20000; i++) { test(add); test(sub); test(mul); }}Copy the code

Also output the compilation result, as shown below:

Continue to look at inline messages

As a result, there is no inline information, so calls to the Apply method are either called through the inline cache or dynamically bound through the method table.

After further testing, when the number of loops was increased to 400,000, the Apply method was inlined again, and when the compilation result was output, the main method also triggered real-time compilation. I don’t know the details and why this happened. If the big guy knows this knowledge, hope not stingy give advice.

conclusion

Method inlining refers to the optimization method that, when a method call is encountered during compilation, the body of the target method is included in the compilation scope and replaced by the original method call.

Method inlining has many rules. In addition to rules that enforce inlining and inlining, the just-in-time compiler determines whether a method call can be inlined based on the number of layers of the method call, the heat of the program path where the method call instruction is located, and the number and size of the target method call.

Complete de-ashing transforms virtual method calls into direct calls through type inheritance relationship analysis. The key is to prove that the target method of the virtual method call is unique.

Conditional deashing transforms virtual method calls into individual type tests and direct calls to the required type by adding type comparisons to the code. It makes use of the type Profile collected by the Java virtual machine.

The relationship between Optimized Virtual_Call and Inline (HOT) is related. What is the triggering scenario of optimized Virtual_Call? Second, virtual method inline extension second point, I have doubts about the results of written test cases.

Other skills

JITWatch

JITWatch is a visual tool for viewing JIT behavior.

download

Clone the project from Github and run it.

MVN clean compile test exec: Java //gradle gradlew clean build runCopy the code

I tested it and found that the JITWatch window opened after running showed garbled characters, so I had to change it. (Maybe I just encountered this problem personally, this method is the easiest, I recommend to try)

2. Download the jitwatch.sh file from pan.baidu.com/s/1i3HxFDF.

Decompress the downloaded package to obtain the following information:

Modify the jitwatch.sh file:

JITWATCH_HOME="/Users/xxx/Downloads/JITWatch/jitwatch-master/lib"; JITWATCH_JAR = "/ Users/XXX/Downloads/JITWatch/JITWatch - 1.0.0 - the SNAPSHOT. Jar" Java - cp $JAVA_HOME/lib/tools. The jar: $JAVA_HOME/jre/lib/JFXRT jar: $JITWATCH_JAR: $JITWATCH_HOME/hamcrest - core - 1.3. The jar: $JITWATCH_HOME / Logback - classic - 1.1.2. Jar: $JITWATCH_HOME/logback - core - 1.1.2. Jar: $JITWATCH_HOME/slf4j - API - 1.7.7. Jar org.adoptopenjdk.jitwatch.launch.LaunchUICopy the code

Then open the cli and run the following command:

JITWatch % ./jitwatch.sh
Copy the code

For usage records, please refer to the following articles:

Test the jitWatch for the first time

Java assembly instruction viewer tool jitWatch

The JIT Profile artifact JITWatch

reference

Geek time Zheng Yudi “In-depth disassembly of Java virtual machine” method inline

Basic skills | Java compiler principle analysis and practice

In-depth Understanding of the Java Virtual Machine Third Edition