This is the nineteenth article in the Learn More ABOUT the JVM series

After the method inlining described above, JIT just-in-time compilation has another cutting-edge optimization technique: Escape Analysis. Let’s cut to the chase.

Escape analysis

First of all, we need to know that escape analysis is not a direct optimization method, but an analysis technique that provides the basis for other optimization methods by dynamically analyzing the scope of the object. To be specific,

Escape analysis is “a static analysis that determines the dynamic range of a pointer by analyzing where a pointer can be accessed in a program.” The just-in-time compiler of the Java virtual machine performs escape analysis on the newly created object to determine whether it has escaped from a thread or method. The just-in-time compiler determines whether an object has escaped in two ways:

  1. Whether an object is stored in the heap (a static field or an instance field of an object in the heap), once the object is stored in the heap, other threads can get a reference to the object, even if the compiler cannot track all the code locations that use the object.

    Simply put, a class variable or instance variable can be accessed by another thread. This is called thread escape, and there are thread safety issues.

  2. If an object is passed into unknown code, the just-in-time compiler will treat code that is not inlining as unknown because it cannot be sure that the method call will not store the caller or the parameters passed to the heap. In this case, the caller and parameters of the method call can be considered to have escaped. (Unknown code refers to method calls that are not inline.)

    For example, when an object is defined in a method, it may be referenced by an external method and passed as a parameter to another method, which is called method escape.

Method escape we can use an example to demonstrate:

// The StringBuffer object has a method escape
public static StringBuffer createStringBuffer(String s1, String s2) {
    StringBuffer sb = new StringBuffer();
    sb.append(s1);
    sb.append(s2);
    return sb;
  }

  public static String createString(String s1, String s2) {
    StringBuffer sb = new StringBuffer();
    sb.append(s1);
    sb.append(s2);
    return sb.toString();
  }
Copy the code

As for escape analysis techniques, I have thought about using code to show whether the object has escaped. For example, the above code can be regarded as escaping in createStringBuffer method according to theoretical knowledge, but we don’t know exactly what the situation is. Although the JVM has a parameter PrintEscapeAnalysis to display analysis results, this parameter can only be used for debugging of JDK of debug version. After several attempts, JDK of Debug version cannot be compiled. So looking at the results of escape analysis will wait until we learn more about JVM tuning.

Optimization based on escape analysis

The just-in-time compiler can perform optimizations such as synchronous elimination, on-stack allocation, and scalar substitution based on the results of escape analysis.

Sync elimination (lock elimination)

Thread synchronization itself is expensive, and the JIT compiler can use escape analysis to determine that if an object is determined not to escape from the thread and cannot be accessed by other threads, then there is no contention for reading and writing the object and the synchronization lock can be eliminated. You can enable EliminateLocks by running -xx :+EliminateLocks (enable by default). This unsynchronization process is called sync elimination, also known as lock elimination.

Let’s use an example to illustrate this situation and see when thread synchronization is required.

Start by building a Worker object

@Getter
public class Worker {

  private String name;
  private double money;

  public Worker(a) {}public Worker(String name) {
    this.name = name;
  }

  public void makeMoney(a) { money++; }}Copy the code

The test code is as follows:

public class SynchronizedTest {


  public static void work(Worker worker) {
    worker.makeMoney();
  }

  public static void main(String[] args) throws InterruptedException {
    long start = System.currentTimeMillis();

    Worker worker = new Worker("hresh");

    new Thread(() -> {
      for (int i = 0; i < 20000; i++) { work(worker); }},"A").start();

    new Thread(() -> {
      for (int i = 0; i < 20000; i++) { work(worker); }},"B").start();

    long end = System.currentTimeMillis();
    System.out.println(end - start);
    Thread.sleep(100);

    System.out.println(worker.getName() + "Total profit."+ worker.getMoney()); }}Copy the code

The result is as follows:

52 Hresh earned a total of 28,224.0Copy the code

It can be seen that the above two threads modify the money data of the same Worker object at the same time, and the read and write of the money field compete, resulting in incorrect final results. In cases like the one above, if the just-in-time compiler determines after escape analysis that an object has escaped, synchronization elimination optimization cannot be performed.

Let’s try a different object without escaping.

//JVM parameters: -xms60m -XMx60m -xx :+PrintGCDetails -xx :+PrintGCDateStamps
public class SynchronizedTest {

  public static void lockTest(a) {
    Worker worker = new Worker();
    synchronized(worker) { worker.makeMoney(); }}public static void main(String[] args) throws InterruptedException {
    long start = System.currentTimeMillis();

    new Thread(() -> {
      for (int i = 0; i < 500000; i++) { lockTest(); }},"A").start();

    new Thread(() -> {
      for (int i = 0; i < 500000; i++) { lockTest(); }},"B").start();

    longend = System.currentTimeMillis(); System.out.println(end - start); }}Copy the code

The following output is displayed:

56 Heap PSYoungGen total 17920K, used 9554K [0x00000007bec00000, 0x00000007c0000000, 0x00000007c0000000) eden space 15360K, 62%, informs [x00000007bec00000 0, 0 x00000007bf5548a8, 0 x00000007bfb00000) from space 2560 k, 0%, informs [x00000007c0000000 x00000007bfd80000 0, 0 x00000007bfd80000, 0) to space 2560 k, 0%, informs [x00000007bfb00000 0, 0 x00000007bfb00000, 0 x00000007bfd80000) ParOldGen total 40960 k, 2 0 k [0 x00000007bc400000, 0x00000007bec00000, 0x00000007bec00000) object space 40960K, 0%, informs [x00000007bc400000 0, 0 x00000007bc400000, 0 x00000007bec00000) Metaspace informs the 4157 k, capacity 4720 k, committed 4992K, reserved 1056768K class space used 467K, capacity 534K, committed 640K, reserved 1048576KCopy the code

Locking the newly created Worker object in the lockTest method has no practical significance. If the object is determined not to escape after escape analysis, synchronization elimination optimization will be carried out. Escape analysis is turned on by default in JDK8, let’s try to turn it off and see the output.

-Xms60M -Xmx60M  -XX:-DoEscapeAnalysis -XX:+PrintGCDetails -XX:+PrintGCDateStamps
Copy the code

The output becomes:

73 2022-03-01T14:51:08.825-0800: [GC (Allocation Failure)] [PSYoungGen: Secs] [Times: Times: 330k -> 330k] 330k -> 330k [Times: 330k] 330k -> 330k [Times: 330k] 330k -> 330k [Times: 330k] 330k -> 330k [Times: 330k] User =0.01 sys=0.00, real=0.00 SECS] Heap PSYoungGen total 17920K, used 16340K [0x00000007bec00000, 0x00000007C0000000, 0x00000007c0000000) eden space 15360K, 97%, informs [x00000007bec00000 0, 0 x00000007bfa8d210, 0 x00000007bfb00000) from space 2560 k, 56%, informs [x00000007bfd80000 x00000007bfb00000 0, 0 x00000007bfc67f00, 0) to space 2560 k, 0%, informs [x00000007bfd80000 0, 0 x00000007bfd80000, 0 x00000007c0000000) ParOldGen total 40960 k, 8 k, informs [0 x00000007bc400000, 0x00000007bec00000, 0x00000007bec00000) object space 40960K, 0%, informs [x00000007bc400000 0, 0 x00000007bc402000, 0 x00000007bec00000) Metaspace informs the 4153 k, capacity 4688 k, committed 4864K, reserved 1056768K class space used 466K, capacity 502K, committed 512K, reserved 1048576KCopy the code

After comparison, it is found that the execution time is longer, the memory footprint is larger, and garbage collection occurs at the same time.

However, lock elimination based on escape analysis is actually rare. Typically, developers do not directly lock newly constructed objects in a method, and locking in a lockTest method makes little sense, as shown in the above example.

In fact, the results of escape analysis are more often used to translate new object operations into on-stack assignments or scalar replacements.

Scalar replacement

When I explained the memory layout of Java objects, I mentioned that objects in the Java Virtual machine are allocated on the heap, and the contents of the heap are mostly visible to any thread (except TLAB). At the same time, the Java virtual machine needs to manage the allocated heap memory and reclaim the memory occupied by objects when they are no longer referenced.

If escape analysis proves that some newly created object does not escape, then the Java virtual machine can allocate it to the stack and automatically reclaim the allocated memory space by popping the stack frame of the current method when the method in which the new statement is placed exits. In this way, we do not need to use the garbage collector to deal with objects that are no longer referenced.

However, Hotspot does not implement true allocation on the stack, but instead uses scalar substitution as a technique.

Scalars are variables that can store only one value, such as local variables in Java code. Aggregations, on the other hand, may store multiple values simultaneously, a classic example being Java objects.

Scalars are called when a piece of data can no longer be decomposed into smaller representations, and primitive data types in the Java virtual machine (numeric types such as int, long, and reference types) can no longer be decomposed further. In contrast, if a piece of data can be further decomposed, it is called an Aggregate, and objects in Java are typically aggregates.

Scalar substitution is an optimization technique that can be thought of as replacing access to the fields of an object with access to local variables.

As shown in the following cases:

public class ScalarTest {

  public static double getMoney(a) {
    Worker worker = new Worker();
    worker.setMoney(100.0);
    return worker.getMoney() + 20;
  }

  public static void main(String[] args) { getMoney(); }}Copy the code

After escape analysis, the Worker object does not escape from the call of getMoney(), so the aggregation quantity Worker can be decomposed to obtain the local variable money, and the pseudocode after scalar replacement:

Public class ScalarTest {public static double getMoney() {double money = 100.0; return money + 20; } public static void main(String[] args) { getMoney(); }}Copy the code

After the object is split, the member variables of the object are changed to local variables of the method, and these fields can be stored on the stack or directly in registers. Scalar substitution relieves the burden of garbage collection by eliminating the need to create objects.

In addition, you can manually by XX: + EliminateAllocations can open the scalar replaced (default is open), – XX: + PrintEliminateAllocations (also need the debug version of the JDK) to view a scalar replacement.

On the stack

The name implies that Hotspot allocates objects on the stack, but currently Hotspot does not really allocate objects on the stack, it is actually a scalar substitution.

In general, memory allocation for objects and array elements is done on heap memory. But as JIT compilers mature, many optimizations make this allocation strategy not absolute. Based on the results of escape analysis, the JIT compiler can determine at compile time whether objects need to be created and whether heap memory allocation can be converted to stack memory allocation.

Partial escape analysis

The escape analysis of C2, independent of control flow, is relatively simple. Graal introduced an escape analysis related to control flow called partial escape analysis. It solves the case where the newly created instance escapes only in part of the program path.

The following code looks like this:

public static void bar(boolean cond) {
  Object foo = new Object();
  if(cond) { foo.hashCode(); }}// Can be manually optimized to:
public static void bar(boolean cond) {
  if (cond) {
    Object foo = newObject(); foo.hashCode(); }}Copy the code

Assuming that the condition of the if statement is true only 1% of the time, there is no need for the program to create new objects 99% of the time. A hand-optimized version of this is exactly what partial escape analysis is intended to achieve automatically.

According to the control flow information, the partial escape analysis will judge that the new object only escapes in some branches, and the new operation of the object will be postponed to the branch where the object escapes. This makes new object operations, which are unavoidable because of object escape, no longer appear in program paths that only perform if-else branches.

We verify this optimization indirectly with a complete test case.

public class PartialEscapeTest {
  long placeHolder0;
  long placeHolder1;
  long placeHolder2;
  long placeHolder3;
  long placeHolder4;
  long placeHolder5;
  long placeHolder6;
  long placeHolder7;
  long placeHolder8;
  long placeHolder9;
  long placeHoldera;
  long placeHolderb;
  long placeHolderc;
  long placeHolderd;
  long placeHoldere;
  long placeHolderf;

  public static void foo(boolean flag) {
    PartialEscapeTest o = new PartialEscapeTest();
    if(flag) { o.hashCode(); }}public static void main(String[] args) {
    for (int i = 0; i < 1000000; i++) {
      foo(false); }}}Copy the code

JDK11 is used in this test. The following parameters need to be configured to enable Graal compiler:

-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler
Copy the code

Output GC logs using the C2 compiler or Graal compiler as follows:

java -Xlog:gc* PartialEscapeTest
java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -Xlog:gc* PartialEscapeTest
Copy the code

Comparing GC logs shows that the memory footprint is inconsistent and smaller under the Graal compiler.

C2

[0.012s][info][GC,heap] Heap Region Size: 1M [0.017s][info][GC] Using G1 [0.017s][info][GC,heap, COOPS] Heap address: 0x0000000700000000, size: 4096 MB, Compressed Oops mode: Zero based, Oop shift amount: 3 [0.345s][info][GC,heap,exit] heap [0.345s][info][GC,heap,exit] garbage-first heap total 262144K, Used 21504K [0x0000000700000000, 0x0000000800000000) [0.345s][info][GC,heap,exit] region Size 1024K, used 21504K [0x0000000700000000, 0x0000000800000000) [0.345s][INFO][GC,heap,exit] Region Size 1024K, 18 Young (18432K), 0 Survivors (0K) [0.345s][info][GC, Heap,exit] Metaspace Used 6391K, Capacity 6449K, Committed 6784K, reserved 1056768K [0.345s][info][GC,heap,exit] Class space used 552K, Capacity 571K, committed 640K, reserved 1056768K [0.345s][INFO][GC,heap,exit] reserved 1048576KCopy the code

Graal

[0.019s][info][GC,heap] Heap Region Size: 1M [0.025s][info][GC] Using G1 [0.025s][info][GC,heap, COOPS] Heap address: 0x0000000700000000, size: 4096 MB, Compressed Oops mode: Zero based, Oop shift amount: 3 [0.611s][info][GC,start] GC(0) Pause Young (Normal) (G1 Evacuation Pause) [0.612s][info][GC,task] GC(0) Using 6 Evacuation workers of 10 for evacuation [0.615s][info][GC, Phases] GC(0) Pre Evacuate Collection Set: 37 ms [0.367 s][info][GC, Phases] gc (0) Evacuate Collection Set: Post Evacuate Collection Set: 0.1 ms [0.615s][info][GC, Phases] GC(0) Other: 0.6ms [0.615s][info][GC,heap] GC(0) Eden Regions: 24->0(150) [0.615s][info][GC,heap] GC(0) Survivor regions: 0->3(3) [0.615s][info][GC,heap] GC(0) Old Regions: 0->4 [0.615s][info][GC,heap] GC(0) Humongous Regions: [info][gc,metaspace] gc (0) metaspace: 8327K->8327K(1056768K) [0.615s][info][GC] GC(0) Pause Young (Normal) (G1 Evacuation Pause) 29M->11M(256M) 3.941ms [info][GC, CPU] GC(0) User=0.01s Sys=0.01s Real=0.00s Cannot use JVMCI compiler: No JVMCI compiler found [0.616s][info][GC,heap,exit] heap [0.616s][info][GC,heap,exit] garbage-first heap total 262144K, used 17234K [0x0000000700000000, 0x0000000800000000) [0.616s][info][GC,heap,exit] region Size 1024K, 9 Young (9216K), 3 Survivors (3072K) [0.616s][info][GC, Heap,exit] Metaspace Used 8336K, Capacity 8498K, Committed 8832K, reserved 1056768K [0.616s][info][GC,heap,exit] Class space used 768K, Capacity 802K, COMMITTED 896K, reserved 1056768K [0.616s][INFO][GC,heap,exit] reserved 1048576KCopy the code

To see Graal compiled on JDK11, run the following command:

java -XX:+PrintCompilation -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -cp /Users/xxx/IdeaProjects/java_deep_learning/src/main/java/com/msdn/java/javac/escape ScalarTest > out-jvmci.txt
Copy the code

conclusion

This article introduces escape analysis for just-in-time compilers in the Java Virtual Machine and optimizations based on escape analysis: synchronous elimination, scalar substitution, and on-stack allocation. There is also an extended look at partial escape analysis under the Graal compiler.

reference

Geek time Zheng Yudi “In-depth disassembly of Java VIRTUAL machine” escape analysis