This is the twenty-second article in the In-depth Learning ABOUT the JVM series

There are three main attributes that affect garbage collection performance, and there are three basic principles of garbage collection tuning, and the information that needs to be collected during garbage collection tuning. If you want to tune garbage collection, you need to make trade-offs between different attributes, understand the tuning principles, and what information to collect.

Performance attributes

throughput

Throughput is one of the important indicators to evaluate the capability of the garbage collector. It refers to the amount of work that can be performed per unit of time, regardless of the pause time or memory consumption caused by garbage collection. Typically, increased throughput comes at the expense of increased latency and/or memory footprint.

Generic requirements for Throughput look like this:

  • Must handle a million orders a day;
  • 2,500 transactions should be executed per second

As you can see, the throughput requirement is not for a single operation, but for how many operations the system must complete in a given amount of time.

delay

Latency is a measure of the time elapsed between an application receiving an incentive to perform some work and the completion of that work. Latency requirements ignore throughput. Often, increased responsiveness or reduced latency comes at the expense of reduced throughput and/or increased memory footprint.

Delay metrics are usually described as follows:

  • The application should execute the trade request within 60 milliseconds to complete
  • 90% of order payment operations must be processed within 3 seconds

This type of performance metric ensures that GC pauses cannot be used temporarily for too long while the program is working, or the metric will not be met.

Memory footprint

Memory footprint is a measure of the amount of memory required to run an application at some throughput, some latency, and/or some level of availability and manageability. Memory footprint is typically expressed as the number of Java heaps required to run the application and/or the total amount of memory required to run the application. In general, increasing memory footprint by increasing the Java heap size can improve throughput or reduce latency, or both. Throughput or latency is often sacrificed as the memory available to an application decreases.

For example, if the system is deployed on a server with less than 1 GB memory, computing resources cannot be squandered at will. Therefore, the memory usage must meet the latency and throughput requirements.

The principle of

  • Minor GC collection principles. Each Minor GC collects as many garbage objects as possible, reducing the frequency of Full GC. Full GC duration is always the longest, affecting throughput and latency performance.
  • GC memory maximization principle. Under budget, the larger the heap, the better the throughput and latency performance. This also means that the new generation can be resized to better cope with the creation rate of short-lived objects and reduce the number of allocations promoted to the old age.
  • Choose two from three. Choose any two of the above three performance attributes (throughput, latency, memory footprint) for JVM garbage collection tuning.

Regarding what information needs to be collected for GC tuning, and how to view it, we will first look at GC logs from the command line, followed by visual tools to capture key information.

GC Log Analysis

We need to pre-set the GC log with the JVM parameter (default: JDK8), which usually has the following JVM parameter Settings:

-xx :+PrintGC prints GC logs -xx :+PrintGCDetails Prints GC logs -xx :+PrintGCTimeStamps Prints GC timestamps (in the form of baseline time) -xx :+PrintGCDateStamps Outputs the timestamp of the GC (in the form of a date, for example2013-05-04T21:53:59.234+0800) -xx :+PrintHeapAtGC prints heap information before and after GC. -xx :+PrintReferenceGC prints soft references, weak references, virtual references, and Finallize queues. -xloggc :.. /logs/gc.log Output path of the log fileCopy the code

The following parameters are used to print the log, based on JDK8, using the Parallel Scavenge collector and Parallel Old collector by default.

@Data
@AllArgsConstructor
@NoArgsConstructor
public class Goods {

  private String name;
  private Double price;
  private String[] types;
}

//-Xms60M -Xmx60M -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
public class OutOfMemoryTest {

  public static void main(String[] args) throws InterruptedException {
    Map<String, Goods> map = new HashMap<>();
    int counter = 1;
    while (true) {
      Thread.sleep(10);
      Goods goods = new Goods();
      String[] types = new String[counter];
      for (int i = 0; i < types.length; i++) {
        types[i] = "type" + i;
      }
      goods.setName("hresh" + counter);
      goods.setPrice(Double.valueOf(counter));
      goods.setTypes(types);
      map.put(goods.getName(), goods);
// if (counter % 100 == 0) {
// System.out.println("put" + counter);
/ /}counter++; }}}Copy the code

Part of the Full GC log is as follows:

16.619: [Full GC (Ergonomics) [PSYoungGen: 15359K->15359K(17920K)] 40924K->40924K(40960K)] 56284K->56284K(58880K), [Metaspace: 3768K->3768K(1056768K)], 0.0320617 secs] [Times: User =0.31 sys=0.00, real= 0.03secs] 16.651: [Full GC (Ergonomics) [PSYoungGen: 15359K->15359K(17920K)] [ParOldGen: 40926K->40926K(40960K)] 56286K->56286K(58880K), [Metaspace: 3768K->3768K(1056768K)], 0.0337364 secs] [Times: User =0.31 sys=0.00, real= 0.03secs] 16.685: [Full GC (Ergonomics) [PSYoungGen: 15359K->15359K(17920K)] [ParOldGen: 40927K->40927K(40960K)] 56287K->56287K(58880K), [Metaspace: 3768K->3768K(1056768K)], 0.0322237 secs] [Times: User =0.30 sys=0.00, real= 0.03secs] 16.717: [Full GC (Ergonomics) [PSYoungGen: 15359K->15359K(17920K)] [ParOldGen: 40929K->40929K(40960K)] 56289K->56289K(58880K), [Metaspace: 3768K->3768K(1056768K)], 0.0314023 secs] [Times: User =0.30 sys=0.01, real= 0.04secs] 16.749: [Full GC (Ergonomics) [PSYoungGen: 15359K->15359K(17920K)] [ParOldGen: 40931K->40931K(40960K)] 56291K->56291K(58880K), [Metaspace: 3768K->3768K(1056768K)], 0.0327116 secs] [Times: User sys = = 0.31 0.00, real = 0.03 secs]Copy the code

Refer to this article for the meaning of THE Minor and Full GC for GC logs.

By analyzing the above output results, it can be known that:

  • From 16.619 to 16.749, 4 Full GC occurred, which took about 130 ms, and there were 4 GC pauses.
  • For example, GC at 16.749 takes 310 ms and pauses 0.0327116 s, but since it is a multi-core CPU machine and uses parallel GC, the pause time is about 30ms;
  • The total duration of the four GC pauses is about 130ms. There is some doubt here. Normally, the pause duration should be less than the running duration.
  • After GC is completed, almost all of the old time space (40960K) is still occupied (40931K).

If the old space is almost full and not freed after GC, an OOM exception will be raised after a while.

The memory footprint of the older generation has not improved after GC and is still close to full, causing a performance bottleneck, either because the memory is too small or there is a memory leak. In addition to looking at the memory usage of older generations, you can also look at the pause times for each GC (the STW we studied earlier, referred to in the log as real). If the total pause times are too high, it can damage the throughput of the system. If individual GC pauses are too long, the system’s latency metrics can be affected.

In addition to configuring parameters at program startup, you can use the JDK’s own commands to view GC information while the program is running.

Jstat command

The jstat command (help document) can be used to print performance data for the target Java process. It contains multiple sliver commands, as follows:

% jstat -options-class // print data related to class loading -compiler // print data related to real-time compilation // subcommand starting with GC, -gccapacity // Displays the capacity and usage of each generation. -gccause // displays information about garbage collection (pass-gcutil), as well as the cause of the last or currently occurring garbage collection -gcMetacapacity // Displays metaspace size. -gcNew // displays new generation information. -gcnewCapacity // Displays new generation size and usage. -gcold // Displays old and permanent generation information // Displays the size of the old age -gcutil // displays garbage collection information -printcompilation // prints the data related to real-time compilationCopy the code

By default, jstat prints only one-time data.

// JDK8
% jstat -gc 67537
 S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC     MU    CCSC   CCSU   YGC     YGCT    FGC    FGCT     GCT   
2560.0 2560.0  0.0   1223.1 15360.0   3956.4   40960.0      8.0     4864.0 3055.9 512.0  322.2       1    0.001   0      0.000    0.001
Copy the code

In practice, if we definitely want to observe performance data for a process over time, we can configure it to print every once in a while until the target Java process terminates or the maximum number of times we have configured it to print. Specific examples are as follows:

// jdk8% jstat -gc 67558 1s 4 S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT 2560.0 2560.0 0.0 1255.1 15360.0 3956.4 40960.0 8.0 4864.0 3055.9 512.0 322.2 1 0.004 0 0.000 0.004 2560.0 2560.0 0.0 1255.1 15360.0 3956.4 40960.0 8.0 4864.0 3055.9 512.0 322.2 1 0.004 0 0.000 0.004 2560.0 2560.0 0.0 1255.1 15360.0 3956.4 40960.0 8.0 4864.0 3055.9 512.0 322.2 1 0.004 0 0.000 0.004 2560.0 2560.0 0.0 1255.1 15360.0 3956.4 40960.0 8.0 4864.0 3055.9 512.0 322.2 1 0.004 0 0.000 0.004 2560.0 2560.0 1255.1 15360.0 3956.4 40960.0 8.0 4864.0 3055.9 512.0 322.2 1 0.004 0.000 0.004 0Copy the code

The preceding command format is as follows:

jstat -outputOptions [-t] [-hlines] VMID [interval [count]]
Copy the code
  • OutputOptions: refers to the subcommands mentioned above, one or more outputOptions;

  • T: Displays the timestamp column as the first column of the output. The timestamp is the time since the target JVM was started.

  • Lines: Each sample (output row) displays a column header, where n is a positive integer. The default value is 0, which is the column header that displays the first row of data.

  • VMID: Virtual machine identifier, which is a string indicating the target JVM. See virtual machine identifiers. When monitoring Java processes in the local environment, VMID can be simply understood as PID.

  • Interval: sampling interval in specified units, seconds (s), or milliseconds (ms). The default unit is milliseconds. This has to be a positive integer.

  • Count: Specifies how many times to print a record. By default, the record is always printed

In the output of the -gc subcommand, the first four columns are Capacity and Utility, respectively, for the two Survivor zones. As you can see, both Survivor zones have the same capacity, and there is always one Survivor zone that has zero memory usage, as explained in the JVM garbage collection section.

If you want to know what each column means and how to calculate the output of the jstat command, there is a file called jstat_options in tools.jar in the JDK lib package that defines the output of each of the above types. For example, gc.

As shown in the figure above, we can open the file in IDEA. Here’s a snippet:

If the garbage collector uses the G1 collector, the output would be:

//JDK9
% jstat -gc 31678 1s 4
 S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC     MU    CCSC   CCSU   YGC     YGCT    FGC    FGCT     GCT   
 0.0   3072.0  0.0   3072.0 161792.0  2048.0   97280.0     9290.0   5376.0 5084.5 512.0  433.9       1    0.004   0      0.000    0.004
 0.0   3072.0  0.0   3072.0 161792.0  2048.0   97280.0     9290.0   5376.0 5084.5 512.0  433.9       1    0.004   0      0.000    0.004
 0.0   3072.0  0.0   3072.0 161792.0  2048.0   97280.0     9290.0   5376.0 5084.5 512.0  433.9       1    0.004   0      0.000    0.004
 0.0   3072.0  0.0   3072.0 161792.0  2048.0   97280.0     9290.0   5376.0 5084.5 512.0  433.9       1    0.004   0      0.000    0.004
Copy the code

As you can see, S0C and S0U are always 0, that is, one of the Survivor capacities and usage is always 0.

This is because, when G1 GC is used, the Java virtual machine no longer sets memory boundaries for Eden, Survivor, and old age regions, but instead divides the heap into several equal-length memory regions (regions).

Each Region can act as the Eden space of the new generation, Survivor space, or old chronospace as needed, and can switch back and forth between different Region types.

In other words, logically we only have one Survivor zone. When migrating data in a Survivor region (i.e., migrating GC), we simply apply for one or more additional memory regions as new Survivor regions.

Therefore, the Java virtual machine decides to store the total capacity and usage of all Survivor memory regions in S1C and S1U when using G1 GC, while S0C and S0U are set to 0.

How do we use the jstat command to analyze the performance of Java processes?

Let’s take the following performance output:

% jstat -gc 33678 1s 9
Timestamp  S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC     MU    CCSC   CCSU   YGC     YGCT    FGC    FGCT     GCT   
200.0       8448.0 8448.0 8448.0  0.0   67712.0  67712.0   169344.0   169344.0  21248.0 20534.3 3072.0 2807.7     34    0.720  658   133.684  134.404
201.0      8448.0 8448.0 8448.0  0.0   67712.0  67712.0   169344.0   169343.2  21248.0 20534.3 3072.0 2807.7     34    0.720  662   134.712  135.432
202.0      8448.0 8448.0 8102.5  0.0   67712.0  67598.5   169344.0   169343.6  21248.0 20534.3 3072.0 2807.7     34    0.720  667   135.840  136.559
203.0      8448.0 8448.0 8126.3  0.0   67712.0  67702.2   169344.0   169343.6  21248.0 20547.2 3072.0 2807.7     34    0.720  669   136.178  136.898
204.0      8448.0 8448.0 8126.3  0.0   67712.0  67702.2   169344.0   169343.6  21248.0 20547.2 3072.0 2807.7     34    0.720  669   136.178  136.898
205.0      8448.0 8448.0 8134.6  0.0   67712.0  67712.0   169344.0   169343.5  21248.0 20547.2 3072.0 2807.7     34    0.720  671   136.234  136.954
206.0      8448.0 8448.0 8134.6  0.0   67712.0  67712.0   169344.0   169343.5  21248.0 20547.2 3072.0 2807.7     34    0.720  671   136.234  136.954
207.0      8448.0 8448.0 8154.8  0.0   67712.0  67712.0   169344.0   169343.5  21248.0 20547.2 3072.0 2807.7     34    0.720  673   136.289  137.009
208.0      8448.0 8448.0 8154.8  0.0   67712.0  67712.0   169344.0   169343.5  21248.0 20547.2 3072.0 2807.7     34    0.720  673   136.289  137.009
Copy the code

Let’s look at the data in the first row:

  • Jstat connects to the JVM in 200 seconds after the JVM starts. This information is known from the “Timestamp” column in the first row.
  • The “YGC” column in the first row tells you that the young generation has performed 34 GCS, and the “FGC” column tells you that the entire heap has performed 658 full GCS.
  • The total GC time of the young generation is0.720 seconds, displayed in”YGCT“Column.
  • The total time of Full GC is133.684 secondsBy”FGCT“Column learned. This immediately caught our eye, the total JVM runtime was only 200 seconds,But 66% of that is consumed by the Full GC.

Look at the second row:

  • A total of four Full GC are performed in the next second. See the “FGC” column.
  • The four Full GC pauses took almost a second (according toFGCTThe difference between columns is known). Compared to the first line, the Full GC costs928 milliseconds, the total duration is 1s, then the Full GC is occupied92.8%The time.
  • According to the”OCAnd”OU“The column learned,The whole old spacefor169344.0 versus KB(” OC “), still occupied after 4 Full GC169344 KB(” OU “). With the928ms“, but only 800 bytes of memory was freed, which felt abnormal in every way.

The first two lines of data show that the program has a serious problem, and subsequent data confirms that the problem still exists.

Using the output of jstat, you can quickly discover the following problems:

  • You can compare the Java process startup time with the total GC time (GCT column), or the time between the two measurements and the increment of the total GC time to get a percentage of the elapsed time. This exposes the problem of excessive GC overhead. Different systems have different tolerances for GC overhead, depending on performance requirements and, in general, over10%All of the GC overhead is problematic.
  • Rapid changes in the “YGC” and “FGC” columns are also often a sign of trouble. Frequent GC pauses can accumulate and lead to more stop-the-world pauses, affecting throughput.
  • If you see in the “OU” column that the usage of the old age is approximately equal to the maximum capacity (OC) of the old age and does not decrease, then the old age GC has been performed but is essentially invalid.

In the future, we will continue to learn log analysis tools, so that we can observe GC state more intuitively. In the next article, we will be there.