In-depth understanding of JVMS – Real JVM Tools (part 1)

preface

This article will focus on the common JVM tools, but there’s no point in talking about them because you’ll forget to use them, so this article will focus on using the tools to basically monitor and tune your code.

Previous review:

The previous section showed you how to interpret the log. You can search the web for some tuning case code and try it out yourself. You can see that even the same JDK version can have different effects on different machines, such as IDEA and Eclipse execution results, and objects generated by the JVM itself can affect the log results.

Reading logs is essential to mastering the JVM, and after you have mastered the basic reading capabilities, this article will talk about the practical tools of the JVM.

An overview of the

This article focuses on how to analyze and tune the JVM based on the original jstat command. Of course, not all cases can be tuned, so there are individual cases of tuning by fixing bugs in the code.

The reason why it is divided into top and bottom is to take into account the length of the article, personally do not like the article on the ten thousand words, but to analyze and explain the case of the scene is really a lot of text description, so ** these articles will be very tired! ** Please read when your energy level is high.

Common tools:

Tool briefly mentioned, in fact, it does not make sense to write out, few people will go to the special back command, more is how to use in the actual case is the focus.

Jstat command:

Syntax of the command:

Here are some common ones:

  • jstat -gc PIDView the current JVM process usage
  • jstat -gccapacity PIDAnalysis of heap memory
  • jstat -gcnew PID: GC analysis of the young generation, where TT and MTT can see the age and maximum age of the object surviving in the young generation
  • jstat -gcnewcapacitry PIDYoung generation analysis
  • jstat -gcold PIDOld age GC analysis
  • jstat -gcoldcapacitry PIDOld memory analysis
  • jstat -gcmetacapacity PIDAnalysis of metadata area

Parameter is introduced

 S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC     MU    CCSC   CCSU   YGC     YGCT    FGC    FGCT     GCT
10240.0 10240.0  0.0    0.0   81920.0  13107.5   102400.0     0.0     4480.0 776.9  384.0   76.6       0    0.000   0      0.000    0.000
Copy the code

The following parameters are explained according to the above parameters:

Jmap command

Key role: which object occupies the most memory?

Use case:

  • Jmap -heap PID: Prints the total capacity of the Eden region, used capacity, remaining capacity, etc.
  • Jmap -histo PID: To understand the distribution of objects, you can print the usage ratio of various class objects. Usually, string occupies the most
  • jmap -dump:live,format=b,file=dump,hprof PID: Displays the snapshot of the current heap

The Jhat command

Use case:

  • jhat dump.hprof -port 7000Using Jhat in browser analysis, you can use the browser to analyze the heap snapshot shown above

Common monitoring methods for online systems:

  1. Run jstat/ jmap/ jhat tools to see if the JVM is running properly before high or low peaks in the system.
  2. Zabbix, OpenFalcon, Ganglia, etc

Actual case Scenario

Note that the code is running on the Jdk8 version.

Before the specific tuning, it is necessary to explain the tuning scenario and simulated service scenario. Therefore, the basic service scenario needs to be explained before the actual operation of the case.

High concurrency APP system:

According to the previous tuning case, we do not introduce complex business at first, but start with a simple social APP. We all know that some social apps will pull some stars to broadcast in order to attract traffic. Where is the high concurrency scene of such apps?

We assume that if you like the star said suddenly in a certain social APP live, first of all, there is no doubt that you can download the APP, at the same time search into his personal home page, do you like the clear focus, will be what we imagine a personal home page, there is no doubt that is some individual dynamic logging or some pictures and words, etc.

Then we analyze again: One or two people to come to see at ordinary times, especially when traffic is less, we can even can be loaded from the database directly, but once this star is the promotion and very hot, no doubt will attract a big Bob people access, such as hundreds of QPS to enter, on the backend must be hold at this moment, there is no doubt that we need to introduce Redis, And give pictures and other very resource-consuming content stored in the picture server partial pressure, this time you can draw a simple structure diagram as follows:

Diagram analysis can know that here there will be a large number of small objects, we assume that each request will lead to 5 m text object or pictures of objects, so a new generation of 2 g there is no doubt that may be filled with moments (even Redis can share), there will be a problem here, is the new generation in found that there are many objects in the use of recycling, As a result, the new generation is rapidly moving into the old age, so there is no doubt that this case is about ensuring that the new generation has enough objects to store.

So this is the simplest case simulation, so why do we do this? Here are two parameters that often appear on the Internet

These two parameters mean to enable memory defragmentation after the CMS old recycle. Here we recall the actual effects of these parameters:

When CMS in the default the last step of concurrent cleaning (CMS perform four steps: initial tag, concurrent tags, tags again, concurrent finishing), if the among the remaining 8% of the trigger condition of 92% recycling old s heavily into the old s new generation object filled and cannot allocate, will trigger a FULL GC, this process is called: Conccurent mode fail. This will cause the serrial to come over and shout “Stop world” and do a single threaded recycle.

Key came, after the completion of a recovery, serrial fade into the background, the CMS will decide according to the above parameters, in the fifth when FULL GC for memory fragments, as to what is memory fragments, here to baidu tag – clean algorithm of faults can understand, also can see articles before the individual to understand:

An in-depth understanding of jVM-garbage collection algorithms

The 5 is actually someone recommended parameters online, because in most cases do not need to defragment every FULL GC to memory, but in fact the above scenarios are obviously cannot be set up so many times, because the instantaneous flow objects created by too much, we can sacrifice some old s finishing time, to realize the old s memory fragments neat, The number of FULL GC is reduced as far as possible, so that the user will not keep turning in circles when entering the home page.

E-commerce system:

This example was used earlier in this column, and is copied here:

Background:

Let’s say an e-commerce site has 20 visits per person per day, if you want to make hundreds of millions of requests you need 5 million requests per day, and if those 5 million people follow the standard of 10% ordering, then 500,000 people will make orders every day, and the orders will be paid within 4 hours according to 2/8 principle. Then the occupation at this time is about 500,000/4 hours == 500,000/14400, about 34 orders per second, in this case, it is found that the impact of the system will not be very big, the old recycling is about a few hours, completely acceptable.

High concurrency scenarios

However, in the seckill scenario, the situation is different, if 1000 orders in a second, how to deal with it? Let’s assume that if we have three machines, each machine needs to process at least 300 requests.

Calculate JVM consumption

Based on the above simulation scenario, assuming 300 requests per second based on 1KB per object, each machine will process about 300KB of memory. If the processing object of an order system is magnified 10 times, it will be 3000KB. If other operations such as order processing are included, it will need 3000KB = 30MB.

If the virtual machine stack is 1 MB each, then hundreds of threads require hundreds of meters of space. For a 4-core 8GB machine, allocate 4 GB to the JVM, and allocate 1 GB of the 4 GB to the VM stack 500M + M, method area: 256M, and out-of-heap memory: 256M. At the same time, the memory guarantee mechanism is enabled (no parameters need to be specified after JDK6) and the new generation and the old generation are allocated 1.5 GB each.

According to the above conversion, we find that if 30M objects come every second, then the EDEN region of about 1200M (8:1: It would take about 40 seconds to fill the new generation, assuming that every request would leave 200M or so surviving objects in the SURVIOR region, but we found that only 150M of the SURVIOR region would not be able to enter the SURVIOR region. According to the memory allocation guarantee mechanism, These objects are allocated to the old age, which means that some objects that are going to be garbage are in the old age too early to be recycled properly!

At this rate the new generation will fill up in less than a minute. For a 200M object, a maximum of 8 or 9 minor GCS will result in a Full GC, which means that 8 or 9 minutes will trigger the old gc, which is a very high probability, which can cause the system to stall and the user thread to stop.

However, if the Survior space is sufficient, then the collection into the Survior space at this point is basically also collected for garbage objects in the next minor GC.

Problem structure diagram:

According to the above introduction, the following problem structure diagram can be obtained:

How to optimize

Here directly say the idea of optimization:

  1. First, we need to expand Survior space or the size of the Cenozoic, say, to 2G space.
  2. You can also expand the memory space by adding machines
  3. Note that this is mainly computational business, so there is no need to ensure low latency, so use the ordinary generational collector CMS+ParNew.

Code simulation:

If we use code simulation, we do not need to use such a large memory space simulation, we use the following code to simply restore the above situation :(of course, the parameters are not 100% consistent, but can simulate the business scenario problems and solutions)

/** * -XX:NewSize=104857600 -XX:MaxNewSize=104857600 -XX:InitialHeapSize=209715200 -XX:MaxHeapSize=209715200 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=15 -XX:PretenureSizeThreshold=20971520 -XX:+UseParNewGC-XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc-%t.log *@author zxd
 * @versionV1.0.0 *@Package : com.zxd.interview.minorGc
 * @Description: Test case: Test garbage collection pin rate *@Create on : 2021/7/25 11:47
 **/
public class Demo1 {

    private static final int _1M = 1024 *  1024;

    public static void main(String[] args) throws Exception {
        Thread.sleep(30000);
        while(true){ loadData(); }}private static void loadData(a) throws Exception{
        byte[] data = null;
        // Loop to allocate 40M objects
        for (int i = 0; i < 4; i++) {
            data = new byte[10 * _1M];
        }
        data = null;
		// Survive 20M objects
        byte[] data1 = new byte[10 * _1M];
        byte[] data2 = new byte[10 * _1M];
        // Generate a temporary 20M guarantee YGC
        byte[] data3 = new byte[10 * _1M];
        data3 = new byte[10 * _1M];

        Thread.sleep(1000); }}Copy the code

From the code, we can see that we use an infinite loop to continuously allocate objects to the new generation, and trigger the recycling of the new generation. It can be seen that the Survior region is not able to hold the old age, so the surviving objects of the new generation will directly enter the old age.

-XX:NewSize=104857600 
-XX:MaxNewSize=104857600
-XX:InitialHeapSize=209715200
-XX:MaxHeapSize=209715200
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=15
-XX:PretenureSizeThreshold=20971520
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Xloggc:gc-%t.log
Copy the code

Here, the configuration object goes directly into the old age after 20M, which is usually not the case with the simulation code.

Running effect before tuning:

When starting the code, we need to immediately use the following command:

The first step is to view the current JVM thread by typing JPS:

34096 Jps
38820 Demo1
Copy the code

Then we need to immediately execute the Jstat command mentioned above:

jstat -gc 38820 1000 10

Then we can see the following results:

Let’s analyze the above results:

First of all, let’s take a look at the region S1. After 1 second, it has increased in size by 1634K, which is about 1.5m. It can be considered as an unknown object here, and we can see that it has been garbage collected in the next second. You can clearly see that a Young Gc has been executed at 0.016s, but of course this collection is hard to perceive. It is also clear from the recycling of the Cenozoic that our surviving objects go directly into the old age and do not enter the Survior region, resulting in the Survior region containing only small objects produced by the JVM. As a result, the 20M objects that survived after each Young GC entered the old age.

Then let’s look at the middle part: the EU part suddenly becomes 0.0. There is no doubt that all garbage objects are collected, but you can also see that the collection time of the new generation is even longer than that of the second. (For the last parameter, note that the collection time of the old generation is the second to last parameter.)

Why does it take less time to recycle the old than the new?

Can also be deduced from the business scene, actually the old s recovery is according to the new generation of recovery, but the new generation because there are too many objects into old age, according to the guarantee of object allocation principles need to be continually calculate all previous dynasties into the object of the old s average size, at the same time, this phase is equal to the new generation needs to wait for old age to determine recovery completed to operation, As a result, the new generation is recovering more slowly than the old

Running effect after tuning:

First, let’s look at the parameters after tuning:

-XX:NewSize=209715200 
-XX:MaxNewSize=209715200 
-XX:InitialHeapSize=314572800 
-XX:MaxHeapSize=314572800
-XX:SurvivorRatio=2  
-XX:MaxTenuringThreshold=15
-XX:PretenureSizeThreshold=20971520
-XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps 
-Xloggc:gc-%t.log
Copy the code

As we said before, we can enlarge the space of the new generation, or adjust the size of the Survior space, so here we mainly increase the heap memory to 300M, and change the Eden region ratio to 2:1:1. Other parameters are basically unchanged.

There are other parameters that can be set. The following are the optimized configurations:

A parameter is “- XX: + CMSParallelInitialMarkEnabled”, this parameter can be in the CMS “initial tag” stage of the garbage collector open multi-threaded concurrent execution.

The other parameter is “-xx :+CMSScavengeBeforeRemark”, which will try to execute the Young GC once before the CMS re-label phase.

Next, run the following command: jstat -gc 18032 1000 100 to check the current heap usage:

Here again, the meaning of the parameter header is emphasized:

Here is the result of the run

The time of FULL GC has changed to 0, and all the time is on Young GC. It can also be seen that the Survior region can store all the Eden region surviving objects at a time, which is consistent with the tuning results described in the previous case.

conclusion

The above two cases should be enough to illustrate the method of optimizing the new generation. Basically, jSTAT is used to look at the changes of the old generation of the new generation. Of course, the actual situation is certainly not so simple

Write in the last

Writing skills are really general, if there is any place I do not understand thanks for feedback, I will keep rewriting in my spare time to make as many people understand.

Consider:

When doing JVM tuning, consider the following questions:

  • Does your company have a JVM parameter template like this?
  • If you were an architect in your company, how would you customize a JVM parameter template for most of your business systems?
  • Does your company have a variety of machines?
  • How can YOU customize JVM parameter templates for machines with different configurations?
  • Does your company have a special type of system, like a very high concurrency or a very high data volume?
  • How do you optimize a special case system?