One month of JVM tuning and 400x performance improvement!

Through the efforts of more than a month, FullGC was optimized from 40 times per day to trigger once every 10 days, and the time of Younggc was also reduced by more than half. With such a large optimization, it is necessary to record the tuning process in the middle.

For JVM garbage collection, it has always been theoretical, and knowing how the new generation and the old are promoted is just enough for job interviews. Not long ago, the fullGC of the online server was very frequent, averaging more than 40 times a day, and the server was automatically restarted every few days, which indicated that the state of the server was very abnormal. Given such a good opportunity, of course, you should take the initiative to request for tuning. Server GC data before tuning. FULLGC is very frequent.

First of all, the configuration of the server is very general (2 cores, 4G), a total of 4 server clusters. The fullGC number and time per server are roughly the same. The boot parameters for several of the JVM cores are:

-Xms1000M -Xmx1800M -Xmn350M -Xss300K -XX:+DisableExplicitGC -XX:SurvivorRatio=4 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:LargePageSizeInBytes=128M -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC

-XMX1800M: Sets the maximum available memory for the JVM to 1800M. -XMS1000M: Sets the JVM initialization memory to 1000M. This value can be set to the same as -Xmx to avoid the JVM reallocating memory after each garbage collection is complete. -XMN350M: Set the young generation size to 350M. Total JVM memory size = young generation size + old generation size + persistent generation size. The size of persistent generation is generally fixed at 64m, so increasing the size of young generation will reduce the size of old generation.

This value has a significant impact on system performance, and Sun officially recommends a configuration of 3/8 of the entire heap. -xss300k: Sets stack size per thread. The size of each thread stack is 1M after JDK5.0, compared to 256K before. Adjust the memory size required by more applied threads. Reducing this value can spawn more threads for the same physical memory. However, the operating system still has a limit on the number of threads in a process, which cannot be generated indefinitely. The experience value is about 3000~5000.

First optimization

When you look at the parameters, you immediately see why the young generation is so small, how does it improve throughput, and it causes frequent triggering of YoungGC, so the young generation collection as shown above takes 830s. The initialization heap memory is not consistent with the maximum heap memory. It is recommended that these two values be set the same to prevent memory reallocation after each GC. Based on the previous knowledge, the first line optimization was performed: the new generation size was increased, and the initial heap memory was set to the maximum memory

-Xmn350M -> -Xmn800M -XX:SurvivorRatio=4 -> -XX:SurvivorRatio=8 -Xms1000m ->-Xms1800m

The idea behind changing the SurvivorRatio to 8 is to allow as much garbage to be recycled as possible in the next generation. After the configuration was deployed to two online servers (prod, prod2 and the other two remaining unchanged for easy comparison), after 5 days of operation, the GC results showed that YoungGC reduced the number of times by more than half and the time was reduced by 400s, but FullGC increased the average number of times by 41. Younggc works pretty much as you would expect, but FullGC doesn’t work at all.

Thus the first optimization failed.

Second optimization

During the optimization process, our supervisor found that there were more than 10,000 instances of an object T in memory, and these instances occupied nearly 20M of memory. Based on the use of the bean object, we can find the reason in the project: anonymous inner class reference, pseudocode as follows:

Public void doSmthing(T T){redis.addListener(new Listener(){public void onTimeout(){if(t.uccess ()){// Perform an action}}); }

Since the listener does not release the object after the callback and the callback is a timeout operation, the object T will not be recovered until the specified time (1 minute) has passed, so there are so many object instances in memory. After the memory leak is found in the above example, the error log file in the program should be checked first and all the error events should be solved first. After the re-release, the GC operation was basically the same, and although the memory leak was resolved a little, it could be explained that the root cause was not resolved and the server continued to reboot inexplicably.

Memory leak investigation

After the first tuning, a memory leak was found, so everyone started to investigate the memory leak. The first thing to do was to check the code. However, this kind of efficiency was quite low, and basically no problems were found. So when the line is not very busy to continue to dump memory, finally caught a large object.

There are more than 4W of these objects, all of which are identical ByteArrowRow objects, so you can verify that the data was generated when the database was queried or inserted. Then we conducted another round of code analysis. In the process of code analysis, our colleagues in operation and maintenance found that at a certain time of the day, the entry flow had increased several times, even as high as 83MB/s. After confirmation, we found that there was no such a large volume of business at present, and there was no function of file uploading. Consulted Ali cloud customer service also explained that is completely normal traffic, can rule out the possibility of attack.

Just when I was still in investigation of inlet flow problem, another colleague to find the fundamental reasons, turned out to be under a certain condition, can all unprocessed data specified in the query table, but as a result of the query time less added a module in the where clause for conditions, lead to query the number of more than 40, ten thousand, and by the log to see the time of the request and data, You can tell that this logic has already been executed. There are only 4W objects in the memory of the dump. This is because the dump happened to have so many objects, and the rest are still in transfer. This is also a good explanation of why the server restarts automatically.

After fixing this problem, the online server is running completely normal, using the parameters before tuning, running fullGC only 5 times for 3 days or so:

Tuning the third time

The problem of memory leakage has been solved, and the rest can continue tuning. After looking at the GC log, it is found that the last three gullGC occurrences occurred when the old age occupied less than 30% of the memory, but the fullGC occurred. So for a variety of data investigation, at: https://blog.csdn.net/zjwstz/… The default metaspace on the server is 21M, and metaspace accounts for about 200M when it is seen in GC log, so it is optimized. The following are the modified parameters of prod1 and prod2 respectively. PROD3, PROD4 remain the same

-Xmn350M -> -Xmn800M -Xms1000M ->1800M -XX:MetaspaceSize=200M -XX:CMSInitiatingOccupancyFraction=75

and

-Xmn350M -> -Xmn600M -Xms1000M ->1800M -XX:MetaspaceSize=200M -XX:CMSInitiatingOccupancyFraction=75

Prod1 and 2 are only different in size in the new generation, other things are the same. After running online for about 10 days, the comparison was made:

Prod2:

Prod3:

Prod4:

In contrast, the fullGC of 1,2 servers is far lower than that of 3, 4 servers, and the youngC of 1,2 servers is reduced by about half compared with that of 3, 4 servers. In addition, the efficiency of the first server is more obvious, except for the decrease in the number of YoungGC, and the throughput of the first server is higher than that of 3, which runs one day longer than that of YoungGC. The number of both (by the number of threads started) is higher, indicating that the throughput improvement of Prod1 is particularly significant. The optimization was declared successful by the number and time of GC, and the configuration of PROD1 was better, greatly improving the server throughput and reducing GC time by more than half.

Select * from prod1 where FullGC = 1;

The reason was not found in GC log. In the CMS remark, it only occupied about 660M, which should not be enough to trigger FullGC. In addition, the possibility of large memory object being promoted was also ruled out by the investigation of Younggc several times before. It also does not meet the conditions of GC. This also needs to continue to investigate, there are known welcome to point out, here first thanks.

conclusion

After more than a month of tuning, the following points are summarized:

FullGC more than once a day is definitely not normal
Memory leaks are investigated first when FULLGC is found to be frequent
Once the memory leak is resolved, there is less room for the JVM to tune as a learning tool, otherwise don’t invest too much time
If the CPU is found to be persistently high, you can consult the customer service of Aliyun after eliminating the code problem. During this investigation, it was found that 100% of the CPU was caused by the server problem, and it will be normal after the server migration.
Data query is also counted as the entrance of the server flow, if the access to business is not so large, and there is no attack problem can be investigated to the database
It is necessary to keep an eye on the GC of the server so that problems can be detected early

The above is a summary of the JVM tuning process over the last month or so. If you have any mistakes, you are welcome to correct them.

The original link: https://blog.csdn.net/cml_blo…

Copyright Disclaimer: This is an original post BY CSDN blogger CMLBeliever, under the CC 4.0 BY-SA copyright. Please include a link to the original source and this statement for republication.

Recent hot article recommended:

1.1,000+ Java interview questions and answers (latest version of 2021)

2. I finally got the IntelliJ IDEA activation code through open source project. How nice!

3. Ali Mock tool officially open source, kill all the Mock tools in the market!

4.Spring Cloud 2020.0.0 is officially released, a new and disruptive version!

5. “Java development manual (Songshan edition)” the latest release, fast download!

Feel good, don’t forget with thumb up + forward oh!

One month of JVM tuning and 400x performance improvement!

First optimization

Second optimization

Memory leak investigation

Tuning the third time

conclusion

Related Posts

[knowledge summary] sort

The encroachment of cloud native provides more possibilities for developers

> Supplement: Shallow Heap, Deep Heap and Memory Leaks