33- AN SQL caused system to freeze (part 2)- Solution

We will continue to analyze and optimize the case mentioned in the previous article. First, we will review the corresponding results of the case mentioned above:

Every 20 seconds, the Eden region of over 300 MB is full to trigger a Young GC, which takes about 50 milliseconds.

Every 30 minutes more than 600 MB of space is occupied in the old days, which triggers a CMS GC, and a Full GC takes about 300 milliseconds.

At this point, to be honest, only according to the visual monitoring and inference is absolutely no further analysis, because we do not know why there are so many objects in the old age, so we need to further explore why there are so many objects in the old age!

Old age object exploration

After three days of continuous monitoring by jstat tool, we found that only a few objects entered the old age after each Yong GC. The remaining surviving objects after each Yong GC were only tens of MB, which could be saved into Survivor zone. However, due to the dynamic age judgment rule, Survivor zone was only 70MB. There will be a few dozen MEgabytes of older objects occasionally, but this will not make older Full GC happen so frequently. As shown below:

So what is it that causes our old days to fill up in 30 minutes and trigger Full CG?

Then we further looked at the data from scratch and found that every time the system ran for a period of time, 5,600MB of data suddenly entered the old age!

Add in the fact that every once in a while there are dozens of MB objects that are aged, and that’s just 68% of the age threshold that triggers Full GC!

After each collection, the 5,600MB object continues to be accessed at intervals, resulting in a Full GC every half an hour or so! This is the root cause of the system!

From the above analysis, the system suddenly entered the age of 5,600MB objects for only one reason: large objects!

Such large objects will not directly enter Eden area, but directly enter the old age, so the current situation is as shown in the picture below:

Locating large Objects

At this point, we just need to locate what object is causing the sudden entry and thus locate the problem in our code.

If you find that the system has suddenly increased hundreds of M objects into the old age, you can use jMAP to export a dump memory snapshot, and then use JHAT or Visual VM visualization tools for analysis.

The use of Jhat has been covered before, and the use of VisualVm will be briefly covered in a post at the bottom.

Finally, through the analysis of memory snapshot, it was found that hundreds of megabytes of objects were the data structures of several maps. Finally, it was found in the code that they were queried from the database and encapsulated.

Then began to gradually check all the corresponding SQl statements, the results finally found that there is indeed a SQl problem, under specific conditions will trigger the execution of the SQl will lead to the corresponding system stuck.

This SQL is very simple:

select * from table;
Copy the code

Yes, is not with any where conditions of the query statement, directly hundreds of thousands of data in the database all query out, leading to every other period of time directly want to allocate several hundreds of MB large objects in memory, and finally into the old age!

Case optimization

Let the development of the corresponding responsible SQL statement bug fix, SQL unconditional query should be careful, do not allow direct query of all data in the table, to avoid triggering problems under certain circumstances
Young generation is obviously too small, Survior region space is too small, 70MB can easily trigger dynamic age judgment, let the object into the old age

So the final optimized JVM parameters are as follows:

-Xms1536M -Xmx1536M -Xmn1024M -Xss256K 
-XX:SurvivorRatio=5 -XX:PermSize=256M
-XX:MaxPermSize=256M  -XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC 
-XX:CMSInitiatingOccupancyFraction=92
-XX:+CMSParallelRemarkEnabled 
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
Copy the code

Without changing the configuration of the machine, we increased the new generation memory from 512M to 700MB, and each Survior is about 150M, so that the remaining survial objects of Yong GC each time will not generally enter the old age.

In addition the parameter “- XX: CMSInitiatingOccupancyFraction = 92” adjustment to 92, to avoid the old s early trigger GC.

After the optimization of the above steps, the online operation of the system is basically one Yong GC per minute, with tens of milliseconds each time. The Full GC is almost rare, which occurs once every 10 days, with hundreds of milliseconds each time, with a very low frequency!

The following is an introduction and use of the Visual VM tools

Visual VM is introduced

VisualVM (All-In-one Java Troubleshooting Tool) is One of the most powerful running monitoring and fault handling programs, and has been the official mainforce of Oracle vm Troubleshooting for a long period of time.

Oracle has written the word “all-in-one” in VisualVM’s software specification, indicating that it will provide additional capabilities, such as Profiling, in addition to regular runtime monitoring and troubleshooting. VisualVM’s performance analysis capabilities are comparable to professional and fee-based Profiling tools such as JProfiler and YourKit.

VisualVM is a free visualization tool that integrates multiple JDK command line tools to provide you with powerful analysis capabilities for performance analysis and tuning of Java applications. These capabilities include generating and analyzing massive amounts of data, tracking memory leaks, monitoring garbage collectors, performing memory and CPU analysis, as well as supporting browsing and manipulation on MBeans.

What the Visual VM does

VisualVM is based on the NetBeans platform development tools, so from the beginning it has the ability to extend functionality through plug-ins. With plug-in extension support, VisualVM can:

This section describes how to display VM processes and their configuration and environment information (JPS and JINFO).
Monitor the application’s processor, garbage collection, heap, method area, and thread information (jstat, JStack).
Dump and analyze heap dump snapshots (JMAP, JHAT).
Method-level program run performance analysis to find out which methods are called the most and run the longest.
Offline program snapshot: Collects program runtime configuration, thread dump, memory dump and other information to create a snapshot, which can be sent to the developer for Bug feedback.
The endless possibilities brought by other plug-ins.

IDEA Plug-in Installation

When these two buttons appear above IDEA after installation, the installation is successful:

When the code is launched in the way of Visual VM, the corresponding window will pop up for display:

Use of Visual VM

The running status of CPU, memory, classes and threads can be monitored through the monitoring page, as shown in the following figure:

Dump file analysis and viewing

OpenCoder Technology Exchange group created!

In order to facilitate you to timely exchange and discuss your questions while reading and learning special articles, we hereby create this group. The purpose of this group is to create a learning atmosphere and a platform for you to exchange and discuss. There are also BATJ and other big factory leaders sitting in the group, sharing interviews, technical topics, the latest industry status, free video materials and so on

33- AN SQL caused system to freeze (part 2)- Solution

Visual VM is introduced

What the Visual VM does

IDEA Plug-in Installation

Use of Visual VM

Dump file analysis and viewing

Related Posts

[English] Use pgBadger to analyze PostgreSQL logs

HelloNative, the most detailed introduction to JNI ever

【 Fine 】 Thoroughly familiar with Hadoop-RPC framework