JVM real-world tuning is easy to understand

This is the second in a series of articles on JVM tuning in action. The last article was about JVM tuning tools. For those of you who haven’t seen it, see: JVM Tuning in Action.

Before that, I wrote an article about how to brush into the Third edition of the JVM Virtual Machine. What was it like to brush into the Third Edition of the JVM virtual Machine in three days

This article mainly explains the case, mainly explains the case: disk insufficient check, CPU soaring check, OOM exception check, stack overflow check, deadlock check.

In fact, the above is mainly troubleshooting, which can only be regarded as one aspect of the problem solving. In addition, based on the previous GC article, the selection of GC in each usage scenario, as well as the JVM parameters used, and the common JVM parameter tuning are explained.

So let’s have a mind map, so that we can have an overall understanding of this article. I think an article can only be solved with at least 10,000 words:

The mind map on the right is what we talked about in the last one, and on the left is what we are going to talk about in this one.

Insufficient Disks

In fact, insufficient disk troubleshooting is a system, program level problem troubleshooting, not the JVM, but on the other hand, considering that the system disk is insufficient, can also cause the JVM to run abnormal, so the disk is also included.

And check the disk is insufficient, is relatively simple, is a few commands, and then is layer by layer, the first command is df -h, query the disk status:

From the above display one of the first lines uses the 2.8g maximum, followed by the mount in/Directory, let’s go straight tocd /.

Then by executing:

du -sh *
Copy the code

Look at the size of each file and find the largest file, or the same size and very large files, and delete the large files that are not useful.

Then, is directly CD to the corresponding directory is executed: du -sh *, so a layer of execution, find the corresponding useless, and then the file is relatively large, can be directly deleted.

Check the CPU Overload

Then it is to check the reason for the high CPU, the high CPU check is to directly find the corresponding process with the highest CPU ratio, and then find the highest CPU thread.

To summarize the possible causes of CPU elevation, it may be a GC thread frequently or lock resources contention frequently, too many threads and other reasons.

Frequent GC threads
Lock contention is frequent (spin)

There are frequent GC threads, which may be caused by large objects (too many objects), memory leaks and other reasons, which cause memory stress. GC is always executed, but the garbage collected each time is very small.

General CPU tension, are online implementation of the investigation, and the general factory will have their own self-developed monitoring platform, our own monitoring platform, for each of our server health status (health points), the service period of the application (Mysql, Redis, Mq, Kafka, service) will carry out the monitoring alarm, So you can usually fix problems before they happen.

As mentioned online, you can use the top and jstack commands to troubleshoot CPU spikes. Here is a case code as follows:

public class CPUSoaring { public static void main(String[] args) { Thread thread1 = new Thread(new Runnable(){ @Override  public void run() { for (;;) { System.out.println("I am children-thread1"); } } },"children-thread1"); Thread thread2 = new Thread(new Runnable(){ @Override public void run() { for (;;) { System.out.println("I am children-thread2"); } } },"children-thread2"); thread1.start(); thread2.start(); System.err.println("I am is main thread!!!!!!!!" ); }}Copy the code

(1) Run the top command to view that the process whose ID is 3806 has the highest CPU usage.

(2) Then run the top-hp PID command to find the thread with the highest CPU usage:

Printf ‘%x\n’ tid = xid; printf ‘%x\n’ tid = xid

(4) finally through: jstack pid | grep xid – A 30 command is the output of the thread stack, A thread’s location:

(5) You can also run the jstack -l PID > file name. TXT command to output the thread stack information to the file for offline viewing.

This is a cpu-busting process to find out where the highest CPU-consuming threads are, and then review your code to find out where the problem is.

Using Arthas’s tools is the same. First use the top command to find the Java process with the highest CPU usage. Then use Arthas to enter the process and use the dashboard command to identify the threads with the highest CPU usage. , and finally the thread information through the thread command.

OOM Troubleshooting

In this case, we need to set the following two parameters:

+ HeapDumpOnOutOfMemoryError - - XX: XX: HeapDumpPath = ${} directoryCopy the code

When an OOM exception occurs in the project, the heap dump file will be automatically exported and analyzed by Visual VM.

First, let’s talk about what causes OOM exceptions, from the JVM partition perspective:

The Java heap
Methods area
The virtual machine stack
Local method stack
Program counter
Direct memory

Only the program counter area will not appear in OOM. In Java 8 and above, all metadata (local memory) will appear in OOM.

From the point of view of program code, there are several reasons why OOM exceptions can occur:

Memory leaks
Too many objects
Way too long
Overuse of the proxy framework, generating a lot of class information

In this case, if you want to dump a heap file on OOM, you need to import the heap dump file on OOM. If you want to dump a heap file on OOM, you need to import the heap dump file on OOM.

import java.util.ArrayList; import java.util.List; class OOM { static class User{ private String name; private int age; public User(String name, int age){ this.name = name; this.age = age; } } public static void main(String[] args) throws InterruptedException { List<User> list = new ArrayList<>(); for (int i = 0; i < Integer.MAX_VALUE; i++) { Thread.sleep(1000); User user = new User("zhangsan"+i,i); list.add(user); }}}Copy the code

The code is simple. You add objects to the collection. When you bring in the heap dump file, you can see the classes with the most instances in the class and instance column:

In this way, you can find the class that caused the OOM exception. You can also use the following method to view the thread stack information that caused the OOM exception and find the corresponding code section of the exception.

This is the last step in the line of defense. How to prevent OOM exceptions before this?

Generally, large factories will have their own monitoring platform, which can monitor the test environment, preview environment, online implementation of service health status (CPU, memory) and other information. For frequent GC and poor memory recovery after GC, we should pay attention to it.

Because the usual method length is reasonable, and more than 95% of objects are ephemeral, with only a small number of surviving objects after Minor GC, you should avoid long methods with large objects at the code level.

Every time I finish writing the code and check it by myself, I can submit it to a senior engineer to review my code, so that the problems of the code can be found in time. Basically, the code is ok, and more than 90% of the problems can be avoided. This is also the habit of big factories to pay attention to the quality of the code and review the code all the time.

Stack overflow

Stack overflow troubleshooting (including virtual machine stack, local method stack) is basically the same as a OOM troubleshooting, export the abnormal stack information, and then use mat or Visual VM tool offline analysis, find the abnormal code or method.

StackOverflowError StackOverflowError occurs when the stack depth of a thread request is larger than the allowed size of the virtual machine stack. From a code point of view, the reason for the thread request depth to be too large may be: there are too many objects in the method stack, or too many methods, so that the local variable table is too large, beyond the -xss parameter setting.

Deadlock screening

Deadlock case demo code is as follows:

public class DeadLock { public static Object lock1 = new Object(); public static Object lock2 = new Object(); public static void main(String[] args){ Thread a = new Thread(new Lock1(),"DeadLock1"); Thread b = new Thread(new Lock2(),"DeadLock2"); a.start(); b.start(); } } class Lock1 implements Runnable{ @Override public void run(){ try{ while(true){ synchronized(DeadLock.lock1){ System.out.println("Waiting for lock2"); Thread.sleep(3000); synchronized(DeadLock.lock2){ System.out.println("Lock1 acquired lock1 and lock2 "); } } } }catch(Exception e){ e.printStackTrace(); } } } class Lock2 implements Runnable{ @Override public void run(){ try{ while(true){ synchronized(DeadLock.lock2){ System.out.println("Waiting for lock1"); Thread.sleep(3000); synchronized(DeadLock.lock1){ System.out.println("Lock2 acquired lock1 and lock2"); } } } }catch(Exception e){ e.printStackTrace(); }}}Copy the code

The above code is very simple, two classes of instance as the lock resource, and then open two threads, different order of lock resources, and acquire a lock resource, wait three seconds, so that the other thread has enough time to acquire another lock object.

After running the above code, a deadlock is reached:

For deadlock detection, if in the test environment or local, you can directly use Visual VM to connect to the process, the following interface will automatically detect the existence of deadlock

And view the thread stack information. You can see the specific deadlocked thread:

You can use Arthas online or you can use the original command, which can be used firstjpsView the specific Java process ID before passingjstack IDCheck the process’s thread stack, and it will also automatically notify you of deadlocks:

The Arthas tool can detect deadlocks using the thread command, focusing on threads in the BLOCKED state, as shown in the figure below:

For detailed parameters of thread, see the figure below:

How do I avoid deadlocks

We’ve talked about how to troubleshoot deadlocks, and we’ll talk about how to avoid deadlocks. From the above example, we can see that a deadlock occurs when two threads hold resources that the other thread does not release.

Therefore, at the code level, to avoid deadlock, we can mainly start from the following four aspects:

The first thing to avoid is that threads lock resources in a consistent order
And to avoid the same thread to multiple resources for resources contention.
In addition, for the acquired lock resources, try to set the expiration time to avoid exceptions. If no lock resources are released, you can use acquire() method to set the timeout parameter.
Finally, use third-party tools to detect deadlocks and prevent online deadlocks.

Deadlock detection has been said, the above is basically a problem detection, can also be part of tuning, but for JVM tuning, the most important part of the Java heap tuning is the most important part of tuning.

Let’s talk about Java heap tuning.

Tuning the purpose

Before we talk about GC heap tuning, let’s talk about the purpose of tuning. There may be different opinions about GC heap tuning, but LET me just say my opinion about tuning.

I think there are two main purposes of tuning, and it has to have a premise. The premise is that your system has to be tuned. What does that mean? If your service runs slowly, responds slowly, has low throughput, or even has OOM exceptions, then this needs to be tuned.

If your service is performing well, responsive, and throughput is high, then there is no need to tune it, and you risk doing the opposite.

Then, the purpose of tuning is to first make your service run well, respond quickly (within acceptable response times), or have high throughput. To do this, you need to have a reasonable GC time (STW), a reasonable number of GC times, and a reasonable number of Minor GC times (every few hours). Full GC is reasonable (once a day or several days).

The second purpose is to solve the problem, such as OOM, so the purpose of tuning is to prevent OOM exception.

Tuning theoretical indicators

What are the tuning metrics, and what metrics are there to measure whether the service is running better than it was before you tuned it?

Average response time, throughput (throughput = CPU time in user application/(CPU time in user application + CPU garbage collection time), generally GC throughput should not be less than 95%)

The two measures for mapping to the bottom of the JVM is STW GC pause time (the longer the pause time means that the user thread wait, the longer the pause time will directly affect the user use the experience of the system), as well as the garbage recycling frequency (generally the lower the frequency of garbage collection is, the better, the process of garbage collection is very CPU resources).

Of course, you can’t just go for fewer GCS, because fewer GCS might make a single GC longer, which might increase the “pause time” of a single GC, so you need to strike a balance.

Tuning of actual combat

The above said the purpose of tuning and tuning indicators, then we will practice tuning, first prepare my case code, as follows:

import java.util.ArrayList; import java.util.List; class OOM { static class User{ private String name; private int age; public User(String name, int age){ this.name = name; this.age = age; } } public static void main(String[] args) throws InterruptedException { List<User> list = new ArrayList<>(); for (int i = 0; i < Integer.MAX_VALUE; i++) { Tread.sleep(1000); System.err.println(Thread.currentThread().getName()); User user = new User("zhangsan"+i,i); list.add(user); }}}Copy the code

Example code is very simple, is to continue to add objects to a collection, first we start the command:

java   -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:+PrintHeapAtGC -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=50M -Xloggc:./logs/emps-gc-%t.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./logs/emps-heap.dump OOM
Copy the code

I simply set some GC logs to print, and then look at the GC display by Visual VM:You can see that after a period of time, four Minor GCS occurred at 29.648ms, and one Full GC occurred at 41.944ms.

Minor GC is very frequent, as is Full GC, several times in a short period of time. Looking at the output log and the Visual VM display, it is because the memory is not set and is too small to cause the Minor GC to be frequent.

Therefore, we appropriately increase the size of the Java heap a second time, with the tuning set as follows:

java -Xmx2048m -Xms2048m -Xmn1024m -Xss256k  -XX:+UseConcMarkSweepGC  -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:+PrintHeapAtGC -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=50M -Xloggc:./logs/emps-gc-%t.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./logs/emps-heap.dump OOM
Copy the code

After a period of observation, the results are as follows:

The number of Minor GCS decreased significantly, but the number of Full GCS still occurred, according to the log print, because of the memory shortage of the meta space.

Therefore, the third time set the meta space to a larger locale and change the GC collector to CMS, setting the following parameters:

java -Xmx2048m -Xms2048m -Xmn1024m -Xss256k -XX:MetaspaceSize=100m -XX:MaxMetaspaceSize=100m -XX:+UseConcMarkSweepGC  -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:+PrintHeapAtGC -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=50M -Xloggc:./logs/emps-gc-%t.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./logs/emps-heap.dump OOM
Copy the code

After observing the same time, the Visual VM display looks like this:

At the same time, neither a Minor GC nor a Full GC occurred, so I consider it tuned.

But tuning is not just about increasing the memory, it is about balancing the areas. You can increase the memory appropriately, and change the GC type, for example, when removing thread.sleep (1000) from the above case code.

Then look at the Visual VM diagram, as follows:

As you can see, Minor GC is also very frequent, because the code itself increases memory until the OOM exception. This is not the case in real life. It is possible that when the memory increases to a certain level, it will be balanced in a range.

When we take the above situation and increase the memory appropriately, the JVM parameters are as follows:

java -Xmx4048m -Xms4048m -Xmn2024m -XX:SurvivorRatio=7  -Xss256k -XX:MetaspaceSize=300m -XX:MaxMetaspaceSize=100m -XX:+UseConcMarkSweepGC  -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:+PrintHeapAtGC -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=50M -Xloggc:./logs/emps-gc-%t.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./logs/emps-heap.dump OOM
Copy the code

For the same amount of time, it is true that Minor GC is reduced, but the time is increased because the replication algorithm, which is mostly alive, requires a lot of performance and time to replicate:

Therefore, there are trade-offs in tuning, to achieve a balance point, performance, state to reach the best is OK, there is no optimal state, this is the basic law of tuning, and tuning is also a fine work, the so-called slow work out fine work, need to spend a lot of time, slowly adjust, constantly do comparison.

Tuning parameters

Ok, for the JVM tuning, I have finished, the following to you sort out the JVM tuning parameters commonly used, convenient for you to use, directly paste copy

The heap

-Xms1024m Sets the initial heap size
-Xmx1024m Sets the maximum heap size
-xx :NewSize=1024m Sets the initial size of the young generation
-xx :MaxNewSize=1024m Sets the maximum value of the young generation
-xx :SurvivorRatio=8 Proportion of Eden and S
-xx :NewRatio=4 Sets the ratio between the old generation and the new generation
-xx :MaxTenuringThreshold=10 Sets the age condition for promotion

The stack

-Xss128k

dimension

-xx :MetasapceSize=200m Sets the initial space size
-xx :MaxMatespaceSize=200m The default maximum metaspace size is unlimited

Direct memory

-xx :MaxDirectMemorySize Sets the direct memory capacity, which is the same as the maximum heap size by default.

The log

-xloggc :/opt/app/ard-user/ard-user-gc-%t.log Sets the log directory and log name
-xx :+UseGCLogFileRotation Enables log scroll generation
-xx :NumberOfGCLogFiles=5 specifies the NumberOfGCLogFiles to be scrolled. The default value is 0
-xx :GCLogFileSize=20M GC file scrolling size. You need to enable UseGCLogFileRotation
-xx :+PrintGCDetails Enables recording GC log details (including GC type, time used for each operation) and prints out the MEMORY usage of the JVM at the end of the program
-xx :+ PrintGCDateStamps Records the GC time of the system
-xx :+PrintGCCause Causes GC (default enabled)

GC

Serial Garbage collector (New Generation) Start:

– XX: + UseSerialGC closed:
-xx: -useserialgc // UseSerial for the new generation and SerialOld for the old generation

Parallel Scavenge opens

– XX: + UseParallelOldGC shut down
-xx: -useParalleloldgC Parallel application. Apply the ParallelOld collector

ParallelOl Garbage collector (old) enabled

– XX: + UseParallelGC shut down
-xx: -useParallelGC Parallel application. Apply the Parallel Old collector

ParNew Starts the garbage collector (new generation)

– XX: + UseParNewGC shut down
-xx: -useparnewgc // The new generation uses the function ParNew the old generation uses the function CMS

The CMS garbage collector (old) is enabled

– XX: + UseConcMarkSweepGC shut down
-XX:-UseConcMarkSweepGC
-xx :MaxGCPauseMillis GC pause time, which the garbage collector tries to achieve in various ways, such as reducing the young generation
– XX: + UseCMSCompactAtFullCollection used in CMS to FullGC collector has to open the memory fragments merging finishing process, because the memory consolidation must move live objects, (in front of the Shenandoah and ZGC) cannot be concurrent.
-xx: CMSFullGCsBefore -xx: CMSFullGCsBefore -xx: how many times to compacted after FullGC. The default value is 0.
– XX: CMSInitiatingOccupancyFraction trigger FullGC when use the old s reached the proportion, the default is 92.
– XX: + UseCMSInitiatingOccupancyOnly above that use this parameter collocation, is said to have been using the above scale trigger FullGC, If set, will only use – XX: when first FullGC CMSInitiatingOccupancyFraction value, after will be automatically adjusted.
-xx :+CMSScavengeBeforeRemark starts a MinorGC before FullGC. The purpose is to reduce the reference of the older generation to the younger generation and reduce the overhead of CMSGC in the marking phase. Generally, 80% of CMSGC is spent in the marking phase.
– XX: + CMSParallellnitialMarkEnabled by default initial marker is single-threaded, this parameter can let him multithreaded execution, can reduce the STW.
The -xx :+CMSParallelRemarkEnabled was recallmarked using multiple threads to reduce the STW.

G1 The garbage collector is enabled

– XX: + UseG1GC shut down
-XX:-UseG1GC
-xx: G1HeapRegionSize specifies the size of each Region. The value ranges from 1MB to 32MB
-xx: MaxGCPauseMillis sets the pause time for the garbage collector. The default is 200 milliseconds. It is usually reasonable to set the expected pause time to one or two hundred milliseconds or two or three hundred milliseconds

Well, THE TUNING of the JVM basically finished, will continue to share their own in-depth learning and understanding of the JVM virtual machine, there is a need to “Deep IN the JVM virtual Machine third edition” e-book can add my wechat: Abc730500468, I’ve marked the key parts of the book so that you can brush and how you do it and there are techniques that you can discuss with each other. All right, I’m Lidu, and I’ll see you in the next issue.