preface

Troubleshooting online problems is inevitable for backend programmers, especially Java programmers. All kinds of HIGH CPU speed, memory overflow, frequent GC, etc., are a headache. The building Lord also encountered these problems, so, encountered these problems how to solve it? First of all, there is a problem, we must first locate the problem, and then analyze the cause of the problem, and then solve the problem, and finally summarize to prevent the next time.

1. The CPU speed is high

Online CPU high speed problem we should have encountered, so how to locate the problem?

First, find the Java process with a high CPU speed, since your server will have multiple JVM processes. Then locate the “problem thread” in that process, and finally locate the problem code based on the thread stack information. Finally, check the code.

How do you do that?

Use the top command to find the process with the highest CPU consumption and remember the process ID. Find the thread ID with the highest CPU consumption again by top-hp [process ID] and remember the thread ID. The jStack tool provided with the JDK dumps the thread stack information to the specified file. Jstack -l [process ID] >jstack.log. Since the thread ID is decimal and the thread ID in the stack information is hexadecimal, we need to convert the hexadecimal to hexadecimal and use this thread ID to look up the stack. You can convert base 10 to base 16 using printf “%x\n” [decimal number]. Find the corresponding thread stack from the stack information using the hexadecimal number just converted. You can see it in this stack. From the experience of the owner of the building, it is generally a business cycle without export, this situation can be repaired according to the business. What is a C2 compiler? When a piece of Java code has been executed more than 10,000 times (the default), it is changed from interpreted execution to compiled execution, that is, compiled to machine code to increase speed. And that’s what this C2 compiler does. How to solve it? Once the project is live, you can warm it up with the pressure gauge tool so that the C2 compiler doesn’t interfere with the application when the user actually accesses it. If it’s a GC thread, which is most likely Full GC, then GC tuning is needed.

2. Rectify memory faults

After CPU troubleshooting, let’s talk about memory troubleshooting. Usually, memory problems are GC problems, because Java memory is managed by GC. There are two cases, one is memory overflow, and one is memory overflow, but GC is not healthy.

Memory can be combined with the situation of – XX: + HeapDumpOnOutOfMemoryError parameter, the parameter is used to output in the process of memory dump file.

Dump files can be analyzed by dump analysis tools such as MAT, Jprofile, JVisualVM, etc. These tools can see exactly where the overflow, where a large number of objects were created, and so on.

The second case is more complicated. GC health issues.

What is the normal state of a healthy GC? According to the experience of the author, YGC is about 5 seconds once, each time is not more than 50 ms, FGC had better not, CMS GC is about once a day.

The optimization of GC has two dimensions, one is frequency and the other is duration.

If YGC is more than 5 seconds, or even longer, it means that the system memory is too large and the capacity should be reduced. If the frequency is too high, it means that Eden area is too small and Eden area can be increased. However, the capacity of the whole new generation should be between 30% and 40% of the heap. The ratio of “from” to “to” should be around 8:1:1, which can be adjusted according to how much the object is promoted.

What if YGC takes too long? YGC has two processes, one is scan and the other is copy. Usually the scan speed is very fast, but the copy speed is slower. If a large number of objects are copied each time, the STW time will be extended. YGC scans the HashTable data structure every time. If the data structure is large and has not been FGC, the STW will also be extended. In another case, the operating system’s virtual memory. When GC happens, the operating system is swapping memory, which also lengthens the STW.

If we look at FGC, actually, FGC we can only optimize frequency, not duration, because we can’t control the duration. How do you optimize the frequency?

First of all, there are several reasons for FGC: 1 is the Old area is out of memory, 2 is the metadata area is out of memory, 3 is system.gc (), 4 is jmap or JCMD, 5 is CMS Promotion failed or Concurrent mode failure. 6 Based on pessimistic strategy, JVM considers that the Old area cannot accommodate promoted objects after YGC, so YGC is cancelled and FGC is advanced.

The usual point of optimization is FGC due to insufficient Old memory. If there are still a large number of objects after FGC, it means that the Old area is too small and should be expanded. If the effect is good after FGC, it means that there are a large number of short-lived objects in the Old area. The optimization point should be to let these objects be removed by YGC in the new generation. Set the size of objects by parameter, do not let these objects into the Old zone, also need to check whether the promotion age is too young. If after YGC, a large number of objects are promoted early because they cannot enter Survivor zone, then the Survivor zone should be increased, but not too large.

These are all optimization ideas, and we also need some tools to know what’s going on with GC.

The JDK provides many tools, such as jmap, JCMD, etc. Oracle officially recommends using JCMD instead of Jmap, because JCMD does replace many functions of Jmap. Jmap can print distribution information of objects and dump files. Note that JMAP and JCMD dump files trigger FGC.

Another common tool is Jstat, which allows you to view GC details, such as memory usage in Eden, FROM, to, Old, and so on.

Another tool is JInfo, which allows you to see which parameters are currently being used by the JVM and also modify them without downtime.

Jprofile, JVisualVM, etc. These tools can analyze files dumped by JMap to see which objects use more memory, and usually can detect problems.

It is also important to keep GC logs on your online environment!!

Summary Based on the title of the article, this is the basic operation, troubleshooting is an endless topic, each fault involves a lot of knowledge, therefore, after learning the basic troubleshooting, we also need to learn more accident troubleshooting techniques, such as troubleshooting IO, network, TCP connection and so on.

Focus on Java technology dry goods: SSM, Spring Family bucket, Spring Boot, Spring Cloud, Intellij IDEA, Dubbo, Zookeeper, Redis, git, microservice, MySQL, cluster, distributed, middleware, message queue, Linux, network, multi-threading, operation and maintenance Jenkins, Nexus, Docker, ELK, public account development, Small program development, occasionally share some technical dry goods, interview questions, committed to Java full stack development!