tags: java, troubleshooting, monitor


In a word: Java application online problems such as CPU is too high, memory overflow, I/O is too high how to troubleshoot problems, this article describes in detail for you.

1 the introduction

After a Java application is put into operation, it is inevitable that there will be various problems. Generally speaking, the problems can be divided into four categories:

  • (1) CPU-related problems
  • (2) Memory-related problems
  • (3) Disk and IO related problems
  • (4) Business code problems.

How to monitor and troubleshoot these problems online is a necessary skill for a Java developer. The following will illustrate the troubleshooting routines of these problems in combination with the Java command line tools mentioned above.

2 CPU Troubleshooting Routine

If the system becomes slow or slow, the first thing to check is the CPU usage. Generally, processes occupy too much CPU, so the CPU usage needs to be monitored. In Java applications, CPU usage is mainly related to the running of threads. The corresponding command line tool is JStack. Therefore, the problem of high CPU usage can be summarized as follows:

#(1) Querying the IDS of processes with high CPU usage (PID)
top -c

#(2) Understand the startup parameters of the processPs - ef | grep PID or jinfo - flags PID
#(3) Print thread stack information and output files
jstack -l PID > PID.dump

#(4) Search thread ID (TID) by process
top -H -p PID

#(5) Obtain the hexadecimal number of TID
printf "%x\n" TID

#(6) Find problems by combining TID and thread stack information files- You can run the grep tid-a20 pid. dump command to check the status of the threadCopy the code

For more information about jStack tools and thread state, see the article “Java Application Monitoring (3)- Are you Mastering these Command-line Tools?”

3 Memory Troubleshooting Routines

The memory problem is mainly caused by OOM (out of Memory) occurring during Java application running. Therefore, it is recommended to add several parameters when Java application is started. Including – Xloggc: file – XX: XX: + HeapDumpOnOutOfMemoryError HeapDumpPath = logs/heapdumps hprof – XX: ErrorFile = logs/java_error_ % p.l og. In this way, when OOM occurs, the dump file can be used to analyze the cause of OOM. The Java command line tools related to memory problems include jmap and jstat. Therefore, troubleshooting procedures for OOM problems are as follows:

#(1) Java application process (PID)Jps-lvm or top-C
#(2) Understand the process startup parameters (especially -xms, -xmx, etc.)Ps - ef | grep PID or jinfo - flags PID
#(3) Confirm the memory condition
jmap -heap PID

#(4) Search for large objects that occupy memory
jmap -histo:live PID 

#(5) Dump heap files for tool analysis
jmap -dump:file=./heap.hprof PID

#(6) Check GC changes and print once per second as follows
jstat -gc PID 1000 

#(7) Analyze OOM and GC by combining error information of log files and dump heap files- Objects are frequently created but are not released. Optimize codes. - Young GC frequency is too highCopy the code

About OOM, official documentation about OOM (https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks002.html), which is mainly divided into the following categories:

  • java.lang.OutOfMemoryError: Java heap spaceThe heap memory footprint has been reached-XmxSet the maximum value to which new objects cannot be created, simply by adjusting them-XmxParameter to resolve.
  • java.lang.OutOfMemoryError: GC Overhead limit exceeded, indicating that the GC is executing and the Java process is running slowly, this exception is usually thrown because the Java heap is allocated so little space that new data cannot be put into the heap. Consider resizing the heap, and if you want to turn off this output, use arguments to turn it off-XX:-UseGCOverheadLimit.
  • java.lang.OutOfMemoryError: Requested array size exceeds VM limitIf a Java application tries to allocate an array larger than the heap size, an error will be reported if the heap size is 256 MB and the array size is 512 MB. Consider resizing the heap or modifying the code
  • java.lang.OutOfMemoryError: MetaspaceWhen the amount of native memory required for class metadata exceeds MaxMetaSpaceSize, consider adjusting MaxMetaSpaceSize.
  • java.lang.OutOfMemoryError: request size bytes for reason. Out of swap space?This error is reported when an allocation from the native heap fails and the native heap may be nearing exhaustion, and you need to look at the log to handle it.
  • java.lang.OutOfMemoryError: Compressed class spaceJVM non-heap structure, class pointer space is insufficient, consider usingCompressedClassSpaceSizeTo adjust.
  • java.lang.OutOfMemoryError: reason stack_trace_with_native_methodThe JVM’s local method area is insufficient. Allocation failures are detected in Java native Interface (JNI) or native methods and need to be queried by looking up the corresponding stack information.

4 Troubleshooting Routines for Disk and I/O Problems

During the running of Java applications, logs may be generated and disk read/write operations may occur, and various problems may occur, such as insufficient disks (too many logs are generated), slow disk read/write I/OS, and excessive I/OS frequency. Generally speaking, the investigation can be carried out according to the following routines:

#(1) Check the disk capacity
df -h

#(2) Check the file size and directory sizeLs -l or du -h --max-depth=1
#(3) Check the I/O status and find the PROCESS PID that has frequent I/O reads and writesIotop -d 1 #1 second or iostat -d -x -k 1 #1 second
#(4) Use stack to print thread stack information and check IO related codes

#(5) Sometimes if you want to test the read/write speed of a disk (especially a virtual machine), you can use DD
#Example: Test the pure write speed of the data volume mounting directoryDd if=/dev/zero of=/ Data volume directory /test.iso bs=8k count=1000000Copy the code

5 Service troubleshooting Routine

Service problems are mainly related to the code logic. They are mainly about querying log output and checking whether the method is executed according to the correct logic. Therefore, the troubleshooting routines are as follows:

#(1) Real-time log output query
tail -fn 100 log_file

#(2) Locate the fault based on the keyword in the log outputGrep keyWord log_file # Grep -c n keyWord log_file #
#(3) log files are analyzed using visual text tools (notepad++, sublime, large files viewed like EmEditor)

#(4) Use online tools to directly detect method parameters, return values, anomalies, etc., such as Btrace, arthas, etc.

Copy the code

Diagnostic tools for Java online problems include Btrace and arthas, which will be covered in subsequent articles in this series.

6 summarizes

This paper divides the problems encountered in Java application line into four categories, which are (1) CPU-related problems, (2) memory-related problems, (3) disk and IO related problems, and (4) business code problems. For all kinds of problems, according to certain routines, combined with Java command line tools and online diagnostic tools, it is very easy to troubleshoot Java applications. Hopefully, this article will be helpful to Java developers.

The resources

  • Understand the OutOfMemoryError Exception: https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks002.html

  • The technology of sunflower treasure dian really hardcore: https://mp.weixin.qq.com/s/NJPXFMgbwXWkzVLDK12Gfg

  • The most comprehensive Java service question screen routines: https://mp.weixin.qq.com/s/SuFPeWxtjHdXAcu6hkmBlA

  • The JDK tools reference documentation: https://docs.oracle.com/javase/8/docs/technotes/tools/unix

  • The sample code address: https://github.com/mianshenglee/my-example/tree/master/java-monitor-example

reading

  • Java Application Monitoring (1)- Application monitoring techniques that Java programmers should know about:https://juejin.cn/post/6844903923111690248
  • Java Application Monitoring (2)- Secrets of Java commands:https://juejin.cn/post/6844903923363348488
  • Java Application Monitor (3)- Do you master these command line tools:https://juejin.cn/post/6844903923845693447