Troubleshooting Java online service faults

1. Related to service logs

If abnormal application system, generally represented by the business log statistics on the day of the number of business ERROR in the log: egrep ERROR -- color logname | wc -l, if the ERROR number is too big, there is a problem commonly to check the ERROR in the log after 10 lines specific ERROR: Egrep - A 10 ERROR logname | less, or - C 10 10 lines before and after check the ERROR logCopy the code
  • In Java, all exceptions inherit from the Throwable class, a fully available class. The Exception category is further divided into UncheckedException (inherited from RuntimeException) and CheckedException (inherited from Exception, Does not inherit from RuntimeException. Common Exception keywords are ERROR and Exception
ERROR: AssertionError, OutOfMemoryError, StackOverflowError UncheckedException: AlreadyBoundException, ClassCastException, ConcurrentModificationException, IllegalArgumentException, an IllegalStateException, IndexOutOfBoundsException, JSONException, NullPointerException, SecurityException, UnsupportedOperationException CheckedException: ClassNotFoundException, CloneNotSupportedException, FileAlreadyExistsException, FileNotFoundException, InterruptedException, IOException, SQLException, TimeoutException, UnknownHostException # above reference: http://www.importnew.com/27348.htmlCopy the code
  • Run the sed command to intercept logs in a specific time range and filter abnormal keywords as follows:
Sed -n '/ start time /,/ end time /p' log file sed -n '/2018-12-06 00:00:00/,/2018-12-06 00:03:00/p' logname # With grep again after filtering the corresponding keyword sed - n '/ 2018-12-06 08:38:00, $p logname | less to the current log # # query time specified ps: ban the use of vim directly open the log fileCopy the code

2. Database related

Java applications have many bottlenecks in the database, a SQL is not written well and leads to slow queries, which may lead to the suspension of the entire application

  • Note that Could not get JDBC Connection, JDBCException appears in the log
  • Reference: docs.jboss.org/hibernate/o…

In this case, you need to check the database connection request, whether the number of connections is too large, whether a deadlock occurs, and check the database slow log to locate the SPECIFIC SQL

3. JVM related

Java VIRTUAL machine (JVM) related problems are generally the following: long GC time, OOM, deadlock, thread block, thread number inflation, etc. The following tools can be used to locate the problem.

  • JDK monitoring and troubleshooting tools: JPS, jStack, jmap, jstat, jConsole, jinfo, jhat, Javap, btrace, TProfiler
JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS JPS However, JPS is used to display Java processes and can be understood as a subset of PS. Jstat JVM Statistics Monitoring Tool jstat is a command line Tool used to monitor various virtual machine health information. It displays classloading, memory, garbage collection, JIT compilation, and other running data in local or remote virtual machine processes. The jinfo Configuration info for Java command is used to view and adjust VM parameters in real time. Jhat JVM Heap Dump Browser is used to analyze heapdump files. It creates an Http/HTML server. Jstack Stack Trace for Java displays a snapshot of the vm thread. Use --help to see how often the command is used:  jps -v jstat -gc 118694 500 5 jmap -dump:live,format=b,file=dump.hprof 29170 jmap -heap 29170 jmap -histo:live 29170 | more jmap -permstat 29170 jstack -l 29170 |moreCopy the code

Reference connection:

  • JVM tuning performance monitoring tools JPS, jstack, jmap, the jhat, jstat using a: blog.csdn.net/wisgood/art…
  • The JVM performance monitoring tools commonly used the JPS, jstat, jinfo, jmap, the jhat, jstack:blog.csdn.net/u010316188/…
  • JVM five series: [of] the JVM monitoring & tools: www.cnblogs.com/redcreen/ar…
  • The JVM series 5: monitoring command (jvisualvm JPS jstat jmap the jhat jstack jinfo) and dump heap memory snapshot analysis: blog.csdn.net/xybelieve19…
  • Jstat JVM and studying method of use: www.cnblogs.com/parryyang/p…
  • The jstat command is used to check the GC status of the JVM (using Linux as an example) : www.cnblogs.com/yjd_hycf_sp…
  • Java process CPU high screening: www.cnblogs.com/Dhouse/p/78…
  • Stackify.com/java-perfor…
  • Stackoverflow.com/questions/9…

3.1 OOM related

When an OOM fault occurs, services generally crash and service logs contain outofMemoryErrors.

  • OOM is generally a memory leak, need to check the OOM when the JVM heap snapshot, assuming that configuration – XX: + HeapDumpOnOutOfMemoryError, in the event of a OOM in – XX: HeapDumpPath heap dump file. With MAT, dump files can be analyzed. Find out the cause of OOM.
  • About MAT use not detailed, Google a pile (inter12.iteye.com/blog/140749…). .
Ensure that the disk space of the server is larger than the memory size. 2. Manually dump heap snapshots. You can run the jmap -dump:format=b,file=file_name PID or kill -3 PID commandCopy the code

3.2, the dead lock

A deadlock occurs when two or more threads are waiting on each other for resources. The phenomenon is usually the occurrence of thread hung live. In more serious cases, the number of threads will spike, and the system will give an API Alive alarm. The best way to look for deadlocks is to analyze the thread stack at the time.

JPS -v jstack -l pid JPS -v jstack -l pidCopy the code

The number of threads has skyrocketed

jstack -l pid |wc -l jstack -l pid |grep "BLOCKED"|wc -l jstack -l pid |grep "Waiting on condition"|wc -l Thread block problems are usually caused by waiting for I/O, network, or monitor lock, which may result in request timeout or spike in the number of threads, resulting in system 502. If such a problem occurs, it is mainly concerned with the status information such as BLOCKED, Waiting on condition, Waiting on monitor entry that comes out of JStack. Assume a large number of threads are "waiting for monitor entry" : a global lock may be blocking a large number of threads. Assume that thread dump files printed within a short period of time reflect. As time goes by. The increasing number of threads waiting for monitor entry without decreasing trend may mean that some threads stay in the critical section for so long that more and more new threads are unable to enter the critical section. Assume that a large number of threads are in "waiting on condition" : it may be that they run to obtain the Response from the third party again, which leads to a large number of threads entering the waiting state. If a large number of threads are found to be in Wait on condition, this may be a sign of a network bottleneck, as threads cannot run due to network congestion.Copy the code

3.4. Gc takes too long

  • Reference: www.oracle.com/technetwork…

4. The Server itself is faulty

  • Check whether CPU, Memory, I/O, or Network is faulty
  • Common commands are top/htop, free, iostat/iotop, and netstat/ss
Focus on network connections: check the number of TCP states: netstat - an | awk '/ ^ TCP / {+ + S [$NF]} END {for (S) in a print a, S [a]}' view some service port most IP connection: Netstat na | grep 172.16.70.60:1111 | awk '{print $5}' | the cut - d: - | f1 sort | uniq -c | sort - rn | head - 10Copy the code
  • Reference: www.cnblogs.com/mfmdaoyou/p…