preface

As programmers usually in addition to have the ability to complete the demand, we also need the ability to solve the problem, said the problem is not here refers to the business logic of the bug, but refers to the service service to relegation fault that caused or blowing, even if restart may solve the problem, we also need to positioning, do not allow service exists the problems.

Service faults fall into four categories: CPU, memory, disk, and network.

What we can do about problems is divided into three stages: before, during and after.

In advance, since we know there will be problems, we can do a good job of monitoring, warning and alarm before going online, and make anti-pressure estimation according to the current service to limit the flow, etc.

Prometheus can be used for monitoring & warning & alarm.

In the event, the alarm detects anomalies. If the alarm reaches the set threshold, the service can be degraded to avoid further deterioration and the alarm can be located.

It has been sent after the outage can only be fused to locate the problem.

Let’s talk about how to position and solve the four types of problems.

The CPU problem

The CPU usage is too high. If the CPU usage is too high, the system freezes or even restarts. The causes of high CPU (which can be prevented and resolved regardless of low CPU configuration) are as follows:

  • The program is not reasonable, such as allowing too many tasks to be processed at the same time, making the CPU switch back and forth, such as improper setting of the thread pool, deadlock and so on.
  • If traffic limiting is not done properly, the incoming interface cannot be processed.
  • Viruses, trojans, etc

The CPU usage can be checked by using the top command. If the CPU usage is Java (we mainly analyze Java here), we will analyze further. The causes of high CPU usage include too many incoming requests. The CPU is busy (multiple tasks and CPU switchover), too many asynchronous processing threads (multiple tasks and CPU switchover), deadlocks (threads are occupied), and some tasks are occupied for a long time.

A deadlock

Deadlocks can be analyzed using jStack in “JVM monitoring Tools – Command line tools”, which is written in this article. You can also view it through visualVM.

You can see the same as jStack, but this tool provides a visual page to see how long the thread lasts.

Let’s click on thread dump

CPU switches

With the PID obtained from top, you can use pidstat -w to view each context switch.

pidstat -w -p 2831
Copy the code

  • PID: process id
  • Cswch/s: number of active task context switches per second
  • Nvcswch/s: number of passive task context switches per second
  • Command: a Command name

Instead of looking at the Nvcswch/ S column and roughly perceiving the number of switches per second, if it’s large, we’re pretty sure there’s too much to do. If the limit is set, you need to analyze whether the response time of the interface becomes longer, resulting in the accumulation of tasks. If not, you need to check whether the asynchronous thread pool is not set well, and the number of concurrent threads is too many.

Some tasks take up a long time

Thread 1 displays the thread list first:

ps -mp <pid> -o THREAD,tid,time
Copy the code

Thread 28802 has been consuming CPU time for almost 2 hours

Second, convert the desired thread ID to a hexadecimal format (since jStack prints it in hexadecimal) :

printf "%x\n" <tid>
Copy the code

Finally, the thread stack is printed:

jstack <pid> |grep <tid> -A 30
Copy the code

memory

There are only two types of memory: memory leak and memory overflow.

A memory leak

A memory leak is an object that has been referenced by GC Root for a long time because of programming problems and cannot be released.

Heap leaks

Example:

Vm arguments: - Xms10m - Xmx10m - XX: - DoEscapeAnalysis - XX: XX: + HeapDumpOnOutOfMemoryError - HeapDumpPath = / Users/XXXCopy the code

Code:

package com.study.jvm.memory; import lombok.SneakyThrows; import java.util.ArrayList; import java.util.List; public class OOMTest { static class OOMObject { private Long l1; private Long l2; private Long l3; private Long l4; private Long l5; private Long l6; private Long l7; private Long l8; } @SneakyThrows public static void main(String[] args) { List<OOMObject> oomObjectList = new ArrayList<>(); for (int i = 0; i < 163840; ++i) { oomObjectList.add(new OOMObject()); if (i % 1000 == 0) { System.out.println(i); Thread.sleep(300); }}}}Copy the code

Output:

. 119000 120000 121000 java.lang.OutOfMemoryError: Java heap space Dumping heap to /Users/naigaipaopao/java_pid11879.hprof ... Heap dump file created [15989379 bytes in 0.112 secs] the Exception in the thread "is the main" Java. Lang. OutOfMemoryError: Java heap space at com.study.jvm.memory.OOMTest.main(OOMTest.java:25)Copy the code

Can see Java. Lang. OutOfMemoryError: Java heap space the heap space is insufficient, let’s look at visualVM:

Found that Eden and the old age are full so OOM.

Let’s have a look at the generated dump file:

conclusion

Actually it’s hard to say this is an example of leaks or overflows, specific to see the scene, the focus is on Java lang. OutOfMemoryError: Java heap space.

Out of memory

An overflow of memory means that there is not enough memory to allocate. Here are some common scenarios.

Metaspace overflow

Meta space: Holds class meta information, constants, static variables, generated bytecode (JIT, Cglib, etc.), and so on. (This is covered in JVM Memory Architecture.)

Class of unloading

If we want to simulate bursting metaspace, we need to know when the class will be unloaded from memory: Java’s default loaders, BootStrapClassLoader, ExtClassLoader, and AppClassLoader, are always referenced by the Java VIRTUAL machine. These classloaders, in turn, always refer to the Class object of the Class they load. So objects loaded by the default class loader will not be unloaded.

The sample

We can use the cglib example from static proxy, Dynamic Proxy, Cglib Proxy to fill the metaclass. We need to change main so that it always dynamically generates bytecode and limit the size of the metaclass, as shown below:

Vm arguments:

-XX:MetaspaceSize=10M
-XX:MaxMetaspaceSize=10M
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/Users/xxx
Copy the code

Example:

public class Student implements Person{ private String name; public Student(String name) { this.name = name; } public void giveMoney() {// system.out.println (name + "$50 "); }}Copy the code
import org.springframework.cglib.proxy.Enhancer; import org.springframework.cglib.proxy.MethodInterceptor; import org.springframework.cglib.proxy.MethodProxy; import java.lang.reflect.Method; public class StudentCglibProxy implements MethodInterceptor { private Class targetClass; public StudentCglibProxy(Class targetClass) { this.targetClass = targetClass; } public Object getProxyInstance(String name) {Enhancer en = new Enhancer(); // Set the parent class en.setSuperclass(targetClass); // Create en.setusecache (false) instead of using the one already in the cache; // Set the callback function en.setcallback (this); Return en.create(new Class[]{string.class}, new Object[]{name}); } @Override public Object intercept(Object obj, Method method, Object[] args, MethodProxy throws Throwable {// system.out.println (" proxy execution "+ method.getName() +" method "); Object returnValue = methodProxy.invokeSuper(obj, args); return returnValue; }}Copy the code
import org.springframework.cglib.core.DebuggingClassWriter; public class CglibProxyTest { public static void main(String[] args) { System.setProperty(DebuggingClassWriter.DEBUG_LOCATION_PROPERTY, "./"); While (true) {Student proxyInstance = (Student) new StudentCglibProxy(student.class). }}}Copy the code

Output:

CGLIB debugging enabled, writing to './' java.lang.OutOfMemoryError: Metaspace Dumping heap to /Users/naigaipaopao/java_pid10631.hprof ... Heap dump file created [19588747 bytes in 0.102 secs] the Exception in the thread "is the main" Java. Lang. OutOfMemoryError: Metaspace at org.springframework.cglib.core.ReflectUtils.defineClass(ReflectUtils.java:530) at org.springframework.cglib.core.AbstractClassGenerator.generate(AbstractClassGenerator.java:363) at org.springframework.cglib.proxy.Enhancer.generate(Enhancer.java:582) at org.springframework.cglib.core.AbstractClassGenerator$ClassLoaderData.get(AbstractClassGenerator.java:131) at org.springframework.cglib.core.AbstractClassGenerator.create(AbstractClassGenerator.java:319) at org.springframework.cglib.proxy.Enhancer.createHelper(Enhancer.java:569) at org.springframework.cglib.proxy.Enhancer.create(Enhancer.java:403) at com.study.proxy.StudentCglibProxy.getProxyInstance(StudentCglibProxy.java:27) at com.study.proxy.CglibProxyTest.main(CglibProxyTest.java:31)Copy the code

You can see that OOM happened because of metacombs.

VisualVM:

You can see it’s already bursting the metasomes.

At first glance, you can see that the Mian thread sent OOM

Cglib is the cause of OOM.

conclusion

Even with additional agency or JDK. In general, also won’t appear the Java lang. OutOfMemoryError: Metaspace, because only for agent in some classes, just be automated regression testing covering all functions under observation dimension.

Stack overflow

General recursion is easy to occur, let’s try:

import lombok.SneakyThrows; public class StackOverflowErrorTest { @SneakyThrows private static void test() { long l = 1L; test(); } public static void main(String[] args) { test(); }}Copy the code

disk

View the disk space usage

You can run the df -h command to view disk usage statistics (size and usage). You can run the du -h –max-depth=1 command to view space usage of level-1 files in the current directory and determine which file occupies too much space.

View process I/O usage

pidstat -d
Copy the code

  • PID: indicates the ID of a process
  • KB_rd /s: KB read from disk per second
  • KB_wr /s: Indicates the number of bytes written to the disk per second
  • KB_ccwr /s: KB written to the disk for task cancellation. Occurs when the current task truncates the dirty Pagecache.
  • COMMAND: Name of the task

network

netstat

The netstat command is used to view the network status of the entire Linux system.

  • A port of netstat nap | grep < pid >

  • Network statistics netstat -s

# netstat -s Ip: 184695 total packets received 0 forwarded 0 incoming packets discarded 184687 incoming packets delivered 143917 requests  sent out 32 outgoing packets dropped 30 dropped because of missing route Icmp: 676 ICMP messages received 5 input ICMP message failed. ICMP input histogram: destination unreachable: 44 echo requests: 287 echo replies: 345 304 ICMP messages sent 0 ICMP messages failed ICMP output histogram: destination unreachable: 17 echo replies: 287 Tcp: 473 active connections openings 28 passive connection openings 4 failed connection attempts 11 connection resets received 1 connections established 178253 segments received 137936 segments send out 29 segments retransmited 0 bad segments received. 336 resets sent Udp: 5714 packets received 8 packets to unknown port received. 0 packet receive errors 5419 packets sent TcpExt: 1 resets received for embryonic SYN_RECV sockets ArpFilter: 0 12 TCP sockets finished time wait in fast timer 572 delayed acks sent 3 delayed acks further delayed because of locked  socket 13766 packets directly queued to recvmsg prequeue. 1101482 packets directly received from backlog 19599861 packets directly received from prequeue 46860 packets header predicted 14541 packets header predicted and directly queued to user TCPPureAcks: 12259 TCPHPAcks: 9119 TCPRenoRecovery: 0 TCPSackRecovery: 0 TCPSACKReneging: 0 TCPFACKReorder: 0 TCPSACKReorder: 0 TCPRenoReorder: 0 TCPTSReorder: 0 TCPFullUndo: 0 TCPPartialUndo: 0 TCPDSACKUndo: 0 TCPLossUndo: 0 TCPLoss: 0 TCPLostRetransmit: 0 TCPRenoFailures: 0 TCPSackFailures: 0 TCPLossFailures: 0 TCPFastRetrans: 0 TCPForwardRetrans: 0 TCPSlowStartRetrans: 0 TCPTimeouts: 29 TCPRenoRecoveryFail: 0 TCPSackRecoveryFail: 0 TCPSchedulerFailed: 0 TCPRcvCollapsed: 0 TCPDSACKOldSent: 0 TCPDSACKOfoSent: 0 TCPDSACKRecv: 0 TCPDSACKOfoRecv: 0 TCPAbortOnSyn: 0 TCPAbortOnData: 1 TCPAbortOnClose: 0 TCPAbortOnMemory: 0 TCPAbortOnTimeout: 3 TCPAbortOnLinger: 0 TCPAbortFailed: 3 TCPMemoryPressures: 0Copy the code

ping

The Linux ping command is used to check hosts.

curl

The Linux curl command is a file transfer tool that works on the command line using URL rules

The author is generally used for testing interfaces, Such as get interface https://www.coonote.com/linux/linux-cmd-curl.html, post interface curl -h “content-type: application/json” – X post – d ‘{” user “:” admin “, “passwd” : “12345678”}’ http://127.0.0.1:8000/login

reference

Description of the pidstat command

JVM(fourteen)visualVM Using Analyzing GC logs,OOM

Memory overflow and memory leak

Unload JVM classes

Java Virtual Machine Notes – Uninstallation of classes