Out Of Memory (OOM) refers to an application system that has Memory that cannot be reclaimed or is using too much Memory. As a result, the program needs more Memory than the maximum available Memory. Can’t run the program, the system will prompt out of memory, sometimes software will automatically shut down, restart the computer or software after the release of memory and can normal operation of the software, and the system configuration, data flow, the user code and other reasons lead to memory overflow error, even if the user to perform a task will not be able to avoid.

A JVM OOM exception may occur in one of the following situations: Java heap overflow, virtual and local method stack overflow, method area and runtime constant pool overflow, native direct memory overflow. Each of these situations has different causes.

In real business scenarios, the environment is often more complex. Today, heapui will take you to learn a few OOM problems troubleshooting practical cases, through a few authors recorded the real case, remind yourself to avoid stepping on the pit, also incidentally review relevant knowledge points.

1. Experienced the troubleshooting and solving process of online CPU100% and application OOM

Author: teddy boy cloud https://heapdump.cn/article/1… Summary: After receiving the application exception alarm, the author logged in to the faulty server to check. When viewing the service log, the author found that the service OOM was displayed. Then, the author used the Top command to check the resource usage of each process in the system, and found that the CPU usage of one process reached 300%. It then queries the CPU usage of all threads under the process and saves the stack data. According to the previous operation, after obtaining the GC information, thread stack, heap snapshot and other data of the service with the problem, using the XElephant provided by the HeapDump community for analysis, it is found that InMemoryReporterMetrics caused the OOM. It was further discovered that the service in question depended on an earlier version of Zipkin and was upgraded to fix the problem.

Highlights: Although this article does not describe and solve rare and difficult diseases, the process of investigation is clear and complete, and the investigation tools are recommended, which are suitable for beginners to read and learn.

2. A containerized SpringBoot program OOM problem exploration

Author: man dream https://heapdump.cn/article/1…

I was told that a containerized Java program will cause OOM problems every time I run it. Then check the GC situation through JStat. It is found that GC is normal but the usage of ByteBuffer objects is the highest (exception point 1). Next, look at the thread snapshot in JStack and find that too many Kafka producers have been created (exception 2). Finally, a Demo program was written to verify the conjecture, and the problem was determined to be caused by the circular creation of Producer objects in business code.

Highlights: Clear troubleshooting process, skilled tool use, fast and accurate verification process.


3. Troubleshooting and analysis of a million-dollar long connection pressure test on the Nginx OOM

Author: dig a hole master zhang https://heapdump.cn/article/4…

Overview: In a million long connection pressure test, the author found that 32C 128G four Nginx frequently appeared in the OOM. After finding the problem, I first checked the network connection status between Nginx and the client. First of all, I suspected that jMeter client had limited processing capacity and many messages piled up at Nginx, so I dumped the memory view of Nginx, which was determined to be due to the memory increase caused by the cache of a large number of messages. Then I check the parameter configuration of Nginx and find that the value of proxy_buffers is very large. We then simulate the impact of upstream/upstream transmission speed inconsistency on Nginx memory usage. Finally, after setting proxy_buffering to OFF and reducing proxy_buffer_size, Nginx’s memory is stable.

Highlights: The author has a clear idea of troubleshooting, very skilled in tool use and parameter adjustment, and a deep understanding of the underlying principles and source code. Both experience and attitude are worth learning and reference.