Crime scene:

AD 20 a few years of a day, the author is the company leisurely eating breakfast, mobile phone suddenly spread an application alarm!! Open a look: application of the old age memory utilization rate of more than 95% a few words impressively greeted, scared the author quickly left the hand of the meat steamed stuffed bun, quickly investigation!

The following figure shows the crime evidence from the application monitoring after the problem was solved. It is obvious that full GC was frequently carried out in the old age of the application in a period of time, and the memory high point has been hovering at 100%.

The process:

Well, as a mature Java siege lion, I naturally kept a cool face when facing oom’s little thief.

At that time, my first reaction was to consult my operation and maintenance colleagues and find that there is no integrated memory analysis tool on the monitoring platform, so I need to dump analysis by myself. Jmap-dump :format=b,file=filename PID; jmap-dump :format=b,file=filename pid; Make sure you take care of yourself, killer. After a long time of ~~~ 10 minutes, we finally get the dump file.

I can’t wait to take out two scalpels (VisualVM of Java and MAT, a well-known external analysis software) and prepare to conduct an autopsy on dump files. After all, the first time to use the sword, holding the handle of the knife for a while, I don’t know how to start, really lose my face! After a while, I finally opened mat and loaded dump. (MAC may sometimes encounter startup failure pit when using MAT, which will not be described here, please search/Google by yourself)

The first cut into the corpse reveals the layout of the bloody internal organs, as shown below:

logback.classic.AsyncAppender

Pull down to accumulate Objects in Dominator Tree, which lists the memory structure of this asyncAppender object. As expected, the LogBack framework uses a queue to cache loggingEvent objects, and each call log output is cached as a loggingEvent. From the first and second points above, we can see that this log cache queue takes up a huge amount of memory, and its loggingEvent object alone takes up about 23% of the memory. Oh my God. The inspector window at the third point above shows the contents of the object, so we intercept the MSG of the first two largest loggingEvent objects to see what the contents of the log are.

The producer of the Dubbo interface reported an out-of-limit transmission length problem when returning results

The case:

Only OOM did not escape the author’s legal eyes, the final qualitative reasons of the case are as follows: Due to the design error of SQL led to a huge amount of batch query data, which exceeded the transmission limit of Dubbo, so exceptions were thrown and logs were written. The log object contains a huge amount of specific information returned by the query and is thrown into the asynchronous log queue. The memory is exhausted before the log is refreshed to the disk, and an oom alarm is generated. In the subsequent optimization, SQL was modified to optimize the conditions for batch query, and the maximum log length was added to the Logback configuration to prevent the occurrence of OOM caused by log printing.