The cause of

Last week afternoon, when I was preparing to leave work early, I received an alarm message, XX service of XX machine is not available, according to the operation and maintenance operation steps, the service should be restored first. Since the service is highly available, there is no impact on the business, but we still need to find out the cause of the problem to avoid the next time. After rebooting, take some time to figure out why the process exited.

The log

The first concern is the log, and in the log, the following information appears

java.lang.OutOfMemoryError: GC overhaed limit exceeded . 
Copy the code

Generally speaking, there are two conditions that lead to OOM:

  • The amount of data increases, resulting in insufficient heap space
  • Memory leak, object not collected in time.

GC Overhaed Limit exceeded usually occurs when the JVM runs out of memory and the GC takes too long to reclaim memory. This exception will be thrown when the JVM expends more than 90% of its time but fails to reclaim 2% of its memory.

From here, we can know that there is no way to reclaim memory in our program, so what is it?

Heapdumps file

Our JVM program has been set to dump exceptions if OOM occurs during startup with the following parameters:

-XX:HeapDumpOnOutOfMemeryError -XX:HeapDumpPath=$path/jvm.hprof 
Copy the code

By analyzing the Hprof file, you can clearly see which memory exists.

jhat

At the beginning, I used jhat file. From the abnormality above, I preliminarily judged that it might be related to log printing.

From the Heap Histogram, you can see that several memory hogging classes are related to Log4j. If you look at Logger references in Log4j, there is a Hashtable that holds Logger references.

log4J

Log4j is used to manage logger. Log4j is used to manage logger. Log4j is used to manage logger. The only difference is that since we are a distributed scheduler, each task instance is called with a different logName.

logger = Logger.getLogger(logName);
Copy the code

This is easy to use without looking closely at the implementation in Logger.getLogger, but when you look at the implementation in getLogger, you’ll see the limitations of this approach.

Loggermanager.getlogger calls LoggerManager.getLogger, as follows

 public static Logger getLogger(String name) {
        return getLoggerRepository().getLogger(name);
    }

Copy the code

LoggerRepository has two implementations:

  • Hierarchy
  • NOPLoggerRepository

The default implementation is Hierarchy

Hierarchy passes ht = new Hashtable(); Retain all logName to Logger mappings. Our program generates a Logger every time a task is called, which adds up to hundreds of thousands of objects over time.

Resources: Describes the eight types of OOM in the JVM