The declaration of Japanese

During the ES cluster on the operation and maintenance line, I happened to encounter the problem of memory leakage. When troubleshooting the problem, I read this article, which is clear and clear, so I share it with you, hoping to provide some ideas for troubleshooting.

This article is reprinted from The book, if the copyright problem, please contact xiaobian.

PS: Rich first-line technology, diversified forms of expression, all in “HULK first-line technology talk”, point attention oh!

background

The day before yesterday, an online ElasticSearch cluster in the vacation department of the company issued an alarm, and some Data Node Heap usage continued to exceed the 80% alarm threshold. After receiving the alarm email, log in the monitoring system to check the cluster status immediately. Fortunately, all the nodes are in normal service, except for two that have very high Heap utilization. At this point, the Old GC keeps firing without being able to reclaim memory.

Troubleshoot problems

The Heapsize of the problem node is allocated 30GB, with 80% utilization equal to approximately 24GB. However, the total amount of data in the cluster is not large, and the total disk space occupied by all the index files on the five nodes is less than 10GB.

 

The segment memory and cache usage of each node are also very small (MB).

 

Cluster QPS is only around 30, CPU consumption is less than 10%, and the number of active threads in all types of Thread pools is very low.

It’s hard to understand what’s holding up 20 + GB of memory?

The ES version of the cluster in question is 5.3.2, and this version has been tested for stability for a long time in the company and has been deployed online as a stable version on a large scale. Other clusters with very high read and write loads have not experienced similar problems, so it seems that they have encountered a new problem.

Looking at the logs of the problem node ES, except for some Bulk exceptions, there are no other resource-related errors that are particularly obvious:

[the 2017-11-06 T16: spring, 668] [the DEBUG] [O.E.A.B.T ransportShardBulkAction] [] [suggest – 3] [0] failed to execute bulk items (update) BulkShardRequest [[suggest-3][0]] containing [44204 ] requests Org. Elasticsearch. Index. Engine. DocumentMissingException: [type] [A ∫ acuity E † DHS A up iEuuA æ ∑ _1198] : document missing at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:92) ~ [elasticsearch – 5.3.2. Jar: 5.3.2] at org. Elasticsearch. Action. Update. UpdateHelper. Prepare (UpdateHelper. Java: 81) ~ [elasticsearch – 5.3.2. Jar: 5.3.2]

Confirm the reason for these exceptions with the user, because the writer will update the data in ES according to the doc_id after getting the data from the data source. Some doc_id does not exist in ES, but it does not affect the business logic, so the document missing abnormality recorded in ES can be ignored.

At this point, there is no choice but to do a Dump analysis of the JVM.

Heap Dump analysis

Use the Eclipse MAT tool, which you can download here for Mac :Downloads. There are two steps to use this tool:

  1. Obtain the binary head dump file jmap -dump:format=b,file=/ TMP /es_heap.bin <pid> where pid is the process ID of ES JAVA process.

  2. Download the generated dump file to your local development machine, start MAT, and open the file from its GUI.

Note that MAT is also a JAVA application and requires JDK runtime support.

When MAT attempts to dump a file for the first time, it needs to parse it and generate multiple indexes. This process is CPU and memory intensive, but once it’s done, it’s quick and cheap to open the dump file again. For such large files of more than 20 GB size, the first parsing process will be very slow and will likely run out of memory due to the lack of memory on the development machine. Therefore, I found a large memory server to do the first parsing:

1. Copy the Linux version of MAT, decompress it, and modify the configuration file memoryanalyzer.ini to set the memory to about 20GB:

This ensures that memory does not run out during parsing.

2. Copy the dump file and run the following commands to generate indexes and three analysis reports:

mat/ParseHeapDump.sh es_heap.bin org.eclipse.mat.api:suspects

mat/ParseHeapDump.sh es_heap.bin org.eclipse.mat.api:overview

mat/ParseHeapDump.sh es_heap.bin org.eclipse.mat.api:top_components

After successful analysis, the following stack of index files (.index) and analysis reports (.zip) are generated

Package these files and download them to the local machine. Open them with MAT GUI and you can analyze them.

When you open the dump file in MAT, you can choose to open the generated report, like Leak Suspects:

A glance at the Leak Suspects shows that the 20-plus GB of memory is mostly occupied by a bunch of BULK thread instances, each of which takes up nearly 1.5GB of memory.

As you can see from the “dominator_tree” panel, the memory footprint of multiple BULK threads is very high.

Unroll the reference chain for one of the threads to see how they Retain so much memory, paying particular attention to the red circle:

This reference relationship is interpreted as follows:

  1. The BULK thread’s Thread Local map holds a LOG4J MultableLogEvent object.

  2. The MutablelogEvent object references log4j’s ParameterizedMessage object.

  3. ParameterizedMessage references the bulkShardRequest object.

  4. BulkShardRequest references more than 40,000 BulkitemRequest objects.

It appears that log4J’s LogEvent has a strong reference to a large BULK request object that cannot be garbage collected, causing a memory leak.

In the ES logs, some bulk anomalies with document missing were recorded, and it was speculated whether the leakage occurred when these anomalies were recorded.

Problem of repetition

To verify my guess, I started a single-node 5.3.2 test cluster on the local development machine, did bulk updates using the BULK API, and intentionally set a non-existent DOC_ID for one of the update requests.

For testing purposes, I added a configuration item processors: 1 to the ES configuration file elasticSearch.yml. This configuration item affects the configuration of the thread_pool. The size of the BULK thread pool is reduced to one, which facilitates rapid and convenient authentication.

After the cluster was started and the BULK request was sent, a dump was performed to repeat the previous analysis process.

Do other bulk exceptions cause the same problem, such as writing data that does not match the mapping? I tested it, and the problem was sure to occur. The size of the memory that cannot be reclaimed depends on the bulk size of the last exception thrown. At this point, it’s pretty clear that the memory leak is related to log4J’s logic for logging exception messages.

In order to find out if this issue is unique to 5.3.2 and has been fixed in subsequent releases, the same tests were performed on the latest 5.6.3 and the problem remained, so this should be a deep Bug that has not yet been discovered.

Read the source code to find the source

Figure out the direction of the problem search, but the root has not been found, and do not know how to repair and avoid, only to scratch the source code.

In Line 209 of TransportShardBulkAction, you find the code snippet that threw an exception in the ES log.

if (isConflictException(failure)) { logger.trace((Supplier<? >) () -> new ParameterizedMessage(“{} failed to execute bulk item ({}) {}”, request.shardId(), docWriteRequest.opType().getLowercase(), request), failure); } else { logger.debug((Supplier<?>) () -> new ParameterizedMessage(“{} failed to execute bulk item ({}) {}”, request.shardId(), docWriteRequest.opType().getLowercase(), request), failure); }

Here you see that request is passed in as a parameter during ParameterizedMessage instantiation. The Request here is a BulkShardRequest object that stores bulk Item Requests to be written to a SHard. Thus, the more requests written to a batch, the more memory the object retains. The question is, why isn’t this reference released after logger.debug () is called?

ParameterizedMessage cannot be released because it is referenced by a MutableLogEvent. The MutableLogEvent is stored as a Thread Local. Bulk thread pool of ES is fixed size, that is, it is created in advance and will not be destroyed or recreated. Since these MutableLogEvent objects are Thread local, they persist globally for as long as the thread is not destroyed and reference the last ParameterizedMessage. Therefore, in the case of BULK Exception recorded by ES, the entire request object is strongly referenced by the Thread Local variable and cannot be released, resulting in a large number of memory leaks.

Source code, then continue to dig up log4j found MutableLogEvent is in org. Apache. Logging. Log4j. Core. Impl. As a thread in the ReusableLogEventFactory local created.

public class ReusableLogEventFactory implements LogEventFactory {    private static final ThreadNameCachingStrategy THREAD_NAME_CACHING_STRATEGY = ThreadNameCachingStrategy.create();    private static final Clock CLOCK = ClockFactory.getClock();    private static ThreadLocal<MutableLogEvent> mutableLogEventThreadLocal = new ThreadLocal<>();

And org. Apache. Logging. Log4j. Core. Config. LoggerConfig ENABLE_THREADLOCALS is based on a constant value to determine which LogEventFactory.

if (LOG_EVENT_FACTORY == null) { LOG_EVENT_FACTORY = Constants.ENABLE_THREADLOCALS ? new ReusableLogEventFactory() : new DefaultLogEventFactory(); }

Continue to dig in the org. Apache. Logging. Log4j. Util. See the Constants, log4j will according to the running environment to judge whether a WEB application, if not, just from the system parameter log4j2. Enable the threadlocals read the Constants, If not set, the default value is true.

public static final boolean ENABLE_THREADLOCALS = ! IS_WEB_APP && PropertiesUtil.getProperties().getBooleanProperty( “log4j2.enable.threadlocals”, true);

Because ES is not a Web application, log4j chooses to use ReusableLogEventFactory, thus creating the MutableLogEvent object using thread_local. The result was a very significant memory leak in the special scenario of ES logging Bulk Exception.

Again, why log4j created logEvents as thread Local? Check out the documentation on log4j’s official website and garbage-free Steady State Logging provides a reasonable explanation. It turns out that log4j uses thread_local in many places to reuse variables in order to reduce the number of objects repeatedly created during logging, reduce GC stress and improve performance. But loading non-JDK classes using the Thread Local field can cause memory leaks, especially for Web applications. Therefore, the runtime environment is determined at startup and variables of type Thread Local are disabled for Web applications.

ThreadLocal fields holding non-JDK classes can cause memory leaks in web applications when the application server’s thread pool continues to reference these fields after the web application is undeployed. To avoid causing memory leaks, Log4j will not use these ThreadLocals when it detects that it is used in a web application (when the javax.servlet.Servlet class is in the classpath, or when system property log4j2.is.webapp is set to “true”).

After the above documents, also found measures to avoid this problem: for ES in the JVM. ES JVM configuration file options, add a system variable log4j – Dlog4j2. Enable the threadlocals = false, disabled thread local. This option has been tested to avoid this memory leak problem.

There is also an Issue submitted on Github with the link: Memory Leak upon Partial TransportShardBulkAction Failure

https://github.com/elastic/elasticsearch/issues/27300

The end of the

ES is indeed a very complex system with many modules and third-party components that can support many unexpected use case scenarios, but some edge scenarios can cause problems that are difficult to troubleshoot. A complete monitoring system and an experienced support team are very important for improving the efficiency of business developers using ES development and improving the stability of business.

HULK First Line technology chat

The technology sharing public account created by 360 cloud platform team covers cloud computing, database, big data, monitoring, pan-front-end, automated testing and many other technical fields. Through solid technical accumulation and rich front-line practical experience, it will bring you the most promising technology sharing