Huawei cloud database GaussDB(for Cassandra) revealed the second phase: memory abnormal growth troubleshooting experience

Abstract:Huawei cloud database GaussDB(for Cassandra) is a cloud native NoSQL database based on computing storage separation architecture and compatible with Cassandra ecology. It relies on the shared storage pool to achieve strong consistency and ensure the safety and reliability of data.

This article is shared from the Huawei Cloud Community “Huawei Cloud Database GaussDB(for Cassandra) revealed the second phase: the detection experience of abnormal memory growth”, the original author: Cassandra official.

background

Huawei cloud database GaussDB(for Cassandra) is a cloud native NoSQL database based on computing storage separation architecture and compatible with Cassandra ecology. It relies on the shared storage pool to achieve strong consistency and ensure the safety and reliability of data. The core features are: separation of memory and computation, low cost and high performance.

Problem description

GAUSSDB (for Cassandra) has encountered some challenging problems under the self-developed architecture, such as excessive CPU, memory leakage, abnormal memory growth and high delay, which are also typical problems encountered in the development process. Analysis of abnormal memory growth is a big challenge, abnormal memory growth is a fatal problem for the program, because it may trigger OOM, process abnormal downtime, business interruption and other results, so it is particularly important to carry out reasonable planning and control of memory. By adjusting the cache size, Bloom filter size, Memtable size, etc., the performance is improved, read and write latency is improved, etc.

In the offline testing process, it was found that the memory of the kernel only increased after a long time running, and abnormal growth occurred. It was suspected that there might be a memory leak.

Analysis & Verification

Firstly, according to the memory usage, the memory is divided into in-heap and out-heap parts, and the two parts of memory are analyzed respectively. Determine that the memory in question is out-of-heap memory, and further analyze out-of-heap memory. A more efficient memory management tool, tcmalloc, was introduced to solve the problem of abnormal memory growth. The following is a detailed analysis of the verification process.

Determine the memory exception region

Using JDK’s jmap command and Cassandra’s monitoring method (configure JVM.memory.* monitor item), the JVM’s in-heap memory and the whole process memory are collected every 1min.

Start the test case until the overall memory limit of the kernel is reached. After analyzing the curve of collected heap memory and process memory, it was found that the heap memory remained relatively stable and did not rise continuously, but the overall memory of the kernel continued to rise during the period, and the growth curve of the two was inconsistent. That is, the problem should occur in out-of-heap memory.

Out-of-heap memory analysis validation

Glibc memory management

The pmap command was used to print the memory address space distribution of the process, and it was found that there were a large number of 64MB memory blocks and many memory fragments, which was related to the memory allocation method of glibc. The usage of out-of-heap memory is similar to the overall memory growth trend of the process, and it is suspected that the problem is caused by out-of-heap memory. In addition, the conditions for GLIBC to return the memory are harsh, that is, the memory is not easy to be released in time and there are many memory fragments. It is speculated that the problem has something to do with Glic. When there is too much memory fragmentation and free memory is wasted, it is possible that the maximum amount of memory used by the process will exceed the expected maximum value, even OOM.

TCMalloc memory management

Introduced tcmalloc memory manager instead of glibc ptmalloc memory management. To reduce excessive memory fragmentation and improve memory usage efficiency, the GPerfTools-2.7 source code was used to compile TCMalloc in this analysis and verification. Run the same test case, and it is found that the memory continues to increase, but the increase rate is lower than before. Pmap prints out the memory address distribution, and it is found that the small memory blocks and memory fragments before are significantly reduced, indicating that this tool has a certain optimization effect, confirming the speculation mentioned above that there is too much memory fragments.

However, the problem of abnormal memory growth still exists, somewhat as a result of delayed or uncollected tcmalloc collection. In fact, the memory collection of tcmalloc is relatively “reluctant”, mainly in order to be used directly when the memory request is needed again, to reduce the number of system calls, and improve performance. For this reason, let’s call it manually to release the memory interface ReleaseFreemory. The effect is not obvious, for reasons unknown (there may indeed be free memory that is not being freed).

Manually trigger the ReleaseFreemory interface of TCMalloc

To verify the problem, set the cache size.

Set the cache size to 6GB and then crush the read requests to the full 6GB cache size
Mysql > modify the cache capacity to 2GB, manually call ReleaseFreeEmory interface of TCMalloc, find no effect, it is possible that the memory is still not decreased after using TCMalloc, the reason may be related to this interface.
Logging at various places inside the ReleaseFreeEmory interface, and then starting the process to test again, found an error report that failed on the system call mAdvise.

Code location:

Error log information:

The code is analyzed by the call failure at this point. If a span fails to be released, the subsequent span to be released will be terminated and the ReleaseFreemory logic call will end. ReleaseFreemory has no effect after executing the interface. It is found that every time the release logic is terminated due to the failure of the interface call.
Again analyze the reason why the system call madvise failed. By patching the kernel method, it is found that it fails because the corresponding memory state of the incoming address block is LOCKED state. The system call failed with an error as an invalid parameter.
Memory is LOCKED, and associated with this state are code calls to mlock system methods and the ulimit configuration of the system. Analysis of related code found no abnormal points. Max Locked Memory is unlimited. Modifying its configuration to 16MB, restarting the Cassandra process and testing it again found that the memory release was significant.
Continuing to run the test, the memory increases disappeared. In the case of continued business, memory will rise to the maximum, no longer rise, remain stable, consistent with the memory planned usage. After the business pressure is reduced or even stopped, memory appears a slow downward trend.

Solution & Summary

Introduce tcmalloc tool to optimize memory management. Some of the best memory managers are Google’s tcmalloc and Facebook’s jemalloc
Modify the Max Locked Memory parameter configuration for the system.

Reasonable allocation process needs to use the maximum amount of memory, and reserve a certain amount of capacity, for the memory does not meet the expected growth needs to be further analyzed. Memory related problems and program dependencies are strong. Critical configurations of the system need to be carefully evaluated and their impact evaluated. All similar configurations were also checked.

ReleaseFreemory command is added to optimize the tcmalloc hold memory release problem. However, the execution of ReleaseFreemory locks the entire PageHeap and may cause the memory allocation request to hang, so it needs to be executed carefully.

The backend adds a parameter that can be dynamically configured with tcmalloc_release_rate to adjust how often tcmalloc returns memory to the operating system. The reasonable range of this value is [0-10], 0 means never return, the higher the value, the higher the frequency of return, the default value is 1

conclusion

In this paper, through the analysis of the memory growth problems encountered in the development process, the use of more excellent memory management tools, as well as more fine-grained memory monitoring, more intuitive monitoring of the memory state during the operation of the database, to ensure the smooth and high-performance operation of the database.

Click on the attention, the first time to understand Huawei cloud fresh technology ~