Huawei cloud database GaussDB(for Cassandra) Revealed Phase 2: Troubleshooting experience of abnormal memory growth

Abstract: Abnormal growth of memory is a fatal problem for the program, because it may trigger OOM, abnormal process downtime, business interruption and other results, so the reasonable planning and use of memory and control is particularly important.

This article is shared by Huawei Cloud community “Huawei Cloud database GaussDB(forCassandra) Revealed Phase 2: Troubleshooting experience of Abnormal memory growth”, originally written by Gaos Cassandra official.

Huawei cloud database GaussDB(for Cassandra) is a cloud native NoSQL database compatible with Cassandra ecology based on computing and storage separation architecture. It relies on shared storage pools to achieve strong consistency and ensure data security and reliability. The core features are: separation of storage and calculation, low cost, high performance.

Problem description

GaussDB(for Cassandra) encountered some challenges in its self-developed architecture, such as high CPU, memory leakage, abnormal memory growth, and high latency. These are typical problems encountered during development. It is a great challenge to analyze abnormal memory growth. Abnormal memory growth is a fatal problem for programs, because it may trigger OOM, abnormal process downtime, service interruption and other results, so it is particularly important to make reasonable planning and control of memory. Adjust cache capacity, Bloom filter size, and memtable size to improve performance and read/write latency.

During the offline test, it was found that the kernel memory only increased without decreasing after running for a long time, and abnormal growth occurred. It was suspected that there might be a memory leak.

Analysis & Validation

Firstly, according to the memory usage, the memory is divided into two parts: in-heap and out-of-heap, and the two parts are analyzed respectively. Determine that the problematic memory is out-of-heap memory, and further analyze out-of-heap memory. Introduce a more efficient memory management tool TCMALloc to solve the problem of abnormal memory growth. The following is the specific analysis and verification process.

Determine the memory exception area

Using JDK jmap command and Cassandra monitor (configure JVM.memory.* monitor) and other methods, collect JVM heap memory and process memory every 1min.

Start the test case until the overall memory of the kernel is capped. By analyzing the change curves of in-heap memory and process memory collected, it is found that the in-heap memory remains relatively stable without continuous increase, but the overall memory of the kernel continues to increase during the period, and the growth curves of the two are inconsistent. That is, the problem should occur in off-heap memory.

Out of heap memory analysis validation

Glibc memory management

When the pmap command is used to print the distribution of the memory address space of the process, it is found that there are a large number of 64MB memory blocks and many memory fragments. This phenomenon is related to the memory allocation mode of glibc. The usage of out-of-heap memory is similar to the overall process memory growth trend, which is suspected to be caused by out-of-heap memory. In addition, gliBC has strict conditions for memory return, that is, it is difficult to release the memory in time, and there are many memory fragments. When the memory fragmentation is too much and the idle memory is wasted seriously, the maximum memory usage of the process may exceed the expected maximum, or even OOM.

Tcmalloc memory management

Tcmalloc memory manager is introduced to replace PTMALloc memory management mode of GLIBC. Reduce excessive memory fragmentation and improve the efficiency of memory use. This analysis verifies that gperftools-2.7 source code is used to compile TCMALLOc. Running the same test case, it was found that the memory continued to increase, but the increase was lower than before. The distribution of the memory address was printed by PMAP, and it was found that the small memory blocks and memory fragments before were significantly reduced, indicating that the tool had a certain optimization effect, confirming the speculation of excessive memory fragments mentioned above.

However, the problem of abnormal memory growth still exists, somewhat like the tcMALloc collection is not timely or not collected. In fact, MEMORY reclamation of TCMALloc is reluctant, mainly in order to reduce the number of system calls and improve performance when memory requisition is needed again. For this reason, the release memory interface releasefreememory is called manually. The discovery effect is not obvious, the reason is unknown (there may be free memory to be freed).

Manually trigger releasefreememory interface of TCmalLOc

To verify the problem, set the cache capacity.

1. Set the cache capacity to 6GB and press the read requests to fill the 6GB capacity

2. The cache capacity was changed to 2GB. In order to quickly release memory, releasefreememory interface of TCMALLOc was manually called, but no effect was found.

3. The Releasefreememory interface is logged in multiple places, and the process is tested again. An error is found when the system call madvise fails.

Code location:

Error log message:

1. Analyze the code based on the call failure. If the tcMALloc memory release logic is found to be “round-robin”, that is, if a span fails to be released, the subsequent SPAN to be released is terminated and releasefreememory logic is called. Releasefreememory interface has no effect on the interface. Each time, tens of MB are released, because the interface fails to call and the release logic terminates.

2. Analyze the failure cause of the system call madvise again. Patch to the kernel method, it is found that this method fails because the memory corresponding to the incoming address block is in LOCKED state. The system invocation fails and an error is reported as an invalid parameter.

3. The memory is LOCKED, and the code calling mlock system method and uLIMIT configuration are associated with the LOCKED state. No abnormality was found in related code analysis. Query the uLIMIT configuration and find that Max LockedMemory is unlimited. Change its configuration to 16MB, restart the Cassandra process, test again, and find that the memory release effect is significant.

4. Run the test again and find that the memory increase disappears. In the case of continuous business, memory will rise to the highest, no longer rise, remain flat, in line with the planned memory usage. After the service pressure is reduced or even stopped, the memory slowly decreases.

Solution & Summary

1. Introduce tcMALloc tool to optimize memory management. Some of the best memory managers are Google’s TCmalloc and Facebook’s Jemalloc

2. Modify Max LockedMemory parameters.

Allocate the maximum memory required by processes and reserve a certain amount of memory capacity. Analyze the memory that does not meet the expected growth. Memory-related problems and program dependencies are strong. Be careful about critical system configurations and assess their impact. I also checked all similar configurations.

Add releasefreememory command, backend call, optimize TCMALloc hold memory does not release problem. However, execution of the Releasefreememory command locks the entire pageHeap and can cause memory allocation requests to be hung, so be careful.

Added a dynamically configurable tcMALloc_RELEase_rate parameter to the back end to adjust how often TCMALloc returns memory to the operating system. A reasonable value ranges from 0 to 10. 0 indicates that a return is never made. A larger value indicates that a return is made more frequently.

conclusion

This paper analyzes the memory growth problems encountered in the development process, uses more excellent memory management tools, as well as more fine-grained memory monitoring, more intuitive monitoring of the memory status during the running of the database, to ensure the stable and high-performance running of the database.

Click to follow, the first time to learn about Huawei cloud fresh technology ~