I. Background introduction

During routine inspection or troubleshooting of the MySQL database server, the system analyzes the performance indicators of the CPU, memory, storage, and network at the operating system level. If the performance indicators of the operating system level are abnormal, the system checks the performance indicators. For example, if you run the TOP command to find that the CPU load of the database server is high, you need to observe indicators such as US (user-mode CPU usage), SY (kernel-mode CPU usage), and WA (CPU usage waiting for I/O). If SY (kernel-mode CPU usage) is high, Can focus on whether the high concurrency of the database causes frequent system context switching.

To ensure the stability of database services, Qunar uses dedicated server + single-machine multi-instance mode to deploy MySQL database instances. The memory usage of each instance is strictly controlled by MySQL parameter innodb_buffer_pool_size and PXC parameter gcache.size, and sufficient memory is reserved for the operating system and basic services. Therefore, memory usage alarm accounts for a very low proportion in Qunar database alarm. However, in a certain period of time, the number and proportion of such alarms rise sharply, which seriously affects the stability of database service. It is convenient to check the proportion of different kinds of alarms in recent N days through our customized database alarm market:

two Problem analysis

For the memory items of the operating system, we monitor four indicators, used memory, cache memory, bufferef memory and free memory, respectively. Alarms are generated according to memory usage (used physical memory/total physical memory). When the memory usage alarm is received, due to the “inertial thinking”, we clear the cache according to the previous operation process for handling “similar faults” :

After “Drop caches” is executed, the “free physical memory” and “Memory usage” of the server return to normal, but a few days later the server triggers a memory usage alarm again, and the “used physical memory” of the server increases at a rate of 3GB/day through monitoring:

Meanwhile, the server’s “free physical memory” drops at a rate of 3GB/day:

To get to the bottom of the problem and see what drop Caches clean up, we pick a database server that stores historical data for analysis, Since commands such as top /free/vmstat use /proc/memoryinfo and /proc/pid/smaps to get memory usage, we first look at this file and pick out the key information:

Buffers are temporary storage of original disk blocks. By caching the data stored on disks, the kernel can aggregate scattered read and write operations for optimization, for example, merge multiple small write operations into a single large write operation to improve disk storage access performance. Typically, the Buffer cache contains less data and does not occupy much physical memory.

The Cache is used to Cache data pages of a file system. For example, the first read data from a file is cached in the memory, and the subsequent read data is directly read from the memory to avoid re-accessing the cached disk storage. Because I/O operations of database services are frequent, a large Cache is used to Cache data when the physical memory is sufficient.

Slab is a memory allocation mechanism of Linux operating system. It works for some frequently allocated and released objects, such as process descriptors. These objects are generally small in size. The slab allocator manages objects on the basis of objects of the same type. Whenever such an object is requested, the Slab allocator allocates a unit of this size from a slab list, and when it is released, it restores it in the list rather than returning it directly to the partner system. To avoid these internal fragments. SReclaimable indicates the recoverable memory managed by the Slab allocator, and SUnreclaim indicates the unrecoverable memory managed by the Slab allocator. The MySQL database uses a buffer pool to implement the data caching mechanism internally and does not use a Slab allocator. Therefore, generally, the Slab memory usage of the database server is low. In the preceding example, the use of 16.6GB memory is abnormal and needs to be considered.

You can use the /proc/slabinfo file to obtain details about Slab memory allocation. However, the file is not readable enough, and you need to calculate the memory usage of different types by yourself. Therefore, you are advised to run the slabtop command to view details.

Warning: Do not run the slabtop command on a server with frequent memory allocation or heavy load, such as the Redis server. This command may cause the server to be suspended for a long time and fail to provide normal services.

The dentry object occupies the most memory, about 14.6GB. In Linux, all files, whether regular files or network sockets, are managed using a unified file system that allocates two data structures to each file:

  • An Index Node is used to record file metadata such as file number, file size, and access time. Index nodes correspond to files one by one and are stored persistently on disks.
  • Directory entries record file names, index node Pointers, and association relationships with other Directory entries. Multiple associated Directory entries form the Directory structure of a file system. Directory entries are managed and maintained by the kernel in an in-memory data structure, which serves as the directory entry cache.

The architecture of Buffer/page Cache/Index Node/Directory Entry in Linux is as follows:

Three,Page cache analysis

In Linux kernel versions 4.1 and higher, cache usage can be analyzed using cachestat and cacheTOP in BCC packages. For systems with a lower kernel version, github open-source tool hCache or VMTouch can be used for analysis. The analysis finds that:

The page Cache (Cache) on the database server mainly caches IB_logfile files and binlog files. The tool VMTouch not only provides the function of Cache viewing, but also provides the function of cleaning the page Cache used by files or loading the file contents into the page Cache.

4. Check directory items

The increase of “used physical memory” is mainly caused by Dentry, and the change of Dentry is detected by simple script:

It is found that the dentry memory usage increases once per minute and the cache increases by about 2MB each time. By analyzing local crontab jobs and remote scheduling jobs, the services scheduled once per minute are found out for further analysis.

To further locate the cause, by monitoring dentry memory allocation (d_alloc) and memory free (d_free), create the dentry_chek.stp file:

Execute the script and analyze the results:

According to the above log analysis results, it can be found that the number of d_alloc executions is far greater than the number of d_free executions. Combined with the scheduling jobs, the cause of the problem is located:

* */1 * * * root yum makecache –enablerepo=xxxxxx_repo; yum -q -y update xxxxxx –enablerepo=xxxxxx_repo; ` `

The yum makecache command can cache the installation package information of the remote server locally to improve the search speed of the installation package. In an early o&M project, the latest software package needs to be installed and deployed on the server in a timely manner. The crontab scheduling +yum Makecache scheme is used to achieve this. During a recent modification of the crontab scheduling configuration, a misoperation changed the crontab job from “once an hour” to “Once a minute”. The correct configuration of the crontab job is as follows:

0` `*/1 * * * root yum makecache –enablerepo=xxxxxx_repo; yum -q -y update xxxxxx –enablerepo=xxxxxx_repo; ` `

Yum Makecache’s dentry problem was exposed at an accelerated rate of 60 times due to scheduling job configuration adjustments.

V. Solutions

Yum Makecache is not strongly dependent on the software package and the software package update frequency is very low. Therefore, the normal execution frequency of the scheduled job should be restored first, and the periodic execution of yum Makecache will be completely avoided through the active push scheme later. Due to the low frequency of scheduling configuration modification, this type of operation is not included in the o&M operation specifications. Therefore, follow the process of “review before operation + Check during operation + Verify after operation” to reduce the probability of o&M misoperation.

Vi. Reference materials

  • Linux performance optimization
  • The SystemTap script analyzes the high usage of dentry slabs
  • How to do some basic analysis of kernel memory leaks