One, foreword

This article focuses on Redis persistence. Persistence has always been a high risk factor for Redis performance and is often asked in interviews.

It includes the specific advantages and disadvantages of RDB, and the advantages and disadvantages of AOF. In fact, due to the real-time problems of RDB, AOF is more commonly used at present, and persistent recovery is also preferred AOF.

RDB is the old mode, now basically use AOF, of course, today we will talk about both.

Second, the RDB

RDB flow chart:

RDB features:

  1. RDB is a snapshot mode that stores key value data contents.
  2. RDB has two persistent modes, synchronous save mode and asynchronous BGSave mode. Because Save is synchronous, data consistency is guaranteed, whereas BGSave is not.
  3. Save can be triggered either explicitly on the client or automatically when shutdown occurs; Bgsave can be triggered explicitly on the client side, by configuring it to be triggered by a scheduled task, or on the slave node.
  4. Save causes redis synchronization to block and is almost obsolete. Bgsave does not cause blocking, but it does have a disadvantage: in forking, you need to increase the memory server overhead because virtual memory is used when there is insufficient memory, causing blocking Redis to run. Therefore, you need to ensure that there is enough free memory.
  5. When shutdown is executed by default, if AOF is not enabled, bgSave is automatically executed.
  6. The RDB file is replaced each time.

About optimization:

Redis will compress the RDB file, using LZF algorithm, so that the final RDB file is much smaller than the memory size, enabled by default. But it consumes CPU.

RDB faults:

  1. There is no second persistence.
  2. The old Redis is not compatible with the new RDB.

RDB advantages:

  1. Compact files are suitable for backup and full copy scenarios. For example, bgSave every 6 hours, save to file system, etc.
  2. Redis loads RDB recovery data much faster than AOF.

Third, AOF

Due to the real-time problem of RDB, AOF (Append Only File) is the mainstream way of Redis persistence at present.

AOF features:

  1. The default file name is appendone.aof. As with RDB, it is saved in the configuration dir directory.
  2. Compared with RDB, AOF saves the write command every time and has higher real-time data.
  3. Because AOF records write commands every time, the files are large and therefore need to be optimized, called the “rewrite mechanism” (more on that below).
  4. Each time AOF saves a write command, it is placed in a buffer and synchronized to disk according to different policies (detailed below).

“Rewrite mechanism” details:

  1. Fork child process (similar to BGSave)
  2. The main process writes to two buffers, the original “AOF cache” and the “AOF rewrite buffer” for the child process.
  3. Child processes write to new AOF files, batch, default 32MB; Notify the main process when finished.
  4. The main process writes the AOF rewrite buffer to the new AOF file.
  5. Replace the old file with the new AOF file.

Rewrite flowchart:

Buffer synchronization policies, controlled by the appendfsync parameter, have three types:

  1. Always: calls the system fsync function until the data is synchronized to the hard disk. Severely affects redis performance.
  2. Everysec: OS write is called, the buffer is written, and then Redis executes OS fsync every second. This approach is recommended.
  3. No: Only the WRITE OS function is executed. The disk synchronization policy is determined by the OS. Not recommended. Data is insecure and may be lost easily.

Persistent recovery

Both AOF and RDB files can be used for data recovery during server restart. The detailed process is shown as follows:

As can be seen from the figure, AOF is loaded first, and RDB is loaded only when there is no AOF. If there is an error in the AOF or RDB, the load fails.

5. Troubleshooting and performance optimization

Redis persistence is a high risk factor for Redis performance and is a common interview question.

1, fork operation

When Redis does RDB or AOF overrides, it must fork, which is a heavyweight operation for the OS. Also, fork copies some data, not all of the main process’s physical space, but the main process’s spatial memory page table. For a 10GB Redis process, approximately 20MB of memory page tables need to be copied, so the fork operation takes time depending on the total amount of memory in the process, plus it takes more time if virtualization technologies such as Xen virtual machines are used.

A normal fork takes about 20 milliseconds. Why? If the OPS of a Redis instance is over 50,000, if the fork operation takes seconds, the execution of tens of thousands of commands will be slowed down, which will have a significant impact on the production environment.

We can query the latestForkusec indicator in Info Stats to obtain the elapsed time of the last fork operation in microseconds.

How to optimize:

  1. Prefer physical machines or efficient fork-enabled virtualization technologies to Xen.
  2. Control the maximum memory of redis instances, and try to keep it within 10GB.
  3. Configure a Linux memory allocation policy to prevent the fork failure caused by insufficient memory.
  4. Reduce the frequency of forks, such as moderately relaxing the AOF automatic trigger timing, to avoid unnecessary full replication.

2. Child process overhead

After the fork is finished, a sub-process is created, which is responsible for RDB or AOF rewriting. This part of the process mainly involves the optimization of CPU, memory and hard disk.

  1. The process of a CPU writing a file is cpu-intensive, and typically the child process has a single-core CPU utilization of close to 90%. How do you optimize it? Since CPU intensive operations, do not bind a single-core CPU, as this will compete with the parent CPU. Also, don’t be on a different machine from other CPU-intensive services. If multiple instances of Redis are deployed, try to ensure that only one child process performs the rewrite at any one time.
  2. In Linux, however, the copy on Write mechanism enables the parent process to share the same physical memory page. When the parent process processes a write operation, the parent process creates a copy of the page to be modified. During the fork operation, the child process shares the memory snapshot of the parent process. That is, if there is a memory modification during the rewrite, the parent process is responsible for creating a copy of the modified memory page. This is where memory is consumed. How do you optimize it? Try to ensure that only one child process is working at a time; Avoid overwriting a large number of writes.
  3. Disk Disk overhead analysis: Sub-processes write RDB or AOF files to disks for persistence, which may cause pressure on disks. You can use tools such as iostat and ioTOP to analyze disk load.

How to optimize:

  1. Do not store on the same machine with other high-disk load services, such as MQ.
  2. AOF rewriting consumes a lot of disk I/O. Enable no-appendfsync-on-rewrite. This function is disabled by default. Indicates that no fsync operation is performed during AOF rewrite.
  3. In high concurrency scenarios, if common mechanical disks are used and the write rate is about 100MB per second, the performance bottleneck of Redis is hard disks. SSD is recommended.
  4. If multiple Redis instances are configured on a single machine, you can configure different instances to store AOF files on different disks to share disk pressure.

3. AOF append block

When AOF persistence is enabled, the common policy for disk synchronization is “sync per second” Everysec to balance performance and data security. In this way, Redis uses another thread to perform fsync every second to synchronize disks. When system resources are busy, the main redis thread will block.

The flow chart is as follows:


As you can see from the figure above: The Everysec configuration can lose up to 2 seconds of data, not 1 second; If the system fsync is slow, the Redis main thread will block, affecting efficiency.

Problem location:

  1. Logs are entered when AOF blocking occurs. Logs the behavior of AOF fsync blocking that slows down the Redis service.
  2. The AOFdelayedFsync indicator accumulates in the Info Persistence statistics whenever an AOF append blocking event occurs, which can be used to locate AOF blocking problems.
  3. AOF synchronization has a maximum delay of 2 seconds. If the delay occurs, disk performance is faulty. You can use the ioTOP monitoring tool to locate processes that consume I/OS.

4. Single-node multi-instance deployment

Redis single-threaded architecture does not take full advantage of multi-core cpus. The common practice is to deploy multiple instances on one machine. When multiple instances are enabled on AOF, they will compete with each other for CPU and IO.

How to solve this problem?

Let the AOF of all instances execute serially.

We write Shell scripts from the info Persistence information about AOF, and then serialize the AOF Persistence of the instance.

The whole process is shown as follows:

By constantly judging the state of AOF, AOF rewrite is manually performed to ensure that there is no competition in AOF. For specific Shell writing and info information judgment, you can see the following figure:

Six, summarized

This article focuses on Redis persistence. Persistence has always been a high risk factor for Redis performance and is often asked in interviews. It includes the specific advantages and disadvantages of RDB and the advantages and disadvantages of AOF. In fact, due to the real-time problems of RDB, AOF is widely used at present. Persistent recovery also takes precedence over AOF.

There are several aspects to persistence: fork time, child CPU, memory, hard disk overhead, AOF synchronization blocking, single-node multi-instance deployment.

These optimizations can be examined by the analysis written earlier.

Focus on not lost, follow-up updates with more dry goods.