Redis persistence method RDB&AOF

preface

The performance of Redis is largely due to the fact that all data is stored in memory. However, when Redis restarts, all data stored in memory is lost. In some cases, we want Redis to ensure that data is not lost after the restart. We can synchronize data from memory to disk in some form so that data can be recovered from the records in the disk after a restart. This process is called persistence.

Redis support persistence in two ways, one is the RDB, one kind is AOF, the former under the rules of the specified “time” will be in memory data stored on disk, while the latter after each execute commands will record the command itself, two kinds of persistent way one can be used alone, but more often is a combination of use.

RDB

When certain conditions are met, Redis automatically generates a copy of all data in memory and stores it on disk. This process is called “snapshot”. Redis snapshots data in the following situations:

Create an automatic snapshot based on the configuration rules
SAVEorBGSAVEThe command
performflushallThe command
When performing replication

The actual work of loading the RDB file is done by the rdb.c/rdbLoad function, which has a relationship with the rdbSave function as shown below.

The four cases of data snapshot

1. Create an automatic snapshot based on the configuration rules

Redis snapshot allows the user to custom conditions, when eligible for snapshot, Redis will automatically perform the snapshot operation, a snapshot of conditions can be made by the user in the configuration file custom, made up of two parameters: time window number N, M and the changes of key whenever time are changed by the number of keys in M is greater than N, which comply with the requirements for automatic snapshot. Such as:

save 900 1
save 300 10
save 60 10000

Multiple conditions can exist simultaneously in an or relationship. In this example, save 900 1 means that a snapshot is taken if one or more keys are changed within 900s.

2,SAVE / BGSAVEThe command

In addition to making Redis automatically take snapshots, we also need to take snapshots manually during service restart, manual migration, and backup. Redis provides two commands to accomplish this task.

SAVEThe command

When the SAVE command is executed, Redis synchronizes the snapshot operation and blocks all requests from the client during the snapshot execution. When there is a lot of data in the database, Redis will not respond for a long time. Therefore, avoid using this command in the production environment.

BGSAVEThe command

You are advised to run the BGSAVE command to manually execute a snapshot. The BGSAVE command can take snapshots asynchronously in the background while the server continues to respond to requests from the client. After BGSAVE is executed, Redis immediately returns OK to indicate that the snapshot operation has started. If you want to know whether the snapshot is complete, you can run the LASTSAVE command to obtain the last successful snapshot execution time. The result is a timestamp, such as:

redis> LASTSAVE
( integer) 1423537869
Copy the code

Because the BGSAVE command is saved by the child process, the Redis server can continue to process the client’s command requests while the child process creates the RDB file, but during the BGSAVE command execution, the server processes SAVE, BGSAVE, BGREWRITEAOF will use these three commands differently than usual. (The BGREWRITEAOF command is an AOF rewrite command, described later in this article.)

During BGSAVE execution, both the SAVE command and BGSAVE command are rejected by the server. The server disallows both the SAVE command and BGSAVE command, or both BGSAVE commands, to prevent the parent (server process) and child from executing two rdbSave calls simultaneously. Prevent the creation of competitive conditions.

The BGREWRITEAOF command cannot be executed while the BGSAVE command is executing:

ifBGSAVEThe command is being executed, then the client sentBGREWRITEAOFThe command will be deferred toBGSAVERun the command after the command is executed.
ifBGREWRITEAOFThe command is being executed, then the client sentBGSAVEThe command will be rejected by the server

Since the actual work of the BGSAVE and BGREWRITEAOF commands is performed by child processes, there is no operational conflict between the two commands. It is only performance that prevents them from being executed at the same time. Both child processes do a lot of disk writing, so avoid both.

3, performFLUSHALLThe command

When the FLUSHALL command is executed, Redis flushes all data in the database. It is important to note that Redis takes a snapshot whenever the automatic snapshot condition is not empty, regardless of whether or not the database is emptied. For example, when the snapshot condition is defined to automatically take a snapshot when 10,000 keys are changed in a second, the FLUSHALL command also triggers a snapshot when there is only one key in the database, even if only one key is actually changed.

When automatic snapshot conditions are not defined, the FLUSHALL command does not perform snapshots.

4. Perform replication

When the master/slave mode is set, Redis takes an automatic snapshot at replication initialization. RDB snapshot files are generated even if no automatic snapshot condition is defined and no manual snapshot operation is performed.

RDB file structure

1, the beginning of the RDB file is REDIS part, length of 5 bytes, save “REDIS” five characters, through five characters, the program can load the file, quickly check whether the file is RDB file.

The value is a string integer that records the RDB file version number. For example, “0006” indicates that the RDB file version is version 6

Databases contain 0 or any number of databases and key-value pair data in each database:

If the server’s database state is empty, this part is also empty and has a length of 0 bytes
If the server database state is non-empty, this section is also non-empty, and the length of this section varies depending on the number, type, and content of key/value pairs held by the database

4. The EOF constant is 1 byte long. This constant marks the end of the RDB file’s body content

REDIS, db_version, databases, EOF, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases, databases The RDB file is checked for errors or corruption by comparing the calculated checksum of the loaded data with the check_sum recorded by the RDB file.

AOF

In addition to RDB persistence, Redis also supports AOF(Append Only File) persistence. Unlike RDB persistence, which records database state by saving key/value pairs in the database, AOF persistence records database state by saving write commands executed by the Redis server.

When the server is started, it can restore the database state before the server was shut down by loading and executing the commands saved in the AOF file.

AOF persistent implementation

The implementation of AOF persistence can be divided into three steps: command append, file write, and file sync.

Command to add

When AOF persistence is enabled, the server appends a write command to the end of the aOF_buf buffer of the server state in a protocol format after executing it.

File writing and synchronization

The Redis server process is an event loop in which file events are responsible for receiving client command requests and sending command replies to the client, and time events are responsible for executing functions that need to be run periodically, such as the serverCron function.

In order to improve the efficiency of file writing, in a modern operating system, when the user invokes the write function, some data will be written to the file, the operating system will often write data is temporarily stored in a memory buffer, wait until the buffer space fill up, or more than the specified time limit, after data from the buffer was really written to disk.

While this improves efficiency, it also poses a security problem for writing data, because if the computer goes down, the written data stored in the memory buffer will be lost.

For this purpose, the system provides two synchronization functions, fsync and fdatasync, which can force the operating system to immediately write the data in the buffer to the hard disk to ensure data writing security.

AOF file loading and data restoration

Because the AOF file contains all the write commands needed to restore the database state, the server simply reads and re-executes the write commands saved in the AOF file to restore the database state before the server was shut down.

Redis reads the AOF file and restores the database state as follows:

Create a pseudo client without network connection: becauseRedisThe command can only be executed in the client context, while loadingAOFThe commands used in the file are directly derived fromAOFFile instead of network connection, so the server uses a dummy client with no network connection to executeAOFFor the write command saved by the file, the effect of executing the command by the pseudo client is exactly the same as that of executing the command by the client with network connection.
fromAOFParse and read a write command from a file
Use pseudo clients to execute read write commands
Perform Step 2 and Step 3 untilAOFAll write commands in the file are processed

When the above steps are complete, the database state saved by the AOF file is restored in its entirety, as shown in the figure below.

AOF rewrite

Because AOF persistence is executed by saving write command to record the state of the database, so as the server is running the passage of time, AOF the contents of the file will be more and more, the volume of the file will be more and more big, if not controlled, the volume is too large AOF file is likely to affect Redis server, even the entire host computer, And the larger the size of AOF files, the more time it takes to restore data using AOF files.

If a key is set multiple times, the AOF file will store the set command multiple times. To solve the problem of bloated AOF files, Redis provides the AOF file rewrite function. With this function, the Redis server can create a new AOF file to replace the existing AOF file. The old and new AOF files store the same database state, but the new AOF file does not contain any redundant commands that waste space. So new AOF files are usually much smaller than old AOF files.

In fact, AOF file rewrite does not require the old AOF file to be read, parsed, or written. This function is achieved by reading the current database state of the server.

For example, in order to save the status of a list key, the server must write a lot of set commands in the AOF file. If you want to record the current state with a minimum of commands, the most simple and efficient way is not to read and analyze the existing AOF file, but to directly read the list value from the database. Then replace many commands to save AOF files with a single set command.

The AOF rewrite function, aof_rewrite, does a fine job of creating a new AOF file, but because it does a lot of writing, the thread that calls it blocks for a long time because the Redis server uses a single thread to process command requests. So if the server called the aof_rewrite function directly, the server would not be able to process the command requests from the client while the AOF file was being overwritten.

Therefore, Redis decided to execute the AOF rewrite in a child process, which serves two purposes simultaneously:

The server process can continue processing command requests during the child process AOF rewrite
The child process has a copy of the data of the server process, using the child process instead of the thread, avoiding the use of locks to ensure data security

Use the child process, however, there is a problem to be solved, because the child process in AOF rewriting, the server process also need to continue to process the command requests, and new command may modify the existing state of the database and making the current database server status and rewritten AOF files do not match the saved database state.

To solve this data inconsistency problem, the Redis server sets up an AOF rewrite buffer, which is used after the server creates the child process. When the Redis server executes a write command, it sends the write command to both the AOF buffer and the AOF rewrite buffer

This means that the server process needs to do the following three things during AOF rewrite by the child process:

(1) Run the command sent by the client

(2) Append the executed write command to the AOF buffer

(3) Append the executed write command to the AOF rewrite buffer.

This ensures that:

The contents of the AOF buffer are periodically written and synchronized toAOFFile on existingAOFThe processing of documents will proceed as usual
All write commands executed by the server from the time the child process is created are loggedAOFRewrite buffer

When the child process completes the AOF rewrite, it sends a signal to the parent process. Upon receiving the signal, the parent process calls a signal handler and performs the following tasks:

willAOFOverwrite everything in the buffer toAOFFile, now newAOFThe database state saved in the file will be the same as the current database state of the server
For the newAOFFiles are renamed atomically to overwrite existing onesAOFFile, complete both old and newAOFFile replacement

AOF file background rewrite process

This is how the BGREWRITEAOF command works.

RDB versus AOF

The advantage of RDB:

RDB files are compact, full backup, and suitable for backup and disaster recovery
When the RDB file is generated, Redis forks () a child process to handle all the save work, and the main process does not need to do any disk IO
RDB can recover large data sets faster than AOF

RDB disadvantage:

When snapshot persistence is implemented, a child process is enabled to take charge of snapshot persistence. The child process has the memory data of the parent process, and the child process does not react to the parent process’s memory modification. Therefore, the data modified during snapshot persistence is not saved and may lose data.

AOF advantage:

It can protect data from loss. Generally, AOF will execute fsync operation every 1s through a background thread, and the data will be lost at most 1s
AOF files do not have any disk addressing overhead, write performance is very high, and files are not prone to breakage
Even if the AOF file is too large, the background rewrite operation does not affect the client read and write
Commands for AOF files are recorded in a very readable manner, ideal for emergency recovery in the event of catastrophic deletions

AOF disadvantage:

AOF files are usually larger than RDB data snapshot files for the same data
When AOF is enabled, the supported write QPS are lower than those supported by RDB because AOF is typically configured to fsync once per second

summary

The command	RDB	AOF
Start the efficiency	Large data sets are efficient	Small data set efficiency is high
The file size	small	big
Recovery rate	fast	slow
Data security	Lost data	By strategy
Light and heavy	heavy	light

There are pros and cons to both methods of persistence, and a combination of them is usually used to implement persistence.