RDB and AOF

Persistent process

There are five processes:

  1. The client sends write operations (data in the client’s memory) to the server.
  2. The database server receives the data for the write request (the data is in the server’s memory).
  3. The server invokes the write system call to write data to disk (the data is in a buffer in system memory).
  4. The operating system transfers the data in the buffer to the disk controller (the data is in the disk cache).
  5. The disk controller writes data to the physical medium of the disk (the data actually falls onto the disk).

These 5 processes are a normal save process under ideal conditions, but in most cases, our machine and so on will have various failures, here are divided into two cases:

  • Redis database failure, as long as the above third step is completed, then you can persist save, the remaining two steps by the operating system for us to complete.
  • If the operating system is faulty, you must complete the preceding five steps.

In this paper, only the possible failure of the saving process is considered. In fact, the saved data may also be damaged, requiring a certain recovery mechanism, but it will not be extended here. The main consideration now is how Redis implements the above five steps for saving disks. It provides two policy mechanisms, namely RDB and AOF.

RDB

In Redis DataBase (RDB) mode, snapshots of data sets in memory are persisted to disks in binary file dump. RDB ata specified interval

Enable RDB persistence (default)

User-defined snapshot rules:

Save: This is used to configure the RDB persistence conditions that trigger Redis, that is, when data in memory is saved to hard disk. For example, “Save m n”. Bgsave is automatically triggered when the data set is modified for n times within m seconds.

The default Settings are as follows: save 3600 1: Snapshots are taken if at least one key is changed within 3600 seconds. Save 300 100: Snapshots are taken if at least 100 keys are changed within 300 seconds. Save 60 10000: Snapshots are taken if at least 10000 keys are changed within 60 seconds.

If persistence is not required, you can comment out all save lines to disable the save function or set it to Save “”.

RDB file saving process

  • Redis calls fork and now has a child and a parent.
  • The parent process continues to process client requests, and the child process is responsible for writing the memory contents to temporary files. Due to the COPY on Write mechanism of the OS, the parent process shares the same physical page. When the parent process processes a write request, the OS creates a copy of the page to be modified by the parent process instead of writing the shared page. So the data in the child’s address space is a snapshot of the entire database at fork.
  • After the child process writes the snapshot to the temporary file, it replaces the original snapshot file with the temporary file. Then the child process exits.

Save, BGSave, automation

Save trigger mode

This command blocks the current Redis server and Redis cannot process other commands during the execution of the save command until the RDB process is complete. The specific process is as follows:

If an old RDB file exists at the end of execution, the new RDB file is replaced with the old one. Our clients may be in the tens of thousands or hundreds of thousands, which is obviously not desirable.

The save command performs a synchronous save operation to save all the snapshots of the current Redis instance to the hard disk as RDB files.

127.0. 01.:6379> save
OK
Copy the code

Bgsave Trigger mode

When this command is executed, Redis asynchronously takes snapshots in the background and responds to client requests at the same time. The specific process is as follows:

The Redis process forks to create a child process. The RDB persistence process is responsible for the child process and ends automatically after the process is complete. Blocking occurs only during the fork phase, which is usually very short. Basically all RDB operations inside Redis use the BGsave command.

The Lastsave command returns the last time Redis successfully saved data to disk, in UNIX timestamp format.

127.0. 01.:6379> bgsave
Background saving started
127.0. 01.:6379> lastsave
(integer) 1632295819
Copy the code

Automatic trigger mode

Automatic triggering is done by our configuration file. In the redis.conf configuration file, there is the following configuration that we can set:

  1. Save: This is used to configure the RDB persistence conditions that trigger Redis, that is, when data in memory is saved to hard disk.For example, "Save m n". Bgsave is automatically triggered when the data set is modified for n times within m seconds.
  2. Stop-writes-on-bgsave-error: The default value is yes. Whether Redis stops receiving data when RDB is enabled and the last background save fails. This makes the user aware that the data has not been persisted to disk correctly, or no one will notice that a disaster has occurred. If Redis is restarted, you can start receiving data again
  3. Rdbcompression; The default value is yes. You can set whether to compress snapshots stored in disks.
  4. Rdbchecksum: The default value is yes. After storing the snapshot, we can also have Redis use the CRC64 algorithm to validate the data, but this adds about 10% of the performance cost and can be turned off for maximum performance gains.
  5. Dbfilename: specifies the snapshot name. The default name is dump.rdb
  6. Dir: sets the directory where the snapshot file is stored. The configuration item must be a directory rather than a file name.

Save is distinguished from BGSave

Since the third method is configured, let’s do a comparison of the first two:

Advantages of RDB

  • RDB files are compact, full backup, and the entire Redis database will contain only one file, ideal for backup and disaster recovery.
  • When the RDB file is generated, the main Redis process forks () a child process to handle all the save work. The main process does not need to do any disk IO.
  • RDB can recover large data sets faster than AOF.

RDB shortcomings

  • If you need to avoid losing data in the event of a server failure, the RDB is not for you. Although Redis allows you to set different save points to control how often RDB files are saved, it is not an easy operation because RDB files need to hold the state of the entire data set. So you’ll probably save your RDB file at least once every 5 minutes. In this case, you could lose several minutes of data in the event of a malfunctioning outage.
  • Each time the RDB is saved, Redis forks () out a child process that does the actual persistence. In large data sets, fork() can be time-consuming, causing the server to stop processing the client in so-and-so milliseconds; If the data set is very large and CPU time is very tight, this stop time can even take a full second. Although AOF overrides also require forking (), the durability of the data is not compromised regardless of the interval between AOF overrides.

If the data is relatively important and you want to minimize the loss, you can use AOF for persistence.

AOF

AOF(Append Only File). Redis appends each received write command to the File using the write function. The popular understanding is logging.

Whenever a write command comes in, it’s stored directly in our AOF file.

Enable AOF persistence

Step 1: Modify the redis.conf file to

Appendonly no ## Default persistent filename appendfilename"appendonly.aof"# appendfsync everysec # appendfsync no ## Do not actively synchronize, default 30 secondsCopy the code

Modified to

appendonly yes
Copy the code

Step 2: Specify the redis. Conf file to start

[root@172 redis-6.2. 5]# ./src/redis-server  /app/redis/redis-6.2. 5/redis.conf
[root@172 redis-6.2. 5] #Copy the code

AOF file saving process

Redis appends each received write command to the file using the write function (default appendone.aof).

When Redis restarts, it recreates the contents of the entire database in memory by re-executing the write commands saved in the file. Of course, since the OS caches the changes made by write in the kernel, they may not be written to disk immediately. It is still possible to lose some changes in aOF persistence.

Save the strategy

The configuration file tells Redis when we want to force the OS to write to disk via fsync. There are three methods as follows (default: fsync once per second)

  • Appendfsync always: Forcibly write to disk every time a write command is received. Slowest but ensures complete persistence. Not recommended
  • Appendfsync Everysec: forces writes to disk once per second, a good compromise between performance and persistence, recommended
  • Appendfsync no: completely dependent on OS, best performance, no persistence guaranteed

The AOF approach also poses another problem. Persistence files get bigger and bigger. For example, if we call the incr test command 100 times, we must save all 100 commands in the file, but 99 of them are redundant. To restore the state of the database, simply save a set test 100 file.

Principle of file rewriting

The AOF approach also poses another problem. Persistence files get bigger and bigger. To compress aOF persistence files. Redis provides the bgrewriteaof command. Memory data is saved to a temporary file by command, and a new process is forked to rewrite the file.

  • Redis calls fork and now has a parent and child process
  • The child process is based on the database snapshot in memoryTemporary file (new AOF file)To write the command to rebuild the state of the database
  • The parent process continues to process client requests except for writing the write command to the original AOF file. It also caches the received write commands. This ensures that there will be no problem if the child process rewrite fails.
  • When a child process writes snapshot contents to a temporary file in command mode, the child process sends a signal to notify the parent process. The parent process then writes cached write commands to the temporary file as well.
  • The parent process can now replace the old AOF file with a temporary file and rename it, and subsequent write commands will start appending to the new AOF file.

Note that the aOF file is overwritten. Instead of reading the old AOF file, the entire database contents in memory are command overwritten into a new AOF file, similar to a snapshot.

AOF advantages

  • AOF can better protect against data loss. Generally, AOF will execute fsync operation every second through a background thread and lose data for a maximum of one second.

  • AOF file is an append only log file (append only log), without any disk addressing overhead, high write performance, the file is not easy to damage, therefore, do not need to seek to write AOF file, Even if the log contains incomplete commands for some reason (for example, the disk was full when the write was made, the write was stopped, and so on), the Redis-check-aof tool can easily fix this problem.

  • Redis can automatically rewrite AOF in the background if the AOF file becomes too large, without affecting client reads and writes: The new AOF file contains the minimum set of commands needed to restore the current data set. The entire rewrite operation is absolutely safe because Redis continues to append commands to existing AOF files while creating new AOF files, and the existing AOF files will not be lost even if an outage occurs during the rewrite. Once the new AOF file is created, Redis switches from the old AOF file to the new AOF file and starts appending the new AOF file

  • AOF files hold all writes to the database in an orderly fashion, which is ideal for emergency recovery in the case of catastrophic deletions. These writes are stored in the Redis protocol format, so the contents of AOF files are easy to read and parse. Exporting AOF files is also very simple: For example, if you accidentally execute the FLUSHALL command, as long as the AOF file isn’t overwritten, stop the server, remove the FLUSHALL command at the end of the AOF file, and restart Redis, You can restore the data set to the state it was in before the FLUSHALL execution.

AOF shortcomings

  • AOF files are usually larger than RDB files for the same data set.
  • When AOF is enabled, the write QPS supported by RDB is lower than that supported by RDB, because AOF is usually configured to fsync log files once per second. Fsync performance is still very high under normal circumstances, but when fsync is disabled, AOF can be as fast as RDB. This is true even under high loads. However, RDB can provide a more guaranteed maximum latency when dealing with large write loads
  • AOF has had a bug in the past where an AOF file could not restore the dataset as it was saved when it was reloaded due to certain commands. (The blocking command BRPOPLPUSH, for example, has caused such a bug.) Test suites have added tests for this situation: they automatically generate random, complex data sets and reload them to make sure everything is fine. Although this kind of bug is not common in AOF files, RDB bugs are almost impossible by comparison.

RDB and AOF summary

  • RDB persistence
    • isThe snapshotSynchronous approach (periodic persistence of data)
    • Data may be lost after a breakdown at a certain point
    • Persistent dataLow efficiency.binaryData Recovery modeSpeed is fast
  • Aof persistence
    • isIncremental logSynchronous mode (persisting data to actions)
    • Data may also be lost at some point in time
    • Persistent dataHigh efficiencyTo restore data in command modeSlow speed
  • choose
    • In general, if you want to achieve data security, you should use both persistence features.
    • If you care deeply about your data, but can still afford to lose it within minutes, you can use RDB persistence only.
    • At most, one second of data will be lost using AOF

Redis distributed cache family

  • Redis Distributed Cache (1) – Redis Installation (Linux and Docker)