🚄

You can become what you dream to be. You are where you are today based on everything you believe in and dream about.

🚄 Premise summary

Why does Redis need persistence

Redis is an in-memory database with memory-level IO. Data is stored in memory. However, as we all know, data changes in memory are very fast, so we need to persist data, otherwise data will be lost. Fortunately, Redis also provides persistence mechanisms for us, which are RDB(Redis DataBase) and AOF(Append Only File).

🚄 Redis persisted

🚄 snapshot (the snapshot)

  • This is not only for Redis, the concept of snapshot is full, such as game archive, RDB is Redis snapshot, according to the point in time, but the time interval is large easy to lose data, the advantage is fast recovery. All the data states at the time point are recorded.

🚄 log (log)

  • AOF is a Redis log, which is linear and additive and records the execution status of the command. With the continuous execution of the Redis command, the Redis log will become larger and larger, but because all Redis operations are recorded, the data recovery is complete. However, because the log is linear, it records the text of the command execution. In this way, data recovery is performed again, which is slow.

Redis turns on RDB by default, because Redis fast is what it’s after, and most Redis scenarios are also cache. So persistence is also necessary for Redis, because if data is not available, requests will be passed through and Redis will crush MySQL.

🚄 Log implementation:

  1. (append fsync always) Redis accepts the modification operation, the memory becomes clogged after the realization, calls the IO to write to the disk, returns success, such a chain call to ensure that each command will not be lost.

  2. (append fsync no) Redis only flusher the changes to the kernel, and we all know that the data the program writes to the kernel is not immediately written to disk. Batch write (pagecache) when it reaches a certain level.

  3. (Append fsync Everysec) the compromise is to swipe once per second and control the amount lost per second relative to the first time limit. Logs are appending and get bigger and bigger, so it doesn’t work. After redis4. x, when AOF logging is enabled, the deletion cancellation instruction is changed to all deletion, the RDB is generated, and then the RDB is put into the log, forming a mixed log scheme.

Here are two ways to introduce.

🚄 Persistent process

Redis data can be saved to disk through the following five processes:

  1. The client sends write operations (data in the client’s memory) to the server.

  2. The database server receives the data for the write request (the data is in the server’s memory).

  3. The server calls write and save to make system calls, and the data is written to disk (the data is in a buffer in system memory).

  4. The operating system transfers the data in the buffer to the disk controller (the data is in the disk cache).

  5. The disk controller writes data to the physical medium of the disk (the data actually falls onto the disk).

These 5 processes are a normal save process under ideal conditions, but in most cases, our machine and so on will have various failures, here are divided into two cases:

  1. Redis database failure, as long as the above third step is completed, then you can persist save, the remaining two steps by the operating system for us to complete.

  2. If the operating system is faulty, you must complete the preceding five steps.

    • Here we only consider the possible failure of the saved process. In fact, the saved data may also be damaged, requiring a certain recovery mechanism, but we will not extend it here.

    • The main consideration now is how Redis implements the above five steps for saving disks. It provides two policy mechanisms, namely RDB and AOF.

🚄 RDB mechanism

🚄 Automatic trigger

  • An RDB is simply a snapshot of data stored on disk. What is a snapshot? You can think of it as taking a picture of the current moment and saving it.

  • RDB persistence refers to writing a snapshot of an in-memory data set to disk at a specified interval. This is also the default persistence mode, which is to write the in-memory data as a snapshot to a binary file named dump.rdb by default.

    • After we installed Redis, all the configuration was in the redis. Conf file, which saved the various configurations of the RDB and AOF persistence mechanisms.

    • Since the RDB mechanism works by taking a snapshot of all the data at one point in time, there should be a trigger mechanism to do this. For RDB, three mechanisms are provided: Save, BGSave, and automation.

Save trigger mode

This command blocks the current Redis server and Redis cannot process other commands during the execution of the save command until the RDB process is complete.

The specific process is as follows:

If an old RDB file exists at the end of execution, the new RDB file is replaced with the old one.

Bgsave Trigger mode

When this command is executed, Redis asynchronously takes snapshots in the background and responds to client requests at the same time. The specific process is as follows:

The Redis process forks to create a child process. The RDB persistence process is responsible for the child process and ends automatically after the process is complete. Blocking occurs only during the fork phase, which is usually very short. Basically all RDB operations inside Redis use the BGsave command.

🚄 Automatic trigger

Automatic triggering is done by configuration files. In the redis.conf configuration file, there is the following configuration that we can set:

  • Save: This is used to configure the RDB persistence conditions that trigger Redis, that is, when data in memory is saved to hard disk. For example, “Save m n”. Bgsave is automatically triggered when the data set is modified for n times within m seconds.

The default configuration is as follows:

No persistence is required, so you can comment out all save lines to disable the save function.

If at least 10 keys have changed in 60 seconds, save 900. If at least 10 keys have changed in 60 seconds, save 300. If at least 10000 keys have changed in 60 seconds. Save 60 10000Copy the code
  • Stop-writes-on-bgsave-error: The default value is yes.

    • Whether Redis stops receiving data when RDB is enabled and the last background save fails. This makes the user aware that the data has not been persisted to disk correctly, otherwise no one will notice that a disaster has occurred.
    • If Redis is restarted, you can start receiving data again
  • Rdbcompression; The default value is yes. You can set whether to compress snapshots stored in disks.

  • Rdbchecksum: The default value is yes. After storing the snapshot, we can also have Redis use the CRC64 algorithm to validate the data, but this adds about 10% of the performance cost and can be turned off for maximum performance gains.

  • Dbfilename: specifies the snapshot name. The default name is dump.rdb

  • Dir: sets the directory where the snapshot file is stored. The configuration item must be a directory rather than a file name.

🚄 ADVANTAGES and disadvantages of RDB

✔ ️ advantage
  • More complete: RDB files are compact, full backup, perfect for backup and disaster recovery.

  • Can be flushed asynchronously: when generating an RDB file, the main Redis process forks () a child process to handle all the save work without the main process doing any disk IO operations.

  • Fast recovery: RDB can recover large data sets faster than AOF.

  • It takes up less space

❎ disadvantage
  • An RDB snapshot is a full backup that stores the binary serialized form of in-memory data and is very compact in storage.

When a snapshot of persistent, opens a child process responsible for snapshot persistence, the child will have the memory data of the parent, the parent process to modify memory the child won’t be reflected, so during the snapshot persistent modification of data will not be saved, may lose data (command transmission will solve this problem, Temporarily compress the assignment cache or write the cache to store the data changed during the process.

  • Because the backup interval is too long, data integrity and consistency are slightly worse.

  • Because a child process is forked out to process the data, there is more resource consumption and CPU burden.

🚄 AOF mechanism

Full backups are always time-consuming, and sometimes we provide a more efficient way, AOF, which works simply as Redis appends every write command it receives to a file via the write function. The popular understanding is logging.

🚄 Persistence principle

Here’s how it works:

Whenever a write command comes in, it’s stored directly in our AOF file.

🚄 Principle of file rewriting

Another problem with the AOF approach is that persistent files get bigger and bigger in order to compress AOF’s persistent files.

  • Redis provides the bgrewriteaof command. Memory data is saved to a temporary file by command, and a new process is forked to rewrite the file.

    • Redis Bgrewriteaof command. The Redis Bgrewriteaof command is used to asynchronously perform an AppendOnly File (AOF) File rewrite operation. Overwrite creates a volume-optimized version of the current AOF file. . Even if Bgrewriteaof fails, no data is lost because old AOF files are not modified until Bgrewriteaof succeeds. Note: as of Redis 2.4, AOF overrides are triggered by Redis itself, BGREWRITEAOF is only used to trigger overrides manually. .

The operation of overwriting an AOF file, rather than reading the old AOF file, commands the entire contents of the database in memory to rewrite a new AOF file, similar to a snapshot.

🚄 AOF also has three trigger mechanisms

  1. Always: Synchronous Persistence Every data change is recorded immediately. Poor disk performance but good data integrity

  2. Everysec: Asynchronous operation, recorded every second If the machine goes down within one second, data is lost

  3. Different no: never synchronize

🚄 advantages

  1. AOF can better protect against data loss. Generally, AOF will execute fsync operation every second through a background thread and lose data for a maximum of one second.
  2. AOF log files do not have any disk addressing overhead, write performance is very high, and files are not prone to breakage.
  3. Even if the AOF log file is too large, the background rewrite operation does not affect the client read and write.
  4. Commands for AOF log files are logged in a very readable manner, which is ideal for emergency recovery in the event of catastrophic deletions. Flushhall flushes all data in the flushhall file. Rewrite in the background has not yet happened. Flushhall deletes the last item in the AOF file and then flushes the AOF file back

🚄 shortcomings

  1. AOF log files are usually larger than RDB data snapshot files for the same data

  2. When AOF is enabled, the write QPS supported is lower than the write QPS supported by RDB, because AOF is typically configured to fsync log files once per second, although the performance is still very high

  3. In the past, there was a bug in AOF, that is, the same data was not recovered when the logs recorded by AOF were recovered.

🚄 RDB and AOF how to choose

If you choose, the two are better together. Because you get the two persistence mechanisms, you’re left with your own requirements, and you don’t have to choose between them, but they’re usually used in combination. Here’s a picture to summarize: