This is an RDB snapshot, which can record the actual data

Abstract:The so-called snapshot is to record a certain instant thing. For example, when we take a picture of the scenery, the picture and information of that instant are recorded into a picture. An RDB snapshot is the memory data that records a certain instant, and the actual data is recorded.

Share this article from huawei cloud community “graphic Redis | not much said, this is the RDB snapshot”, the original author: xiao Lin coding.

Although Redis is an in-memory database, it provides two techniques for persisting data. These are “AOF logs and RDB snapshots”.

Each of these technologies uses a log file to record information, but the content is different.

The contents of the AOF file are operation commands;
The contents of an RDB file are binary data.

Today I’ll focus on RDB snapshots.

The so-called snapshot is to record a certain instant thing. For example, when we take a picture of the scenery, the picture and information of that instant are recorded into a picture.

Therefore, an RDB snapshot is the memory data that records the actual data at an instant, while an AOF file records the log of the command operation rather than the actual data.

Therefore, when Redis recovers data, the efficiency of RDB recovers data is higher than that of AOF, because RDB files can be directly read into the memory, and there is no need to execute additional steps of operation commands like AOF to recover data.

Next, let’s talk about RDB snapshots in detail.

How do you use snapshots?

The best way to get familiar with something is to see how it is used.

Redis provides two commands to generate RDB files, save and bgsave. The difference between them is whether they are executed in the “main thread” :

If the save command is executed, the RDB file will be generated on the main thread. Because it is on the same thread as the operation command, if the RDB file is written too long, the main thread will be blocked.
When the bgsava command is executed, a child process is created to generate the RDB file, so that the main thread is not blocked.

The loading of RDB files is done automatically when the server starts, and Redis does not provide specific commands for loading RDB files.

Redis can also automatically execute bgsava commands every once in a while through the option of the configuration file. By default, the following configuration is provided:

save 900 1
save 300 10
save 60 10000

Despite the name of the option sava, you are actually executing the bgsava command, which creates a child process to generate the RDB snapshot file.

As long as any of the above conditions are met, bgsava will be executed, which means:

Make at least one database change within 900 seconds. Make at least 10 changes to the database within 300 seconds. At least 10,000 changes were made to the database within 60 seconds. Just to mention, Redis snapshots are full snapshots, which means that every time you take a snapshot, “all of the data” in memory is recorded to disk.

So it can be argued that taking snapshots is a heavy operation, and if it happens too often, it can have an impact on Redis performance. If the frequency is too low, more data will be lost when the server fails.

It’s usually possible to save a snapshot for at least 5 minutes, and if Redis goes down or something like that, that means up to 5 minutes of data will be lost.

This is the disadvantage of RDB snapshot. In the event of server failure, more data will be lost than AOF. Since RDB snapshot is a full snapshot, it should not be executed too frequently, otherwise it will affect Redis performance, while AOF log can record operation commands in seconds. So less data is lost.

Can the data be modified when a snapshot is executed?

In the process of executing bgsava, the main thread can still work because it is assigned to the child process to build the RDB file. Can the main thread modify the data?

If you can’t modify the data, then the performance will be greatly reduced. And if you can modify the data, how?

To cut to the bottom line, Redis can continue to process operations during bgsava, meaning that data can be modified.

So how do you do that? The key technology is copy-on-write (COW).

When bgsava is executed, the child process shares the same piece of memory with the parent process through fork(), because the page table of the parent process is copied when the child process is created, but the page table points to one physical memory.

A copy of physical memory is made only if the memory data is modified.

The goal is to reduce the performance cost of creating a child process, which would block the main thread.

Therefore, after creating the bgsave child process, since all the memory data of the parent process is shared, it can directly read the memory data in the main thread and write the data to the RDB file.

When the main thread is also read-only on the shared memory data, then the main thread and the bgsave child process do not affect each other.

However, if the main thread wants to modify A piece of shared data (such as A key-value pair A), write-time replication will occur, and the physical memory of that piece of data will be copied (key-value pair A’), and then the main thread will make modifications on that copy of data (key-value pair A’). In the meantime, the bgsave subprocess can continue writing the original data (key-value pair A) to the RDB file.

This is done in the background by the bgsave subprocess without blocking the main thread, which allows the main thread to modify the data at the same time.

In the process of BGSAVE snapshot, if the main thread modifies the shared data, the RDB snapshot saves the original memory data, and the data just modified by the main thread is written to the RDB file at this time, so it can only be sent to the next BGSAVE snapshot.

Therefore, when Redis uses bgsave snapshot, if the main thread modifies the memory data, no matter whether it is shared memory data or not, the RDB snapshot cannot write the newly modified data of the main thread, because the memory data of the main thread and the memory data of the child thread have been separated. Memory data written by a child thread to an RDB file can only be the original memory data.

If the system crashes right after the RDB snapshot file has been created, Redis will lose any data that the main thread modified during the snapshot.

In addition, this is the extreme case when copying while writing.

When Redis performs RDB persistence, the primary process and the child processes share the same physical memory at fork. However, the primary process processes the write operation and modifies the shared memory, so the physical memory of the current modified data is copied.

In the extreme case, if all shared memory is modified, the footprint will double.

Therefore, in the case of many write operations, we should pay attention to the memory changes during the snapshot to prevent the memory from being full.

RDB combined with AOF

Although RDB can recover data faster than AOF, the frequency of snapshots is tricky:

If the frequency is too low, once the server goes down between two snapshots, more data may be lost.
If the frequency is too high, frequent writing to disk and creating child processes incurs additional performance overhead.

Is there any method that not only has the advantage of fast RDB recovery, but also has the advantage of less lost AOF data?

There is, of course, the combination of RDB and AOF. This method was introduced in Redis 4.0. This method is called mixed AOF logging and memory snapshot, also known as mixed persistence.

If you want to enable mixed persistence, you can set the following item to Yes in the Redis configuration file:

AOF-use-RDB-preamble YES mixed persistence works during the AOF log rewrite process.

When mixed persistence is enabled, the fork writes the memory shared by the main thread to the AOF file in RDB mode, and then the main thread writes the operation commands in the rewrite buffer. The increment commands in the overwrite buffer are written to the AOF file in AOF mode. When the write is completed, the main process is told to replace the old AOF file with the new AOF file containing both RDB and AOF format.

That is, the use of mixed persistence, AOF filesThe first half is full volume data in RDB format, and the second half is incremental data in AOF format.

The advantage of this is that when Redis restarts to load the data, the first half of the data is RDB content, so it will load faster.

After the contents of the RDB are loaded, the latter part of the AOF content is loaded. Here, the contents are the operation commands processed by the main thread during the rewriting of the AOF by the Redis backstage process, so as to reduce the loss of data.

Click on the attention, the first time to understand Huawei cloud fresh technology ~

This is an RDB snapshot, which can record the actual data

How do you use snapshots?

Can the data be modified when a snapshot is executed?

RDB combined with AOF

Related Posts

On the architect: roles, capabilities, and challenges

Oracle —- window function (1)

Real-time data acquisition using Debezium, Postgres, and Kafka (CDC)