Hello, I’m Xiao Lin.

Although Redis is an in-memory database.

However, it provides two techniques for data persistence, namely “AOF logging” and “RDB snapshot”.

Each of these techniques uses a log file to record information, but the content is different.

  • The contents of the AOF file are operation commands;
  • The contents of an RDB file are binary data.

I introduced AOF persistence in the last article, but today I’ll focus on RDB snapshots.

The so-called snapshot is to record a certain moment, for example, when we take a picture of the landscape, the picture and information of that moment is recorded to a photo.

So, an RDB snapshot is a snapshot of memory data at a moment in time, recording actual data, while an AOF file is a log of command operations, not actual data.

Therefore, when Redis restores data, RDB restores data more efficiently than AOF, because RDB files are read directly into memory, and there is no need to execute additional command steps to restore data as AOF does.

Next, let’s talk more about RDB snapshots.

How do I use snapshots?

To get familiar with something, it is a good way to see how to use it first.

Redis provides two commands to generate RDB files, save and bgSave. The difference is whether they are executed in the “main thread” :

  • After executing the save command, an RDB file will be generated in the main thread. Since the RDB file is in the same thread as the operation command, if it takes too long to write the RDB file, the main thread will be blocked.
  • When bgSAVA is executed, a child process is created to generate the RDB file, which prevents the main thread from blocking.

The loading of RDB files is performed automatically when the server is started, and Redis does not provide a specific command for loading RDB files.

Redis can also use configuration file options to automatically execute bgSAVA commands at regular intervals, providing the following configuration by default:

save 900 1
save 300 10
save 60 10000
Copy the code

Despite the name of the savA option, the bgSAVA command is actually executed, which creates child processes to generate RDB snapshot files.

Bgsava is executed whenever any of the above conditions are met, which means, respectively:

  • Make at least one change to the database within 900 seconds;
  • At least 10 changes were made to the database within 300 seconds;
  • Within 60 seconds, at least 10,000 changes were made to the database.

Note that Redis snapshots are full snapshots, which means that “all data” in memory is recorded to disk each time a snapshot is taken.

Therefore, it can be considered that performing a snapshot is a heavy operation, and if the frequency is too frequent, it may affect Redis performance. If the frequency is too low, more data will be lost when the server fails.

It is usually possible to save a snapshot for at least 5 minutes. If Redis is down, the data may be lost for up to 5 minutes.

This is the disadvantage of THE RDB snapshot. In the event of a server failure, more data will be lost than the AOF persistent mode. Because the RDB snapshot is a full snapshot mode, the execution frequency should not be too frequent, otherwise it will affect Redis performance. So less data is lost.

Can data be modified when performing bgSAVA snapshots?

When executing the bgSAVA, the main thread can continue to work because the child process builds the RDB file. Can the main thread modify the data?

If you can’t modify the data, then performance is significantly reduced. If you can modify the data, how?

To conclude, Redis can still process commands during bgSAVA execution, that is, data can be modified.

So how do you do that? The key technology is copy-on-write (COW).

When the bgSAVA command is executed, fork() is used to create a child process that shares the same memory as the parent process because the page table of the parent process is copied, but the page table points to the same physical memory.

Physical memory will be copied only if memory data is modified.

The purpose of this is to reduce the performance loss when creating the child process, and thus speed up the creation of the child process, after all, during the creation of the child process, it will block the main thread.

So, when a bgSave child is created, since all the memory data of the parent process is shared, it can directly read the memory data in the main thread and write the data to the RDB file.

When the main thread is also read-only on these shared memory data, the main thread and bgSave child processes do not affect each other.

However, if the main thread modifies A piece of shared data (for example, key-value pair A), write-on-copy occurs, and the physical memory of the piece of data is copied (key-value pair A’), and the main thread modifies the copy (key-value pair A’). In the meantime, the BGSave child process can continue to write the original data (key-value pair A) to the RDB file.

That’s it. Redis uses bgSave to take a snapshot of all the data currently in memory. This is done in the background by the BGSave child process without blocking the main thread, which allows the main thread to modify the data at the same time.

Careful students, certainly found, bgSave snapshot process, if the main thread to modify the shared data, write replication, RDB snapshot save is the original memory data, and the main thread just modified data, is written in this time RDB file, can only be sent to the next BGSave snapshot.

So when Redis uses bgSave snapshot, if the main thread changes the memory data, whether it is shared or not, the RDB snapshot cannot write the data to the main thread because the main thread memory data is separated from the child memory data. The memory data written by the child thread to the RDB file can only be the original memory data.

If the system crashes just after the RDB snapshot file is created, Redis will lose the data that the main thread modified during the snapshot.

In addition, there is an extreme case when copying while writing.

When Redis forks RDB persistence, the primary and child processes share the same physical memory, but along the way the primary process processes a write operation, modifying the shared memory, so that the physical memory of the currently modified data is copied.

In the extreme case, if all shared memory is modified, the memory footprint is twice as large.

Therefore, in the scenario where many write operations are performed, pay attention to memory changes during the snapshot process to prevent memory usage from being full.

RDB and AOF

Although RDB is faster than AOF in data recovery, the frequency of snapshots is difficult to determine:

  • If the frequency is too low, a large amount of data may be lost once the server breaks down between snapshots.
  • If the frequency is too high, frequent writing to disk and creating child processes can incur additional performance overhead.

Is there any method that not only has the advantage of RDB recovery speed, but also has the advantage of AOF losing less data?

Of course there is, that is to use RDB and AOF together, this method is introduced in Redis 4.0, this method is called mixing AOF log and memory snapshot, also known as mixed persistence.

To enable mixed persistence, set the following configuration item to yes in the Redis configuration file:

aof-use-rdb-preamble yes
Copy the code

Hybrid persistence works in the AOF log rewrite process.

When mixed persistence is enabled, when the AOF overwrite log is forked, the overwrite child will first write the memory data shared with the main thread to the AOF file in RDB mode, and then the operation commands processed by the main thread will be recorded in the overwrite buffer. The delta command in the rewrite buffer is written to the AOF file as AOF, and the main process is told to replace the old AOF file with a new AOF file in both RDB and AOF formats.

That is, with mixed persistence, the first half of an AOF file is full data in RDB format and the second half is incremental data in AOF format.

The advantage of this is that when rebooting Redis to load data, the first half of the data is RDB content, so the load time is very fast.

After the content of RDB is loaded, the content of the second half of AOF will be loaded. The content here is the operation command processed by the main thread during the rewriting of AOF by the subprocess of Redis backend, which can make the data less lost.


Recommended reading

The illustration Reids | AOF persistence

Graphic Reids | avalanche, breakdown, traditional cache


Looking back, I found that this article is very few pictures ah, not the graphical tool name ha ha.

Compare rush, did not go to elaborate technical diagram, but the text description is still very smooth, easy to understand, hee hee.