Memory Snapshot: How does Redis quickly recover from a downtime?

Last time, we learned about Redis’s AOF method to avoid data loss. The advantage of this method is that only the operation command needs to be recorded for each execution, and the amount of data that needs to be persisted is not large. In general, as long as you don’t use the always persistence strategy, you won’t have a significant impact on performance.

However, because operation commands are recorded, rather than actual data, operation logs need to be executed one by one when AOF is used for fault recovery. If there are too many operation logs, Redis will recover slowly, affecting normal use. This is certainly not an ideal outcome. So, are there other ways to ensure reliability while still recovering quickly from downtime?

Of course there is, and that’s another persistence method we’re going to look at today: memory snapshots. A memory snapshot is a record of the status of data in memory at a certain time. It’s similar to a photo, when you take a picture of a friend, a single photo can capture exactly what that friend looked like in an instant.

For Redis, it achieves a photo-like effect by writing the status of a given moment to disk as a file, known as a snapshot. In this way, even if there is a downtime, snapshot files will not be lost, and data reliability is guaranteed. This snapshot file is called an RDB file, where RDB stands for Redis DataBase.

Compared with AOF, RDB records data at a certain point in time, not operations, so when we do data recovery, we can directly read RDB files into memory, and quickly complete recovery. That sounds great, but a memory snapshot is not the best choice. Why do you say that?

There are two key issues to consider:

What data does the snapshot take? This is related to the efficiency of snapshot execution.
Can data be added, deleted, or changed when taking snapshots? This is related to whether Redis is blocked and can handle requests properly at the same time.

This may not be easy for you to understand, but let me take a photo as an example. When we take photos, we usually pay attention to two questions:

How to frame a scene? That is, who and what we intend to photograph;
Before pressing the button, remember to remind your friend not to move, otherwise the picture will be blurred.

You see, these two questions are very important? So, next, let’s talk about it in detail. Let’s start with “framing,” which data we’re taking snapshots of.

What memory data are snapshots taken for?

Redis data is all in memory, and to provide reliability for all data, it performs a full snapshot, that is, recording all the data in memory to disk, which is similar to taking a group photo of 100 people and putting everyone in a photo. The nice thing about this is that you record all the data at once, all of them.

When you take a picture of one person, you only need to coordinate one person, but when you take a picture of a large group of 100 people, you need to coordinate the position and status of 100 people, etc., which of course will be more time-consuming and laborious. Similarly, taking a snapshot of all the data in memory and writing it all to disk can take a lot of time. Also, the more full data, the larger the RDB file, and the more time it takes to write data to disk.

For Redis, its single-threaded model dictates that we should avoid all operations that block the main thread, so for any operation, we ask the soul question: “Does it block the main thread?” Whether or not the generation of RDB files blocks the main thread depends on whether or not Redis performance is reduced.

Redis provides two commands to generate RDB files, save and BGSave.

Save: Executed in the main thread, blocking;
Bgsave: Create a subprocess that is dedicated to writing to RDB files, avoiding blocking on the main thread, which is also the default configuration for Redis RDB file generation.

Well, at this point, we can perform the full snapshot with the BGSave command, which provides data reliability and avoids performance impact on Redis.

The next question we need to ask is, when taking a snapshot of the in-memory data, can it still “move”? That is, can the data be modified? This is important because, if the data can be modified, it means that Redis can still handle writes. Otherwise, all write operations will have to wait until the snapshot is finished, and performance will suffer.

Can data be modified during snapshot?

When we take a picture of someone, if they move, the picture is ruined and we have to take it again, so of course we want them to stay still. For memory snapshots, we don’t want the data to “move” either.

Let me give you an example. Let’s take a snapshot of the memory at time t. Assume that the amount of memory data is 4GB and the disk write bandwidth is 0.2GB/s. Simply speaking, it takes at least 20 seconds (4/0.2 = 20) to complete. If A memory data A, which has not been written to disk, is changed to A ‘at time t+5s, the snapshot integrity will be damaged because A’ is not the state at time t. So, like taking a photo, we don’t want the data to “move,” that is, not be modified, when we take a snapshot.

However, there are potential problems if the data cannot be modified during snapshot execution. In the example above, if the 4GB of data cannot be modified within 20 seconds of making the snapshot, Redis cannot handle the writes to the data, which will have a significant impact on the business services.

You might think that bgSave can be used to avoid blocking. This is where I get to a common misconception: avoiding blocking is not the same thing as handling write operations properly. At this point, it is true that the main thread is not blocking and can receive requests normally, but to ensure snapshot integrity, it can only handle read operations because it cannot modify the data of the snapshot being performed.

It is certainly not acceptable to pause a write for a snapshot. In this case, Redis uses the copy-on-write (COW) technology provided by the operating system to process Write operations while performing snapshots.

In simple terms, the BGSave child process is generated by the main thread fork and can share all memory data of the main thread. Once the BGSave child process runs, it starts reading the main thread’s memory data and writing it to an RDB file.

At this point, if the main thread also reads these data (for example, key-value pair A in the figure), then the main thread and the BGSave child do not affect each other. However, if the main thread modifies a piece of data (such as the key pair C in the figure), the piece of data is copied, making a copy of that data. The BGSave child then writes the copy to the RDB file, while the main thread can still modify the original data directly.

This ensures snapshot integrity and allows the main thread to modify data at the same time, avoiding impact on normal services.

Redis uses bgSave to take a snapshot of all the data currently in memory. This is done in the background by the child process, which allows the main program to modify the data at the same time.

Now, let’s look at another question: How often do you take snapshots? When we take pictures, there is also a technique called “tandem photography”, which can record the state of a person or an object for several consecutive moments. So, are snapshots also suitable for “continuous shooting”?

Can I take a snapshot every second?

For snapshots, the so-called “continuous” refers to a series of snapshots. In this way, the snapshot interval becomes very short. Even if an outage occurs at a certain point, not much data is lost because the snapshot was just executed at the last moment. However, the time between snapshots is critical.

As shown in the figure below, we took a snapshot at T0 and then a snapshot at T0+t, during which blocks 5 and 9 were modified. If the machine is down during the time period t, the machine can only be restored based on the snapshot taken at time T0. At this point, the modified values of data blocks 5 and 9 cannot be recovered because there is no snapshot record.

Therefore, in order to recover the data as much as possible, the t value should be as small as possible, the smaller t, the more like “beat”. So, how small can t be, say, one snapshot per second? After all, each snapshot is executed in the background by the BGSave child and does not block the main thread.

This is wrong. Although bgSave execution does not block the main thread, there are two costs if full snapshots are performed frequently.

On the one hand, frequently writing a full amount of data into disks puts a lot of pressure on disks. Multiple snapshots compete for limited disk bandwidth. As a result, a vicious circle is created.

The BGSave child, on the other hand, needs to be forked out of the main thread. Although the child process does not block the main thread after it is created, the process of forking itself blocks the main thread, and the larger the main thread, the longer it blocks. If bgSave child processes are forked frequently, this will block the main thread frequently. So, what’s another good idea?

In this case, you can create an incremental snapshot. After a full snapshot is created, subsequent snapshots record only the modified data to avoid the overhead of each full snapshot.

After the first full snapshot, we only need to write the modified data to the snapshot file at time T1 and time T2. However, to do this, we need to remember which data has been modified. Don’t underestimate this “remember” feature, which requires us to use additional metadata information to keep track of what data has been changed, which can cause additional space overhead issues. As shown below:

If we keep a record of every key pair modification, then if we have 10,000 key pairs that were modified, we need 10,000 additional records. Also, sometimes the key-value pair is very small, such as only 32 bytes, while recording the metadata information that it is modified may require 8 bytes, and such a picture incurs a large amount of extra space in order to “remember” the changes. This is not worth the cost to Redis, which has precious memory resources.

Compared with AOF, snapshot recovery is faster, but the snapshot frequency is difficult to grasp. If the frequency is too low, more data may be lost once the two snapshots break down. If the frequency is too high and there is extra overhead, what other ways to take advantage of RDB’s fast recovery and lose as little data as possible with less overhead?

Redis 4.0 proposes a hybrid approach using AOF logging and memory snapshots. In simple terms, memory snapshots are taken at a certain frequency, and AOF logs are used to record all command actions between snapshots.

This way, snapshots are not executed as frequently, which avoids the impact of frequent forks on the main thread. Also, AOF logs only record operations between snapshots, which means that you don’t need to record all operations, so you don’t get too big files and you avoid overwriting.

As shown in the following figure, changes made at time T1 and time T2 are recorded in AOF logs. After the second full snapshot is taken, the AOF logs can be cleared because the changes made at this time have been recorded in the snapshot and will not be used in recovery.

This method can enjoy the benefits of RDB file quick recovery, and can enjoy the simple advantage of AOF only record operation command, quite a bit of “you can have it both ways” feeling, recommend you use it in practice.

summary

In this lesson, we learned about the memory snapshot method Redis uses to avoid data loss. The advantage of this approach is that the database can be quickly restored by simply reading the RDB files directly into memory, which avoids the inefficient performance of AOF requiring sequential re-execution of operation commands one by one.

However, memory snapshots have their limitations. It takes a big picture of memory, which inevitably takes time and energy. Although Redis has designed bgSave and copy-on-write to minimize the impact of memory snapshots on normal reads and writes, frequent snapshots are still not acceptable. The mixed use of RDB and AOF can take the advantages of both and avoid the disadvantages of both, so as to ensure data reliability and performance with less performance overhead.

Finally, I would like to give you three suggestions on the selection of AOF and RDB:

When data cannot be lost, a combination of memory snapshots and AOF is a good choice;
If minute-level data loss is allowed, only RDB can be used.
If only AOF is used, everysec’s configuration option is preferred because it strikes a balance between reliability and performance.

Each lesson asking

I came across a scenario where we were running Redis on a cloud host with a 2 core CPU, 4GB ram, and 500GB disk. The Redis database was about 2GB in size and we used RDB for persistence. At that time, the running load of Redis was mainly modification operations, and the write/read ratio was about 8:2, that is, if there were 100 requests, 80 requests were modification operations. Do you see any risk in using RDB for persistence in this scenario? Can you help me break it down?

This is the end of persistence, which is the basis for mastering Redis. It is recommended that you study these two lessons carefully. If you feel there is a harvest, I hope you can help me share with more people, help more people to solve the problem of persistence.

Memory Snapshot: How does Redis quickly recover from a downtime?

What memory data are snapshots taken for?

Can data be modified during snapshot?

Can I take a snapshot every second?

summary

Each lesson asking

Related Posts

SQL Basic Ability -SQL query execution sequence

SQL Common Error Collection (continuously updated)

Why did I quit Google to work for myself