Quickly understand Redis persistence

Redis series of articles

How much do you know about Redis? To fill in the gaps?
Redis hasn’t heard of these data structures and you’re out

Why persistence

Simple, because Redis is memory based. If the data is not persisted, the data cannot be restored when the server is restarted or down. Therefore, to ensure data security, you need to persist the data in the memory to disks.

Persistence of Redis

Redis provides two methods of persistence, namely RDB and AOF.

RDB : RedisThe default persistence mode is based onThe snapshotWhen certain conditions are metRedisData in memory is automatically snapshots and persisted to disk.
AOF : RedisNot enabled by defaultAOFPersistence, which needs to be set in the configuration fileappendonly trueOpen it. What it stores isRedis 的 Sequential instruction sequence 。

RDB persistence

Save blocking mode

There is a command in Redis that can trigger RDB persistence, but this operation blocks Redis.

This command is save, we all know that Redis is single-threaded, if persistent for special processing, then it will block other commands and make Redis unavailable for a short time, if the RDB file is large, then flush operation will take tens of seconds, seriously affecting availability. So we generally don’t use the save command.

Bgsave Background mode

Bgsave, as the name suggests, is saved in the background. Redis does some special processing when executing this command.

What special treatment?

First, the main Redis process calls glibc’s fork, which will persist the file to the child, and the parent process can continue processing the user’s request (bgSave returns directly after execution). Of course, it is possible to temporarily block user request commands when the main process forks.

Everything has advantages and disadvantages, you see BGSave so good, do not have disadvantages? The answer is yes. Where is it? Let’s look at COW first.

COW

Copy on Write (COW) refers to copying while writing. We know that RDB persistence requires traversing the data in memory, as shown in the figure below.

Since we need the data (snapshot) at the moment when the child process is generated, if the main process is requested to modify the memory, then the memory traversed by the child process will be changed, which is not the snapshot data.

So COW, a mechanism for generating snapshots, is used. We know that the above data segment is actually composed of multiple operating system pages. COW actually means that when the main process needs to modify data in memory, it copies the page where the data to be modified resides first, and then makes changes on the copied page. Additional memory is used when copying pages, which is why BGSave takes up more memory than Save, but don’t worry too much because there aren’t a lot of user requests to modify data during resaving, and the extra memory isn’t much.

The RDB is automatically triggered

There are several situations in Redis for RDB persistence, so even if you turn RDB off in your configuration file, Redis will persist RDB.

When the condition is met

There are three configurations in the Redis configuration file.
```
save 900 1
save 300 10
save 60 10000
Copy the code
```
Save seconds changeTimes, the first number after save is the time, the second number is the number of changes, These three configurations mean that 1 change in 900 seconds or 10 changes in 300 seconds or 10,000 changes in 60 seconds are automatically persisted by the RDB.
Shutdown RDB persistence occurs when Redis is normally disabled.
Flushall produces an empty RDB file
When a full copy is made from master/slave replication (for now, I’ll talk about Redis clusters in a later article)

The advantages of RDB

RDBFile, which does some compression, stores data, okQuick disaster recovery.
Good for cold preparation.

The disadvantage of RDB

Easy to lose dataBecause theRDBI need to go through all the data in memory, soTo make aRDBOperation is a laborious operationIn order to ensureRedisThe high performance you need to minimizeRDBSo you might lose data for a while.

AOF persistence

Redis does not have AOF persistence enabled by default. You need to configure this in the configuration file.

/ #aof file directory appendFilename "appendonly-6379.aof" #aof filenameCopy the code

After AOF persistence is enabled, Redis implements the AOF persistence policy as specified in appendfSync.

# appendfsync always Writes disks every time a modification is performed. Appendfsync Everysec writes disks every second. # appendfsync NoCopy the code

Redis provides a total of three parameters. Consider setting Everysec to write disks per second to reduce efficiency and data loss.

AOF principle

The AOF log stores the operation instructions for Redis. With the AOF log, we can use it to replay Redis.

For example, if the set Hello World and sadd userset FancisQ commands are recorded in the AOF log, we can replay the AOF file to an empty Redis instance, and finally the empty Redis has the above two records.

You may notice that the two persistence methods in Redis are similar to the bin log and redo log methods in MySQL, but note that AOF in Redis executes commands first and stores logs later. This is the opposite of WAL in MySQL.

Why is that? I think there are two things.

RedisIs weakly transactional, we don’t need to guarantee strong consistency of data. inMySQLWe used inredo logTwo-phase commitTo ensure that thesave-crashAbility, while inRedisWe obviously don’t need to do this. If the command goes down before we can write to the log, it goes down. Because of weak transactions, we don’t have to guarantee that the data exists.
To avoid logging incorrect instructions, logging first means we didn’t do it in the first placeLogic processing and parameter verificationSo that would beA lot of wrong instructions were recordedBut we knowAOFFile is requiredThin bodyThese incorrect instructions will be givenAOFSlimming brings a lot of trouble.

AOF rewrite

The slimming mentioned above is actually AOF rewrite, we know that the AOF file is stored in the order of instructions, when Redis runs for a long time, many instructions will be generated.

For example set a b,set a C,set a d…..

In fact, the above three are to operate on the data with key a. In RDB, it may only store A = D, but because of the instruction mechanism of AOF, it must have three, but the preceding one is meaningless, which will waste a lot of space and bring trouble to AOF replay.

So Redis does an automatic AOF rewrite when the AOF file is too large (meets certain criteria). There are two corresponding configurations in the configuration file.

# indicates that the current AOF file will be overwritten when the size of the last AOF file exceeds the percentage of the current aOF file size. If not overwritten, use auto-aof-rewrite-percentage as the starting size of the auto-aof-rewrite-percentage 100 # file to overwrite the auto-aof-rewrite-min-size 64mbCopy the code

So how is AOF rewritten?

bgrewriteaof

This is an AOF override command. Like BGSave, Redis forks a sub-process that overwrites the AOF file. The general process is as follows:

Advantages and disadvantages of AOF

Disadvantages: With the same amount of data,AOFThe file size is greater thanRDBThe files are much larger,If you use it for memory state recovery, it takes a long time.
Advantages: Fast persistence,Reduces the amount of data lostIn the configuration,everysecAt most, seconds of data will be lost.

Redis hybrid persistence

Prior to Redis 4.0, we used to enable AOF and then disallow AOF log replay when we needed to restore memory state (using RDB would lose a lot of data). However, if the Redis instance is large and the AOF file is large, the Redis restart will be very slow.

To solve this problem, after Redis 4.0 we can store RDB files together with AOF incremental logs. If we do a memory state recovery at this point, we can use the previous RDB section first, and then use the incremental AOF log generated after RDB persistence to reduce the memory state recovery time.

How to choose RDB and AOF

In general, you should use both persistence capabilities if you still have some security requirements for your data.
ifCan withstand minute-level data lossYou can just useRDB.
AOFTry to useeverysecConfiguration ensures both data security and performance efficiency.
RDBShut downsave seconds changeTimesThis automatic persistence mechanism, or the proper use of parameters.

Mind mapping

Thanks for reading. Let me share a mind map with you (#^.^#)