Redis persistence (RDB)

【AOF Logs 】

AOF is a post-write log. That is, Redis writes data into the memory after executing commands. Post-write logging avoids logging bad commands, and it does not block the current write operation.

However, there are two potential risks with AOF: if the command goes down right after execution, the command is at risk of being lost. When Redis is used directly as a database, data recovery is not possible; AOF logs are also written to disks in the main thread. If a large AOF file slows disk write, subsequent operations are blocked. So the AOP mechanism provides three write-back strategies, which are the three optional values of the AOF configuration item appendfsnyc.

Write back strategy:

1.Always synchronizes write back. Logs are written back to the disk immediately after the command is executed. Highly reliable but affecting main thread performance.

2.Everysec writes back logs every second. After the command is executed, logs are written to the memory buffer and logs are written to disks every second. Reduced performance overhead but with the possibility of downtime loss.

3. The No operating system determines the timing of writing logs in the memory buffer to disk. Log commands are easy to lose. But even if we choose the right write back strategy based on system performance needs, AOF can still be problematic. AOF is written to files as an append, so as more write commands are received, the AOF file gets bigger and bigger. Too many AOF files slow the append of commands, and if there is an outage, AOF is used for fail-over and commands are executed one by one, which can affect Redis usage. So there’s a rewriting mechanism for AOF.

The AOF override mechanism occurs when the AOF file is too large, and Redis creates a new AOF file based on the current state of the database, in which each key/value pair is written with a command. It essentially turns multiple commands in an old file into a single command, reducing space. The generated AOF file is written to disk by the backend process bgreWriteAOF, so it avoids the main thread being blocked by the AOF rewrite operation. In the process of AOF rewrite, the main thread will write the new instruction record to the old AOF buffer and the new AOF buffer at the same time, ensuring that the log operation will be abstained.

However, AOF logs are only used in the process of “write operations”. For fault recovery, AOF can only replay all operation commands, which affects the execution efficiency of the Redis main thread. So we use RDB memory snapshots for faster data recovery.

【RDB snapshot 】 :

A memory snapshot writes the status of a given moment to disk as a file. RDB replication is typically full replication, which involves a large amount of data and blocks the main thread. So Redis provides the bgsave command for RDB replication: Redis creates a subprocess that is dedicated to writing RDB files without blocking the main thread.

However, when a snapshot is generated, if new data is written, the snapshot will be incomplete. If write operations are blocked, Redis performance will be reduced. Therefore, Redis adopts copy-on-write to ensure the reliability of RDB snapshots.

Copy-on-write: For read operations, the main thread and child processes are not affected; For the main thread, the fork child actually copies the main thread’s original page table. The main thread copies the data to the copy. The main thread operates on the copy and modifies its own page table mapping, so the child process is unaffected by the main thread.

[Choice of persistence mechanism]

If the RDB replication frequency is too small, the data records during the idle period will be lost. Too many RDB copies can put too much pressure on the disk, and the process of forking can itself block the main thread, and the larger the main thread, the longer it blocks. So we use a mix of AOF logs and RDB snapshots so that snapshots don’t have to be executed very often, and AOF only needs to record the increments between two RDB snapshots, avoiding overwriting overhead caused by a large file.

Redis elimination mechanism

Redis’s flushing mechanism is triggered when the memory used by Redis exceeds maxMemory or when a key/value pair reaches its expiration date.

Flush policy – A scan delete policy for situations where the memory space limit is exceeded

Do not perform data elimination: noevction: If Redis exceeds the memory space limit, Redis will directly report an error if there is another write request.

“Filter key-value pairs with expiration time set” (full memory or KV expired)

① Volatile – TTL: Deletes the packets according to the expiration time. The earlier the packets expire, the earlier the packets are deleted.

② Volatile -random: Redis randomly selects 20 expired keys and deletes those expired keys. If more than five keys are deleted, Redis will do the loop again immediately.

③ Volatile – lRU: least recently used. Each field data in Redis has an LRU field that records the timestamp of the last access. When Redis triggers the elimination mechanism, N data will be randomly selected as a candidate set, and then the data with the smallest LRU will be eliminated. When the elimination mechanism is triggered again, Redis will only select the data whose LRU field value is less than the minimum LRU in the set to enter the candidate set. When the candidate set data reaches the threshold, the data with the minimum LRU will be deleted. However, for scanning single query, that is, a large number of key-value pairs are accessed in only one query, lRU will only take out one of the key-value pairs when the elimination mechanism is triggered each time, and the other low-frequency key-value pairs are still retained in memory. LRU pays more attention to the timeliness of data and is suitable for general scenarios.

(4) volatile – lfu:

LFU adds a counter to each data to count the number of times the data is accessed. When Redis triggers the LFU elimination mechanism, the data with the lowest access times is eliminated first. If the two data have the same access times, THE LFU compares the access timeliness of the two data to eliminate the data whose access time is longer than the previous one.

LFU implementation: the original LRU field is divided into IDT access timestamp and counter access times, the number of visits is compared first and then the timestamp. Lfu_log_factor is the current value of the counter *lfu_log_factor and the reciprocal value of the counter, and then compare the size with the random number (0,1), if greater than +1).

In addition, LFU also has a lFU_decay_time attenuation factor to continuously attenuate the number of data accesses to prevent the data from being accessed for a period of time. The number of data accesses is still large and cannot be eliminated. Redis calculates the difference between the current time and the last time the data was accessed, and then divides by lFU_decay_time to calculate the value to be attenuated.)

LFU focuses on the frequency of data access and applies to scan query scenarios

[Filter all data] (full memory)

① Allkeys-random: deletes randomly.

② Allkeys-LRU: same as lRU

③ ALLKeys-LFU: same as lFU

However, Redis does not automatically write the modified data back to the database, and the changes will be lost if Redis data is subsequently discarded. Therefore, we need to ensure that MySQL and Redis are consistent when modifying data.

Redis cache consistency with the database

Read and write cache

Read/write caching is the operation of the cache read/write equivalent scenario (Redis as database, MySQL backup). Generally, data is written to the database at the same time as writing to the cache. When the cache is full, a flushing mechanism is used to flush the cache into the database. However, asynchronous database drop-off failure may occur. In this way, short-term services are not affected. However, services are affected once the cache is used up or eliminated.

Solution: 1) Full proofreading can be carried out at dawn every night, subject to Redis. 2) You can use message queues: asynchronously update database operations into message queues. If the operation succeeds, delete the message. Otherwise, retry the message.

Read/write cache concurrency problems:

In concurrent reads and writes, because the cache is written first, there may be a temporary inconsistency between the cache and the database. However, concurrent reading of all new data in the cache does not affect the business logic.

In concurrent writing, thread A and thread B may perform operations to update the cache and update the database in inconsistent order. As A result, thread A overwrites the database updated by thread B, and thread B overwrites the Redis updated by thread A, resulting in inconsistent data. So we can only use ReentrantLock to read and write locks to ensure serialization for each thread

Read-only directed cache

Read-only cache applies to scenarios where most operations on the cache are read and few changes are made. New data is written directly to the database, not to the cache. If it is not in the cache, it is read from the database. When deleting or modifying data, it will invalidate the data mark in the cache and wait for the next time to query the new value from the database.

Read-only cache has cache consistency problems caused by deletion and modification:

(1) If the data in the cache fails to be deleted, the old data in the database will be found in the query again. If deleting the database data fails, the old data in the cache is directly retrieved. Solution: Use a message queue to temporarily store the value to be deleted. After the deletion, send back an ACK and remove the value from the queue. After several retries, the remaining value in the queue indicates that the deletion failed, and an error message is sent to the service layer.

② If the cache is deleted first, the database is not updated in time due to network delay. When other threads start reading data, they will read old data in the database. Solution: 1. Delayed double deletion – After deleting the cache and updating the database value, the thread sleeps for a short time and deletes the cache again to ensure final consistency. The other way around, same thing.

Cache abnormal

Hot key problem

Suddenly there are a lot of requests to access a particular key, and the concentration of traffic leads to an avalanche of Redis

The solution

Break up the hot key to different servers in advance

Load hot keys to the memory in advance. If redis is down, the memory will be removed

Cache avalanche

A large amount of hot data in Redis expires at the same time. The request for that data will then go directly to the database. If the number of concurrent requests is high, the database may become overwhelmed and crash. Or cache avalanche caused by Redis instance outage.

The solution

The cache is periodically updated in advance, and the expiration time is set to the original time + a random number to avoid simultaneous invalidation.

Service degradation can be used to suspend non-core data queries from the cache in the event of a cache avalanche, allowing them to return predefined information or null values directly.

Build the Redis cache cluster in a master-slave manner.

Cache breakdown (not in Redis, but in DB)

A large number of requests are made to a key set that has just failed, causing a large number of requests to pass through Redis directly to the database, i.e. Redis fails (as if a hole has been punched in a bucket)

The solution

If the cache is updated periodically, you can use primary/secondary polling to resolve cache breakdown. When adding a cache, add the primary/secondary cache. When updating the cache, update the secondary cache first. The user queries the primary cache first, and the secondary cache if the user cannot find it

Lock update: for example, if a key does not exist in Redis, lock it, query it in the database, write to the cache, return it to the user, and unlock it. This allows subsequent requests to retrieve data from the cache

Cache penetration (not in Redis or DB)

External malicious attack, a large number of requests to access data that does not exist in Redis, resulting in a large number of requests directly into the database.

The solution

Use bloom filters to solve cache penetration: Quickly determine if an element is in the database. If not, return directly to avoid searching for the data stored in the database and save the data in the Bloom filter. [redis→ Bloom filter →mysql]

Redis- Distributed lock

Distributed lock based on single Redis node:

Use a String key-value pair as a distributed lock with value set to 1 when the lock is added and 0 when the lock is released. Because a single Redis node operates on data single-threaded, Redis processes requests from multiple clients serially.

Locking involves three operations, so use the SETNX command to do the locking, DEL command to complete the lock is released (the SET key value NX PX 10000 expiration time of 10 seconds/redisTemplate opsForValue () setIfAbsent (k, v, time, timeUnit). If the kv to be set doesn’t exist then it will create the KV first and then set the value. Atomicity is guaranteed here, i.e. SETNX and EXPIRE are atomic operations.

However, in case a client fails to execute the DEL command and causes a deadlock: we set an expiration time for the lock variable. (Expire key xxxS) In order to differentiate lock variables from different clients and prevent different clients from operating the same lock with the same identity, each client should set value as its unique value when obtaining lock variables to prevent other clients from owning their own lock.

Lua scripts work with distributed locks

But there are problems up there. When we were preparing DEL to delete the lock, it happened that the lock was invalid, and another request acquired the lock with the same key value, so we mistakenly deleted the lock of others. That is, SETNX PX and DEL are not atomic operations. So we’re going to use Lua scripts for atomicity.

Highly reliable Distributed lock based on Redis Cluster (red lock algorithm) :

Basic idea: let the client and a number of independent Redis instances request lock in turn. If the client can successfully lock more than half of the Redis instances, then the client has successfully obtained the distributed lock

Steps:

① The client obtains the current time

② The client executes the lock operation on the N Redis instances in sequence (combined with the above lua script).

③ When the client completes the locking operation with all Redis instances, the client calculates the total time of the whole locking process. Then determine whether the client obtains the lock from more than half of the Redis instances && the total elapsed time does not exceed the lock validity time. If yes, reset the effective time of the lock to the initial value – the total time it takes for the client to lock, then we can evaluate the effective time and the operation time we need, if not enough, we will not lock directly. If not, the locks are released in turn.

The problem is that we don’t know how long the client needs distributed locking.

If the setting is too short, the client may be released by Redis without completing the lock. If the set time is too long, the client cannot release the lock in time, causing other clients to block and wait. So, we can set a short expiration time for each lock, and then automatically extend the expiration time of the lock before the client is finished using it.

So how to realize automatic renewal of lock?

An event loop is performed using a coroutine that attempts to extend the lock expiration time each time the 0. X * lock_TIMEOUT event passes and then adds itself to the event loop. When the client releases the lock, it creates an event that cancels the loop event and adds it to the coroutine.

Redis instance down

We can use Redis master-slave shard to separate read and write. The master library handles the write operation and then synchronizes it to the slave library. Both master and slave process read operations. Redis uses master/slave synchronization mechanism and sentinel mechanism to ensure the consistency between master and slave libraries, so as to solve the problem of master library downtime. Uniformly allocate requests to nodes using the consistent hash algorithm.

Master-slave synchronization

In essence, the consistency between master and slave data is guaranteed by persisting files.

(1) When a Slave instance starts, it sends a command to the master to establish a socket connection and a psync command to indicate data synchronization. After receiving the commands from the slave, the master forks a sub-process to perform full replication and generate a snapshot of the RDB. At the same time, the master records the incremental data during synchronization to the Replication Buffer. The RDB snapshot is sent to the slave after the generation is complete.

② The Slave takes the RDB snapshot, clears the current data, loads the RDB data into memory, and notifies the master to send it the incremental data in the Replication Buffer.

③ Subsequent increments are synchronized through the AOF log.

The guard mechanism

The sentry mechanism is used to realize automatic switch between master and slave nodes. Sentinel is essentially a special-status Redis instance that can monitor the availability of other Redis services through the info command.

① Master offline: The sentry will Ping the Redis service instance every second. If no response is received within a valid time, the sentry marks the Redis instance offline. If the service instance is the Master node, the sentinels will ask the other sentinels. When most sentries agree that the master is offline, the master is offline objectively, and the master is re-elected.

(2) Reelect the master node: The sentry selects the master node based on the length of time, the high priority, the large number of RDB data copied, and the small process ID. After the selection is complete, the slaveof no one command is sent to the master node to make it an independent node. It then sends commands to other nodes to make them children of the master node.

Split brain

In the Redis master-slave cluster, a new master node is created because the original master node is unable to respond to the heartbeat of the sentinels because the CPU is full, so the client does not know which master node to write data to. In addition, when sentry switches over the master node, after full replication of the original master database, all local data should be emptied, and then the RDB file of the new master database should be loaded. So the new write data saved during the master/slave switch between the old and new master repositories is lost.

How do I configure the Redis cluster

(1) Create a folder for each child node and modify the port in the configuration file corresponding to each node. ②redis-cli –cluster create IP +port1 IP +port2… — cluster-replicas 1 (1 for each primary node, 1 for each secondary node)