Redis Distributed lock

My Github blog post

Website to explain

Distributed locks are necessary in an environment where many different processes need to mutually exclude shared resources

Redis author proposed Redlock algorithm

Introduction:

Safety and Liveness

It is necessary to design a reasonably distributed lock and meet the basic guarantee 1. Security -> mutual exclusion. Under any conditions, only one client can get lock 2. Activity (reliability)A -> deadlock detection, even if the client that acquired the lock crashes or becomes unavailable, but eventually the lock can be acquired up to 3. Activity (reliability)B -> fault tolerance. As long as most of the Redis nodes are alive, the client can acquire and release locksCopy the code

Why isn’t failover implementation sufficient

Client A acquires the lock. 2. Master crashes before the command is transmitted to slave. 3. Client B attempts to acquire the lock, and it is obtained -> In this case, the security of redis distributed lock is not satisfied. In extreme cases, when the cluster service fails, multiple clients may acquire the lock simultaneouslyCopy the code

The correct implementation of a single instance

Before we can solve the above problems, we need to design the basic Redis distributed lock

When we acquire the lock, execute the following command line:

SET resource_name my_random_value NX PX 30000 NX indicates that the value can only be SET if the key does not exist. PX indicates the timeout period. My_random_value must be unique among all clients and lock requestsCopy the code

To release the lock, you need to mark it with my_random_value and then del, so the lua script is recommended

if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end
Copy the code

instructions

1. This prevents other clients from generating locks. (Probably without further explanation.) My_random_value = "clientId" or "RC4"; expire = "expire"; Yes Lock automatic release time + the time required by the client to perform transactions within lock time (before another client acquires the lock) = Lock automatic release time + the time required by the client to process transactions (during the lock) Considering the guarantee of mutual exclusion, this time window should be limited to after the lock is acquiredCopy the code

The basic operations have been introduced and can now be optimized

Redlock algorithm

In the distributed version, we assume that we have N redis matser, each independent and unrelated (not in the same cluster), which can be deployed on different servers or virtual machines. Assume that N is set to 5. 2. Client A tries to obtain the lock (ordered) in all N Redis, using the same key and value. In this step, due to the need to traverse multiple Redis services (setOrder request, before understanding into business processing time), can lead to blocking, need to set the timeout, assuming that lock automatically release time was 10 s, then the timeout time can be set in the range of 5-50 milliseconds, this step, prevent the client during the course of acquiring a lock caused by collapse, not with redis node locks around 3. When client A obtains the lock, it obtains the current timestamp and subtracts the time in Step 1 to obtain the time consumed in obtaining the lock. Within the lock validity period, the distributed lock can be officially obtained only when client A obtains at least three (3) locks. Each time a lock is acquired (from each master), the elapsed time can be set to the initial elapsed time minus the elapsed time (since the lock is already acquired, the elapsed time does not count). Suppose client A fails to acquire locks in Step 2, such as most locks, or locks that have not timed out, it needs to release the locks it has acquired on A small number of RedisCopy the code

Is this algorithm asynchronous

The algorithm relies on the assumption that the clock accuracy of the different processes is the same, and that no time synchronization is required. Similarly, in real life, everyone uses their own computer (and time on it), and there is usually no problem; In this case, we need to be more specific about the mutex rule: it guarantees that only the mutex that acquires the lock will finish its work within the lock's lifetime, minus a small time difference (a few milliseconds, to compensate)Copy the code

Failed retries (in this case, the failure to acquire the lock)

When the client does not lock, it should retry after a random time points, this in order to avoid multiple clients attempt to access the same resource at the same time (similar to fissure, probably means a lot of competition, only to find that no one get a lock), the faster the client to get the lock (early), the fissure is smaller (or retry the smaller), so ideally, The client can simultaneously (multiplexing) sendsetCommands are given to individual mastersCopy the code

Release the lock

The lock release step is simple. The lock is released on all master instances, regardless of whether the client successfully acquired the lock on that instanceCopy the code

Security discussion

Assuming that a client can acquire locks on most instances, all instances will have a key with the same timeout, but the key will be set at different points in time (i.eset), so the key will timeout at different times. However, when the first key is set to at least T1 and the last key is set to T2(timeout time), we can confirm that the first key will exist at least MIN_VALIDITY=TTL-(T2-T1) -clock_drift, and other keys will fail later. We're going to set the time to this at the same time when a key isset(NX), other keys cannot be used by other clientsset---- in general is the time Settings and concurrency controlCopy the code

Reliability discussion

1. Automatically released locks can still be acquired. 2. The client usually helps to delete those locks that are not acquired (Step 5). If the lock is acquired and the work is finished, it does not need to wait for the lock timeout to obtain the lock again. 3. Clients need to retry to acquire locks, and in order to compete for resources, they should wait longer than they need to acquire locks from most instancesCopy the code