The problem of Redis and distributed lock has been a commonplace. This article tries to summarize some common solutions of Redis and Zookeeper to implement distributed lock, and provides some good practical ideas (based on Java). The deficiencies are welcome to be discussed.

Redis distributed lock

Distributed lock is realized under single Redis

Scheme 1: Run the SET command. If the current client needs to hold a user_Lock, for the first time it needs to generate a token (a random string, such as UIID) and use that token to lock.

Lock command:

redis> SET user_lock <token> EX 15 NX
OK
Copy the code

EX: This key specifies the expiration date after the specified time, in seconds. Similar parameters include PX, EXAT, and PXAT. NX: The key value is set only when the key does not exist.

So if the user_lock key does not exist, the above Redis command will successfully create the Redis key and set it to expire after 15 seconds. Other clients also use this command to lock. Within 15 seconds, other clients fail to lock. The command execution fails if the NX parameter guarantees the existence of the Redis key. Therefore, user_lock is locked in the current client for 15 seconds.

Why use tokens and expiration dates? There are the following reasons: (1) The validity period of the lock ensures that no deadlock occurs. Usually, the client holding the lock needs to release the lock (delete the Redis key) after the operation is completed. After the lock validity period is used, even if the client holding the lock goes offline, the lock will be automatically invalid 15 seconds later, and other clients can preempt the lock without deadlock. (2) The token prevents the client from releasing a lock that it does not own. If no token is used, all clients use the same value as the user_lock key. Suppose client A owns the lock user_lock, but the user_lock key is deleted by the Redis server due to the expiration time, then client B owns the lock. After client A finishes the operation, run the DEL command to delete the current user_lock key. In this way, client A deletes the lock that is not owned by client A. When allocating a token to each client and releasing the lock, the client needs to check whether the lock is currently owned by itself, that is, whether the value of user_lock is its own token. If yes, you can delete the key to avoid this problem. There are two commands that lua scripts can use to ensure atomicity. For example:

> EVAL "if redis.call('GET',KEYS[1]) == ARGV[1] then return redis.call('DEL',KEYS[1]) else return 0 end" 1 user_lock <token>
(integer) 1
Copy the code

Redis. IO /commands/se…

As can be seen from the above content, the distributed lock in this scheme is not secure. The client that possesses the lock will automatically lose the lock after the lock expires, and then other clients can possess the lock. In this way, two clients simultaneously possess a lock, and the distributed lock fails.

Therefore, the effective time of the lock in this scheme is very important. If the effective time of the lock is set too short, the distributed lock may fail. If the effective time of the lock is set too long, other clients still have to wait ineffectly for a long time to occupy the lock after the client with the lock goes offline, resulting in poor performance. Is there a better plan? Let’s look at plan 2.

Scheme 2: Automatically delay the lock validity time. We can in the first place to lock set a short effective time, and start a background thread, before the lock failure, delay the valid time of the lock actively, for example, at the beginning of a valid time for 10 seconds to set lock up, and start a background thread, every 9 seconds, will lock the expiration date of the modification to the current time after 10 seconds. The schematic code is as follows:

new Thread(new Runnable() {
    public void run() {
        while(lockIsExist) {
            redis.call("EXPIRE user_lock 10");
            Thread.sleep(1000 * 9);
        }
    }
}).start();
Copy the code

In this way, the lock currently occupied by the client will not be invalid when the time expires, avoiding the problem of distributed lock failure. In addition, if the current client fails to go offline, because there is no background thread to delay the lock validity time, the lock will be automatically invalid soon. Tip; When the current client releases the lock, you need to stop the background thread or change lockIsExist to false.

The Java client Redisson provides this solution, which is very easy to use. Here’s how to implement Redis distributed locks using Redisson. (1) Add Redisson reference.

<dependency>
    <groupId>org.redisson</groupId>
    <artifactId>redisson</artifactId>
    <version>3.16.4</version>
</dependency>
Copy the code

(2) The following is an example:

Config config = new Config();
config.useSingleServer().setAddress("redis://localhost:6379");
RedissonClient redissonClient = Redisson.create(config);

RLock lock = redissonClient.getLock("user_lock");
lock.lock();
try {
// process...
} finally {
    lock.unlock();
}
Copy the code

If there is no special reason, it is recommended to use the distributed lock provided by Redisson.

But is it safe?

Consider such a scenario, if the client that obtains the lock cannot obtain the CPU to execute the task on time due to high CPU load or GC or other reasons, the lock failure scenario will also occur.



There is no good solution for this scenario and it is not expanded.

Distributed lock is implemented in Sentinel and Cluster modes

In actual production environment, single-node Redis is seldom used. Sentinel and Cluster models are usually used to deploy Redis clusters. It will be a very troublesome problem for Redis to realize distributed locks offline in these two modes. To ensure high performance, Redis master/slave synchronization uses asynchronous mode, which means that the primary Redis node may not have synchronized the SET command when the primary Redis node responds successfully. If the primary node fails to go offline, then the following situation will occur: (1) Sentinel, Cluster mode will elect a secondary node as the new primary node, but this primary node does not execute the SET command. That is, the client does not possess the lock. (2) The client receives a successful response from the SET command (previously returned by the master node) and thinks it has successfully acquired the lock. In this case, other clients also request the lock and can also possess the lock, and then the distributed lock will fail.

The essence of this situation is that Redis uses asynchronous replication to synchronize data from the primary and secondary nodes, which does not strictly guarantee data consistency between the primary and secondary nodes. In this regard, the author of Redis proposed the RedLock algorithm, the approximate solution is to deploy multiple independent Redis master nodes, and send the SET command to multiple nodes at the same time, when more than half of the Redis master nodes return success, the lock is considered successful. This mechanism feels almost identical to the “Quorum mechanism” used in distributed consistency algorithms such as Raft. On whether the scheme can truly ensure distributed lock security, Redis author and another big guy Martin broke out a heated discussion, this article is biased to actual combat content, here is not a display of RedLock algorithm details.

Even if this algorithm can truly secure distributed locks, it is cumbersome if you want to use it, requiring multiple additional Redis master nodes to be deployed, and a reliable client to support the algorithm. Considering these situations, if distributed lock security is strictly required, components such as ZooKeeper and Etcd that strictly guarantee data consistency are more appropriate.

Zookeeper distributed lock

Zookeeper ensures consistent cluster data and provides mechanisms such as Watch and client expiration detection, making it suitable for distributed locking. Zookeeper implementation of distributed lock mode is simple, the client by creating a temporary node to lock a distributed lock, if creation is successful, the lock is successful, otherwise, explain the lock has been locked by other clients, then the current client to monitor the temporary node changes, if the temporary node to be deleted, you can try again to lock in the distributed lock. Although ZooKeeper implements distributed locking in different schemes with different details, the overall extension is based on this scheme.

Conveniently, the distributed lock is recommended using the Curator framework (the ZooKeeper client provided by Netflix). Here is an introduction to the use of Curator Curator. (1) Introducing a reference to Curator

< the dependency > < groupId > org. Apache. Curator < / groupId > < artifactId > curator - framework < / artifactId > < version > 3.3.0 < / version > </dependency> <dependency> <groupId>org.apache.curator</groupId> <artifactId>curator-recipes</artifactId> The < version > 3.3.0 < / version > < / dependency >Copy the code

Caution The Curator version corresponds to the ZooKeeper version.

(2) Use the InterProcessMutex class to achieve distributed lock.

CuratorFramework client = CuratorFrameworkFactory. NewClient (127.0.0.1: "2181", 60000, 15000, new ExponentialBackoffRetry(1000, 3)); client.start(); InterProcessMutex lock = new InterProcessMutex(client, "/user_lock"); lock.acquire(); try { // process... } finally { lock.release(); }Copy the code

Curator supports multiple distributed locks, very comprehensive:

  • InterProcessMutex: Reentrapable exclusive lock, as shown in the example.
  • InterProcessSemaphoreMutex: non-reentrant exclusive lock.
  • InterProcessReadWriteLock: distributed read-write lock.

The way to use it is very simple, and I will not expand it here.

Is it safe for Zookeeper to implement distributed locks? If Client1 is successfully locked in ZooKeeper, a temporary ZK node is successfully created. However, because Client1 does not respond to ZooKeeper’s heartbeat detection request for a long time, ZooKeeper determines that Client1 is invalid and deletes the temporary ZK node. If Client2 requests the lock, the lock will be successful. Client1 and Client2 both possess a distributed lock, that is, the distributed lock fails. This scenario is somewhat similar to the above scenario where the Redis delayed thread does not execute on time, and this scenario does not present a good solution. Although distributed lock failure is theoretically possible in ZooKeeper, the probability of occurrence should be compared, and this scenario can be reduced by increasing the time for ZooKeeper to determine the client. Therefore, ZooKeeper distributed lock can meet the majority of distributed lock scenarios.

(1) If distributed lock security is not strictly required, we can consider using Redis to realize distributed lock in Sentinel and Cluster mode. For example, when multiple clients obtain the lock at the same time and there is no serious business problem, or performance optimization is required to avoid multiple clients operating at the same time, Redisson can use the distributed lock provided by Redisson. (2) If distributed lock security is strictly required, ZooKeeper, Etcd and other components can be used to achieve distributed lock. Of course, it is recommended to use mature frameworks such as Redisson and Curator to implement distributed locks to avoid repeated coding and reduce the risk of errors.

For a systematic study of Redis, please refer to the author’s new book, Core Principles and Practices of Redis. This book through in-depth analysis of Redis 6.0 source code, summarized the design and implementation of Redis core functions. Through reading this book, readers can deeply understand the internal mechanism and the latest features of Redis, and learn Redis related data structure and algorithm, Unix programming, storage system design, distributed system architecture and a series of knowledge. With the consent of the editor of the book, I will continue to publish some chapters in the book on binecy, as a preview of the book, you are welcome to check it out, thank you.

Preview: Redis Core Principles and Practices