preface

Distributed lock is to achieve mutual exclusion between multiple processes, common solutions include: unique index based on DB, temporary ordered node of Zookeeper, Redis SETNX to achieve; Redis is widely used because of its high performance. This article uses a question and answer method to understand how Redis implements distributed locking.

1. How does Redis implement distributed locking

Use the SETNX command provided by Redis to ensure that only one write succeeds

SETNX key value

Set the value of key to value if and only if it does not exist. If the given key already exists, nothing is done.

127.0.0.1:6379> setnx lock 001
(integer) 1
127.0.0.1:6379> setnx lock 002
(integer) 0

You can also use the SET command and use the NX keyword

set <key> <value> NX

2. What if the node that acquired the lock fails

If the SETNX command is used only, when a node preempts a lock, if the current node hangs, the lock cannot be released, resulting in a deadlock. In this case, you want to set an expiration time for the key, so that the node will be automatically deleted if it fails.

127.0.0.1:6379> expire lock 5
(integer) 1

The expire time is set using the expire command.

3. If Set is executed and Expire is not executed, the node hangs up

The reason for this is that SETNX and Expire are not atomic, so it is possible that the node will be suspended after SETNX and Expire will not be executed. In this case, the lock will not be released, resulting in a deadlock.

127.0.0.1:6379> set lock 001 ex 5 nx
OK

The previous command combines the SETNX and Expire commands into one atomic operation, ensuring that they both succeed and fail.

4. How to block a node that has not acquired a lock

The node that has not acquired the lock must be in the blocked state and retry periodically to ensure that the lock can be acquired in the first time.

while(true){ set lock uuid ex 5 nx; ## preempt lock if(acquire lock){break; }... sleep(1); ## prevent CPU consumption

If you want to be more powerful, you can specify the blocking time. If the specified blocking time is exceeded, the lock is directly acquired.

5. If the lock reentrant problem is solved

Reentrant means that if a thread acquired the lock, the current thread should be able to enter the lock again. The number of reentrants is increased by one and the number of reentrants is decreased by one. This can be implemented locally using threadId or directly using ThreadLocal; While it’s better to save the relevant information directly to Redis, Redisson uses lua scripts to record the threadId information:

If (redis.call(' hincrby', KEYS[1], ARGV[2], 1) == 0) then ThreadId redis.call('pexpire', KEYS[1], ARGV[1]); ## set expiration time return nil; end; If (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then ARGV[2], 1); Call ('pexpire', KEYS[1], ARGV[1]); ## set expiration time return nil; end; " return redis.call('pttl', KEYS[1]);

6. What happens if the expiration date is up and the task just finished

Normally, we expect the expiration time to be longer than the execution time of the task, so when the task is finished, we delete it

127.0.0.1:6379> del lock
(integer) 1

Is it possible that the lock acquired by node A expires and is deleted? At this time, node B obtains the lock and runs the set ex nx command again? However, the task of node A is completed and the lock deletion command is executed to delete the lock of node B. The lock is mistakenly deleted.

In this case, we need to check whether the lock that is being deleted is the same as the lock we acquired before. We can use a unique value when we set it, such as using uUID directly. In this way, when deleting the lock, we need to obtain the corresponding value of the lock, and then compared with the value of the current node object, can be deleted;

string uuid = gen(); Set lock (uUID) ex 5 nx; ## Preempt lock...... String value = get lock; If (value == uUID) {else {return; if(value == uUID) { ## Do not delete otherwise}

7. What if the deadline is up and the task is not finished

The expiration time is an estimated time. If a task is executed for a long time, the lock will be deleted and other nodes can acquire the lock again. In this case, multiple nodes acquire the lock at the same time.

This situation is usually resolved as follows:

  • The expiration time is long enough to ensure that the task can be completed.
  • Start a daemon thread that adds time to a lock that is about to expire but has not been released.

The common toolkit, Redisson, provides an internal lock watchdog that continuously extends the lock validity period until the Redisson instance is closed. Internal use of HashedWheelTimer as timer for periodic checks;

8. What should I do if the primary Redis node breaks down and the secondary node has not been synchronized

We know that Redis primary_secondary synchronization is asynchronous. If a node acquires a lock, the lock information has not been synchronized to the secondary node. The primary node breaks down and the secondary node is upgraded to the primary node, resulting in lock loss. In this case, the Redis author proposed the Redlock algorithm, with the general meaning as follows:

In the distributed Redis environment, suppose we have N Redis hosts; These nodes are completely independent, so we don’t use replication or any other implicit coordination system;

A lock is obtained successfully if and only if the lock is obtained from most (N/2+1, here is 3 nodes) Redis nodes, and the time used is less than the lock failure time.

Redisson provides support for RedLock, which is easy to use:

RLock lock1 = redissonClient1.getLock(resourceName); RLock lock2 = redissonClient2.getLock(resourceName); RLock lock3 = redissonClient3.getLock(resourceName); RedissonRedLock redLock = new RedissonRedLock(lock1, lock2, lock3);

More: redlock

9. What happens when Redis has a cluster split brain

Cluster brain split refers to the network problems that cause the primary node, secondary node and Sentinel to be in different network zones. Because sentinel will promote the secondary node to the primary node because some primary nodes do not exist. At this time, there are different primary nodes, and different clients may connect to different primary nodes. Two clients can have the same lock at the same time;

Redis provides two configuration items to restrict request processing for the master repository, min-abolitionists to write and min-abolitionists to max-lag:

  • Min-slaves to write: Sets the minimum number of slaves for data synchronization for the master database
  • Min-abdels-max-lag: When data replication between the master and slave repositories is set, the slave repository sends data to the master repositoryACKThe maximum delay of the message in seconds

After the item combination is configured, the primary database must have at least N secondary libraries connected to it. The DELAY of ACK messages when the primary database replicates data cannot exceed N seconds. Otherwise, the primary database will not receive requests from clients.

10. How to implement a fair lock

We know that ReentrantLock through AQS to fair lock, AQS internal through two-way queue to achieve, Redis itself provides a variety of data structures including list, ordered set and so on; Redisson implements fair locking by means of Redis’ built-in data structures:

  • Using the list as the wait queue for the thread, the new wait queue is added to the end of the list;
  • An ordered collection is used to store the order of waiting threads. Score is the timeout timestamp of waiting threads.

conclusion

No matter which method is used to implement distributed locking, we need to ensure that the lock features include: mutual exclusion, reentrant, obstructive; At the same time, because of the distributed existence, we need to ensure that the system is high availability, high performance, and prevent all deadlock and lock acquisition situation.