Deep analysis of Redis distributed lock principle

I. Implementation principle

1.1 Basic Principles

The JDK’s native locks allow threads to mutually exclusive access to a shared resource, but they don’t do that if you want to mutually exclusive access to a shared resource between processes. At this point, Redis can be used to implement distributed locks.

Redis implements distributed lock core commands as follows:

SETNX key value
Copy the code

The SETNX command creates and sets a value for the specified key if it does not exist, and returns the status code 1. If the specified key exists, 0 is returned. If the return value is 1, the lock is obtained. When another process attempts to create the lock again, 0 will be returned because the key already exists, indicating that the lock has been occupied.

When the process that obtains the lock finishes processing services, run the del command to delete the key. Then other processes can create the key competitively and obtain the lock again.

In order to avoid deadlocks, we usually set a timeout for the lock. This is done in Redis with the expire command:

EXPIRE key seconds
Copy the code

Here we combine the two and use the Jedis client to implement it as follows:

Long result = jedis.setnx("lockKey"."lockValue");
if (result == 1) {
    // If the program is terminated abnormally (for example, kill -9), the timeout cannot be set and the lock will be deadlocked
    jedis.expire("lockKey".3);
}
Copy the code

The code above has an atomic problem, that is, the setnx + EXPIRE operation is non-atomic, and if the program terminates abnormally before the timeout is set, the program will be deadlocked. The SETNX and EXPIRE commands can be written in the same Lua script and executed by calling Jedis’s eval() method. Redis ensures the atomicity of the entire Lua script operation. This approach is cumbersome, so the official documentation recommends a more elegant implementation:

1.2 Official Recommendation

Distributed locks with Redis recommends using the set command:

SET key value [EX seconds|PX milliseconds] [NX|XX] [KEEPTTL]
Copy the code

Here we focus on the following four parameters:

EX: Sets the timeout period, in seconds.
PX: Set the timeout in milliseconds.
NX: This parameter is set only when the corresponding Key does not exist.
XX: This parameter is set only when the corresponding Key exists.

These four parameters are supported as of Redis 2.6.12. Since most current Redis are older than Redis 2.6.12, it is recommended to use this command to implement distributed locking. The corresponding Jedis code is as follows:

jedis.set("lockKey"."lockValue", SetParams.setParams().nx().ex(3));
Copy the code

At this point, a single command can complete the value and timeout Settings, and since there is only one command, the atomicity is guaranteed. But because timeouts are introduced to avoid deadlocks, two other problems arise:

Problem 1: When the service processing time exceeds the expiration time (process A in the figure), the lock has been released, and other processes can acquire the lock (process B in the figure), which means that two processes (A and B) enter the critical area at the same time, and the distributed lock becomes invalid.
Fault 2: As shown in the figure above, after the service processing of process A is complete, the lock of process B is deleted. As A result, the distributed lock fails again, and process B and process C enter the critical area at the same time.

For question 2, we can specify a unique identifier as the Value of the Key when creating the lock. Here we assume that we use UUID + thread ID as the unique identifier:

String identifier = UUID.randomUUID() + ":" + Thread.currentThread().getId();
jedis.set("LockKey", identifier, SetParams.setParams().nx().ex(3));
Copy the code

Then, before deleting the lock, compare the unique identifier with the Value of the lock. If the Value is different, the lock does not belong to the current operation object. In this case, the lock is not deleted. To ensure atomicity of the judgment and delete operations as a whole, Lua scripts are used here:

if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end
Copy the code

This script means that if the value of value is the same as the given value, the delete command is executed, otherwise the status code 0 is returned. The corresponding code using Jedis is as follows:

String script = "if redis.call('get', KEYS[1]) == ARGV[1] then return redis.call('del', KEYS[1]) else return 0 end";
jedis.eval(script, 
           Collections.singletonList("LockKey"),  // set of keys
           Collections.singletonList(identifier)  // Set of args
          );
Copy the code

Moving on to problem 1, the easiest way to solve problem 1 is to estimate the maximum processing time for your business and make sure that the expiration time is greater than the maximum processing time. However, due to various complex situations, services may not be able to be processed within the specified expiration time each time. In this case, you can use the lock duration extension policy.

1.3 Extension of lock duration

The solution to prolong the lock duration is as follows: Assume that the lock timeout period is 30 seconds, and the program needs to scan the lock periodically to see if it still exists. The scanning time should be less than the timeout period, which can usually be set to 1/3 of the timeout period, i.e. once every 10 seconds in this case. If the lock still exists, reset its timeout to 30 seconds. In this scheme, the lock will remain in effect as long as the transaction is not completed. As soon as the transaction is completed, the application deletes the lock.

Distributed locks provided by Redis’s Java client Redisson support a similar lock duration extension strategy called WatchDog, which literally translates to “WatchDog” mechanism.

The above discussion is about the Redis distributed lock in the single-machine environment, and to ensure that the Redis distributed lock is highly available, first of all, Redis must be highly available, there are two high availability modes of Redis: sentinel mode and cluster mode. The following are discussed respectively:

Sentinel mode and distributed lock

Sentinel mode is an upgraded version of the master-slave mode, which can automatically failover when a failure occurs and elect a new master node. However, due to the asynchronous replication mechanism of Redis, distributed locks implemented in sentinel mode are unreliable for the following reasons:

The replication between the master and slave nodes is asynchronous. After a lock is created on the master node, the lock may not be created on the slave node. If the primary node goes down, the secondary node will not create the distributed lock.
After being promoted from a node to a master node, other processes (or threads) can still create distributed locks on the new master node. At this point, multiple processes (or threads) enter the critical section at the same time, and the distributed lock becomes invalid.

Therefore, in sentry mode, lock failure cannot be avoided. Therefore, if you want to achieve highly available distributed locks, you can adopt another high availability solution of Redis – Redis cluster pattern.

Cluster mode and distributed lock

3.1 RedLock scheme

To implement distributed lock in cluster mode, Redis provides a solution called RedLock. Suppose we have N instances of Redis, the client execution process is as follows:

Record the current time in milliseconds as the start time.
Then try to create locks on each instance in turn in the same manner as the standalone version. In order to avoid blocking caused by the client communicating with a faulty Redis node for a long time, quick polling is adopted here: Assuming that the timeout period set when the lock is created is 10 seconds, each Redis instance may be accessed for a timeout period between 5 and 50 milliseconds. If no communication is established within this time, the next instance is attempted.
If locks are successfully created on at least N/2+1 instances. andCurrent time - Start time < lock timeout, the lock is considered to have been obtained, and the valid time of the lock is equal toTimeout - Time spent(If you consider the clock drift of different Redis instance servers, you need to subtract the clock drift);
If the number of distributed locks is less than N/2+1, the distributed locks fail to be created. In this case, you need to delete the locks created on these instances for other clients to create distributed locks.
If the client fails, it can wait a random period of time and try again.

The above is the implementation of RedLock, you can see that it is mainly implemented by the client, and does not really involve Redis cluster-related functions. So the N instances of Redis here are not required to be a true Redis cluster, they can be completely independent from each other, but they are still fault tolerant and highly available because only half of the nodes need to acquire locks to actually acquire them. This will be verified again later when we demonstrate RedLock using Redisson.

3.2 Low latency communication

In addition, when the client implementing RedLock scheme communicates with all Redis instances, it must ensure low latency, and it is better to use multiplexing technology to ensure that the SET command is sent to all Redis nodes at one time, and obtain the corresponding execution results. If the network latency is high, assume that clients A and B are both trying to create locks:

SET key random number EX 3 NX #A Client SET key random number EX 3 NX #B clientCopy the code

If client A creates A lock on one half of the nodes and client B creates A lock on the other half, neither client will be able to obtain the lock. If the concurrency is high, there may be multiple clients that have created locks on some of the nodes, and no client has more than N/2+1. This is why, in the last step of the process above, it is emphasized that once a client fails, it needs to wait for a random amount of time to retry. If it is a fixed amount of time, all failed clients will retry at the same time and the situation will be the same.

Therefore, the best implementation is that the client’s SET command can reach all nodes almost at the same time and receive all execution results almost at the same time. To ensure this, low-latency network communication is critical, and Redisson uses the Netty framework to make this possible.

3.3 Persistence and High Availability

To ensure high availability, all Redis nodes also need persistence enabled. Assume that persistence is not enabled, and process A is processing business logic after acquiring the lock. At this time, the node breaks down and restarts. Because the lock data is lost, other processes can create the lock again.

The default AOF synchronization mechanism is Everysec, that is, one process per second. In this case, performance and data security can be taken into account. In the event of an unexpected outage, at most one second of data will be lost. But if process A happens to create the lock during that second, the data will be lost due to an outage. At this point, other processes can also create the lock, and the lock’s mutual exclusion is no longer valid. There are two ways to approach this problem:

Methods a: Modify redis. confappendfsyncThe value ofalways, that is, persist after each command. At this time, Redis performance will be reduced, and then the performance of distributed lock will be reduced, but the mutual exclusion of lock is absolutely guaranteed;
Method 2: If a node is down, wait until the lock timeout period expires before restarting the node. In this case, the original lock automatically becomes invalid (but you must ensure that services can be completed within the preset timeout period). This method is also called delayed restart.

Four, Redisson

Redisson is a Java client of Redis. It provides a variety of Redis distributed lock implementations, such as reentrant lock, fair lock, RedLock, read and write lock, and so on, and is more comprehensive in implementation, suitable for use in production environments.

4.1 Distributed Lock

Using Redisson to create a standalone distributed lock is very simple, as shown in the following example:

// 1. Create a RedissonClient. If integrated with Spring, RedissonClient can be declared as a Bean, which can be injected when used
Config config = new Config();
config.useSingleServer().setAddress("Redis: / / 192.168.0.100:6379");
RedissonClient redissonClient = Redisson.create(config);

// 2. Create a lock instance
RLock lock = redissonClient.getLock("myLock");
try {
    //3. Try to obtain the distributed lock. The first parameter is the wait time and the second parameter is the lock expiration time
    boolean isLock = lock.tryLock(10.30, TimeUnit.SECONDS);
    if (isLock) {
        // 4. Simulate business processing
        System.out.println("Processing business logic");
        Thread.sleep(20 * 1000); }}catch (Exception e) {
    e.printStackTrace();
} finally {
    / / 5. Releases the lock
    lock.unlock();
}
redissonClient.shutdown();
Copy the code

The corresponding data structure in Redis is as follows:

Key is the lock name set in the code, and value is hash, where key 9280e909-c86b-43EC-b11D-6e5a7745e2e9:13 format is UUID + thread ID. The key corresponds to a value of 1, representing the number of locks. The main reason for using hash is that the locks Redisson creates are reentrant, meaning you can lock multiple times:

boolean isLock1 = lock.tryLock(0.30, TimeUnit.SECONDS);
boolean isLock2 = lock.tryLock(0.30, TimeUnit.SECONDS);
Copy the code

In this case, the corresponding value will be 2, indicating that the lock was added twice:

Of course, like other reentrant locks, it needs to be unlocked as many times as it was locked.

lock.unlock();
lock.unlock();
Copy the code

4.2 RedLock

Redisson also implements RedLock, which is officially recommended by Redis. Here we start three instances of Redis, which can be completely independent of each other and do not require clustering:

$./redis-server .. /redis.conf
$./redis-server .. /redis.conf --port 6380
$./redis-server .. /redis.conf --port 6381
Copy the code

The corresponding code example is as follows:

// 1. Create RedissonClient
Config config01 = new Config();
config01.useSingleServer().setAddress("Redis: / / 192.168.0.100:6379");
RedissonClient redissonClient01 = Redisson.create(config01);
Config config02 = new Config();
config02.useSingleServer().setAddress("Redis: / / 192.168.0.100:6380");
RedissonClient redissonClient02 = Redisson.create(config02);
Config config03 = new Config();
config03.useSingleServer().setAddress("Redis: / / 192.168.0.100:6381");
RedissonClient redissonClient03 = Redisson.create(config03);

// 2. Create a lock instance
String lockName = "myLock";
RLock lock01 = redissonClient01.getLock(lockName);
RLock lock02 = redissonClient02.getLock(lockName);
RLock lock03 = redissonClient03.getLock(lockName);

// 3. Create RedissonRedLock
RedissonRedLock redLock = new RedissonRedLock(lock01, lock02, lock03);

try {
    boolean isLock = redLock.tryLock(10.300, TimeUnit.SECONDS);
    if (isLock) {
        // 4. Simulate business processing
        System.out.println("Processing business logic");
        Thread.sleep(200 * 1000); }}catch (Exception e) {
    e.printStackTrace();
} finally {
    / / 5. Releases the lock
    redLock.unlock();
}

redissonClient01.shutdown();
redissonClient02.shutdown();
redissonClient03.shutdown();
Copy the code

Each Redis instance is locked as follows:

You can see that locks are acquired on each instance.

4.3 Extend lock duration

Finally, take a look at Redisson’s WatchDog mechanism, which can be used to prolong the lock duration, as shown in the following example:

Config config = new Config();
// 1. Set WatchdogTimeout
config.setLockWatchdogTimeout(30 * 1000);
config.useSingleServer().setAddress("Redis: / / 192.168.0.100:6379");
RedissonClient redissonClient = Redisson.create(config);

// 2. Create a lock instance
RLock lock = redissonClient.getLock("myLock");
try {
    //3. Try to obtain the distributed lock. The first parameter is wait time
    boolean isLock = lock.tryLock(0, TimeUnit.SECONDS);
    if (isLock) {
        // 4. Simulate business processing
        System.out.println("Processing business logic");
        Thread.sleep(60 * 1000);
        System.out.println("Remaining lifetime of lock:"+ lock.remainTimeToLive()); }}catch (Exception e) {
    e.printStackTrace();
} finally {
    / / 5. Releases the lock
    lock.unlock();
}
redissonClient.shutdown();
Copy the code

Redisson’s WatchDog mechanism only works on locks that do not have a lock timeout set, so we call the tryLock() method with two arguments:

boolean tryLock(long time, TimeUnit unit) throws InterruptedException;
Copy the code

Instead of the tryLock() method with three arguments to the timeout: tryLock()

boolean tryLock(long waitTime, long leaseTime, TimeUnit unit) throws InterruptedException;
Copy the code

Secondly we through the config. SetLockWatchdogTimeout (30 * 1000) will be lockWatchdogTimeout value is set to 30000 milliseconds (the default value is 30000 milliseconds). Redisson’s WatchDog checks all unset locks at 1/3 of the lockWatchdogTimeout interval (in this case, 10 seconds). If the lock has not been deleted, Redisson’s WatchDog checks all unset locks at 1/3 of the lockWatchdogTimeout interval. Redisson resets the lock timeout to the value specified by lockWatchdogTimeout (30 seconds in this case) until the lock is actively removed by the program. As you can see in the above example, no matter how long you set the sleep time of the simulated business, the lock will have a certain amount of time remaining until the business is processed.

On the other hand, if the lock timeout period leaseTime is specified, leaseTime is used, because the WatchDog mechanism does not take effect on locks that have a specified timeout period.

The resources

Distributed locks with Redis
Redisson Distributed locks and synchronizers

For more articles, please visit the full stack Engineer manual at GitHub.Github.com/heibaiying/…