Author’s brief introduction

Chen Hanli, a true programmer. I worked in logistics finance group, logistics terminal business group and pressure balance group successively. I learned from Python to Java in the technical stack, but still didn’t learn to write business code well, dreaming of saving business with abstract model.

At the beginning

Distributed lock based on Redis is no stranger to everyone, but do you have distributed lock failure? Have you ever doubted that the distributed lock you are using is reliable when you fail? Here are some tips from my own experience.

Do you really need distributed locks?

The distributed lock is used to prevent multiple processes from accessing the same resource in two scenarios:

  1. Improve efficiency. For example, multiple nodes compute the same batch of tasks. If a task is already being computed by a node, other nodes do not need to repeat the calculation to avoid wasting computing resources. But it’s okay to double count and not cause other bigger losses. That is, allow for occasional failure.
  2. Ensure correctness. In this case, the lock requirement is very high, and the correctness can be affected if the calculation is repeated. There is no room for failure.

The introduction of distributed lock is bound to introduce a third-party infrastructure, such as MySQL, Redis, Zookeeper, etc. If the infrastructure for implementing distributed lock fails, it will also affect services. Therefore, before using distributed lock, you can consider whether it can be implemented without locking. However, this is beyond the scope of this article, which assumes that the need for locking is reasonable and favors the second case above. Why is it biased? Since there is no such thing as a 100% reliable distributed lock, read the following.

Start with a simple distributed lock implementation

Distributed lock Redis implementation is very common, their own implementation and the use of third-party libraries are very simple, at least it seems so, here is a most simple and reliable Redis implementation.

The simplest implementation

It’s a classic implementation, but there are only two points, right?

  1. A common solution is to give each lock a key (unique ID) that is generated when the lock is added and determined when the lock is unlocked.
  2. You cannot permanently lock a resource. A common solution is to give a lock an expiration date. There are other options, of course. We’ll talk about that later.

A copy-paste implementation is as follows:

lock

public static boolean tryLock(String key, String uniqueId, int seconds) {
    return "OK".equals(jedis.set(key, uniqueId, "NX"."EX", seconds));
}
Copy the code

Here is called the SET key value PX milliseoncds NX, don’t understand this command reference SET under the key value [EX seconds | PX milliseconds] [NX | XX] [KEEPTTL]

unlock

public static boolean releaseLock(String key, String uniqueId) {
    String luaScript = "if redis.call('get', KEYS[1]) == ARGV[1] then " +
            "return redis.call('del', KEYS[1]) else return 0 end";
    return jedis.eval(
        luaScript, 
        Collections.singletonList(key), 
        Collections.singletonList(uniqueId)
    ).equals(1L);
}
Copy the code

The essence of this implementation is in the simple Lua script that determines whether unique ids are equal or not.

By spectrum?

What’s wrong with this implementation?

  1. Single point problem. In the above implementation, only one master node is needed to solve the problem. The single point here refers to the single master node. Even if it is a cluster, if the lock is copied from the master to the slave, the same resource will be locked by multiple clients.
  2. The execution time exceeded the lock expiration time. Procedure In order not to lock all the time, it adds an expiration time for the bottom of the pocket. When the time is up, the lock will be released automatically. However, what if the task is not finished during this time? Longer task times due to GC or network latency make it difficult to guarantee that the task will complete within the lock expiration time.

How to solve these two problems? Let’s try a more complicated implementation.

Redlock algorithm

For the first single point problem, following the thinking of Redis, the next thought must be Redlock. In order to solve the problem of single machine, Redlock requires multiple master nodes (more than 2) of Redis. These master nodes are independent of each other and do not synchronize data.

Redlock is implemented as follows:

  1. Gets the current time.
  2. Obtain the locks of N nodes in turn. The method of locking each node is the same as above. There is a detail here, that is, each time the lock is acquired, the expiration time is different, subtract the previous lock acquisition operation time,
  • For example, the expiration time of an incoming lock is 500ms,
  • If it took 1ms to acquire the lock on the first node, the lock on the first node would expire in 499ms.
  • It took 2ms to acquire the lock on the second node, so the lock on the second node will expire in 497ms
  • If the lock expiration time is less than or equal to 0, the entire lock operation times out and fails
  1. Check whether the lock is successfully obtained. If the client obtains (N/2 + 1) node locks in the preceding steps, and the expiration time of each lock is greater than 0, the client succeeds in obtaining the locks. Otherwise, the client fails to obtain the locks. Release the lock on failure.
  2. Releases the lock. The lock release instruction is issued to all nodes, and the implementation logic for each node is the same as the simple implementation above. Why operate on all nodes? Because a failure to obtain a lock from a node in a distributed scenario does not mean a failure to accelerate on that node, it may actually have succeeded in locking but timed out on return due to network jitter.

The above is a common description of the redLock implementation. At first glance, it looks like a simple version of the multi-master version. If so, it is too simple.

Pits for distributed locks

Problems in high concurrency scenarios

The following problem does not mean that it is not easy to occur in low concurrency scenarios, but it is more likely to occur in high concurrency scenarios. Performance issues. The performance problem comes from two areas.

  1. When the lock was acquired. If RedLock is used in high-concurrency scenarios where there are N master nodes, the time required to request one by one will be long and the performance will be affected. That’s easy to solve. From the above description, it is easy to see that the operation of acquiring locks from multiple nodes is not a synchronous operation, but can be an asynchronous operation, so that multiple nodes can acquire locks at the same time. Even if it is parallel processing, it is necessary to estimate the lock acquisition time and ensure that the TTL of the lock > lock acquisition time + task processing time.
  2. The locked resource is too large. Locking schemes inherently sacrifice concurrency for correctness, proportional to the size of the resource. At this time, you can consider splitting resources. There are two ways to split resources:
  3. Divide the locked resources into multiple segments and lock each segment separately. For example, if I want to perform several operations on a merchant and lock the merchant before operation, I can split several operations into multiple independent steps and lock them separately to improve concurrency.
  4. With the idea of buckets, a resource split into multiple buckets, a lock failure immediately try the next one. Batch task processing scenario, for example, to deal with the task of 200 w a merchant, in order to improve the processing speed, with multiple threads, each thread take 100 merchants processing, have to give the 100 merchants lock, without treatment, it is difficult to guarantee the same time two threads lock merchants are not overlap, then can press a dimension, such as a label, Divide buckets for merchants, and then deal with one bucket for each task, and then deal with the next bucket after processing this bucket, to reduce competition.

Retry problem. Both simple and Redlock implementations have retry logic. If the above algorithm is directly implemented, there will be multiple clients acquiring the same lock almost at the same time, and then each client locks part of the nodes, but no client obtains most of the nodes. A common solution is to stagger multiple nodes during retry by adding a random time to the retry time. It’s not going to cure the problem, but it’s going to alleviate the problem.

The node is down

For scenarios where a single master node does not persist and crashes, this must be implemented to support repeated operations and do idempotent.

For multi-master scenarios, such as Redlock, let’s look at a scenario like this:

  1. Suppose there are five redis nodes: A, B, C, D, and E, with no persistence.
  2. If client1 successfully obtains the lock from nodes A, B, and C, client1 successfully obtains the lock.
  3. Node C fails.
  4. If client2 succeeds in obtaining the lock from C, D, and E, and client2 succeeds in obtaining the lock at the same time, then client1 and Client2 acquire the lock at the same time, and the redlock is corrupted.

How do you solve it? The most obvious solution is to turn on persistence. Persistence can persist every redis command, but it has a great impact on performance and is generally not adopted. If this method is not adopted, a small part of the data will be lost when the node is hung, perhaps our lock is among them. Another option is to delay startup. After a node is suspended and repaired, it does not join the node immediately. Instead, it waits for a period of time to join the node again. The waiting time must be greater than the maximum TTL of all locks at the moment when the node is down. However, this solution still cannot solve the problem. If B and C are both suspended in step 3 above, then there are only three nodes A, D and E, and the lock from D and E can be successfully obtained, which will still cause problems. Increase the total number of master nodes to alleviate the problem. Adding master nodes improves stability, but also increases cost, a trade-off between the two.

The execution time of the task exceeded the TTL of the lock. Procedure

In the past, due to network delay, the execution time of the task was much longer than expected, the lock expired, and the task was executed by multiple threads. This problem is common to all distributed locks, including distributed locks based on ZooKeeper and DB implementations. It is a contradiction between the lock expiration and the client’s ignorance of the lock expiration. When locking, we usually give a TTL for the lock. This is to prevent client downtime and lock cannot be released after locking. However, all uses of this pose face the same problem: there is no guarantee that the client execution time is less than the TTL of the lock. While most programmers would be optimistic that this would never happen, I used to be, too, until reality slapped me in the face again and again.

Martin Kleppmann also questioned this, using his diagram directly:

  1. Client1 has obtained the lock
  2. Client1 starts the task and then GC of the STW occurs beyond the lock expiration time
  3. Client2 acquires the lock and begins the task
  4. Client1’s GC finishes and the task continues, at which point both Client1 and Client2 think they have acquired the lock and both process the task, resulting in an error.

Martin Kleppmann used GC as an example, and WHAT I encountered was network latency. Either way, there’s no denying that it’s inevitable and easy to get caught up in when it happens.

How to solve it? One solution is not to set TTL, but after the lock is successfully acquired, add a watchdog to the lock, the watchdog will start a timed task, if the lock has not been released and will expire. This is a bit abstract, let’s combine the redisson source code to say:

 public class RedissonLock extends RedissonExpirable implements RLock {...@Override
    public void lock(a) {
        try {
            lockInterruptibly();
        } catch(InterruptedException e) { Thread.currentThread().interrupt(); }}@Override
    public void lock(long leaseTime, TimeUnit unit) {
        try {
            lockInterruptibly(leaseTime, unit);
        } catch(InterruptedException e) { Thread.currentThread().interrupt(); }}... }Copy the code

Redisson’s commonly used locking API is the above two. One is that the TTL is not passed in. In this case, Redisson maintains it by himself and will actively renew it. The other option is to pass in the TTL itself, which Redisson won’t automatically renew, or pass the leaseTime value to -1, but this is not recommended, since there is already an API, so why use this weird notation? The locking logic for a method that does not take arguments is as follows:

public class RedissonLock extends RedissonExpirable implements RLock {...public static final long LOCK_EXPIRATION_INTERVAL_SECONDS = 30;
    protected long internalLockLeaseTime = TimeUnit.SECONDS.toMillis(LOCK_EXPIRATION_INTERVAL_SECONDS);

        
    @Override
    public void lock(a) {
        try {
            lockInterruptibly();
        } catch(InterruptedException e) { Thread.currentThread().interrupt(); }}@Override
    public void lockInterruptibly(a) throws InterruptedException {
        lockInterruptibly(-1.null);
    }
    
    @Override
    public void lockInterruptibly(long leaseTime, TimeUnit unit) throws InterruptedException {
        long threadId = Thread.currentThread().getId();
        Long ttl = tryAcquire(leaseTime, unit, threadId);
        // lock acquired
        if (ttl == null) {
            return;
        }

        RFuture<RedissonLockEntry> future = subscribe(threadId);
        commandExecutor.syncSubscription(future);

        try {
            while (true) {
                ttl = tryAcquire(leaseTime, unit, threadId);
                // lock acquired
                if (ttl == null) {
                    break;
                }

                // waiting for message
                if (ttl >= 0) {
                    getEntry(threadId).getLatch().tryAcquire(ttl, TimeUnit.MILLISECONDS);
                } else{ getEntry(threadId).getLatch().acquire(); }}}finally {
            unsubscribe(future, threadId);
        }
// get(lockAsync(leaseTime, unit));
    }

    private Long tryAcquire(long leaseTime, TimeUnit unit, long threadId) {
        return get(tryAcquireAsync(leaseTime, unit, threadId));
    }

    private <T> RFuture<Long> tryAcquireAsync(long leaseTime, TimeUnit unit, final long threadId) {
        if(leaseTime ! = -1) {
            return tryLockInnerAsync(leaseTime, unit, threadId, RedisCommands.EVAL_LONG);
        }
        RFuture<Long> ttlRemainingFuture = tryLockInnerAsync(LOCK_EXPIRATION_INTERVAL_SECONDS, TimeUnit.SECONDS, threadId, RedisCommands.EVAL_LONG);
        ttlRemainingFuture.addListener(new FutureListener<Long>() {
            @Override
            public void operationComplete(Future<Long> future) throws Exception {
                if(! future.isSuccess()) {return;
                }

                Long ttlRemaining = future.getNow();
                // lock acquired
                if (ttlRemaining == null) { scheduleExpirationRenewal(threadId); }}});return ttlRemainingFuture;
    }

    private void scheduleExpirationRenewal(final long threadId) {
        if (expirationRenewalMap.containsKey(getEntryName())) {
            return;
        }

        Timeout task = commandExecutor.getConnectionManager().newTimeout(new TimerTask() {
            @Override
            public void run(Timeout timeout) throws Exception {
                
                RFuture<Boolean> future = commandExecutor.evalWriteAsync(getName(), LongCodec.INSTANCE, RedisCommands.EVAL_BOOLEAN,
                        "if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then " +
                            "redis.call('pexpire', KEYS[1], ARGV[1]); " +
                            "return 1; " +
                        "end; " +
                        "return 0;",
                          Collections.<Object>singletonList(getName()), internalLockLeaseTime, getLockName(threadId));
                
                future.addListener(new FutureListener<Boolean>() {
                    @Override
                    public void operationComplete(Future<Boolean> future) throws Exception {
                        expirationRenewalMap.remove(getEntryName());
                        if(! future.isSuccess()) { log.error("Can't update lock " + getName() + " expiration", future.cause());
                            return;
                        }
                        
                        if (future.getNow()) {
                            // reschedule itselfscheduleExpirationRenewal(threadId); }}}); } }, internalLockLeaseTime /3, TimeUnit.MILLISECONDS);

        if(expirationRenewalMap.putIfAbsent(getEntryName(), task) ! =null) { task.cancel(); }}... }Copy the code

As you can see, the last lock logic will enter to org. Redisson. RedissonLock# tryAcquireAsync, after acquiring a lock is successful, will enter the scheduleExpirationRenewal, it initializes a timer, Dely’s time is internalLockLeaseTime / 3. In redisson, internalLockLeaseTime is 30s, that is, it is renewed every 10s, 30s each time. If it is a distributed lock based on ZooKeeper, zooKeeper can be used to check whether the node is alive, so as to achieve renewal. Zookeeper distributed lock has not been used before, and will not be detailed.

However, this approach cannot ensure that only one client can obtain the lock at the same time. If the renewal fails, such as the GC of STW described by Martin Kleppmann, or the client and Redis cluster are disconnected, as long as the renewal fails, multiple clients will obtain the lock at the same time. In my scenario, I have reduced the granularity of the lock, and Redisson’s renewal mechanism is sufficient. To be more stringent, add a logic that terminates the task when it fails to renew. This has been done in Python code before, but Java has not encountered such a strict situation.

Here I also mention Martin Kleppmann’s solution. I think this solution is not reliable, for reasons that will be mentioned later. His solution is to make the locked resource maintain a set of conditions to ensure that multiple clients will not access the same resource at the same time due to lock failure.

  1. Transactions cannot be guaranteed. In the schematic diagram, only 34 accesses the storage, but in real scenarios, multiple accesses to the storage may occur within a task, and must be atomic. If Client1 visits the storage once before GC with a 33token, then GC occurs. Client2 obtains the lock and accesses the storage with the token of 34. Can the data written by the two clients still be correct? If not, the scheme is flawed unless the storage itself has other mechanisms to guarantee it, such as a transaction mechanism. If so, then the token here is redundant and the fencing scheme is superfluous.
  2. High concurrency scenarios are not practical. Since only the largest token can be written at a time, the access to the storage is linear. In high-concurrency scenarios, this mode greatly limits throughput, and distributed locks are mostly used in such scenarios, which is a very contradictory design.
  3. This is the problem with all distributed locks. This solution is a generic solution that can be used with Redlock or any other lock. So I understand that this is just a solution that has nothing to do with Redlock.

System clock drift

This problem is only considered, but not encountered in the actual project, because it is possible to appear in theory, also mentioned here. The redis expiration time is dependent on the system clock. If the clock drift is too large, the expiration time will be affected.

Why does the system clock drift? For a brief introduction to system time, Linux provides two system times: Clock realtime and clock monotonic. Clock realtime = xtime/wall time = xtime/wall time = xtime/wall time = xtime/wall time = xtime Clock monotonic, literally translated as a monotonic time, will not be changed by users, but will be changed by NTP.

Ideally, all systems’ clocks would be constantly synchronized with the NTP server, but this is obviously not possible. There are two reasons for system clock drift:

  1. The system clock is not synchronized with the NTP server. At present, there is no particularly good solution, we can only trust the operation and maintenance students.
  2. Clock RealTime was manually modified. Do not use clock realtime when implementing distributed locks. But unfortunately, redis uses this time, I have a look at redis 5.0 source code, using clock realTime. Antirez talked about clock monotonic, but the big guy hasn’t changed it yet. In other words, artificially changing the time of the Redis server can cause redis problems.

conclusion

This article starts from a simple distributed lock based on RedIS to a more complex implementation of Redlock, and introduces some of the pits and solutions that have just been stepped in the process of using distributed lock.

reference

Distributed locks with Redis

clock_gettime(2) – Linux man page







Not enough to read blogs?

You are welcome to scan the QR code and join the communication group by adding group assistant to discuss the technical problems related to the blog, and you can also have more interaction with the blogger

For blog reposting, offline activities and cooperation, please email [email protected] for communication