The scheduled tasks we used before were only deployed on a single machine. In order to solve the single point problem and ensure that a task is executed by only one machine, we need to consider the problem of locking, so we spent time to study this problem. How do you implement a distributed lock?

The essence of lock is mutual exclusion, to ensure that any time a client can hold the same lock, if considering the use of Redis to achieve a distributed lock, the simplest solution is to create a key value in the instance, when releasing the lock, the key value will be deleted. But a reliable and complete distributed lock needs to consider more details, let’s look at how to write a correct distributed lock.

Standalone distributed lock SETNX

So we implement a simple lock directly based on redis setNX (SET if Not eXists) command. Go straight to the pseudo-code

Lock acquisition:

    SET resource_name my_random_value NX PX 30000Copy the code

Lock release:

    if redis.call("get",KEYS[1]) == ARGV[1] then
        return redis.call("del",KEYS[1])
    else
        return 0
    endCopy the code

A few details to note:

  • First of all, we need to set the timeout time when acquiring the lock. The timeout period is set in case the client crashes or the lock is held after a network problem occurs. The whole system is deadlocked.
  • Use the setNX command to ensure that the query and write steps are atomic
  • KEYS[1]) == ARGV[1]; KEYS[1] = ARGV[1]; The reason for the above judgment is to ensure that the lock is released by the holder of the lock. We assume that this step is not checked:

    1. Client A acquires the lock, and the backend thread hangs. The time is greater than the lock expiration time.
    2. After the lock expires, client B obtains the lock.
    3. After client A recovers, client A processes related events and sends the del command to Redis. The lock is released
    4. Client C obtains the lock. At this point, two clients in a system hold the lock simultaneously.

      The key problem is that the lock held by client B is released by client A.

  • Lock release must use Lua scripts to ensure atomicity of operations. Lock release consists of three steps: GET, judge and DEL. Distributed locks have concurrency problems if the atomicity of the three steps is not guaranteed.

Note the above details, a single Redis node distributed lock is achieved.

There is still a single point of Redis in this distributed lock. You might say that Redis is a master-slave architecture, and when a fault occurs, you can switch to slave, but the replication of Redis is asynchronous.

  • If client A gets the lock on the master.
  • The master breaks down before synchronizing data to the slave.
  • Client B then gets the lock from the slave again.

This caused multiple locks to be held at the same time due to Master downtime. If your system is available to accept a short period of time, there are multiple people holding the lock. This simple solution will solve the problem.

But if you solve this problem. Redis official provides a Redlock solution.

The second implementation is RedLock

To solve the problem of Redis single point. Redis authors propose RedLock’s solution. The scheme is very clever and simple. The core idea of RedLock is to use multiple Redis masters simultaneously for redundancy, and these nodes are completely independent and do not need to synchronize data between them.

Assuming we have N Redis nodes, N should be an odd number greater than 2. RedLock implementation steps:

  1. Get current time
  2. Obtain the Redis locks of N nodes in turn using the method mentioned above.
  3. If the number of acquired locks is greater than (N/2+1) and the acquisition time is less than the lock validity time, it is considered that a valid lock has been obtained. Automatic lock release time is the initial lock release time minus the time used to acquire the lock.
  4. If the number of locks obtained is less than (N/2+1), or not enough statements are obtained within the lock validity time, the lock acquisition is considered to have failed. A lock release message needs to be sent to all nodes.

The implementation of releasing locks is simple. Want all Redis nodes to be released, regardless of whether the lock was previously acquired.

Several details need to be noted:

  • The interval between retries to acquire locks should be a random range rather than a fixed time. This prevents multiple clients from simultaneously sending lock operations to the Redis cluster, avoiding simultaneous contention. The condition where the same number of locks are acquired simultaneously. (Although the probability is very low)
  • If a master node fails, the recovery interval should be longer than the lock validity period.

    1. Suppose there are three Redis nodes A, B and C.
    2. Client Foo has acquired locks A and B.
    3. At this time, B breaks down and all memory data is lost.
    4. B node recovers.
    5. At this time, client BAR acquires the lock again and obtains nodes B and C.
    6. At this point two more clients have acquired the lock.

      This can be avoided if the recovery time is longer than the lock duration. If the performance requirements are not high, you can even enable Redis persistence.

conclusion

After understanding the distributed implementation of Redis, I actually feel that the principle of most distributed systems is very simple, but in order to ensure the reliability of distributed systems need to pay attention to a lot of details, trivial exceptions. RedLock algorithm to achieve distributed lock is simple and efficient, the idea is quite clever. But is RedLock necessarily secure? I’ll write another article about this. Please look forward to the article address.