Is RedLock the silver bullet of Redis distributed lock?

An overview,

In the case of this technology constantly updated iteration, the concept of distribution in the enterprise weight is getting higher and higher! When it comes to distributed, it is inevitable that distributed lock will be mentioned. At this stage, there are three main ways to achieve distributed lock,Zookeeper, DB and Redis. This article takes Redis as an example!

From our perspective, these three attributes are the minimum guarantee required to use distributed locks effectively.

  1. Security features: Mutually exclusive. Only one client can hold the lock at any given time.
  2. Vitality property: No deadlocks. Finally, the lock is always available even if the client that locked the resource crashes or partitions.
  3. Activity: fault tolerance. Clients can acquire and release locks as long as most Redis nodes are running.

Second, redis multi-node implementation of distributed locking challenges

The simplest way we can use Redis to lock resources is:

  1. Create locks in the instance.
  2. Locks usually exist for a limited time using the Redis expiration feature, so they are eventually released and eventually deleted after a given period.
  3. When the client needs to release the resource, it removes the lock.

At first glance, there seems to be no problem. But let’s dig a little deeper, this implementation doesn’t seem to be a problem in the Redis standalone environment! But what if the node goes down? Ok, so let’s add a slave node! If the primary server goes down, use this node! But let’s see if she can really guarantee availability.

While talking about this fatal flaw, we need to understand that Redis replication is asynchronous.

  1. Client A obtains the lock on the primary server.
  2. The host crashed before transferring the lock copy to the slave.
  3. slavePromoted tomaster.
  4. Client B obtains the lock, because the slave machine does not have the lock object, obtain success!

Obviously, this is not correct. The master node is down because it has no time to synchronize the data, so the slave node does not have the data, resulting in the failure of the distributed lock. So how does antirez solve this problem?

Third, Redlock algorithm

In the author’s view, we should use multiple Redis. These nodes are completely independent, and there is no need to use replication or any system to coordinate data. The process of multiple Redis acquiring locks becomes the following steps:

  1. Gets the current server time in milliseconds
  2. Trying to use the same key and random value to get the lock, for each machine should have a timeout when acquiring a lock, such as the expiration date of the lock is 10 s acquiring a single node lock timeout should be about 5 to 50 milliseconds, the aim is to ensure that his client connection to a machine fault, take extra time! If the data is not obtained within the time limit, the node will be abandoned and the next node will be obtained until all nodes have been obtained.
  3. After obtaining the lock, obtain the current time minus the time obtained in Step 1. If and only if more than half of the clients obtain the lock successfully and the time to obtain the lock is less than the timeout period, the lock is proved to be effective!
  4. After the lock is acquired, the lock timeout is equal toValid time set - the time it takes to acquire the lock
  5. If the number of locked machines is not more than half, or the lock timeout period is negative after calculation, the system will try to unlock all instances, even if some instances failed to obtain the lock, it will still try to unlock!
  6. To release the lock, simply release the lock in all instances, whether or not the client thinks it can successfully lock a given instance.

But does Redlock really solve the problem?

Martin Kleppmann post task, Redlock does not guarantee lock security!

He thinks locks can be used in two ways

  1. To improve efficiency, use locks to ensure that a task doesn’t have to be executed twice. For example (very expensive calculation)
  2. Ensure that the system is correct. Locks are used to ensure that tasks are executed properly, preventing file conflicts and data loss caused by two nodes operating the same data at the same time.

For the first reason, we have a certain tolerance for locking, even if two nodes work at the same time, the impact on the system is only some extra cost of calculation, there is no additional impact. At this time, using a single point of Redis can solve the problem well, there is no need to use RedLock to maintain so many Instances of Redis, which increases the maintenance cost of the system.

1. The disadvantages of timeout of distributed lock

However, for the second scenario, it is more prudent because there is likely to be some money involved, and if the lock fails and both nodes process the same data, the result will be file corruption, data loss, permanent inconsistencies, or monetary loss!

Let’s assume a scenario where we have two clients and each client must acquire the lock to save the data to the database. What’s the problem if we use the RedLock algorithm? In RedLock, locks have an expiration date to prevent deadlocks, but Martin thinks that’s not safe! The flowchart looks something like this!

After client 1 successfully obtains the lock, it starts to execute. In the middle of the execution, Full GC occurs in the system, and system services are suspended. After a period of time, the lock times out.

Client 2 waits for client 1’s lock to time out, and then successfully obtains the lock. After that, client 1 completes Full GC, and then does the database again! It’s not safe! How to solve it?

Martin proposed an implementation mechanism similar to optimistic locking, as shown in the following figure:

After client 1 was suspended for a long time, client 2 acquired the lock and began to write to the library, carrying token 34. When the write to the library was completed, client 1 woke up and started to enter the library, but because the token 33 was smaller than the latest token, the commit was rejected!

This idea sounds like a complete idea, so that even if the system is suspended for some reason, the data can be processed correctly. But think about it:

  • If the data store can always accept writes only if your token is larger than all the past tokens, then it is a linearized store, comparable to implementing a distributed locking system using a database, then RedLock will be of little use! You don’t even need to use Redis to guarantee distributed locks!

2.RedLock is strongly dependent on the system clock

If you recall the steps of the Redlock algorithm to acquire the lock, you will see that the lock validity is strongly dependent on the current system clock. Let’s assume:

We have five Redis nodes: A, B, C, D, E:

  1. Client 1 obtains the lock of nodes A, B, and C. D and E cannot be accessed due to network problems.
  2. The clock on node C jumps forward, causing the lock to expire.
  3. Client 2 obtains the lock of nodes C, D, and E. A and B cannot be accessed due to network problems.
  4. Now, both clients 1 and 2 think they own the lock.

A similar problem can occur if C crashes and restarts immediately before persisting the lock to disk.

Martin believes that the step in system time comes from two main aspects (and the solution presented by the author) :

  1. Manual modification.
    • What can you say about human modification? There’s no way to avoid destruction.
  2. A skip time clock update was received from the NTP service.
    • NTP receives a step clock update, which requires o&M to ensure. When you need to update the step time to the server, you should take a spritz approach. Multiple changes, each update time as small as possible.

3. Based on the program language to make up for the shortcomings of distributed lock timeout

Abstract: we review 1 point of view, get the root cause of the defect, in order to solve the lock failure caused by the system outage for lock imposes a failure time, abnormal cases, program (business) of execution time is greater than the lock failure resulting in a series of questions, and to consider whether we can from this aspect, Thus using the program to solve this kind of a dead end?

Since the expiration time of the lock is less than the business time, we try to ensure that the business program execution time is absolutely less than the lock timeout.

In The Java language, Redisson implements a mechanism to ensure that the lock failure time is absolutely longer than the execution time of the business program. The main principle is that after the application successfully obtains the lock, it will fork a thread to renew the lock until the lock is released. His schematic might look something like this:

Redisson uses daemon threads to renew locks. (Daemon threads: When the main thread is destroyed, it is destroyed with it.) Prevent program downtime, thread continues to continue life, resulting in deadlock!

In addition, Redisson also implements and optimizes RedLock algorithm, fair lock, reentrant lock, interlocking and other operations, making the implementation of Redis distributed lock easier and more efficient!


If the understanding of the article is wrong, welcome big men private chat correction! Welcome to pay attention to the author’s public number, progress together, learn together!