“This is the first day of my participation in the First Challenge 2022. For details: First Challenge 2022.”

preface

RedLock algorithm is an implementation of distributed lock based on Redis. After RedLock proposed, there is a distributed field of research leader Martin criticized RedLock on Github, this article on the implementation of distributed lock and why RedLock was spritzed to take you to explore what distributed lock is.

Before introducing RedLock, let’s take a look at the comparison between traditional stand-alone locks and distributed locks, as well as common distributed lock implementations.

Single lock vs. distributed lock

As our business data flows, the architecture of the system will be upgraded from a single centralized system to a distributed architecture. In the case of high concurrency on a stand-alone system, a Java built-in lock such as Synchronize or ReentrantLock can be used to fulfill business requirements.

This type of lock is a single-machine lock and is perfectly adequate for a single-machine architecture. But not in a distributed architecture. When user requests are placed on each service via a load balancer, a stand-alone lock can only limit requests to the current machine, not the entire distributed cluster.

In a distributed environment, distributed locks are needed if we want to control resources concurrently and strictly.

Common distributed lock implementations

Common distributed lock implementation is based on Redis, Mysql, Zookeeper. The bottom line is that these middleware provide the ability to share resources. When I was young, WHEN I saw a fight between my classmates, I would go to the teacher to coordinate it.

Mysql

Uniqueness constraint

Mysql > insert unique index (s); insert unique index (s);

The uniqueness of the primary key can be used for verification. In addition to the primary key, a unique index can also be used for distributed lock implementation, the principle is the same.

  • If we can insert this data, then I have successfully obtained the lock
  • When we have finished executing the business logic, we can delete this data, representing the lock release.

If this data has already been inserted, no one else will be able to insert it again, and an error will be reported indicating that the lock has been acquired.

Note: the data inserted here must be the same for everyone. For example, insert a commodity ID: 10086. Everyone tries to insert 10086.

Zookeeper

Based on the use of Zookeeper to achieve distributed lock I believe we are very familiar with, he is based on the temporary sequence node in Zookeeper to achieve.

Temporary sequential nodes + ZAB protocol

Principle: ZNodes in Zookeeper have persistent nodes and temporary nodes. The life cycle of the temporary node is bound to the Session of the client. When the client loses the connection, all temporary nodes created are deleted.

Obtaining a lock: All clients go to Zookeeper to create the same node. The one who successfully creates the node obtains the lock.

Lock release: the client deletes the node itself or forcibly releases the lock after the client and server time out.

Zookeeper is a CP middleware that uses the Zookeeper Atomic Broadcast (ZAB) distributed consistency protocol. So it can control the uniqueness of node creation in the case of concurrency.

Zookeeper is a very reliable distributed lock implementation solution, but the disadvantage is that the introduction of third-party middleware will make the system architecture heavy. Determine whether to log in to Zookeeper based on the amount of service data.

Redis

Redis based implementation, is also a very common solution. Because a system may not have Zookeeper, it may not have messaging middleware, but Redis cache certainly does (or other caching middleware Memcache, etc.).

Uniqueness of Key

One implementation is based on the uniqueness of the Key. SetNx, that instruction.

Example: setNx sets if not Existed

In general, we will carry the timeout period to avoid the failure of the lock release, which will cause the Key to be kept in Redis and unable to obtain the lock again.

Disadvantages: Can only be used with a single Redis instance, Redis cluster is not supported. In addition, if the Redis instance where the lock is located hangs, other clients can take the opportunity to get the lock, but the client that has obtained the lock cannot sense it.

The client concurrently acquires the lock and can only go to one Redis instance to acquire the lock. It cannot be said that client A successfully acquired the lock in Redis_1. Client B then sets the lock on Redis_2, and you say you succeeded in obtaining the lock.

Note: there are a lot of details to consider, such as locks renewing lives, locks only releasing their own locks, etc. Redis&Redisson on distributed lock watchdog mechanism source code understanding oversold problem – nuggets (juejin. Cn). I won’t repeat it here.

Are there any locks that support Redis clustering? Redis is now based on a cluster architecture to withstand concurrency pressures. The answer is RedLock.

RedLock

Analysis problem: The main reason why simple Key setting is not enough for Redis cluster application is that locks only exist in a single instance. Now let’s introduce today’s hero, RedLock.

Principle of RedLock algorithm

  1. Gets the current timestamp
  2. Use the same Key with a timeout, issue a lock request to the Redis cluster, and set a timeout to the client, so that the client does not wait for the Redis instance to hang. (Client timeout should be shorter than Key timeout)
  3. If more than half of the instances successfully acquired the lock and did not exceed the client timeout (calculated according to Step 1), the client is considered to have successfully acquired the lock.
  4. If the lock fails to be acquired, for example, not half of the clients are locked successfully or the client has timed out, the lock is considered to have failed and the client needs to send a unlock request (DEL) to all instances of Redis.

The algorithm is relatively simple, which contains the idea of half. The concept of half appears very frequently in distributed data consistency, such as Paxos and ZAB.

Does RedLock really solve the problem of distributed locking?

Normally, you can solve distributed problems.

When a client obtains a lock with more than half of the instances, it is impossible for other clients to obtain a lock with more than half of the instances. In extreme cases, however, RedLock may not be fully packaged!

Extreme scenario: The locked node breaks down

ClientA successfully locked Redis_1, Redis_2, and Redis_3 instances using RedLock. However, after a period of time, the Redis_3 node breaks down and restarts to join the cluster, but the locked data is lost, and ClientB takes advantage of the situation. The locks of Redis_3, Redis_4, and Redis_5 nodes are successfully over half locked, so ClientA and ClientB hold the lock at the same time, and the lock is not fully wrapped.

Solutions:

  • Persistent data, using AOF to store data, as much as possible to save all the lock data, when the node goes down can also ensure that the lock is still in Redis after restart. The AOF synchronization policy includes synchronization per second and synchronization per time. If you set bit synchronization per second, logs are written every time a write operation is performed, which is inefficient.
  • Delayed startup. It is not enough to persist the data in time. You must expect the data to break down before it is persisted to disk. At this point we can take a delayed start. Do not restart Redis immediately after it goes down. Instead, restart Redis after the TTL (timeout time) of the longest Key in the distributed lock expires to ensure that all keys are forcibly unlocked. But this scenario requires something to store the TTL time for each distributed lock.

Extreme scenario: The client cannot perceive the lock timeout

Martin, a well-known distributed domain researcher abroad, has complained about RedLock’s shortcomings on Github. He makes the following points:

  • Because Key in Redis has the automatic release mechanism of timeout, the lock on the client cannot perceive its lock failure.
  • RedLock relies too much on time.

The picture above is one of the situations Martin pointed out.

  1. After the client successfully locks through RedLock, it executes its own business logic.
  2. The client happens to perform garbage collection in the GCSTW (Stop the World), which causes the client to block for a period of time.
  3. When the client wakes up, the lock has been deactivated in Redis, and is then taken in by Client2, who successfully locks it. Client1 and Client2 hold locks at the same time, causing resource insecurity.

In fact, not only the STW (Stop the World) case mentioned, there are many times when the client is blocked. For example: network communication reasons and so on.

Client1 didn’t know his lock was broken. You can’t blame Client1 for that. So can we solve this problem? Martin also proposed a corresponding solution to fencing mechanism.

Mechanism of fencing

Fencing I baidu found is the meaning of 🤺 fencing, still quite interesting!

Martin thought that adding a flag ID to each lock would be monotonically increasing. The later the lock, the smaller the lock.

Returning to the previous scenario, Client1 has the lock and the lock ID is 369. Client2 acquired the lock later, so the lock ID he generated is 370 (larger than 369). When they write, only the client with the largest lock is allowed to hold the lock.

The ZXID is the global transaction ID in Zookeeper. Ultimately, it is an implementation of optimistic locking.

If you can guarantee the synchronization of shared variables, you don’t need to introduce locking.

Specifically, since you can ensure that each client knows its lock ID and whether its lock ID is valid (the largest) in the entire cluster, you can also let locked resources know who owns them!

And I personally find it very difficult to implement Martin’s “determine if you are the largest lock ID when writing data”.

Let’s say it’s Client1 and Client2 again. Client2 has not yet got the lock. At this point, Client1 is checking fencing in the last step and finds that it is the highest and is about to write data. The Client now gets the lock. Oh and clear! Another lock held by two clients!

RedLock relies too much on the clock

One of the characteristics of distributed architectures is the lack of a global clock.

Even in the real world, each place will have a different time, depending on the latitude and longitude, will be divided into zones (East 8). Distributed machines are distributed freely in geographical dimensions. So it’s inevitable that the clock on the machine is different.

Even NTP (clock synchronization server) does not guarantee the reliability of the clock. For example, clock back, clock drift and other problems. Clock callback is a problem encountered in snowflake algorithm. If you are interested, see how the Snowflake algorithm solves the clock rollback problem. By the way, the snowflake algorithm does not guarantee strict increments, because the distributed time of each machine is not strictly synchronized.

Let’s review the RedLock algorithm:

  1. Gets the current timestamp
  2. Use the same Key with a timeout, issue a lock request to the Redis cluster, and set a timeout to the client, so that the client does not wait for the Redis instance to hang. (Client timeout should be shorter than Key timeout)
  3. If more than half of the instances successfully acquired the lock and did not exceed the client timeout (calculated according to Step 1), the client is considered to have successfully acquired the lock.
  4. If the lock fails to be acquired, for example, not half of the clients are locked successfully or the client has timed out, the lock is considered to have failed and the client needs to send a unlock request (DEL) to all instances of Redis.

Let’s take a look at where RedLcok spends its time:

  • In step 3, we need to get the current timestamp, then subtract the timestamp obtained in step 1, curtime-startTime, and then compare it with the client timeout to see if it times out.

  • Keys stored in Redis also expire depending on time.

If a clock rollback problem occurs, Redis_3 time jumps forward, causing the lock to be released early at Redis_3 and then taken advantage of by ClientB. Direct Barbie Q.

Martin criticized RedLock’s algorithm for being too time-dependent, which roughly means that it emphasizes that a good algorithm may not get the right answer immediately, but it will give the right answer in the future, rather than the wrong answer, no matter there is a problem in the time dimension or in network communication.

It smells like final consistency!

conclusion

  • There are two very big problems with the RedLock algorithm. 1. The client cannot sense the lock failure. 2. RedLock relies too much on the clock.
  • If data consistency requirements are strict, you are advised to use Zookeeper to implement distributed locks.

We do see that The RedLock algorithm does have its limitations in a distributed environment. However, I personally believe that Redis is the middleware of AP architecture in the CAP theorem, and it is not suitable and unnecessary for lock to guarantee C. If the data is really consistent, use CP middleware such as Zookeeper.

Take caching and database consistency, even though we all know that cache-aside Pattern can have inconsistent rows. Even delayed dual deletion is not guaranteed. But it will be used because not many businesses need Canal’s rigorously consistent middleware.

As always, talking about technology without business is playing rogue.