Redis Distributed lock Redlock algorithm, those you may not know the secret!

1. Introduction

Before we go any further, let’s take a look at what the RedLock algorithm is.

Redlock algorithm is a distributed lock implementation method proposed by Antirez, the author of Redis. The algorithm idea is as follows: In the distributed environment of Redis, it can be assumed that there are N Redis masters. These nodes are completely independent of each other and there is no master-slave replication or other cluster coordination mechanism. We acquired and released locks on N instances using the same method as standalone Redis. Now let’s assume we have 5 Redis master nodes and run these 5 instances on 5 servers so that they don’t all go down at the same time. To get the lock, the Redis client does the following:

Gets the current Unix time in milliseconds.
Using the same key and a value with a unique value, try to obtain locks from each of the five instances in turn. When requesting a lock from Redis, the client should set a network connection and response timeout that is less than the lock expiration time. For example, if your lock expires automatically in 10 seconds, the timeout should be between 5 and 50 milliseconds. This prevents the client from waiting for a response when Redis has already failed on the server. If the server does not respond within the specified time, the client should try to obtain the lock from another Redis instance as soon as possible.
The client obtains the lock usage time by subtracting the current time from the time it started acquiring the lock (the time recorded in Step 1). The lock is successful if and only if it is taken from most of the Redis nodes (N/2+1, here 3 nodes) and used for less than the lock expiration time.
If a lock is obtained, the true validity time of the key is equal to the validity time minus the time used to obtain the lock (as calculated in Step 3).
If, for some reason, the lock fails to be acquired (not in at least N/2+1 Redis instances or the lock has been acquired for an extended period of time), the client should unlock all Redis instances (even if some Redis instances are not locked at all). Prevents some node from acquiring the lock but the client does not get the response so that the lock cannot be reacquired for a later period of time.

Martin Kleppmann[1], the author of DDIA, published a great article on RedLock, a distributed lock algorithm. This paper is translated and integrated into personal understanding on this basis, and the formal start.

An algorithm named Redlock[2] is published on the Redis website. As a distributed systems researcher, I was skeptical of the algorithm’s claim to implement fault-tolerant distributed locks (or, more specifically, leases [3]) on top of Redis, and wrote down my own reflections.

Since Redlock already has more than 10 separate implementations, and I don’t know who is already relying on the algorithm, I thought it was time to share my notes publicly. I will not delve into other aspects of Redis because Redis has been criticized elsewhere [4].

I love Redis and have used it successfully in production in the past. I think it’s great for situations where servers share some transient, approximate, rapidly changing data, and it’s not a big deal if you occasionally lose that data for some reason. For example, a good use case is to maintain a request counter for each IP address (to limit access rates) and a different set of IP addresses it uses for each user ID (for abuse detection).

However, Redis is gradually moving into data management, where there needs to be greater consistency and persistence — which worries me, because that’s not what Redis was designed for. But distributed locking is one of those areas. Let’s look at it in more detail.

2. Why are you using distributed locks?

The purpose of locking is to ensure that only one node actually performs this operation (at least one at a time) when multiple nodes attempt to perform the same work. This might be writing some data to a shared storage system, performing some calculations, calling some external apis, and so on. In summary, you might want to use locks in a distributed application for two reasons: efficiency or correctness. How do you distinguish between the two? You can ask yourself what happens if you fail to acquire the lock:

** Efficiency: ** Acquiring locks avoids doing the same work twice unnecessarily (e.g., some expensive calculations). If locking fails and both nodes end up doing the same job, it could result in a slight increase in cost (say you end up paying AWS 5 cents more than you would otherwise) or minor inconvenience (for example, the user ends up getting the same email notification twice).
** Correctness: ** Using locks prevents concurrent processes from interfering with each other and corrupting system state. If locking failure results in two nodes processing the same piece of data at the same time, the consequences could be file corruption, data loss, permanent inconsistencies, or the wrong dose of medication given to the patient, or some other more serious problem.

Both of these are reasons to use locks, but you need to be very clear about which of the two you’re dealing with.

If you use locks only for efficiency purposes, there is no need to bear the cost and complexity of Redlock, run 5 Redis servers and check most of them to get your locks. You are better off using only a single Redis instance, perhaps asynchronously copying to a replica instance in case the primary instance crashes.

If you use a single Redis instance, your locks will also have some problems if your Redis node suddenly goes out of power, or if something goes wrong. But if you use locks for efficiency purposes, and Redis node crashes don’t happen very often, that’s not a big deal. This “no big deal” scenario is where Redis comes in. At the very least, if you rely only on a single instance of Redis, then everyone on the management system knows that the lock is approximate and is only used for non-critical paths and purposes.

On the other hand, the Redlock algorithm with five copies and majority voting appears at first glance to be suitable for cases where there are strict requirements on the correctness of the lock. For the rest of this article, we’ll assume that you’re using distributed locks for correctness, and it would be a serious mistake for two different nodes to think they hold the same lock at the same time.

3. Use locks to protect resources

Let’s leave the details of Redlock aside for a moment and then discuss the general use of distributed locks (independent of the particular lock algorithm used). It is important to remember that locks in distributed systems are not like mutex locks in multithreaded applications. This is a more complex design because different nodes and networks can fail autonomously in various ways.

For example, suppose you have an application where the client needs to update files in shared storage (such as HDFS or S3). The client first acquires the lock, then reads the file and makes some changes, then writes back the modified file, and finally releases the lock. This lock prevents two clients from performing this read -> modify -> write to the same file at the same time, which would result in lost updates. The code might look like this:

1// THIS CODE IS BROKEN 2function writeData(filename, data) { 3 var lock = lockService.acquireLock(filename); 4 if (! lock) { 5 throw 'Failed to acquire lock'; 6 } 7 8 try { 9 var file = storage.readFile(filename); 10 var updated = updateContents(file, data); 11 storage.writeFile(filename, updated); 12 } finally { 13 lock.release(); 14}} 15Copy the code

Unfortunately, even if you have perfect lock service, the above code can be broken. The following figure shows how the data is corrupted:

In this example, the client that acquired the lock paused for a long time after holding it — for example, because the garbage collector (GC) was started. The lock has a timeout (that is, it is a lease), which is a good design (otherwise a crashed client might end up holding the lock forever and never release it). However, if the GC pause lasts longer than the lease expiration and the client does not realize that the lock it holds has expired, it may continue to make unsafe changes.

This error has happened before: HBase has had this problem [5]. Typically, GC pauses are short, but “stop-the-world” GC pauses can sometimes last a few minutes — long enough for the lease to expire, of course. Even so-called “concurrent” garbage collectors, such as the CMS of HotSpot JVM, cannot run entirely in parallel with application code — because they sometimes need to “stop-the-world” as well.

You cannot solve this problem by inserting a check for lock expiration before writing back to storage. Keep in mind that GC can pause a running thread at any time, including when you least want it to happen (between the last check and the write operation).

While your own programming language may not have long GC, your process can still be suspended for many other reasons. For example, maybe your process tries to read an address that has not been loaded into memory, and it causes a page miss error and pauses until the page is loaded from disk. Perhaps your disk is using EBS, so reading a variable could inadvertently turn into a synchronous network request on Amazon’s congested network. Perhaps there are many other processes competing for CPU, and you encounter a black node in the scheduler tree [6]. Someone might accidentally send a SIGSTOP or something to the process, which can cause the process to pause.

If you still don’t trust the process to pause, consider that file write requests may be delayed due to network congestion before they reach the storage server. Packet networks such as Ethernet and IP can cause packets to be arbitrarily delayed, and they do: in one famous incident at GitHub, packets were delayed around 90 seconds across the network. This means that an application process may send a write request, and when the lease has expired, it may arrive at the storage server one minute later.

Even in a well-managed network, this kind of thing can happen. You can’t make any assumptions about time, so the code above is not secure no matter which locking service you use.

The value of RedLock algorithm is that it introduces the majority idea (PaxOS distributed consistency algorithm) to solve the impact of single point of failure on data security and service availability. But its heavy reliance on system clocks and proper lease time Settings makes it impossible to use in scenarios where strong correctness is required.

4. Use Fencing to secure locks

Fixing the above problem is actually quite simple: each write request to the storage service carries a protection token. In this case, the guard token only needs to be a number that is incremented each time a client acquires a lock (for example, by the lock service). As shown below:

Client 1 acquires the lease and a token of value 33, but then enters a long pause and the lease expires. Client 2 gets the lease, gets a token with a value of 34 (the number is always increasing), and writes both the content and the token with a value of 34 to the storage service. Later, when client 1 recovers, it sends the write request: content and token with value 33 to the storage service. However, the storage server remembers that it has already processed writes with a higher numeric token (34), so it rejects requests with token 33.

Note that this requires the storage server to take active role checking of tokens and reject any writes to tokens with smaller values. But it’s not particularly hard, once you get the hang of it. And if the lock service generates strictly monotonically increasing tokens, this makes the lock secure. For example, if you are using ZooKeeper as a lock service, you can use zxID or ZNode version number as a guard token and you are in good shape.

However, this brings us to Redlock’s first big problem: it doesn’t have any tools for generating monotonically increasing tokens. This algorithm does not guarantee that the client will get an increasing number every time it acquires a lock. This means that even if the algorithm is otherwise perfect, it is not safe to use, because in the case of a client process pausing or a packet delay, you cannot prevent other clients from competing for that lock at that time.

It wasn’t obvious to me how to change the Redlock algorithm to start generating guard tokens. The unique random value it uses does not provide the monotonicity required. It is not enough to keep a counter on a Redis node because the node may fail. Keeping counters on multiple nodes means that their data will be out of sync or out of sync. You may need a consensus algorithm to generate fencing tokens. (It’s easy to just increment a counter)

5. Take time to resolve common ground

You should not use Redlock if you want to use locks for correctness because Redlock cannot generate fencing tokens. But there are further issues worth discussing.

In academic literature, the most practical system model of this algorithm is asynchronous model with unreliable fault detector [7]. In simple terms, this means that the algorithm makes no assumptions about time: processes can pause for any length of time, packets can be delayed arbitrarily in the network, clocks can be wrong arbitrarily — but the algorithm will do the right thing nonetheless. Given what we’ve discussed above, these are all very reasonable assumptions.

The Redlock algorithm uses the clock for the sole purpose of generating timeouts to avoid waiting forever while the node is down. But timeouts are not always accurate: just because a request has timed out, it doesn’t mean the other node is down — it could be that there is a lot of latency in the network, or that your local clock is wrong. When used as a fault detector, timeouts simply predict a problem. (If possible, the distributed algorithm would have no clock at all, but then it would be impossible to reach consensus [8], and obtaining a lock is like a comparison and set operation that requires consensus [9].)

Note that Redis uses getTimeofday [10] rather than monotonic clock [11] to determine the expiration timeof the key [12]. Gettimeofday’s man page clearly states [13] that the time it returns is subject to discontinuous jumps in system time — that is, it may suddenly jump forward by a few minutes or even time jumps (for example, if the clock is stepped by NTP [14] because it is too different from the NTP server, Or if the clock is manually adjusted by an administrator). So if the system clock is doing something strange, it’s easy to have a key expire much faster or slower than expected in Redis.

For algorithms in the asynchronous model, this is not a big problem: these algorithms generally guarantee their security attributes are constant without making time-based assumptions [15]. Only the activity attribute depends on timeout or some other fault detector. In layman’s terms, this means that the algorithm will never make a bad decision, even though timing is everywhere in the system (process pauses, network delays, clocks jumping forward and backward), causing the performance of the algorithm to degrade.

Not so with Redlock, however. Its safety depends on a number of time assumptions:

It assumes that all Redis nodes are held for the appropriate length of time before they expire
Network latency is much shorter than expiration time
Process pause times are much shorter than expiration times

6. Redlock based on time assumptions is unreliable

Let’s look at some examples to demonstrate Redlock’s reliance on time assumptions. Suppose the system has five Redis nodes (A, B, C, D, E) and two clients (1 and 2). What happens if the clock on one of the Redis nodes jumps forward?

Client 1 obtains the locks on nodes A, B, and C. D and E cannot be accessed due to network problems.
The clock on node C jumps forward, causing the lock to expire.
Client 2 obtains locks on nodes C, D, and E. A and B cannot be accessed due to network problems.
Both clients 1 and 2 now believe they hold the lock.

A similar problem can occur if C crashes and restarts immediately before persisting the lock to disk. For this reason, the Redlock documentation recommends delaying the restart of a crashed node at least [16] to the maximum lock lifetime. But this restart delay again relies on reasonably accurate time measurements, but can also cause a failure when a clock jump occurs.

Perhaps you don’t think it is realistic for clock jumps to occur because you are confident that NTP is configured correctly to adjust the clock. In this case, let’s look at an example of how a process pause can cause an algorithm to fail:

Client 1 requests to lock nodes A, B, C, D, and E.
While the response to client 1 is in progress, client 1 goes into the stop-the-world GC.
Locks on all Redis nodes will expire.
Client 2 obtains locks on nodes A, B, C, D, and E.
Client 1 completes GC and receives a response from the Redis node indicating that it successfully acquired the locks (they are stored in client 1’s kernel network buffer when the process is paused).
Both clients 1 and 2 now believe they hold the lock.

Note that even though Redis is written in C and therefore has no GC, this doesn’t help us: any system where a client might experience GC pauses has this problem. You can only ensure security by preventing client 1 from doing anything under the lock after client 2 has acquired it, such as using the above guard method.

A longer network delay has the same effect as a process pause. This may depend on your TCP user timeout — if you make the timeout significantly shorter than Redis TTL, delayed network packets may be ignored, but we’ll have to look at the TCP implementation in detail to be sure. Also, with timeouts, we go back again to time measurement accuracy!

7.Redlock’s synchronization assumption

These examples show that Redlock only works correctly if you assume a synchronous system model — that is, a system with the following properties:

Network delay with boundary time (packets must always arrive within a certain maximum delay time)
Process pauses with boundary time (you must ensure that the maximum pause time of a process is less than the set timeout, i.e., hard real-time constraints[17], which you can usually only find in car airbag systems)
Limited clock error (difficult to avoid getting time from a bad NTP server [18])

Note that the synchronous model does not mean that clocks are fully synchronized clocks: it means that you assume that network latency, pauses, and clock drift have a known, fixed upper limit [19]. The Redlock algorithm assumes that latency, pause, and drift are small relative to the timeout of the lock; But if the timing problem becomes as large as the timeout, the algorithm fails.

In a well-functioning data center environment, timing assumptions are met most of the time — this is called a partially synchronous system. But this is not enough, because once these time assumptions are broken, Redlock may violate its security attributes, for example by granting a lease to one client before another expires. If you rely on your lock for correctness, “most of the time” is not enough — you need it to work correctly all the time.

There is considerable evidence that it is dangerous to assume that most system environments conform to the synchronous system model. I kept reminding myself of the GitHub accident — 90-second packet delay [20]. So Redlock is unlikely to survive the Jepsen[21] test.

If you want to ensure correctness and consistency, take advantage of consensus algorithms like Raft, Viewstamped Replication, Zab, and Paxos that are designed for partially synchronous system models (or asynchronous models with fault detectors). Consensus algorithms must discard all assumptions about timing. While it’s tempting to assume that networks, processes, and clocks are more reliable than they really are, in a chaotic and complex distributed system you have to be very careful about your assumptions.

Conclusion 8.

I think the Redlock algorithm is a bad choice: it’s too complex and expensive to implement for efficiency optimization, and it’s not secure enough for scenarios that want to be correct.

In particular, the algorithm makes dangerous assumptions about timing and system clocks (basically assuming that synchronous systems have limited network latency and limited operation execution time), and if these assumptions are not met, the security attributes will be violated. In addition, it lacks facilities to generate isolation tokens (to protect the system from long network delays or suspended processes).

If locks are used for efficiency optimization purposes and a certain degree of incorrectness can be tolerated, I recommend sticking with the simple Redis single-node locking algorithm [22] (atomic delete-if-value-matches to release locks if conditions are not present to obtain the lock). And document very clearly in your code that locks are only approximations and may sometimes fail. Don’t bother setting up a cluster of five Redis nodes.

On the other hand, if you need to lock to ensure correctness, don’t use Redlock. Instead, use an appropriate consensus system, such as ZooKeeper, possibly by implementing one of the Curator recipes[23]. (At a minimum, use a database with reasonable transaction guarantees [24].) Also force the use of a protection token on all resource access under the lock.

As I said at the beginning, Redis is a great tool if used correctly. None of this makes Redis less useful for its intended use. Salvatore[25] has been working on this project for many years and deserves its success. But each tool has limitations, and it’s important to understand them and plan accordingly.

9.References

[1] : the original martin.kleppmann.com/2016/02/08/… [2] RedLock algorithm and implementation: redis. IO /topics/dist… [3] lease paper: www.semanticscholar.org/paper/Lease… [4] aphyr.com/tags/Redis [5] www.slideshare.net/enissoz/hba… [6] http://43.128.13.41/? P = 188 [7] courses.csail.mit.edu/6.852/08/pa… [8] www.cs.princeton.edu/courses/arc… [9] cs.brown.edu/~mph/Herlih… [10] github.com/redis/redis… [11] linux.die.net/man/2/clock… [12] github.com/redis/redis… [13] linux.die.net/man/2/getti… [14] www.eecis.udel.edu/~mills/ntp/… [15] www.net.t-labs.tu-berlin.de/~petr/ADC-0… [16] redis. IO/switchable viewer/dist… [17] stackoverflow.com/questions/1… [18] xenia.media.mit.edu/~nelson/res… [19] www.net.t-labs.tu-berlin.de/~petr/ADC-0… [20] github.com/blog/1364-d… [21] aphyr.com/tags/jepsen [22] redis. IO/commands/se… [23] curator.apache.org/curator-rec… [24] www.postgresql.org/ [25] antirez.com/latest/0

Brother Dei, if you think my writing is good, please do me a favor

1, pay attention to my original wechat public number “Xiaoliang programming”, every day on time push dry goods technical articles, focus on writing algorithm + computer basic knowledge (computer network + operating system + database +Linux) + distributed + interview guide, heard that the attention is not good will become good oh.

Give me a thumbs up, can let more people see this article, by the way inspire me, hee hee.

Author’s brief introduction

Author: hello everyone, I am the trabecular, won a ICPC Asian games MEDALS, now at tencent, I knew algorithm, the importance of basic computer knowledge, so I apply for a WeChat public “trabecular programming hui”, professional in writing these underlying knowledge, improve our internal work, trabecular looking forward to your attention, and I study together. Reprint description: Without authorization, reprint is prohibited