What is distributed locking

When it comes to Redis, the first thing that comes to mind is the ability to cache data. In addition, Redis is often used for distributed locking due to its single-process and high performance characteristics.

Lock, as we all know, is a synchronization tool in the program to ensure that shared resources can only be accessed by one thread at the same time. We are familiar with locks in Java, such as synchronized and Lock, which are often used by us. However, The Java Lock can only be effective when a single machine is used, and the distributed cluster environment is powerless. This is where distributed locks are needed.

Distributed lock, as its name implies, is the lock used in distributed project development. It can be used to control synchronous access to shared resources between distributed systems. Generally speaking, distributed lock needs to meet the following characteristics:

1. Mutual exclusion: Only one application can obtain distributed lock for the same data at any time;

2. High availability: In distributed scenarios, the failure of a small number of servers does not affect normal use. In this case, services providing distributed locks need to be deployed in a cluster.

3, prevent lock timeout: if the client does not actively release the lock, the server will automatically release the lock after a period of time, to prevent client downtime or network unreachable deadlock;

4, exclusivity: lock unlocking must be carried out by the same server, that is, the holder of the lock can release the lock, can not appear you add the lock, others to unlock you;

There are many tools in the industry that can achieve distributed lock effect, but the operations are nothing more than a few: lock, unlock, prevent lock timeout.

Since this article is about Redis distributed locks, we naturally extend the knowledge of Redis.

The command to implement the lock

Let’s start with a few commands for Redis,

1, SETNX, SETNX key value

SETNX is short for “SET if Not eXists”, returning 1 on success and 0 otherwise.

Setnx usage

As you can see, when the key in the lock, as value is set to “Java”, set to other values will fail again, look very simple, seems to exclusive locks, but there is a fatal problem, is the key no expiration time, as a result, unless manually delete key, or set the expiration time, after acquiring a lock or other threads get locked forever.

Add an expiration date to the key, and let the thread perform two steps when acquiring the lock:

SETNX Key 1EXPIRE Key Seconds
Copy the code

The problem with this scheme is that there are two steps to acquire the lock and set the expiration time. It is not atomic, and it is possible to acquire the lock successfully but fail to set the expiration time.

But don’t worry, this kind of thing Redis officials have considered for us, so it leads to the following command

2、SETEX,用法SETEX key seconds value

Associate the value value with the key and set the lifetime of the key to seconds. If the key already exists, the SETEX command overwrites the old value.

This command is similar to the following two commands:

SET key valueEXPIRE key secondsCopy the code

These two steps are atomic and will be completed at the same time.

Setex usage

PSETEX: PSETEX key milliseconds value

This command is similar to the SETEX command, but it sets the lifetime of the key in milliseconds, rather than seconds, as the SETEX command does.

However, starting with Redis 2.6.12, the SET command can use arguments to achieve the same effect as SETNX, SETEX, and PSETEX.

Like this command

SET key value NX EX seconds 
Copy the code

Add NX and EX parameters and the effect is equivalent to SETEX, which is the most common Redis lock acquisition method.

How to release the lock

Releases the lock command was a simple, direct delete key, but we said earlier, because the distributed lock must be released by a holder of the lock themselves, so we must first ensure that the current thread releases the lock holders, the right to delete, as a result, they became two steps, seems to have violated the atomicity, how to do?

Not to panic, we can use a Lua script to assemble the two steps like this:

if redis.call("get",KEYS[1]) == ARGV[1]then    return redis.call("del",KEYS[1])else    return 0end
Copy the code

KEYS[1] is the name of the current key, and ARGV[1] can be the ID of the current thread (or some other variable value that can identify the owning thread) to prevent threads from having expired locks, or other threads from deleting existing locks by mistake.

Code implementation

Knowing the principle, we can hand-write code to implement Redis distributed lock function, because the purpose of this article is mainly to explain the principle, not to teach you how to write distributed lock, so I use pseudo-code implementation.

The first is the Redis lock utility class, which contains basic methods for locking and unlocking:

public class RedisLockUtil { private String LOCK_KEY = "redis_lock"; Private long EXPIRE_TIME = 5; private long EXPIRE_TIME = 5; // Wait timeout, 1s private long TIME_OUT = 1000; Private SetParams params = setparams.setparams ().nx().px(EXPIRE_TIME); private SetParams params = setparams.setparams ().nx(). JedisPool JedisPool = new JedisPool("127.0.0.1", 6379); /** * lock ** @param id * thread id, * @return */ public Boolean lock(String id) {Long start = System.currentTimemillis (); Jedis jedis = jedisPool.getResource(); try { for (;;) String lock = jedis. SET (LOCK_KEY, id, params); if ("OK".equals(lock)) { return true; Long l = system.currentTimemillis () -start; if (l >= TIME_OUT) { return false; } thread.sleep (100);} thread.sleep (100); } catch (InterruptedException e) { e.printStackTrace(); } } } finally { jedis.close(); }} /** * unlock ** @param id * thread ID, Public Boolean unlock(String id) {Jedis Jedis = jedispool.getResource (); Jedis Jedis = jedispool.getResource (); String script = "if redis. Call ('get',KEYS[1]) == ARGV[1] then" + "return redis. Call ('del',KEYS[1])" + "else" + " return 0 " + "end"; try { String result = jedis.eval(script, Collections.singletonList(LOCK_KEY), Collections.singletonList(id)).toString(); return "1".equals(result); } finally { jedis.close(); }}}Copy the code

Now that the code is commented clearly, we can write a demo class to test the effect:

public class RedisLockTest { private static RedisLockUtil demo = new RedisLockUtil(); private static Integer NUM = 101; public static void main(String[] args) { for (int i = 0; i < 100; i++) { new Thread(() -> { String id = Thread.currentThread().getId() + ""; boolean isLock = demo.lock(id); If (isLock) {NUM--; System.out.println(NUM); }} finally {// Unlock must be placed in finally demo.unlock(id); } }).start(); }}}Copy the code

We create 100 threads to simulate concurrency, and the result looks like this:

Code execution results

As you can see, the lock effect is achieved, thread safety can be guaranteed.

Of course, the above code is just a simple implementation of the effect, the function is certainly not complete, a sound distributed lock to consider there are many aspects, the actual design is not so easy.

Our purpose is just to learn and understand the principle, hand-writing an industrial-grade distributed lock tool is not realistic, nor necessary, a lot of similar open source tools (Redisson), the principle is similar, and has already been tested by industry peers, directly used on the line.

Although the function is realized, but in fact, from the design, such a distributed lock has great defects, which is the main content of this article.

Defects of distributed locks

1. Lock failure caused by long-time client block

Client 1 obtains the lock, which is blocked for a long time due to network problems or GC, and then the lock expires before the service program completes execution. In this case, client 2 can also obtain the lock normally, which may cause thread safety problems.

The client is blocked for a long time

So how do you prevent such exceptions? We won’t talk about solutions until we cover the other defects.

Second, redis server clock drift problem

If the redis server clock jumps forward, the key will expire prematurely. For example, after client 1 gets the lock, the key expiration time is 12:02, but the redis server clock is 2 minutes faster than the client, so the key will expire at 12:00. At this time, If client 1 does not release the lock, it can cause multiple clients to hold the same lock at the same time.

Third, single instance security issues

If redis is in single-master mode, when the machine goes down, all the clients will not be able to obtain the lock. To improve the availability, we may add a slave to the master. However, because the master/slave synchronization of Redis is asynchronous, the master may hang up after client 1 sets the lock. The slave is promoted to master. Due to the asynchronous replication feature, the lock set on client 1 is lost. In this case, the lock set on client 2 can also be successfully set, causing both client 1 and client 2 to have the lock.

In order to solve Redis single point problem, Redis author proposed RedLock algorithm.

RedLock algorithm

The premise of the algorithm is that Redis must be multi-node deployment, which can effectively prevent single point of failure. The specific implementation idea is as follows:

1. Obtain the current timestamp (MS);

2. Set the TTL for the key, which will be automatically released if the key exceeds the TTL. Then the client will try to set the same key and value for all redis instances, and set a timeout time much shorter than TTL for each redis instance. This is to avoid waiting too long for the redis service, which has been shut down. And try to get the next redis instance.

For example, if the TTL (expiration time) is 5s, the timeout time for acquiring the lock can be set to 50ms. Therefore, if the lock cannot be acquired within 50ms, the lock should be abandoned and the next lock should be acquired.

3, the client obtains all the time after the lock minus the first step of the time, and redis server clock drift error, and then the time difference is less than the TTL time and successfully set the number of lock instances >= N/2 + 1 (N is the number of Redis instances), then lock success

For example, if TTL is 5s, it takes 2s to connect to Redis to get all the locks, and then subtract the clock drift (assuming the error is about 1s), then the lock is only valid for 2s.

4. If the client fails to acquire the lock for some reason, it starts unlocking all instances of Redis.

According to this algorithm, we assume that there are 5 instances of Redis, then the client only needs to acquire more than 3 of the locks to be considered successful. The flow chart shows something like this:

Key Validity period

Well, the algorithm is also introduced, from the point of view of the design, there is no doubt that the idea of RedLock algorithm is mainly to effectively prevent Redis single point of failure, and in the design of TTL also took into account the error of the server clock drift, so that the security of distributed lock improved a lot.

But is this really the case? Well, personally, I don’t think it works,

First of all, the first point, we can see that in RedLock algorithm, the effective time will lock minus the length of the connecting Redis instance, if the time because the network problems lead to the process is too long, then finally to lock the effective time will be greatly reduced, the time is short, the client access to a Shared resource will most likely be due process in the process of the lock. Also, the duration of the lock needs to be subtracted from the server clock drift, but how much is appropriate? If this value is not set properly, problems can occur.

Secondly, although this algorithm considers the use of multiple nodes to prevent the single point of failure of Redis, if a node crashes and restarts, it is still possible for multiple clients to acquire locks at the same time.

Assume that there are five Redis nodes: A, B, C, D, and E, and clients 1 and 2 are locked respectively

  1. Client 1 successfully locked A, B, and C, and obtained the lock (but D and E were not locked).

  2. The master of node C hangs, and the lock is not synchronized to the slave. After the slave is upgraded to master, the lock added by client 1 is lost.

  3. Client 2 then obtains the lock, locks C, D, and E, and obtains the lock successfully.

In this way, client 1 and client 2 hold the lock at the same time, and the vulnerability to program security remains. In addition, if one of these nodes has time drift, it may lead to security issues of the lock.

Therefore, although the deployment of multiple instances has improved availability and reliability, RedLock has not completely solved the hidden danger of Redis single point of failure, nor has it solved the problem of clock drift and lock timeout failure caused by client block for a long time. The hidden danger of lock security still exists.

conclusion

One might ask further, what can be done to make the lock absolutely secure?

I can only say, do not have your cake and eat it, the reason we use Redis as a distributed lock tool, largely because Redis itself with high efficiency and the characteristics of the single process, even in the case of high concurrency can well ensure performance, but most of the time, performance and security cannot fully take into account, if you make sure the safety of the lock, Can use other middleware such as DB, ZooKeeper to do control, these tools can be very good to ensure the security of the lock, but the performance can only be said to be unsatisfactory, otherwise we would have used it.

In general, if Redis is used to control shared resources and requires high data security requirements, the ultimate guarantee solution is to do idempotent control of business data, so that even if multiple clients acquire locks, data consistency will not be affected. Of course, not all scenes are suitable for this, and how to make a choice is up to you to deal with it. After all, there is no perfect technology, only suitable is the best.