preface

Hello, everyone. I am a little boy picking up field snails. Today we discuss the design and implementation of distributed lock. Hope to help you, if there are incorrect places, welcome to point out, learn together, progress together.

  • Overview of Distributed Locks
  • Database distributed lock
  • Redis distributed lock
  • Zookeeper distributed lock
  • Comparison of three distributed locks
  • Public number: a boy picking up snails

1. Overview of distributed locks

Our systems are all distributed deployment. In daily development, in order to prevent oversold inventory, distributed locks are needed for business scenarios such as placing orders in seconds and snapping up commodities.

Distributed lock is the realization of a lock that controls the access to shared resources by different processes in a distributed system. If a critical resource is shared between different systems or hosts on the same system, mutual exclusion is often required to prevent interference and ensure consistency.

There are three ways to realize distributed lock in the industry:

  • Distributed lock based on database implementation
  • Distributed lock based on Redis implementation
  • Distributed lock based on Zookeeper implementation

2. Database based distributed lock

2.1 Database pessimistic lock implementation of distributed lock

You can use select… For UPDATE to implement distributed locking. We used a similar implementation for our own project, distributed timed tasks, and I’ll show you a simple version

The table structure is as follows:

CREATE TABLE 't_resource_lock' (' method_name 'varchar(45) COLLATE UTf8_bin NOT NULL DEFAULT' method name ', `status` char(1) COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT 'S,F,P', 'lock_flag' int(10) unsigned NOT NULL DEFAULT '0' COMMENT '1 ' 'begin_time' datetime DEFAULT NULL COMMENT 'start time ',' end_time 'datetime DEFAULT NULL COMMENT' end time ', Client_ip 'vARCHar (45) COLLATE UTf8_bin NOT NULL DEFAULT ', 'method_time' int(10) unsigned NOT NULL DEFAULT '60' COMMENT 'Only one lock is allowed in the lifetime of a method, in unit: ', PRIMARY KEY (' method_name ') USING BTREE) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE= UTf8_binCopy the code

The pseudo-code for the lock method is as follows:

@transcational public Boolean lock(String methodName, int time){ resourceLock = 'select * from t_resource_lock where method_name ='#{methodName}' for update'; Try {if(resourceLock==null){// Insert lock data resourceLock= new resourceLock (); resourceLock.setMethodTime(time); resourceLock.setLockFlag(1); / / lock resourceLock setStatus (P); / / processing resourceLock. SetBeginTime (new Date ()); int count = "insert into resourceLock"; If (count==1){return true; } return false; } }catch(Exception x){ return false; } // The lock is not locked and the lock has timed out, If (resourcelock.getLockFlag =='0'&&'S'. Equals (resourcelock.getStatus) && new Date()>=resourceLock.addDateTime(resourceLock.getBeginTime(,time)){ resourceLock.setLockFlag(1); / / lock resourceLock setStatus (P); / / processing resourceLock. SetBeginTime (new Date ()); //update resourceLock; return true; } else if (new Date () > = resourceLock. AddDateTime (resourceLock. GetBeginTime (time)) {/ / overtime not normal execution end, failed to get the lock return false. }else{ return false; }}Copy the code

The pseudocode for the unlock method is as follows:

Public void unlock(String methodName, status){resourcelock.setlockFlag (0); / / lock resourceLock setStatus (status); S: success, F: failure //update resourceLock; return ; }Copy the code

Overall process:

Try {if(lock(methodName,time)){// Lock status = process(); // Handle your business logic. } } finally{ unlock(); // release the lock}Copy the code

In fact, this pessimistic lock to achieve distributed lock, the overall process is relatively clear. Select… For UPDATE locks the primary key method_name. If null, insert a new record and determine if the current record has timed out. Note that you have to add a transaction.

2.2 Distributed lock implemented by optimistic database lock

In addition to pessimistic locks, optimistic locks can also be used to implement distributed locks. Optimistic locking, as the name implies, is optimistic, and every update operation is performed with the expectation that there will be no concurrency conflicts and only retry after the update fails. It is implemented based on the CAS idea. At my former company, we used this method to deduct balances.

The version field will increase by 1 each time it is updated. Then when updating the balance, update the version number found with conditions. If it is the version number of the last time, update it.

The general process is as follows:

  1. Query the version number and balance
select version,balance from account where user_id ='666';
Copy the code

Suppose the version number is oldVersion=1.

  1. Logical processing, judging the balance
If (balance< deduction amount){return; } left_balance = balance - deduction amount;Copy the code
  1. Deduct the balance
update account set balance = #{left_balance} ,version = version+1 where version 
= #{oldVersion} and balance>= #{left_balance};
Copy the code

You can take a look at this flow chart:

3. Distributed lock based on Redis

Redis distributed locks are generally implemented in the following ways:

  • setnx + expire
  • Setnx + value Indicates the expiration time
  • Set extension command (set ex px nx)
  • Set ex px nx + check unique random value, then delete
  • Redisson
  • Redisson + RedLock

3.1 setnx + expire

Setnx + expire Redis distributed locks (expire)

If (jedis.setnx(key,lock_value) == 1) {//setnx adds lock expire (key, 100); Try {do something // service processing}catch(){} finally {jedis.del(key); // Release the lock}}Copy the code

This code can lock successfully, but you have not found the problem, lock operation and set timeout is separate. If the process crashes or restarts after executing a setnx lock, the lock will never be acquired by another thread, so it cannot be implemented in this way.

3.2 setnx + value Indicates the expiration time

long expires = System.currentTimeMillis() + expireTime; // System time + Set expiration time String expiresStr = string.valueof (Expires); If (jedis.setnx(key, expiresStr) == 1) {return true; } String currentValueStr = jedis.get(key); If (currentValueStr! = null && long.parselong (currentValueStr) < system.currentTimemillis ()) { String oldValueStr = jedis. GetSet (key, expiresStr); oldValueStr = jedis. if (oldValueStr ! = null && oldValueStr. Equals (currentValueStr)) {return true; }} return false; }Copy the code

In daily development, some partners implement distributed locks in this way, but there are some disadvantages:

  • The expiration time is generated by the client. In a distributed environment, the expiration time on each client must be synchronized.
  • There is no unique identifier of the save holder and it may be released/unlocked by other clients.
  • When the lock expires, multiple concurrent client requests come over at the same time, all executedjedis.getSet()In the end, only one client can successfully add the lock, but the expiration time of the client lock may be overwritten by other clients.

3.3 Extended commands for set (set ex px nx)

What do the parameters of this command mean? To review:

SET key value [EX seconds] [PX milliseconds] [NX|XX]
Copy the code
  • EX second: Sets the expiration time of the key tosecondSeconds.
  • PX millisecond: Specifies the expiry time of the keymillisecondMilliseconds.
  • NX: Sets the key only when the key does not exist.
  • XX: Configures the key only when the key already exists.
If (jedis. Set (key, lock_value, "NX", "EX", 100s) == 1){try {do something // catch(){} finally {jedis.del(key); // Release the lock}}Copy the code

There may be problems with this scheme:

  • The lock expired and was released. The service was not finished.
  • The lock was deleted by another thread.

Some of you may wonder why locks are deleted by other threads by mistake. Suppose that in A concurrent multi-threaded scenario, thread A acquired the lock, but it did not release the lock, thread B is unable to acquire the lock, so it is logically unable to execute the code below the lock, how can cause the lock to be deleted by other threads by mistake?

Assume that thread A and thread B both want to lock with key, and thread A succeeds in locking and locking. However, thread A takes A long time to execute the service logic, exceeding the preset timeout period of 100s. At this point, Redis automatically releases the key lock. At this point thread B can successfully lock and then perform business logic processing. Suppose that by chance, A finishes executing its business logic and releases the lock, but it releases B’s lock.

3.4 Set ex px nx + check unique random value, then delete

To solve the problem of the lock being deleted by another thread. We can add a unique random check value to set ex px nx as follows:

If (jedis.set(key, uni_request_id, "NX", "EX", 100s) == 1){// try {do something // catch(){} finally {// Check whether the current thread is holding the lock (uni_request_id.equals(jedis.get(key))) { jedis.del(key); // Release the lock}}}Copy the code

In this case, determine whether the lock being added and released by the current thread is an atomic operation. If jedis.del() is called to release the lock, the lock may no longer belong to the current client.

Lua scripts are usually used instead. The lua script is as follows:

if redis.call('get',KEYS[1]) == ARGV[1] then 
   return redis.call('del',KEYS[1]) 
else
   return 0
end;
Copy the code

This is a good way to do it. In general, you can already use this implementation. However, there is still a problem: the lock has expired and the service has not been completed.

3.5 Redisson

Locks may expire and services may not be completed. We can set the lock expiration time to be slightly longer than the normal business processing time. In fact, we can also give the thread that obtains the lock, open a regular daemon thread, every once in a while to check whether the lock still exists, exists to extend the expiration time of the lock, prevent the lock from being released in advance.

The current open source framework, Redisson, addresses this problem. Take a look at Redisson’s underlying schematic:

As soon as the thread successfully adds the lock, a watch Dog will be started. It is a background thread that will check every 10 seconds. If thread 1 still holds the lock, it will continuously extend the lifetime of the lock key. Therefore, Redisson used watch Dog to solve the problem that the lock was released after expiration and the service was not finished.

3.6 Redisson + RedLock

The previous six solutions are only distributed lock discussions based on Redis standalone and are not perfect. Because Redis is typically clustered:

If thread 1 acquires the lock on the master node of Redis, but the lock key has not been synchronized to the slave node. When the master node fails, a slave node becomes the master node. Thread two can automatically acquire the lock on the same key, but thread one has already acquired the lock, so the lock security is lost.

To solve this problem, Antirez, the author of Redis, proposed an advanced distributed lock algorithm: Redlock. Its core idea is this:

Deploy multiple Redis masters to ensure that they do not go down at the same time. These master nodes are completely independent of each other and there is no data synchronization between them. At the same time, you need to make sure that locks are acquired and released on multiple Master instances in the same way that they are acquired and released on single instances of Redis.

We assume that there are currently five Redis master nodes running these Redis instances on five servers.

RedLock implementation steps:

  1. Gets the current time in milliseconds.
  2. Lock requests are made to the five master nodes in sequence. The client sets network connection and response timeouts that are less than the lock expiration time. (Assuming automatic lock failure time is 10 seconds, the timeout time is usually between 5-50 ms, let’s assume the timeout time is 50ms). If you time out, skip the master node and try the next master node as soon as possible.
  3. The client obtains the time used to obtain the lock by subtracting the current time from the time when the lock was acquired (that is, the time recorded in Step 1). The lock is successful if and only if more than half of the Redis master nodes (N/2+1, in this case 5/2+1=3 nodes) are locked and used for less than the lock expiration time. (10s> 30ms+40ms+50ms+4m0s+50ms)
  4. If a lock is obtained, the true duration of the key is changed by subtracting the time taken to obtain the lock.
  5. If the lock fails to be acquired (the lock has not been acquired for at least N/2+1 master instances, or the lock has been acquired for longer than the valid time), the client must unlock all master nodes (even if some master nodes have not been locked at all, to prevent some from escaping).

The simplified steps are:

  • Lock requests are made to the five master nodes in sequence
  • Determine whether to skip the master node based on the timeout period.
  • If more than three nodes are locked successfully and the lock duration is shorter than the lock validity period, the lock is successfully locked.
  • If you fail to acquire the lock, unlock it!

Redisson implements the redLock version of the lock, you can go to see it

4 Zookeeper Distributed lock

Before learning about Zookeeper distributed locks, let’s review the nodes of Zookeeper.

The ZNodes of Zookeeper are classified into four types:

  • Persistent node: The default node type. After the client that creates the node disconnects from ZooKeeper, the node still exists.
  • Persistent nodes Sequential nodes: When creating a node, Zookeeper numbers the node name based on the time sequence. Sequential persistent nodes are persistent nodes in sequence.
  • Temporary nodes: In contrast to persistent nodes, temporary nodes are deleted after the client that created the node disconnects from ZooKeeper.
  • Temporary sequential node: A temporary node with an order.

The Zookeeper distributed lock implementation applies temporary sequential nodes.

4.1 LOCK acquisition process of ZK

When the first client request comes in, the Zookeeper client creates a persistent node /locks. If it (Client1) wants to acquire locks, a sequential node lock1 needs to be created under the LOCKS node. As shown in figure

Client1 then looks for all the temporary ordering children under locks to determine if its node lock1 is the least ordered, and if it is, it has successfully acquired the lock.

If another client tries to acquire the lock, it creates a temporary node lock2 under LOCKS

The client also looks for all the temporary sequential children under locks to determine if lock2 is the smallest, and lock1 is the smallest, so it fails to acquire the lock. Client2 registers a Watcher event with lock1, the node with the highest order, to monitor whether lock1 exists or not.

At this point, if another client Client3 attempts to acquire the lock, it creates a temporary node lock3 under LOCKS

Client3 also looks for all temporary sequential children under locks to determine if its node lock3 is the smallest and fails to acquire the lock if it is not. It will register a Watcher event with the node ahead of it, Lock2, to listen for the presence of the lock2 node.

4.2 release the lock

When the ZooKeeper client service is complete or faulty, the temporary node is deleted to release the lock. If the task is complete, Client1 explicitly calls the delete lock1 instruction

If the client fails, lock1 is automatically removed according to the nature of the temporary node

Client2 is happy when the lock1 node is deleted because it is always listening for lock1. Client2 is notified immediately that a lock1 node is deleted, and also looks for all temporary sequential children under locks, and locks are acquired if lock2 is the smallest.

In the same way, after Client2 has acquired the lock, Client3 is also eyeing it

  • Zookeeper is designed for distributed coordination and is easy to use. If you can’t get the lock, you just need to add a listener, which is very suitable for distributed locking.
  • Zookeeper also has disadvantages as a distributed lock: If many clients frequently apply for locks or release locks, the Zookeeper cluster will be under heavy pressure.

5. Comparison of three distributed locks

2.3 Database implementation of distributed lock summary

Database implementation of distributed lock, use and understanding, are relatively simple, do not need to introduce Redis, ZooKeeper and other middleware. However, the performance is low compared to Redis, which is not suitable for high concurrency scenarios. The timing task of our project can be used for distributed control because the application is deployed on multiple machines and the concurrency is not high.

5.1 Database Distributed lock implementation

Advantages:

  • Simple, easy to use, no need to introduceRedis, zookeeperMiddleware, etc.

Disadvantages:

  • Not suitable for high concurrency scenarios
  • Db operation performance is poor, there is the risk of locking table;

5.2 Redis distributed lock implementation

Advantages:

  • High performance, suitable for high concurrency scenarios

Disadvantages:

  • Lock deletion failure Expiration time is not controllable.

5.3 Distributed Lock Implementation in Zookeeper

Disadvantages:

  • Performance is not as good as distributed locks implemented by Redis
  • Heavy distributed locks.

Advantages:

  • It has good performance and reliability

5.4 Comparison Summary

  • From a performance perspective (from highest to lowest) Redis > Zookeeper >= database;
  • Database > Redis > Zookeeper in terms of ease of understanding (from lowest to highest);
  • From the perspective of implementation complexity (from low to high) Zookeeper > Redis > database;
  • From the perspective of reliability (from highest to lowest) Zookeeper > Redis > database.