“This is the second day of my participation in the First Challenge 2022, for more details: First Challenge 2022”.

What is distributed locking? What can it do?

In a single system, when accessing a shared resource in a high concurrency scenario, we need to lock the shared resource to ensure the concurrency security and ensure that only one thread can operate the shared resource at the same time. You are familiar with Java’s synchronized keyword and Lock Lock, and have used it in real projects. As shown in the following figure, in the same JVM process, once Thread1 has acquired the lock, it can operate on the shared resource. Other threads that have not acquired the lock can only wait for Thread1 to release the lock.

However, with the continuous development of business, the original single application is split into multiple micro-services, and each micro-service will deploy multiple instances, thus forming the current micro-service architecture. Requests to process shared resources come from different service instances, that is, in different JVM processes. The locking method in the original single service cannot meet the requirements of concurrent access to shared resources in the distributed scenario. Therefore, we need a mechanism to deal with shared resource security in distributed scenarios. At this time, the distributed lock to deal with this problem arises at the right moment.

Since JVM processes cannot manage threads of other service instances, they can use external component capabilities to achieve unified management of shared resources by different service instances, which we can call distributed locking. Therefore, the essence of distributed lock is to establish a lock acquisition mechanism outside of different service instances and form a concurrent mutual exclusion capability to ensure the concurrent safety of different threads for shared resources, so as to realize that only one thread can operate on shared resources at the same time in microservice architecture. For distributed locks, an external state storage system is required to implement atomized exclusive operations.

Through the demand analysis of distributed lock, the following four characteristics of distributed lock are summarized, which are multi-node, lock speed, exclusivity and lock expiration implementation mechanism.

Distributed lock implementation scheme

Implementation scheme of distributed lock based on database

Realize the principle of

The effect of distributed lock is realized by database, in fact, the uniqueness constraint of database or for UPDATE. For example, the inventory service in the e-commerce field is responsible for deducting the inventory of goods. First, it creates a special lock table to store the lock information. Then the inventory service inserts a lock resource data into the lock table in the database before carrying out the inventory operation.


create table'distributed_lock' (' id 'BIGINT NOT NULLAUTO_INCREMENT, 'resource_lock_key'varchar(64) NOT NULL
PRIMARYThe KEY (" id "),UNIQUEThe KEY 'uk_resource_lock_key' (' resource_lock_key ')USING BTREE
)
Copy the code

The general interaction process is as follows:

1. When the inventory service deducts mobile phone inventory, first insert a resource lock information into the lock table in the database;

2. If the insertion is successful, it indicates that inventory service 1 can carry out inventory deduction operation for mobile phone inventory;

3. The inventory service 2 also operates on the inventory, so it also inserts data into the lock table.

4. However, due to the unique constraint set in the lock table, the lock information fails to be inserted and the inventory service waits;

5. Inventory service 1. After inventory deduction, delete the information of lock table;

6. Inventory service 2 Insert the resource lock information. If the resource lock information can be inserted successfully, perform subsequent operations.

Project analysis

The database-based implementation seems relatively straightforward. But actually there are some problems. Let’s analyze them.

1, performance problems: because it is the insertion of data data need to drop disk storage, if ordinary reading and writing will affect the database performance, in addition, because the use of a unique key to judge will also affect the database performance to a certain extent, so the database scheme is suitable for the simple scenario of less concurrency;

2. A single point of failure may occur if the database is deployed in a single point of failure. If the database fails, services on the platform may be abnormal.

3. Deadlock problem: In the introduction above, it includes the steps of obtaining locks by inserting the database, and the process of releasing locks by deleting the lock information. However, if the inventory service 1 hangs after locking, the lock cannot be released, and other services cannot obtain the lock, it will cause the deadlock problem. Of course, we can check the lock table for outdated lock resources through a scheduled task. But this undoubtedly increases the complexity of the distributed lock implementation.

4. Reentrant is not supported: If you want to achieve reentrant lock, you need to add fields such as host and thread name to mark, and judge whether the information is consistent with the current information by these fields. If so, the lock is considered to have been obtained.

Given these problems, are there other distributed implementations that can avoid these problems? Let’s go further down.

Distributed lock implementation scheme based on Redis

Implementation principle based on sentnx command

As a high-performance database middleware, Redis is often used as a cache in projects. Therefore, Redis is a common solution to realize distributed lock. Similarly, implementing distributed locks through Redis also requires the ability to implement mutually exclusive locks through Redis. The sentnx (set if not exists) command is used. In addition, whether the command can be successfully set determines whether the service can obtain the corresponding distributed lock.

127.0.0.1:6379> setnx stockLock 10.12.35.12_stockService 
(integer) 1
Copy the code

As shown in the figure above, the process of locking and releasing locks roughly resembles the distributed locking scheme of a database. Instead of inserting data into the database, you get a lock from Redis. Because Redis operates in memory, it performs better than a distributed database based locking scheme.

Project analysis

The above redis-based solution has a performance advantage, so let’s analyze the way this command is used. In fact, similar to the previous database solution, Redis also has a deadlock problem. If inventory service 1 dies after acquiring the lock, inventory service 2 will not be able to acquire the lock. So we need to optimize it. So the essence of the problem is how to release the lock, so we need to set the lock with the expiration time, so that even if the inventory service 1 hangs and cannot release the lock actively, the lock will be invalid after the expiration time, and inventory service 2 can still obtain the lock, and will not cause the deadlock problem.

It should also be noted that when we set the lock, we also need to have the business properties of our own service, otherwise it can be confusing. Why do you say that? For chestnuts, inventory service in after add the lock start deducting the inventory tasks, after deducting the inventory finish, service, need to delete the original lock resources, wait to be Redis deleted after expiration, the inventory service can continue to apply for 2 lock, if the inventory service 1 restored, it does not know lock resources have been released, If the lock on inventory service 2 is removed immediately after waking up, two problems will occur:

1. After the inventory service performs the inventory deduction, when it comes back to release the lock resource, it finds that the lock is actually gone;

2. When inventory service 1 resumes, the lock is still there, and the lock is deleted immediately to complete the work that was not completed before it was suspended. However, this lock is actually added by inventory service 2. If inventory service 3 also tries to lock and finds that the lock can be successfully added, it will operate on the inventory as inventory service 2 does, then there will be thread safety problems.

According to the above analysis, the root cause of this problem is that there is no specific distinction between which service adds the lock. Therefore, when executing the command, we need to set the attribute associated with the service instance to value, so that the owner of the current lock can be checked during lock acquisition. If it is not the service instance itself, the deletion operation cannot be performed.

Is this the perfect solution to the problem? In fact, there are some problems, some students will say, why so many problems? In fact, the realization of this scheme is to gradually find a relatively perfect scheme in a variety of imperfect schemes.

The process of obtaining the lock mentioned above to determine if the lock was added by your own service instance and then deleting the lock is not actually atomic. So there is still a concurrency security issue, which can be addressed by lua scripts that implement the logic in Lua scripts rather than in the client.

But in fact, there are still some problems that have not been solved, for example, we will set the expiration time when locking, but how long should the expiration time be set? If the setting is too short, the lock will become invalid if the network times out or the service is not finished. If the setting is long, other service nodes wait longer to acquire locks, which degrades service performance.

Implementation based on Redisson

Redisson is actually a client that encapsulates Redis operations, which encapsulates common Redis operations. For example, the steps of setting locks and deleting locks of Redis are encapsulated. In the operation of setting the lock, the automatic lock renewal mechanism is introduced. When the SDK detects that services are not completed, the lock is about to expire, and the lock is renewed. In this way, the expiration time can be dynamically adjusted to avoid the problem that locks are released when services are not completed.

It also encapsulates the logic of deleting locks after making business decisions, so that we can use Redis with Redisson just like we did with JDK.

RedLock

In order to solve the single point problem of Redis as a distributed lock, the author of Redis also proposed a solution of Redlock, which relies on multiple Master nodes of Redis. The official recommendation is to use five Master nodes, which are independent of each other. The general interaction steps are as follows:

1. Obtain the system time of the current node;

2. The client tries to sequentially send lock requests to all Redis instances (at least 5 instances in the Redis cluster are officially recommended), using the same key and random value in the lock setting process, and the timeout period of the request should be much shorter than the effective time of the lock. The purpose of this is to prevent the lock request from being blocked if the node is not available. If the instance does not respond, it can be quickly skipped and the lock request continues on the next node.

3, suppose Redis cluster of size 5, so if the client in most instances (more than three instances) won the lock, and at the same time, minus the current time is calculated in step 1 time, this event is sent if less than the effective time of the lock, lock is successful, so at this time can think can perform subsequent operation of the business;

4. If the condition of Step 3 is not met, it indicates that the lock fails to be added, and the client needs to initiate lock release requests to all Redis nodes.

Project analysis

Why does Redlock lock multiple instances in a cluster? The actual purpose is to achieve high fault tolerance of distributed locks through the redundancy of locks. Imagine if there was only one Instance of Redis, once it died, the client could not lock or the lock information would be lost, affecting business functions. The availability of distributed locks is improved by redundancy of lock information in multiple instances of the cluster so that lock information remains in other nodes even if Redis fails.

So why count several times? Because we lock, each node set the timeout, if the lock time is too long, the entire process of accumulative time over the effective time lock, the lock is completed will oh lock failure case, so we need to make sure that the lock event as short as possible, this is also why the lock request is timeout, Skip to the next node immediately if a timeout occurs, avoiding a single node taking too long.

Although Redlock seems to be a perfect distributed solution, in fact it is a heavy solution, requiring the maintenance of a Redis cluster. In addition, the process depends on system time, but if the time jump occurs, it will have a very large impact on the entire distributed lock.