First of all, the principle of distributed locking is basically the same as that of locking we usually talk about. The purpose is to ensure that when multiple threads are concurrent, only one thread can operate the business or method or variable at the same time.

In a process, that is, a JVM or an application, it is easy to handle control. The JDK java.util package provides methods for locking, such as synchronized or Lock.

However, if we only deploy one server, the concurrency is very poor, if tens of thousands of requests at the same time, it is very likely to cause the server overload, and crash.

Consider the business scenes such as Alipay red envelope at 10:00 PM on November 11 and 30. Naturally, multiple servers are needed to process these businesses simultaneously, so there may be hundreds of servers processing these services simultaneously.

But let’s think about it, if there are 100 servers to handle the bonus package business, now suppose there are 100 million red packets, 10 million individual cents, the amount is random, then the business scenario must ensure that the sum of the red packets of the 10 million individuals is equal to 100 million.

If the processing is not good, everyone gets 1 million, then Ma Yun’s father estimates that on the first day of the New Year, he will have to declare bankruptcy

1. What happens to regular locks?

First of all, let’s talk about why we want to set up a cluster. A simple understanding is that the demand (the number of concurrent requests) increases, and the processing capacity of one worker is limited, so we should recruit more workers to deal with it.

Assuming 10 million requests assigned to an average of 100 servers, each server receives the request of the 10 w (the 10 w a request is not in the same second, may be in 1, 2 hours, can lenovo under 30 in the evening we open a red envelope, wait until 10.20, some people immediately opened, and some people is to think of it until 12 o ‘clock ~)

That’s less than 1,000 requests per second on average, which is bearable for a mediocre server.

After the first request comes in, do I have to give the first person a portion of the 100 million, the amount is random, suppose the first person gets 100, do I have to subtract 100 from the 100 million, leaving 99999900

The second user divides again, the amount is random, this time divides 200 dollars, then needs to subtract 200 dollars from the remaining 99999900 dollars, left 99999700 dollars.

Wait until the 10w user comes, look at the 1000W, then 1000W is all his.

So you have 100 million per server, so you have 100 million users per server, and you end up with 100 servers and you end up with 10 billion.

If so, though Jack Ma’s father won’t go bankrupt (he’s worth 230 billion yuan at last count), then the development team and product manager of the bonus package can be GG

The simplified structure diagram is as follows:

2. How to deal with distributed locks?

So in order to solve this problem, let’s have 10 million users share 100 million instead of 10 billion, that’s where distributed locks come in.

Distributed locks can treat the entire cluster as if it were an application, so the lock needs to be independent of each service, not within it.

Let’s say the first server receives a request from user 1, and instead of just looking at its own application to figure out how much money it has left, it needs to go to someone outside of the server who is responsible for managing the 100 million red packets and say, hey, I want 100 dollars, give me 100.

Management of red packets of sister (service) a look, there are 100 million, that good, give you 100, and then left 99999900.

After the second request comes, it is obtained by server 2, continue to ask, the sister who manages the red envelope, I want to divide 10 pieces, the sister who manages the red envelope first checks there are 99999900, then she says: ok, give you 10 pieces. That leaves $9,999,890

Wait until the 1000w request arrival, server 100 to get the request, continue to ask, the red envelope management sister, you want 100, the sister rolled her eyes, said to you, only 1 piece, love to do not, then at this time can only give you 1 piece (1 piece is also money, buy a spicy stick or can).

These requests in the order of Numbers 1, 2 don’t represent the execution of the formal scenarios, it should be 100 servers each server holds a request to visit the sister (service) is responsible for managing the red envelope, that in the tube’s sister there will receive 100 requests at the same time, this time will need to be in charge of the red sister there it is ok to add a lock (a ball), You 100 servers who get the lock (grab hydrangea), who come in and I talk, I give you points, others wait to go

After the processing of the distributed lock above, Ma’s father was finally relieved and decided to add a chicken leg to each red envelope team.

The simplified structure diagram is as follows:

3. What are the implementation of distributed lock?

When it comes to the realization of distributed lock, there are many, there are database way, redis distributed lock, There are ZooKeeper distributed lock and so on

If we use Redis as a distributed lock, then the girl (service) in the picture above can be replaced by Redis, please imagine.

3.1. Why can Redis achieve distributed locking?

First, Redis is single-threaded, which means that the network request module uses a single thread (so there is no need to worry about concurrency safety), that is, one thread handles all network requests, while other modules still use multiple threads.

In practice, the process looks something like this:

Server 1 will set a key in redis via the “setnx key value” operation. What is important is to have a key, which is a tag, and the key can be called whatever you like. As long as all servers have the same key.

Suppose we set one, as shown below

So we can see that it returns a 1, and that represents success.

If another request is made to set the same key, it looks like this:

This will return 0, which means failure.

So we can use this operation to determine whether we can get the lock, or we can visit the “red envelope girl”, if it returns 1, then I start to execute the following logic, if it returns 0, it means that it is already occupied, I have to continue to wait.

After server 1 gets the lock, it performs business processing and releases the lock, as shown in the following figure:

Delete success returns 1, then other servers can repeat the above steps to set the key, to achieve the purpose of lock.

Of course, the above operation is directly carried out in the Redis client, through the program call, certainly can’t write so, such as Java needs to call through Jedis, but the whole processing logic is basically the same

In this way, we seem to have solved the problem of distributed locks, but what are the problems?

Yes, there are still problems, there may be deadlock issues, such as server 1 after setting up, after obtaining the lock, suddenly crashed.

The key will always be in Redis. Every time other servers check it, they will return 0. They will think that someone is using the lock and I need to wait.

To solve this deadlock problem, we need to set the expiration date for the key.

There are two ways to set this parameter

1. The first option is to directly set the expire key timeout (expire key timeout) of the key after the key is set. Set a timeout (unit: second) for the key.

In this way, redis controls the validity period of lock holding. If the time is up and you haven’t deleted the key for me, then Redis will delete it for you and other servers can continue to go to setnx to obtain the lock.

2, the second way is to give the right to delete the key to another server, then need to use the value.

For example, server 1 sets the value (timeout) as the current time +1 second. At this time, server 2 finds that the time has exceeded the current system time through GET, which indicates that server 1 has not released the lock, and server 1 May have a problem.

Server 2 starts deleting the key and continues with the setnx operation.

But there’s a problem with this, that is, not only will server 2 find that server 1 has timed out, but server 3 will also find that if it happens, server 2, setnx is done, server 3 is deleted.

That means that both server 2 and server 3 are locked, and that’s a big problem. What to do at this point?

The “GETSET key Value” command is used. This command gets the current key value and sets the new value.

If server 2 finds that the key has expired, it invokes getSet and uses the acquired time to determine whether the key has expired. If the acquired time is still expired, the lock has been obtained.

If not, server 3 May also discover that the lock expired and perform the getSet operation before server 2 resets the expiration time.

Server 2 then has to abandon subsequent operations and continue to wait for server 3 to release the lock or to monitor whether the key has expired.

This actually has a small problem is that the server 3 changed its validity to get after lock, server 2, and also modify the validity, but failed to get the lock, but the validity period of time has been based on server 3 have to add some, but the effect is really very small, almost negligible.

3.2. Why can ZooKeeper implement distributed locking?

ZooKeeper is a distributed, open source distributed application coordination service. It is an open source implementation of Google’s Chubby and an important component of Hadoop and Hbase.

ZooKeeper is just like our computer file system. We can create folder A in disk D and continue to create folder A1 and A2 in folder A.

What are the features of our file system? That is, the file name in the same directory cannot be the same as that in ZooKeeper.

In ZooKeeper, all nodes, or folders, are called ZNodes, and this Znode can store data.

We can create a node by “create/ZKJJJ nice”. This command means to create a ZKJJJ node in the following directory with the value of nice. And again, the values here, as I said in redis, don’t make any sense, so you can just give it to me.

ZooKeeper can create four types of nodes:

1. Persistence node

2. Persistence of sequential nodes

3. Temporary nodes

4. Temporary order nodes

The difference between a persistent node and a temporary node is that as long as you create a persistent node, the ZooKeeper server will record the node regardless of whether your ZooKeeper client is disconnected.

Temporary nodes are the opposite. Once your ZooKeeper client disconnects, the ZooKeeper server stops saving the node.

When creating a node, ZooKeeper automatically numbers the node with numbers like 0000001 or 0000002.

Finally, ZooKeeper has a listening mechanism. The client registers to listen to the directory nodes it cares about. When the directory node changes (data changes, deletion, addition and deletion of subdirectory nodes), ZooKeeper notifies the client.

Let’s continue with our bonus pack scenario and describe how to lock in ZooKeeper.

If server 1 creates a node/ZKJJJ and succeeds, then server 1 gets the lock, and server 2 tries to create the same lock, then it fails, and then it can only listen for that node.

After server 1 has finished processing and deleted the node, he will be notified and then he will create the same node, get the lock to process the service, and then delete the node, and the next 100 servers will be similar

Note that the 100 servers do not create nodes one by one, but concurrently. When server 1 is successfully created, the remaining 99 will register to listen for the node, waiting for notification, and so on.

But did you notice that there’s still a problem, there’s still a deadlock, right?

When server 1 creates a node and then hangs without deleting it, the other 99 servers are waiting for notifications and that’s it…

In this case, temporary nodes are needed. As we mentioned earlier, the characteristic of temporary nodes is that once the client is disconnected, it will be lost. That is, when server 1 creates the node, it will hang.

The node is automatically removed so that subsequent servers can continue to create nodes and acquire locks.

But one thing we might want to note is the stampede effect: A very simple example is when you throw a piece of food in the middle of a flock of pigeons.

A temporary sequential node is used to handle the situation where the other 99 servers are notified when the node on server 1 changes, but only one server is created successfully

Where once all 99 servers listened on a node, now each server listens on a node in front of it.

Let’s say 100 servers send requests at the same time, 100 temporary sequential nodes/ZKJJJ /000000001, / ZKJJJ /000000002, up to/ZKJJJ /000000100 will be created under/ZKJJJ.

When 001 node is finished, after the node is deleted, 002 receives the notification to obtain the lock, and starts to execute. When the execution is finished, delete the node, and notice 003~ and so on.