The Eureka read-write lock idea, learning failed?

preface

I’m sorry that I haven’t updated the article for a long time. The most recent original article was in October last year. This account has been abandoned for a long time.

Readers often ask me in private letters: “Did you sell? The article is not updated. “

The main reason for not updating is that I’m too lazy and don’t know what to write. In the last year, I learned something piecemeal. Every time I prepared to write an article, I gave up halfway, and finally I gave up writing.

No more nonsense, or read the article, the content to share is some of my own thinking, and there is no standard answer, I hope everyone can have their own opinions when reading, there are questions can be the first time to contact me to discuss.

Follow me, learn it or lose it!

What to learn to waste?

This article just want to talk about EurekaServer on the use of read and write lock some small skills.

For our normal logical thinking, read locking is locking while reading, write locking is locking while writing, this seems to be no skill?

You don’t seem to be learning anything, do you? Read locks can only be added to read operations, and write locks can only be added to write operations.

Read on to see how Netflix’s programmers play the game.

Read/write Lock Review

Lock is ReentrantReadWriteLock JDK modestly said, speaking, reading and writing, and we already used in the usual work more, the lock is out of the gate, they are all realized in AbstractQueuedSynchronizer related logic

ReentrantLock splits the state variable “bitwise cut” in AQS into two parts. The high 16 bits represent the read lock status (number of read locks), and the low 16 bits represent the write lock status (number of write locks). As shown below:

You can also take a look at an article I wrote earlier explaining AQS in detail: I drew 35 diagrams just to get you into the AQS

There is no further explanation of the underlying implementation principle of the read-write lock, which is described in the above article. Here we can think of a read/write lock as the same lock as a ReentrantLock, but with the distinction of read/write operations.

To improve read and write performance, read and write operations are mutually exclusive, rather than mutually exclusive. For example, our business is more read and less write. In most cases, read and write locks can be accessed concurrently without affecting system performance by locking each time.

How does EurekaServer play read/write locks?

There’s a lot of stuff going on here, but hopefully you’ll be able to understand this read-write lock thing. Read/write locks are easy to use, and there are apis in the JDK to call. Some of the best frameworks are built using these underlying JDK apis, so let’s see how EurekaServer plays.

PS: For those interested in the underlying source code of SpringCloud, I wrote a set of source code interpretation blog: www.cnblogs.com/wang-meng/p… (Password: 222 Don’t tell anyone o(~ ▽ d)

Why does EurekaServer need to be locked?

We know that EurekaClient, as a registry, stores the EurekaClient registry information. In order to sense the existence of other registered instances, each EurekaClient will periodically go to the registry to pull the incremental registry information. However, this incremental pull is very subtle. Write locks must be added during incremental fetching to ensure the accuracy of the acquired data, which will be explained in detail later

Let’s start with a few common scenarios:

When service A starts, it sends A regist request to the registry, which writes service A to its own roster
Service B sends an offline request telling the registry that I am going offline, please request me from the registry, and the registry will erase service B from the roster
Service C also needs to periodically pull the latest data from the registry and synchronize the data to the local server so that the local server can discover other services based on the service name

EurekaServer read/write lock EurekaServer read/write lock EurekaServer read/write lock EurekaServer

public abstract class AbstractInstanceRegistry implements InstanceRegistry {
    private static final Logger logger = LoggerFactory.getLogger(AbstractInstanceRegistry.class);

    // Registry is a collection of information stored in the registry
    private final ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>> registry
            = new ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>>();
    
    // Store the most recently modified instance information
    private ConcurrentLinkedQueue<RecentlyChangedItem> recentlyChangedQueue = new ConcurrentLinkedQueue<RecentlyChangedItem>();
    
    // Read/write lock
    private final ReentrantReadWriteLock readWriteLock = new ReentrantReadWriteLock();
    private final Lock read = readWriteLock.readLock();
    private final Lock write = readWriteLock.writeLock();
}
Copy the code

There are three key points to note:

Registry: ConcurrentHashMap

>> Registry
,>

Recently modified instance information queue: ConcurrentLinkedQueue

recentlyChangedQueue

Read/write lock: ReentranteadWriteLock readWriteLock

EurekaServer Read/write lock application scenario?

Given the background, let’s look at how read/write locks are used here. Let’s take a look at some of the scenarios where read-write locks are used here:

The evICT operation also uses the cancel logic to bring the service offline, so there is no reference to EVICT in the call chain

readLock:

writeLock:

Do not restrict read locks to only read operations, and write locks to only write operations.

That’s exactly what Eureka does, read lock on write lock, write lock on read lock, reverse operation like a tiger

Here’s another diagram to completely summarize the read-write lock usage scenario:

Deep thinking

We need to understand how ‘EurekaClient’ gets registry information:

(You can also refer to my previous post on how to obtain the registry: www.cnblogs.com/wang-meng/p…)

EurekaClient obtains full registry information

This is how EurekaClient achieves the first full retrieval of the registry. After the registry is pulled from the registry, EurekaClient saves the registry information in the local list.

It is also important to mention the two-tier caching mechanism in EurekaServer. Every time we pull the registry from the registry, we pull the cache directly, using The Google-provided GuavaCahe

EurekaClient obtains delta registry

EurekaClient goes to the registry every 30 seconds to pull the incremental registry information, compares it with the local cache registry information, and overwrites the registry information in the cache after adding, deleting, changing, and checking the data. Here is an example of code that increments the registry information, fetching the changed instance information from the recentlyChangedQueue, and finally setting an appHashCode value:

Even delta registry data is fetched from the registry cache when EurekaClient registers /cancel… Update the data in the registry, and then store the changed instance information in a queue: RecentlyChangedQueue: this queue stores only nodes that have changed in the last three minutes, and finally clears the readWriteCacheMap cache in EurekaServer

There is an important point to note here: EurekaClient pulls the registry deltas with a hash value for the full registry, appHashCode, which can be viewed as a status grouping of all registered instances: hash=count+status

Why do I need this hash check operation? This is a validation that EurekaClient does to ensure that the incremental updated data is consistent with the registry data in the registry. We can imagine that EurekaClient adds, deletes, and checks incremental data after obtaining it. It is supposed that the final modified data should be consistent with the registry, but for some reason it is not consistent, so it is meaningless to do incremental data later!

Therefore, if the hash is inconsistent, it will immediately go to the registry to obtain the full data to overwrite the local dirty data. To obtain the hash value, the registry can no longer perform “write” operations, such as register/cancel, which change the number and state of instances in the registry, thus creating a mutually exclusive operation:

This is why both the registry and the recently updated instance queue are already secure and need read/write locks, because there needs to be a mutex operation.

Think about it again

Now that we’ve explained the interchangeable use of read/write locks in EurekaServer, let’s go back to the following questions:

Take the author’s point of viewEurekaServerWhy is the use of read-write locks designed this way?
Put yourself in the reader’s shoesEurekaServerHow does incremental retrieval of registry information perform?
The registryregistryItself isMapStructure memory access, why use it againThe cache?
whyrenewOperation without any read/write lock? This is clearly the renewal time for updating the registry

1. Thoughts on the design of read/write lock in EurekaServer

After reading the above operation, readers may have the same confusion as ME. Why does the author want to design this way?

First, let’s sort out the business scenario: this is a typical scenario of read more than write less (by default, EurekaClient pulls registry increments every 30 seconds) :

Registry **” read operation “: **

A global lock must be added to prevent new data from being written to update. The mutex lock must be added because the hash value of the registry must be obtained during read

Registry **” write operations “: **

Register /cancel/ EVict for the registry… Operations can be executed simultaneously, relying on the ConcurrentLinkedQueue/ConcurrentHashMap implementation of concurrent containers, such as updating recently updated queue or modify the registry of all operations are thread-safe

On the other hand, if the read/write locks of the above operations are interchanged, it is equivalent to adding another layer of write lock logic to the two concurrent containers, adding another layer of mutually exclusive performance loss, and the performance return will be worse

2,`EurekaServer`How does incremental retrieval of registry information perform?

We can take a look at EurekaClient to obtain the registry process operation:

Although we add write locks to each incremental pull registry, we use caching technology to reduce the frequency of use of write locks

Other write updates to recentlyChangedQueue or Registry are thread-safe and do not require read/write locks

3, Registry itself is a Map structure why need another layer of caching?

In fact, the answer is already in the above, if we do not use cache, then each incremental fetch operation will be registry or recentlyChangedQueue operation, each write lock, performance compared to direct read cache will be much lower, so we use cache to solve the problem of locking every time

Can we also think of how Spring, another commonly used framework, solves the problem of loop dependencies? The answer is also to use multi-level caching, there is a feeling of sudden enlightenment here ~

Let’s take a closer look at the ResponseCacheImpl code implementation:

In this scenario, we use expireAfterWrite as an example. When our cache expires, 1W clients will go to write lock logic, so the throughput of the registry will decrease a lot.

Wouldn’t it be better to use refreshAfterWrites here? Because refreshAfterWrite is a background asynchronous refresh, other threads access the old values and only one thread is refreshing. There is no need for multiple threads to refresh the cache of the same Key

Of course, this may be overthinking, and I haven’t actually tested this scenario. My guess is that in the case of a large number of requests, the logic inside the incrementally fetching the registration information and writing the lock will also be fast, since it will be an in-memory operation. Using expireAfterWrite saves a lot of memory. Maybe the author has some pros and cons in mind… (⊙ _ ⊙;) …

4, Why renew does not need to lock?

Renew does not add elements to the latest update queue and does not affect the pull of incremental update data

By default, renew will send a heartbeat operation to the registry every 30 seconds. After receiving the heartbeat request, the registry will take out the instance information from the registry and then update the last heartbeat time of the instance, which is used by the registry for fault elimination. If an instance does not send a heartbeat request within a specified period, it is considered faulty and removed from the registry

However, the renew operation has a Bug in the lastUpdateTimeBug of the instance, which I mentioned in the previous article.

Here is a code for the registry when it is aware of faults, and the author also says in the comment: “Renew () operation has a problem, add a duration, but we will not fix this problem, it is only the time that affects the fault perception, and my system is the final consistent, so I will not fix it” (PS: Every time I see this, I can’t help but make a comment, What he doesn’t know is that we’ve done a lot of work to improve failure awareness, which is probably why people on the Internet say Eureka code sucks.

Write in the last

Recently, I have been helping companies interview candidates, and I have asked SpringCloud questions, but often candidates have answered:

“These frameworks are out of date, we use the latest XXX framework”, “you ask these things I just need to know how to use I don’t need to know how “…

Such a lot of answers, I am usually a more like to get to the bottom of the people, firmly believe that all the problems in front of the source code are no secret, to learn things to know what it is also to know why. Great oaks grow from little acorns, and frameworks are just tools to help us do our jobs, and the implementation is still dependent on the lowest level of technology.

Borrow a word from my teacher: technology does not divide old and new, technology is just a carrier, through the analysis of their source code to teach you is the architecture design, thought principle, scheme mechanism, kernel mechanism, and analysis of the source code method, skills and ability.

PS: Special thanks and reference

The above is some thinking when I read the source code, write out the content may be wrong, there are wrong places also please tell me, I hope to improve and grow with you, welcome to add my wechat communication: W510782645

Refer to the following blog post and thank the original author for sharing the content:

Eureka source code parsing – application instance registration found (nine) years is the adorable read and write lock
What is a read/write lock? How are read/write locks optimized for microservices registries?