Continue to produce original articles. This is why Technology’s 16th original article

This paper is a detailed analysis of the minimum active number algorithm of Dubbo load balancing strategy. The source code shown in the article, without special annotations are version 2.6.0.

Why not use the latest version 2.7.4.1 as of now? There are two bugs in 2.6.0. Speaking from the bug, the impression is even more profound.

Finally, the 2.6.0/2.6.5/2.7.4.1 version will be compared, through contrast learning, deepen the impression.

This article directory

Section 1: Demo Preparation.

This section is mainly for the convenience of demonstration, set up a Demo service. In Demo, three servers are started. The load balancing policies are set to the minimum active number and the weights are different.

Q2: Where is the break point?

This section is mainly to share my way of looking at the source code. And how to set breakpoints when we look at the source code, how to avoid “wandering around” in the source code.

Section three: Simulation environment.

This section is mainly based on the transformation of Demo, simulation of the real environment. Problems are found in this process, which leads to the next section.

Section 4: Why is active 0?

This section describes the role that the active field in the RpcStatus class plays in the minimum active number algorithm and when it changes. Let the reader understand why you need to configure the ActiveLimitFilter interceptor on the Customer side.

Section 5: analyze the source code

In this section, the implementation class of the minimum active number algorithm is interpreted line by line, with comments added on almost every line of code. Belong to the key part of the full text.

Section 6: Where are the bugs?

After reading the source code line by line, two bugs of the 2.6.0 minimum active number algorithm are introduced. And through the differences and similarities of 2.6.0/2.6.5/2.7.4.1 versions of the cross comparison, deepen the reader’s impression.

Section 7: Serendipity

When I read the official document, I found a small clerical error, which I modified and merged. It introduces the process of contributing code to open source projects.

PS: The first three sections are mainly to share my view of the source code ideas and skills, if you are not interested in the fourth section can directly start to read. This paper focuses on sections 4 to 6.

P.S. Reading this article requires some understanding of Dubbo.

A. Prepare the Demo

My habit of looking at source code is to make a Demo to put the debugging environment together. Then go to the Debug with questions, not letting go of any questions that flash in my mind during this process.

This article shares leastactive Veload Balance, one of Dubbo’s load balancing strategies. So I set up a Dubbo project and started three providers for consumer calls.

Leastactive is configured for the loadBalance of the three providers. The weights are the default weight, 200, and 300.

** What is the default weight? ** When you look at the source code, the source code will tell you.

Three different service providers will return to the caller what weights they are.

Start three instances. (Note: There is only one provider. XML and DemoServiceImpl. You can manually change the port and weight each time you start the service.)

Check whether the service provider is normal on ZooKeeper:

You can see that the three service providers have ports 20880, 20881, and 20882 respectively. (The last 5 numbers in each red box are port numbers).

Finally, we look at service consumers. The consumer is simple. Configure consumer.xml

Simply call the interface and print the return value.

Two. Where is the break point?

I believe that many friends also want to see the source code, but do not know where to start. In a state of “wandering around” in the source code, a circle wandering down, the harvest is not big.

In this section I want to share how I looked at the source code. First I will take the question to find the answer inside the source code, that is, targeted look at the source code.

If it’s a framework class like this, as mentioned above, I’ll build a simple Demo project and Debug will follow. If you need to set a breakpoint for Debug, how to set a breakpoint?

The first breakpoint, of course, is where the method is called. For example, in this article, the first breakpoint is here:

What’s next?

You can of course follow through from the first break point. But in the process, did you find that? Most of the time you are led by the source code. You go to the source code with only one problem, maybe you Debug for ten minutes and still haven’t found the key code. Or maybe you Debug for ten minutes and the problem goes from one to an infinite number.

So how do we avoid being dragged around by the source code? We have to find a breakthrough, remember I was very happy, in the process of using Mybatis I stepped on a pit. The reverse screening method mentioned in this article? In this article, I’ll show you the method again.

Before we look at the source code, we have to analyze it calmly. The goal is to be very clear, is to find the concrete implementation class of Dubbo’s minimum active number algorithm and what the concrete logic of the implementation class is. According to our provider.xml:

Obviously, we know that loadBalance is the keyword. So we take the loadBalance global search and we can see the loadBalance under the Dubbo package.

This is a SPI interface com. Alibaba. Dubbo. RPC. Cluster. LoadBalance:

Its implementation class is:

com.alibaba.dubbo.rpc.cluster.loadbalance.AbstractLoadBalance

AbstractLoadBalance is an abstract class with an abstract method, doSelect. One of the implementation classes of this abstract method is the source code for the least active load balancing that we will analyze.

Meanwhile, let’s go over here. We know that LoadBalance is an SPI interface, which means we can extend our load balancing strategy. The abstract method doSelect has four implementation classes. The four implementation classes are the official load balancing policies provided by Dubbo. They are:

ConsistentHashLoadBalance consistency hash algorithm

Leastactive Veload Balance minimum active number algorithm

RandomLoadBalance Weighted random algorithm

RoundRobinLoadBalance Weighted polling algorithm

Now that we’ve found the LeastActiveLoadBalance class, it’s clear where our second breakpoint should be.

So far, two breakpoints are enough to support our analysis.

Some of you might be wondering, well, how did Dubbo figure out that we want the least active algorithm, and not some other algorithm? How do the other algorithms work? What is the chain of calls directly from the first breakpoint to the second breakpoint?

** These are all documented and ignored until the least active number algorithm is thoroughly understood. ** must have a clear goal, bring a problem in, first bring the problem solved. Then you can tackle any other problems you encounter along the way. ** In this interlocking problem-solving process, you will slowly grasp the essence of the source code. ** This is my personal point of view of the source code. For your reference.

Three. Simulation environment

It’s called the minimum active number strategy. So we have to get three existing consumers to make some calls. So we have to reinvent service providers and consumers.

The transformation of service provider side is as follows:

PS: The server whose weight is 300 is used as an example. The other two servers have the same transformation points.

(PS: I should be <20 in the for loop)

Twenty-one requests are sent: the first 20 are sent to the server for holding (because the server has sleep), and the last request is the one we need to Debug to trace.

Run it, stop the program at the breakpoint, and look at the console output:

The server with a weight of 300 received a total of nine requests

The server with the weight of 200 received a total of six requests

The server with the default weight received a total of five requests

We also have a request in Debug. Go directly to the location of our second breakpoint and Debug to the line shown in the image below (you can click to see a larger version):

As the graph above shows: weight=100 answers a question, and active=0 poses a question.

What question does weight=100 answer?

What is the default weight? Is 100.

Our active counts on the server side should look something like this

For a server with a weight of 300, active=9

For a server whose weight is 200, active=6

Default weight (100) for server, active=5

But why is active equal to 0 here? That’s a problem.

If you continue to Debug, you will find that the active of each server is 0. So by comparison no invoker has a minimum active. So the program goes to the logic of selecting invoker by weight.

Why is active 0?

An active value of 0 indicates that the active does not change during the dubbo call. So why is active 0? It’s just saying when does active change?

To answer this question, we need to know where active is defined, because where it is defined, there must be methods to modify it.

The figure below shows that active is a member variable of type AtomicInteger defined in the RpcStatus class.

In the RpcStatus class, there are three places () that call the active value method, one increment, one decrement, and one fetch:

Obviously, what we need to look at is the first one, where the increase is.

So we find the beginCount(URL,String) method, which only has two Filter calls. ActiveLimitFilter, by definition, that’s what we’re looking for.

Com. Alibaba. Dubbo. RPC. Filter. ActiveLimitFilter specific as follows:

Seeing this, we know how to answer the question: ** why is active 0? The ActiveLimitFilter is not configured on the client. ** Therefore, the ActiveLimitFilter does not take effect, resulting in no change in active.

How do you make it work? It’s on the horizon.

Ok, let’s try it again:

After adding Filter, we can see through Debug that the active number of the corresponding weight is consistent with our expectation.

The active number with a weight of 300 is 6

The active number with a weight of 200 is 11

The default weight (100) is 3 active

According to the active number, we can analyze it. Finally, when we Debug the request, we will select the invoker with the default weight to execute it, because it is the invoker with the smallest active number. As follows:

Although we have not yet started the source code analysis, just to comb the process clearly. However, the Demo is fully built and it is known that the minimum active load balancing algorithm must be used with the Active field in the RpcStatus class to work, otherwise it is a weight-based algorithm.

We were even more impressed with the results of our own experiments than we would have been otherwise told to configure ActiveLimitFilter.

Let’s take a closer look at the number of active services with the Addition of ActiveLimitFilter:

The active number with a weight of 300 is 6

The active number with a weight of 200 is 11

The default weight (100) is 3 active

Don’t you think it’s odd that the 200 weight is the highest?

The business meaning is as follows: We have three servers with different performance. Server A has the best performance, so the weight is 300; server B has medium performance, so the weight is 200; and server C has the worst performance, so the weight is 100.

When we choose the minimum active load balancing algorithm, we expect the best performing A server to handle more requests, while the real situation is that the medium performing B server will handle more requests. It goes against the grain.

If you say 20 requests is too little data, it’s probably a coincidence and not enough to tell the story. Which means I haven’t fooled you yet. We can’t program based on coincidence.

So to verify that there was a problem with this place, I expanded the request to 10,000.

Also, remember to expand the provider side of the Dubbo thread pool:

Since each server runs the same code, we expect the highest weight to take on more requests. But the final result looks like this:

The servers split the requests evenly. This is one of the bugs I mentioned at the beginning of this article about the minimum active load balancing algorithm in Dubbo 2.6.0.

Next, we take this problem, to analyze the source code.

Five. Analyze the source code

Com. Alibaba. Dubbo. RPC. Cluster. Loadbalance. LeastActiveLoadBalance source code is as follows, I read line by line. You can click to view the big picture and read it carefully. It is very cool:

The part in red box below is a logic for selecting invoker based on weights:

Let me draw a picture for you:

Please carefully analyze the illustration given in the diagram. At the same time, the figure above is also drawn according to the scale. It can be seen intuitively that for a certain request, the server with a larger range (weight) is more likely to undertake the request. Therefore, when there are enough requests, the number of requests that each server undertakes should be the interval, that is, the ratio of weights.

Line 81 contains code that calls the getWeight method, in the abstract class AbstractLoadBalance, and also needs important reading.

Com. Alibaba. Dubbo. RPC. Cluster. Loadbalance. AbstractLoadBalance source code is as follows, I also made a lot of note:

The AbstractLoadBalance class mentions a concept of preheating. The official website describes the feature as follows:

The weight calculation process is mainly used to ensure that when the running time of the service is less than the service warm-up time, the service can be reduced to avoid the high load state at the beginning of the service startup. Service warming up is an optimization tool, as is JVM warming up. The main purpose is to allow the service to run at “low power” for a period of time after startup, slowly increasing its efficiency to optimum state.

The calculated weight =(uptime/ WarmUp)*weight indicates that the calculated weight is closer to weight as the service startup time increases. In a practical scenario, as the service startup time increases, the traffic borne by the service slowly increases without a steep rise. So this is an optimization tool. The Dubbo interface also supports delayed exposure.

After a careful look at the above source code analysis diagram, with the official website summary plus my soul painting, I believe you can have a more in-depth understanding of the minimum active load balancing algorithm:

1. Walk through the invokers list looking for the Invoker with the smallest number of active invokers

2. If there are multiple Invokers with the same minimum active number, the subscripts of these Invokers in the Invokers set are recorded, and their weights are accumulated to compare whether their weight values are equal

3. If only one Invoker has the minimum number of active invokers, return the Invoker directly

4. If more than one Invoker has a minimum number of invokers active and they are not equally weighted, the same is done with RandomLoadBalance

5. If there are multiple Invokers with the minimum number of invokers, but they all have equal weight, then return one at random

So I think the full name of minimum active number load balancing should be called: minimum active number with minimum active number, no minimum active number is selected according to the weight, the weight is the same as the random return of the load balancing algorithm.

Where are the bugs

Dubbo2.6.0 minimum active number algorithm Bug 1

The problem lies in two lines of code labeled ① and ② :

The code labeled ① fetches the url weights that have not been de-weighted by the getWeight method. This value will be added to the totalWeight.

The code labeled ② takes the weight value processed by the getWeight method.

The difference in values causes a problem. On the left side of the code labeled ②, offsetWeight is a random number in the range [0, totalWeight), and on the right side is the weight reduced by the getWeight method. Therefore, after leastCount subtraction, offsetWeight will most likely still be greater than 0 before the service startup time reaches the hot start setting (10 minutes by default), so it will not enter the code ③, but directly enter the code ④, and become a random call policy. It doesn’t match the design, so it’s a bug.

The situation described in the previous section is caused by this Bug.

The issues address and pull request corresponding to this Bug are divided into:

Github.com/apache/dubb…

Github.com/apache/dubb…

So how was it repaired? Let’s directly compare the Dubbo 2.7.4.1(currently the latest version) code:

You can see that the method of getting weight has changed: getting it directly from the URL has been changed to getting it through the getWeight method. The obtained variable name has also been changed: afterWarmup has been changed from weight to more specific.

Another change is that the method of obtaining random values has changed from Randmo to ThreadLoaclRandom, which has improved performance. This change will not unfold, interested friends can go to understand.

Dubbo2.6.0 minimum active number algorithm has Bug 2

I have not encountered this Bug, but I have read its description in the official documentation (version 2.6.4 in the official documentation), quoting as follows:

The official website states that this issue will be fixed in version 2.6.5. I compared 2.6.0/2.6.5/2.7.4.1 and found that each version was slightly different. As follows:

The three codes marked ① in the figure:

The 2.6.0 version is buggy code for reasons mentioned above.

2.6.5 fixes this problem by adding one to the random number, so the value range is changed from [0,totalWeight] to [0,totalWeight].

The 2.7.4.1 version still has the range [0,totalWeight], but the fix is in the code labeled ②. 2.6.0/2.6.5 all marked with “if(offsetWeight<=0)”, 2.7.4.1 changed to “if(offsetWeight<0)”.

You taste a product, is not the effect is the same, but more elegant.

My friends, the devil, it’s all in the details!

Seven. Serendipity

When you look at the documentation on load balancing on the official website. I found a clerical error. So I modified it and merged it.

As you can see, the change is also a very small one. But I also do my part for the Dubbo community. I’m a Committer of Dubbo documents, “Dubbo Committer “for short.

This section is a brief introduction to the process of proposing PR for open source projects.

First, fork the project into its own repository. Then execute the following command to pull the project and set the source:

Git clone github.com/thisiswangh…

cd dubbo-website

Git remote add upstream github.com/apache/dubb…

git remote set-url –push upstream no_push

Create a local branch:

git checkout -b xxxx

Submit the code after development:

git fetch upstream

git checkout master

git merge upstream/master git checkout -b xxxx git rebase master git push origin xxxx:xxxx

Then create a pull request on Git and wait for notification.

One last word

Dubbo2.6.0/2.7.0/2.7.3 version of the source code, analysis of Dubbo2.7 asynchronous transformation details, you can look at oh.

If you find something wrong, please leave a message and point it out to me so that I can modify it.

Thank you for reading, my subscription number is full of original, very welcome and thank you for your attention.

The above.