The background,

Some businesses in the Internet field of Vivo chose the TARS micro-service framework based on many comprehensive factors during the practice of micro-service.

The official description is: TARS is a multilingual, embedded service governance framework that works well with DevOps. We have done a lot of things on the open source basis to adapt the internal system, such as the CICD build release system, the single sign-on system, but this is not the focus of our presentation. Here I would like to focus on the dynamic load balancing algorithm we implemented in addition to the existing load balancing algorithm.

Second, what is load balancing

Wikipedia are defined as follows: Load balancing (Load balancing) is a kind of computer technology, used in more than one computer (computer cluster), network connections, CPU Load, disk drive, or other resources distribution, to achieve the optimization of resource use, maximize throughput, minimizing response time, and to avoid overload. Using multiple server components with load balancing instead of a single component improves reliability through redundancy. Load balancing services are usually performed by specialized software and hardware. The main function is to reasonably allocate a large number of operations to multiple operating units for execution, which is used to solve the problems of high concurrency and high availability in the Internet architecture.

This is easy to understand, and is essentially a way to solve the problem of traffic allocation when a distributed service is dealing with a large number of concurrent requests.

3. Which load balancing algorithms are supported by TARS

TARS supports three kinds of load balancing algorithms: Polling based load balancing algorithm, Polling based weight allocation load balancing algorithm, and Consistent Hash load balancing algorithm. The entry point to this function is SelectAdapterProxy, and the code is in the TARSCPP file. If you are interested, you can start with this function.

3.1 Polling based load balancing algorithm

The load-balancing algorithm based on polling is simple to implement. The principle is to form a call list of all available IPs providing the service. When a request arrives, the request is assigned to each machine in the request list one by one in chronological order. If the request is assigned to the last node in the last list, the loop starts again from the first node in the list. This will achieve the purpose of flow dispersion, as far as possible to balance the load of each machine, improve the use efficiency of the machine. This algorithm can basically satisfy a large number of distributed scenarios, which is also the default load balancing algorithm of TARS.

But what if each node has different processing power? Although the traffic is evenly distributed, there is still the possibility of overloads on these nodes due to the weak processing capacity in the middle. So we have the following load balancing algorithm.

3.2 Polling load balancing algorithm based on weight allocation

As the name implies, weight assignment is to assign a fixed weight to each node, which represents the probability that each node can be assigned to traffic. For example, if there are 5 nodes, the weights are 4,1,1,1,3. If 100 requests come in, the corresponding traffic is also assigned to 40,10,10,10,30. In this way, the client requests are assigned according to the weight of the configuration. One detail to note here is that it must be smooth when implementing weighted polling. So if you have 10 requests, you can’t have the first four of them landing on the first node.

The industry already has a lot of smooth weighted polling algorithms, interested readers can do their own search.

3.3 Consistent Hash

In many cases, in some business scenarios with cache, we have the requirement of equal distribution of traffic, and also the requirement that the same client request should fall on the same node as far as possible.

Consider a scenario where a business has 10 million users and each user has an identity ID and a set of user information. There is a one-to-one relationship between user ID and user information, and this mapping exists in DB, and all other modules need to query this mapping and get some necessary user field information from it. In the scenario of large concurrency, direct request DB system is certainly not resist, so we naturally thought of using the cache solution to solve. Does each node need to store the full amount of user information? Yes, but it’s not the best option. What if you go from 10 million users to 100 million? It became clear that as the size of the user base grew, the solution became too big to handle, and soon there was a bottleneck or even an unmet need. The consistent hash algorithm is needed to solve this problem. Consistent hash algorithms provide a guarantee that requests with the same input will land on the same node as far as possible.

Why as much as possible? Consistent Hash algorithms are able to minimize cache reconstruction in the event of such a change, as nodes may fail to go offline or may be added due to capacity expansion. There are two hash algorithms used by TARS. One is to take the address offset to do XOR after calculating the MD5 value of key, and the other is Ketama Hash.

Why do we need dynamic load balancing?

Most of our current services are still running on virtual-machine dominated machines, so hybrid deployment (multiple services deployed on a single node) is a common phenomenon. In the case of mixed deployment, if a service code has a bug that consumes a large amount of CPU or memory, then inevitably all the services deployed with it will be affected.

Then if the above three load balancing algorithms are still used, there will be a problem, the affected machine will still allocate traffic according to the specified rules. One might wonder, wouldn’t a weight-based polling load balancing algorithm be able to configure the nodes in question to have a low weight and then allocate less traffic? Yes, it can, but it’s not always done quickly. What if it happens in the middle of the night? In addition, manual configuration is needed after the failure is removed, which increases the cost of operation and maintenance. Therefore, we need a dynamically adjusted load balancing algorithm to automatically adjust the allocation of traffic, so as to ensure the quality of service under such abnormal circumstances as far as possible.

It is not difficult to see from here that the core of the dynamic load balancing function only needs to dynamically adjust the weight of different nodes according to the load of the service. In fact, this is also a common practice in the industry, which is to obtain the status information of the server periodically and dynamically calculate the current weight of each server.

Fifth, dynamic load balancing strategy

Here, we also use the method based on various load factors to dynamically calculate the weight of available nodes and reuse the TARS static weight node selection algorithm after returning the weight. The load factors we chose were: average interface time of 5 minutes/interface timeout rate of 5 minutes/interface exception rate of 5 minutes /CPU load/memory utilization/network card load. The load factor supports dynamic scaling.

The overall functional diagram is as follows:

5.1 Overall interaction sequence diagram

When RPC is invoked, EndpointManager periodically gets a collection of available nodes. The node comes with weight information. When the business initiates the call, the corresponding node is selected according to the load balancing algorithm specified by the business party.

The RegistrServer regularly gets timeout rates and average elapsed time information from the DB/monitor. Get machine load class information, such as CPU/memory, from other platforms (such as CMDB). All computation threads are cached locally for asynchronous execution;

The EndpointManager executes the selection policy based on the weights obtained. The following figure shows the influence of node weight change on request traffic allocation:

5.2 Node update and load balancing strategy

All performance data of the node is updated every 60 seconds using thread timing.

Calculate the weight value and value range of all nodes and store them in memory cache;

After the caller gets the node weight information, he performs the current static weight load balancing algorithm to select the node.

Guaranteed strategy: if all nodes have the same weight or are abnormal, the default way to select nodes by polling;

5.3 Calculation method of load

Load calculation method: the weighting value and corresponding importance degree of each load factor (expressed as percentage) will be adjusted according to the specific importance degree. Finally, the total value will be calculated after multiplying the weights calculated by all load factors by the corresponding percentages. For example, the weight of time consumption is 10, the weight of timeout rate is 20, and the corresponding importance is 40% and 60% respectively, so the sum is 10 0.4 + 20 0.6 = 16. The calculation method for each load factor is as follows (currently we only use the average time and timeout rate as two load factors, which are also the most easily obtained data in the current system of TARS) :

1, according to the proportion of each machine in the total time allocation weight: weight = initial weight * (total time – the average time of a single machine)/total time (the disadvantage is that it is not completely according to the ratio of time allocation flow);

2. Timeout rate weight: Timeout rate weight = initial weight – initial weight of timeout rate 90%, which is converted to 90% because 100% timeout may also be caused by excessive flow, and small flow trial request is retained;

The corresponding code is implemented as follows:

void LoadBalanceThread::calculateWeight(LoadCache &loadCache) { for (auto &loadPair : loadCache) { ostringstream log; const auto ITEM_SIZE(static_cast<int>(loadPair.second.vtBalanceItem.size())); int aveTime(loadPair.second.aveTimeSum / ITEM_SIZE); log << "aveTime: " << aveTime << "|" << "vtBalanceItem size: " << ITEM_SIZE << "|"; for (auto &loadInfo : LoadPair. Second. VtBalanceItem) {/ / according to every machine in the total time of inverse proportion distribution of weight: Initial weight weight = * (time-consuming sum - single machine average time-consuming)/time-consuming combined TLOGDEBUG (" loadPair. Second. AveTimeSum: "< < loadPair. Second. AveTimeSum < < endl); int aveTimeWeight(loadPair.second.aveTimeSum ? (DEFAULT_WEIGHT * ITEM_SIZE * (loadPair.second.aveTimeSum - loadInfo.aveTime) / loadPair.second.aveTimeSum) : 0); aveTimeWeight = aveTimeWeight <= 0 ? MIN_WEIGHT : aveTimeWeight; // Timeout rate: Timeout weight = initial timeout weight - timeout rate * initial weight * 90%, which is calculated as 90% because it is 100% timeout. Int timeout weight (loadInfo. Succcount? (DEFAULT_WEIGHT - static_cast<int>(loadInfo.timeoutCount * TIMEOUT_WEIGHT_FACTOR / (loadInfo.succCount + loadInfo.timeoutCount))) : (loadInfo.timeoutCount ? MIN_WEIGHT : DEFAULT_WEIGHT)); LoadInfo. Weight = aveTimeWeight * getProportion(time_consuming_weight_Proportion) / WEIGHT_PERCENT_UNIT + timeoutRateWeight * getProportion(TIMEOUT_WEIGHT_PROPORTION) / WEIGHT_PERCENT_UNIT ; log << "aveTimeWeight: " << aveTimeWeight << ", " << "timeoutRateWeight: " << timeoutRateWeight << ", " << "loadInfo.weight: " << loadInfo.weight << "; "; } TLOGDEBUG(log.str() << "|" << endl); }}

The relevant code is implemented at RegistryServer, and the code file is as follows:

The core implementation is the LoadBalanceThread class, you are welcome to correct.

5.4 Usage

  1. Servant management can support dynamic load balancing by configuring the -w -v parameter. If not, it will not be enabled.

The diagram below:

  1. Note: All nodes need to be enabled to take effect, otherwise the RPC framework finds that different nodes use different load balancing algorithms and forces all nodes to be adjusted to the polling mode.

VI. Scenarios suitable for dynamic load balancing

If your service runs on a Docker container, you probably don’t need dynamic load balancing. Docker’s scheduling ability is directly used to automatically scale the service, or the granularity allocated by Docker is directly split down in the deployment, so that the service monopolizes Docker, and there will be no problem of mutual influence. It is recommended to enable this feature if the service is mixed deployed and there is a high probability that the service will be affected by other services, such as a service that will directly occupy the CPU.

Seven, the next step plan

The current implementation only considers the average time and timeout rate, which can reflect the service capacity to some extent, but it is not complete. Therefore, in the future, we will also consider adding metrics such as CPU usage to better reflect node load. Also, some strategies to adjust the weight based on the return code on the calling side.

Finally, we welcome you to discuss and communicate with us and contribute to the open source of TARS.

Author: Yang Minshan, Vivo Internet Server Team