Dynamic load balancing practice based on open source Tars

The background,

Some businesses in vivo Internet field chose TARS microservice framework based on many comprehensive factors in the process of microservice practice.

TARS is a multilingual, built-in service governance framework that works well with Devops. We have done a lot of things to adapt internal systems on the basis of open source, such as building a publishing system with CICD and getting through the single sign-on system, but that is not the focus of this presentation. Here we would like to focus on the dynamic load balancing algorithm we implement in addition to the existing load balancing algorithm.

What is load balancing

Wikipedia are defined as follows: Load balancing (Load balancing) is a kind of computer technology, used in more than one computer (computer cluster), network connections, CPU Load, disk drive, or other resources distribution, to achieve the optimization of resource use, maximize throughput, minimizing response time, and to avoid overload. Using multiple server components with load balancing instead of a single component improves reliability through redundancy. Load balancing services are typically performed by proprietary software and hardware. The main function is to reasonably spread a large number of jobs across multiple units of operation to solve the problem of high concurrency and high availability in the Internet architecture.

Essentially, this is a way to solve the traffic allocation problem of distributed services dealing with a large number of concurrent requests.

What load balancing algorithms are supported by TARS

TARS supports three load balancing algorithms: polling, weight distribution-based polling, and consistent hash. The function entry is selectAdapterProxy, and the code is in the TarsCpp file. You can start with this function if you are interested.

3.1 Polling based load balancing algorithm

The polling based load balancing algorithm is simple to implement. The principle is to form a call list of all available IP addresses providing services. When a request arrives, the request is allocated to each machine in the request list in chronological order, and if the last node in the final list is allocated, the loop starts again from the first node in the list. In this way, the purpose of flow dispersion is achieved, and the load of each machine is balanced as far as possible to improve the use efficiency of the machine. This algorithm can basically meet a large number of distributed scenarios, which is the TARS default load balancing algorithm.

But what if each node had different processing power? Although the traffic is evenly divided, there are nodes with weak processing power in the middle, and there is still the possibility of overload on these nodes. So we have the following load balancing algorithm.

3.2 Polling load balancing algorithm based on weight allocation

Weight assignment, as the name implies, assigns a fixed weight to each node, which represents the probability that each node can be assigned traffic. For example, if there are 5 nodes, the weights are 4,1,1,1,3. If there are 100 requests, the corresponding traffic is allocated to 40,10,10,10,30. This implements the allocation of client requests according to the configured weight. One detail to note here is that when implementing a weighted poll it must be smooth. So if you have 10 requests, you can’t have the first 4 requests on the first node.

The industry already has a number of smooth weighted polling algorithms, interested readers can search for understanding.

3.3 Consistency Hash

Most of the time, in some business scenarios with caching, in addition to the need to evenly distribute traffic, we also need the same client requests to fall on the same node as far as possible.

Consider a scenario where a service has 10 million users, each of whom has an ID and a set of user information. There is a one-to-one mapping between the user ID and the user information. This mapping exists in the DB, and all other modules need to query this mapping and get some necessary user field information from it. In large concurrency scenarios, direct requests to THE DB system is certainly not able to resist, so we naturally think of using caching solutions. Does each node need to store the full amount of user information? Yes, but it’s not the best solution. What if you go from 10 million users to 100 million? It is clear that this solution will become stretched as the user base grows, and soon it will reach a bottleneck or even fail to meet demand. A consistent hash algorithm is needed to solve this problem. The consistency hash algorithm ensures that requests with the same input fall on the same node as possible.

Why as much as possible? The consistent hash algorithm is able to minimize cache reconstruction when nodes are offline due to failures or are added due to capacity expansion. TARS uses two hash algorithms: One is the MD5 value of the key, and the other is the ketama hash.

Iv. Why dynamic load balancing is needed?

Most of our services still run on virtual-driven machines, so mixed deployment (multiple services on one node) is a common phenomenon. In the case of mixed deployment, if a service code is buggy and consumes a lot of CPU or memory, then all the services deployed with it will be affected.

If the above three load balancing algorithms are still used, the problem is that the affected machines will still be allocated traffic according to the specified rules. One might wonder, wouldn’t a weighted polling load balancing algorithm be able to configure the nodes in question to have low weights and then allocate very little traffic? Yes, but this method is often not handled in time. What if it happens in the middle of the night? After the fault is rectified, you need to manually configure it again, which increases the o&M cost. Therefore, we need a dynamically adjusted load balancing algorithm to automatically adjust the distribution of traffic, as far as possible to ensure the quality of service in this abnormal situation.

From here, it is not difficult to see that the core of dynamic load balancing function only needs to dynamically adjust the weight of different nodes according to the load of services. This is also a common practice in the industry, which is to obtain server status information periodically and dynamically calculate the current weight of each server.

5. Dynamic load balancing policy

Here, we also use the method based on various load factors to dynamically calculate the weight of available nodes, and then reuse the TARS static weight node selection algorithm after returning the weight. The load factors are as follows: Average 5-minute interface time / 5-minute interface timeout rate / 5-minute interface exception rate /CPU load/memory usage/NIC load. The load factor supports dynamic scaling.

The overall function diagram is as follows:

5.1 Overall interaction timing diagram

When RPC is called, EndpointManager periodically obtains a collection of available nodes. Nodes carry weight information. When invoking a service, the service selects the corresponding node according to the load balancing algorithm specified by the service side.

The RegistrServer periodically obtains information such as timeout rate and average time from the DB/monitor. Get machine load class information, such as CPU/memory, from other platforms (such as CMDB). All computation process threads are asynchronously executed and cached locally;

The EndpointManager executes the selection policy based on the obtained weights. The following figure shows the impact of node weight change on request traffic allocation:

5.2 Node Update and Load Balancing Policies

All performance data of the node is updated every 60 seconds by thread.

Compute the weight values and value ranges of all nodes and store them in the memory cache.

After obtaining node weight information, the caller uses the current static weight load balancing algorithm to select nodes.

Bottom line policy: If all nodes are the same or abnormal, the default method of selecting nodes is polling.

5.3 Load Calculation

Calculation method of load: each load factor is given weighting value and corresponding importance (expressed as percentage), and the setting is adjusted according to the specific importance. Finally, the total value will be calculated by multiplying the weight value calculated by all load factors by the corresponding percentage. For example, if the weight of time consumption is 10, the weight of timeout rate is 20, and the corresponding importance degree is 40% and 60% respectively, the sum is 10 * 0.4 + 20 * 0.6 = 16. The calculation method for each load factor is as follows (currently, we only use average time and timeout rate, which are the most easily available data in the current TARS system) :

1, according to the proportion of each machine in the total time distribution weight: weight = initial weight * (sum of time – average time of a single machine)/sum of time (the disadvantage is that it is not completely according to the time ratio distribution flow);

2. Weight of timeout rate: weight of timeout rate = initial weight – timeout rate * initial weight * 90%, the conversion of 90% is because 100% timeout may also be caused by excessive traffic, and the trial request of small traffic is reserved;

The corresponding code is implemented as follows:

void LoadBalanceThread::calculateWeight(LoadCache &loadCache)
{
    for (auto &loadPair : loadCache)
    {
        ostringstream log;
        const auto ITEM_SIZE(static_cast<int>(loadPair.second.vtBalanceItem.size()));
        int aveTime(loadPair.second.aveTimeSum / ITEM_SIZE);
        log << "aveTime: " << aveTime << "|"
            << "vtBalanceItem size: " << ITEM_SIZE << "|";
        for (auto &loadInfo : loadPair.second.vtBalanceItem)
        {
            // Assign weights in inverse proportion to the total time of each machine: Weight = initial weight * (total time - average time of each machine)/total time
            TLOGDEBUG("loadPair.second.aveTimeSum: " << loadPair.second.aveTimeSum << endl);
            int aveTimeWeight(loadPair.second.aveTimeSum ? (DEFAULT_WEIGHT * ITEM_SIZE * (loadPair.second.aveTimeSum - loadInfo.aveTime) / loadPair.second.aveTimeSum) : 0);
            aveTimeWeight = aveTimeWeight <= 0 ? MIN_WEIGHT : aveTimeWeight;
            // Timeout rate weight: timeout rate weight = initial weight - timeout rate * initial weight * 90%. 90% is calculated because 100% timeout may be caused by excessive traffic
            int timeoutRateWeight(loadInfo.succCount ? (DEFAULT_WEIGHT - static_cast<int>(loadInfo.timeoutCount * TIMEOUT_WEIGHT_FACTOR / (loadInfo.succCount           
+ loadInfo.timeoutCount))) : (loadInfo.timeoutCount ? MIN_WEIGHT : DEFAULT_WEIGHT));
            // Add the weights by their respective proportions
            loadInfo.weight = aveTimeWeight * getProportion(TIME_CONSUMING_WEIGHT_PROPORTION) / WEIGHT_PERCENT_UNIT
                              + timeoutRateWeight * getProportion(TIMEOUT_WEIGHT_PROPORTION) / WEIGHT_PERCENT_UNIT ;
 
            log << "aveTimeWeight: " << aveTimeWeight << ","
                << "timeoutRateWeight: " << timeoutRateWeight << ","
                << "loadInfo.weight: " << loadInfo.weight << "; ";
        }
 
        TLOGDEBUG(log.str() << "|"<< endl); }}Copy the code

The relevant code is implemented in the RegistryServer, and the code file is shown below:

The core implementation is the LoadBalanceThread class.

5.4 Usage Mode

You can configure the -w -v parameter in the Servant management office to support dynamic load balancing. If you do not configure the parameter, the dynamic load balancing function is disabled.

The diagram below:

Note: It takes effect only when all nodes are enabled. Otherwise, if different load balancing algorithms are found to be used by different nodes in the RPC framework, all nodes are forcibly adjusted to the polling mode.

6. Scenarios where dynamic load balancing applies

If your service runs on a Docker container, you may not need dynamic load balancing. The scheduling capability of Docker is directly used for automatic scaling of services, or the granularity allocated by Docker is directly broken down in deployment, so that the service exclusively occupies Docker, so that there is no problem of mutual influence. If services are deployed in a mixed manner and are likely to be affected by other services, for example, if a service directly uses up the CPU, you are advised to enable this function.

Seventh, the next step

Only two factors, average time and timeout rate, are considered in the current implementation, which can reflect the service capability to some extent, but it is not complete. Therefore, in the future we will consider adding CPU usage as a better indicator of node load. And, some strategies to adjust weights on the caller based on the return code.

Finally, you are welcome to discuss with us and make contributions to TARS open source.

Author: Vivo Internet Server team -Yang Minshan