preface

With the popularity of micro-services, the stability between services becomes more and more important. We often spend a lot of experience in maintaining the stability of services, and traffic limiting and fuse downgrading are the two most commonly used methods. Some time ago in the group of some small partners on the use of the flow limit some questions, coupled with the recent company has also done the flow limit related things, so here to summarize some of their views on the flow limit.

Just now, we said that traffic limiting is one of the means to ensure the stability of service, but it can not guarantee the stability of all scenarios. As its name implies, it can only play its role in the scenario of heavy traffic or sudden traffic. For example, our system supports the maximum 100QPS, but suddenly a 1000QPS request comes in, the system may directly hang up at this time, resulting in the subsequent request can not be processed. However, if we have limited flow means, no matter how big the QPS is, we will only process 100QPS requests, other requests are directly rejected. Although we rejected 900 QPS requests, our system did not fail. Our system can still handle subsequent requests continuously, which is what we expect. Some students may say, now all on the cloud, the dynamic expansion of the service should be particularly simple, if we find that the flow is particularly large, automatic expansion of the machine to support the target QPS that does not need to limit the flow? In fact, there are a lot of students who have this idea. Some students may be fooled by some brag articles, so they will think this way. This idea can be realized in the ideal time, but in reality, there are the following problems:

  • Expansion takes time. Capacity expansion is simply building a new machine and redistributing the code, and those of you who do Java know that the time to release a code is usually measured not in seconds, but in minutes, and sometimes when you’re done with capacity expansion, maybe the traffic spike is over.
  • How much to expand is a particularly complex issue. It is complicated to expand several machines, which requires a lot of pressure calculation and an expansion of the whole link. If the machine on your side is expanded, the machine on other teams may not be expanded, which is also a problem.

Therefore, simple capacity expansion cannot solve this problem, and limiting traffic is still a skill we must master! Spring family bucket is also one of the necessary treasures!

The basic principle of

If you want to master the flow limiting, you need to master some basic algorithms. The algorithm of flow limiting is basically divided into three kinds: counter, funnel, token bucket, and others are evolved on the basis of these.

Counter algorithm

First, let’s talk about the counter algorithm, which is relatively simple and crude. We only need one cumulative variable, and then refresh the cumulative variable every second, and then determine whether the cumulative variable is greater than our maximum QPS.

    int curQps = 0;
    long lastTime = System.currentTimeMillis();
    int maxQps = 100;
    Object lock = new Object(a); booleancheck(){
        synchronized (lock){
            long now = System.currentTimeMillis();
            if (now - lastTime > 1000){
                lastTime = now;
                curQps = 0;
            }
            curQps++;
            if (curQps > maxQps){
                return false; }}return true;
    }
Copy the code

This code is simpler, we define the current QPS, as well as the last time to refresh accumulative variable, and our biggest QPS and we lock locks, every time we checked, all you need to determine whether to need to refresh, if need to refresh the need to take time and QPS are reset, and then judge the QPS accumulation for.

The problem with this algorithm is particularly obvious because it is so simple. If our maximum QPS is 100, 100 requests come in 0.99 seconds, and another 100 requests come in 1.01 seconds, this can pass our program, but we actually pass 200 requests in 0.03 seconds. This is definitely not what we expected, because it is very likely that these 200 requests will directly lead to the suspension of our machine.

Sliding window counter

To solve the above critical problem, we can use sliding Windows to solve the problem:

As shown in the figure above, we divided the ordinary counter of 1s into five 200ms. The current QPS we counted need to count all the QPS of the last five Windows. Back to the question just now, 0.99 seconds and 1.01 seconds are actually within our last five Windows, so there will not be the critical spike problem just now.

In fact, from another perspective, our ordinary counter is actually a sliding window counter with 1 window number. As long as we divide more Windows, the statistics will be more accurate when we use the counter scheme, but the cost of maintaining the window will increase relatively. We’ll talk more about how he does sliding window counting when we talk about Sentinel later.

Funnel algorithm

Solving the critical spike problem in the counter can also be achieved by funnel algorithm, as shown in the figure below:

In the funnel algorithm, we need to pay attention to the leaky bucket and the uniform flow, no matter how much flow is in the leaky bucket first, and then at a uniform speed. How do you implement this constant velocity in your code? For example, if we want the constant speed to be 100q/s, we can get that it takes 10ms for each outgoing flow, which is similar to a queue, and the flow is taken out from the head of the queue every 10ms for release. Our queue is also the leaking bucket. When the flow is larger than the length of the queue, we can reject the excess.

Funnel algorithms also have certain disadvantages: they cannot handle sudden bursts of traffic (unlike the critical spikes above, not to be confused). For example, when 100 requests come in a moment, the leaky bucket algorithm can only pass one by one. When the last request flows out, the time has passed by one second. Therefore, the funnel algorithm is suitable for the scene where the request arrival is uniform and the request rate needs to be strictly controlled.

Token bucket algorithm

To solve the burst traffic situation, we can use the token bucket algorithm, as shown in the figure below:

  • Token production: Again, we assume that the maximum QPS is 100, so we convert from one traffic per 10ms of the funnel to one token per 10ms until the maximum token is reached.
  • Consumption tokens: each and every one of us traffic consumes the token bucket, the consumption of the rules can be changeable, can be simple each traffic consumption a token, and packet size can be according to the different flow or traffic types to the consumption of different rules, such as query traffic consumption a token, write traffic consumption two tokens.
  • Check whether it passes: if the token bucket is sufficient, we allow traffic to pass. If not, we can wait or reject traffic directly. This can be controlled by a funnel-like queue.

Single current limiting

We have introduced some basic algorithms of traffic limiting above, and these algorithms can be divided into two types when applied to our distributed services, one is single-machine traffic limiting, the other is cluster traffic limiting. Single-machine current limiting means that each machine does its own current limiting without affecting each other. Let’s look at how to achieve single-machine current limiting?

guava

Guava is Google’s open source Java core tools library, which includes collection, caching, concurrency and other useful tools, but also provides the tools we need here, the core class is RateLimiter.

// RateLimiter rateLimiter = RateLimiter.create(100, 500, TimeUnit.MILLISECONDS); The preheating rateLimit
    RateLimiter rateLimiter = RateLimiter.create(100); // Simple rateLimit
    boolean limitResult = rateLimiter.tryAcquire();
Copy the code

As shown in the above code, we just need to build a RateLimiter and then call tryAcquire method. If the return value is true, it means that the traffic is passing, otherwise, the traffic is restricted. RateLimiter is also divided into two types in Guava, one is the implementation of the ordinary token bucket algorithm, and the other is RateLimiter with preheat, which allows us to gradually increase the release rate of the token bucket until the maximum. This kind of preheat is also available in Sentinel. This can be used on cold systems where the database connection pool is not fully filled and is still being initialized.

Here’s a quick overview of how Guava’s token bucket is implemented:

The normal token bucket creates a SmoothBursty class, which is the key to our tryAcquire implementation:

There are four steps:

  • Step1: Add a synchronous lock. It should be noted that there is no lock in Sentinel, but there is in Guava. Some problems of Sentinel will be solved later.
  • Step2: judge whether we can apply for the token bucket. If there are not enough tokens in the bucket and the waiting time exceeds our timeout, we will not apply here.
  • Step3: apply for the token and get the waiting time. In our tryAcquire, the timeout parameter is the maximum waiting time. If we just call tryAcquire(), there will be no waiting.
  • Step4: Sleep the waiting time.

To deduct the token method concrete in reserverEarliestAvailable approach:

This looks like a lot of work, but if we had just called tryAcquire(), there would only be two red boxes to focus on:

  • Step1: The token is issued according to the latest time. In Guava, the way of issuing tokens asynchronously by other threads is not adopted, and the update of tokens is placed in the flow limiting method every time we call. This design is worth learning. And it’s not as complicated as asynchronous threads.
  • Step2: Deduct the token. We have verified in canAcquire, and the token must be successfully deducted.

Guava currently provides both types of traffic limiting. Many middleware or business services use Guava as their own tool, but Guava’s approach is limited, dynamic change traffic limiting, and more policy traffic limiting are not supported, so we will introduce Sentinel next.

sentinel

Sentinel is a lightweight flow control framework of Alibaba’s open source distributed service framework, which carries on the core scene of Alibaba’s double Eleven Traffic promotion in the past 10 years. Its core is flow control, but not limited to flow control, but also supports fuse downgrade, monitoring and so on.

Limiting using Sentinel is slightly more complicated than Guava, and here is the simplest code:

        String KEY = "test";
        // ============== Initialization rule =========
        List<FlowRule> rules = new ArrayList<FlowRule>();
        FlowRule rule1 = new FlowRule();
        rule1.setResource(KEY);
        // set limit qps to 20
        rule1.setCount(20);
        rule1.setGrade(RuleConstant.FLOW_GRADE_QPS);
        rule1.setLimitApp("default");
        rules.add(rule1);
        rule1.setControlBehavior(CONTROL_BEHAVIOR_DEFAULT);
        FlowRuleManager.loadRules(rules);
        // ================ Traffic limiting decision ===========
        Entry entry = null;

        try {
            entry = SphU.entry(KEY);
            // do something

        } catch (BlockException e1) {
            // Limiting flows throws BlockException
        }finally {
            if(entry ! =null) { entry.exit(); }}Copy the code
  • Step1: The concept of Resource is emphasized in Sentinel. What we protect or act on is based on Resource, so we need to determine the key of our Resource first. Here we simply set it as test.
  • Step2: Then we initialize a flow limiting rule for our Resource. Here we select the QPS flow limiting rule and the default policy is the sliding window version of the counter, and then load it into the global rule manager. The Settings of the whole rule are quite different from guava.
  • Step3: The second important concept in Sentinel is Entry, which represents a resource operation. The internal Invocation saves the current invocation information and exit Entry when finally. SphU. Entry is the key to executing our flow limiting rule. Unlike Guava, if the flow is blocked, a BlockException will be thrown.

Although sentinel’s overall usage is much more complex than Guava’s, the algorithm has more options than Guava’s current limiting.

Based on concurrency (number of threads)

In Sentinel, we provide a strategy based on the number of concurrent calls. The effect is similar to semaphore isolation. We can use this mode when we need to keep the business thread pool from being exhausted by slow calls.

Usually we use A thread pool for the HTTP interface provided by the same service. For example, we use A Tomcat-Web server so we have A Tomcat business thread pool. If there are two methods in HTTP, A and B,B is relatively fast and A is relatively slow. If A method is called too much, the thread pool may be exhausted because A is too slow to release threads, and the other method B will not get threads. This scenario we have encountered before directly results in the rejection of all requests received by the entire service. Some students said that it is ok to limit THE QPS of A. It should be noted that QPS is per second. If the time of our A interface is more than 1s, the QPS will be recalculated after the next wave of A comes.

If Grade is FLOW_GRADE_THREAD, flow limiting can be implemented.

Based on the QPS

Qps-based current-limiting Sentinel also provides four strategies:

  • Default policy: Set Behavior to CONTROL_BEHAVIOR_DEFAULT, which is a sliding window counter mode. This method is suitable for situations where the processing capacity of the system is clearly known, such as when the exact water level of the system is determined by pressure measurement.
  • Warm Up: Set the Behavior to CONTROL_BEHAVIOR_WARM_UP, which is similar to warmup described in Guava. Preheat startup mode. When the system has been at a low water level for a long time, when the flow suddenly increases, directly pulling the system to a high water level can suddenly overwhelm the system. The graph of QPS in this mode is as follows:

  • Uniform queuing: Set the Behavior to CONTROL_BEHAVIOR_RATE_LIMITER. This mode is actually a funnel algorithm. The advantages and disadvantages of this mode were explained earlier
  • Warm Up + Uniform queuing: Set Behavior to CONTROL_BEHAVIOR_WARM_UP_RATE_LIMITER. Previously, the flow limiting algorithm of the sliding window is used after the Warm Up reaches the high watermark. In this mode, the uniform queuing algorithm is used.

Based on the call relationship

Sentinel provides a more complex type of limiting, which can be done more flexibly based on the call relationship:

  • Limiting flow according to the caller: Contextutil. enter(resourceName, Origin),origin is the id of the caller. Then, when setting limitApp as the parameter of our rule, we can set limitApp as follows:
    • Set to default to limit traffic for all callers by default.
    • Set to {some_origin_name} to limit the stream for a particular caller.
    • Setting it to other limits the flow of all callers except those represented by one of the configured referResource parameters.
  • Associated flow control: Also supports in the sentinel, two related resources can affect each other flow control, such as, there are two interface using the same resources, an interface is more important, another interface is not so important, we can set up a rule when the important interface of access, can be another important interface for current limiting, Prevent sudden traffic on this interface from affecting important interfaces.

Some problems with Sentinel

Sentinel offers many algorithms, but there are some problems:

  • Compared to guava’s two lines of code, using Sentinel requires knowing some nouns and then using them. Although Sentinel provides some annotations to help simplify the use, it is still more complex than Guava.
  • Sentinel has certain operation and maintenance costs. The use of Sentinel often requires the establishment of sentinel server background. Compared with guava out of the box, sentinel has certain operation and maintenance costs.
  • There is no lock in the sentinel source code. In extreme cases, if the QPS limit is 10, if there are 100 simultaneous over-limiting logic, it will pass. Guava does not have this situation.

These problems are basically compared to Guava’s current limiting, because Sentinel has more features and costs more.

Cluster current-limiting

The limited flow mentioned before is single-machine flow limiting, but now we are all in the architecture mode of micro-service cluster, usually a service will have multiple machines, for example, there is an order service, the service has 10 machines, so we want to limit the flow of the whole cluster to 500QPS, how should we do? Every machine that is simple, direct current limit 50, 50 * 10 is 500, but in the real environment there will be a load imbalance, when micro service calls to a variety of load balancing algorithm, such as the priority with the room and training in rotation, such as random algorithm, these algorithms are likely to lead to our load is not special equilibrium, As a result, the QPS of our whole cluster may not have 500 or even be limited at 400, which is what we have encountered in the real scene. Since there is a problem with single-machine traffic limiting, we should design a more perfect cluster traffic limiting scheme

redis

This scheme does not rely on the framework of current limiting, we the whole cluster use the same redis can, need to encapsulate a minimum flow logic, here we use the most simple counter to design, we put our system time in seconds as the key, set to redis (you can set certain expiration time for space clean up), Using Redis’s int atomic addition, we do +1 for each request and then determine if the current value exceeds our maximum limit.

The scheme of Redis is relatively simple to implement on the whole, but it is strongly dependent on our system time. If there is deviation in system time between different machines, the current limiting may be inaccurate.

sentinel

Sentinel provides a cluster solution, which is unique compared to other current limiting frameworks. Two models are provided in Sentinel:

  • Standalone mode: The traffic limiting service is deployed as an independent server. As shown in the following figure, all applications obtain tokens from the token-server deployed separately. This mode applies to all restricted flows across services. More importantly, there is more traffic limiting in the cluster within the service.

  • Embedded mode: in embedded mode, we will deploy the server to our application instance, we can also transform our server-client identity through the interface, of course, we can introduce some logical Settings of ZK to let our leader as the server, the machine down can automatically switch. This comparison is suitable for limiting traffic between the same service cluster and has good flexibility. However, it should be noted that a large number of token-server accesses may also affect our own machines.

Sentinel also has some back-of-the-envelope strategies. If the token-Server fails, we can revert to our single-node flow limiting mode without affecting our normal service.

In actual combat

We have introduced a lot of tools for limiting traffic, but many students are still confused about how to limit traffic. When limiting the flow of a scene or resource, we need to check the following points:

  • Where to limit the flow
  • Limit how much flow
  • How to select tools

Where to limit the flow

This problem is complicated, many companies and many teams are not the same, when Meituan make a wave of SOA, I remember all of the service at that time all interfaces need to do current limit, ask each group to give a reasonable upper limit of QPS interface evaluation, do this in theory is right, we should give a cap every interface, To prevent the whole system from dragging down, but the cost of doing so is very high, so most companies still choose to limit the flow.

First of all, we need to determine some core interfaces, such as ordering and payment in the e-commerce system. If the traffic is too large, there will be problems in the path of the e-commerce system. For example, for the edge interface like account checking (which does not affect the core path), we can not set the flow limit.

Secondly, we do traffic limiting not only at the interface layer, but also directly at the gateway layer to prevent further infiltration of traffic into the core system. Of course, the front end can also perform traffic limiting. After the front end catches the error code of traffic limiting, the front end can prompt waiting message, which is actually part of traffic limiting. Actually when current limiting the downstream trigger our resources waste, because in the downstream current limiting upstream have done a lot of work before, if by this time trigger current limiting before work will be in vain, if it involves some rollback work will also increase our burden, and so for current limit should be the upper trigger the better.

Limit how much flow

Most of the time, the problem of limiting the flow may be a historical experience value, we can monitor the daily QPS chart, and then add a little redundant QPS on this contact, maybe this is our limiting flow. But there is a scene to note, that is great to promote (here refers to the inside of the electricity system scene, other system analogy flow high scenarios), we’ll jump system flow, is no longer our daily QPS, this kind of circumstance, often need us before big promoting the whole link to our system pressure test, pressure to determine a reasonable upper limit, The limit is then set based on this upper limit.

How to select tools

Generally speaking, the larger Internet companies have their own uniform stream limiting tools that can be used directly here. For other cases, if there are no requirements such as cluster flow limiting or fuse cutting, I personally think RateLimter is a good choice, which is relatively easy to use and basically has no learning cost. If there are some other requirements, I personally think Sentinel is the best choice. As for Hytrix, I personally do not recommend it. Because this is no longer maintained.

conclusion

Although the current limit is only two words, but really to understand the current limit, do a good job of the current limit, is a very difficult thing, for me personally, welcome to discuss!