Why fuses limit current: To prevent service avalanches. So what is service avalanche?

Here are a few concepts:

Quick failure: A quick failure occurs in Java when multiple threads operate on an unsafe collection in a collection. By analogy, when we develop Web applications, we will also prematurely terminate (throw an exception or directly return) illegal parameters to avoid occupying later resources and carrying out invalid calculation.

Cache avalanche: Databases are often the bottleneck of Web applications. To avoid frequent database reads, local cache or distributed cache stores database data in memory. If data does not exist in the cache or has not expired, access to the database is reduced. Caches tend to be set to expire for a certain time. If at some point in time, the cache collectively fails. A flood of data access into the database is called a cache avalanche.

Distributed services: Early usage level expansion, for some servers may not fully utilize resources, such as IO intensive and CPU intensive. For this purpose, distributed services are introduced, with specific services using suitable servers, resulting in network transport using RPC or HTTP.

Service avalanche: Multiple services invoke links to each other, with one core link often invoking ten services. We know that as the number of concurrent requests increases, the system response time increases dramatically at some point.

If this happens to a service in a link, RT (response time) increases sharply and upstream services continue to request, causing a vicious circle. The more upstream threads wait for results, the more upstream services block and the whole link becomes unusable. This is called service avalanche.

Solution:

Borrow the idea of failing fast. An RT timeout occurs, and an error result is returned. – fusing

Borrowing from cache avalanche is wanted. Avoid massive data access to the application and keep the amount within a reasonable range. – current limiting

Single-node and distributed current limiting:

Single-machine flow limiting refers to limiting the number of QPS or concurrent threads or load index for a code snippet in the current process and raising an exception or returning false if the number exceeds the specified value.

This server will generate only a certain number of votes per second for each specified resource. Before executing the critical section code, it will first go to the centralized invoice service to collect tickets. If it succeeds, it can execute.

Current limiting algorithm:

The mainstream traffic limiting algorithms in the industry generally include the following.

1. Traffic limiting of the token bucket

A token bucket is a bucket that holds tokens of a fixed capacity. Tokens are added to the bucket at a fixed rate. When the bucket is full, tokens are discarded. Token buckets allow a certain amount of burst traffic, which can be processed as long as there are tokens, and support holding multiple tokens at a time. The token bucket contains tokens.

2. Limit flow in leaky buckets

Leaky bucket A leaky bucket of a fixed capacity flows out requests at a constant rate. The inflow rate is arbitrary. If the number of incoming requests reaches the leaky bucket capacity, new incoming requests are rejected. A leaky bucket can be regarded as a queue with a fixed capacity and a fixed outbound rate. The leaky bucket limits the outbound rate of requests. The leaky bucket contains requests.

3. Current limiting of the counter

Sometimes we use counters to limit the total number of concurrent requests in a given period of time, such as database connection pools, thread pools, and spread-kill concurrency. Counter current limiting as long as the total number of requests within a certain period of time exceeds the set threshold, the current limiting is a simple and crude total quantity current limiting, rather than average rate current limiting.

4. Fixed window algorithm (sliding window algorithm)

To implement it, start a timer to reset the count periodically. For example, if you want to limit the number of accesses per second, the logic of limiting traffic is as simple as comparing the count to see if it exceeds the threshold. The disadvantage is that traffic can flood in between the two intervals, causing the window to fail.

Mainstream industrial scheme

**Guava’s RateLimiter uses the token bucket algorithm for single-machine traffic limiting, mainly because it supports burst traffic and is a mature off-the-shelf framework. ** The reason is as follows: for traffic limiting at the interface level, it is more important to ensure rate balance and allow certain burst traffic.

For distributed current limiting applications there are hytrix and Sentinel.

Hystrix current limiting function

  1. Hystrix uses the Command pattern HystrixCommand(Command) to wrap the dependent call logic, with each Command executed in a separate thread/under signal authorization.
  2. Perform fallback based on timeout.
  3. Provide a small thread pool (or signal) for each dependency, and the call will be rejected immediately if the thread pool is full, with no queuing by default. Speed up the failure determination time.
  4. Dependent call results: success, failure (throw exception), timeout, thread rejection, short circuit. Fallback logic is executed when the request fails (exception, rejection, timeout, short circuit).
  5. Fuses: Provides fuses based on failure ratio (50%) that stop current dependency for a period of time (10 seconds).
  6. Provides statistics and monitoring for near real-time dependency

Interface to use

Detailed interface USES: blog.51cto.com/developeryc…

Use command mode to encapsulate dependency logicpublic class HelloWorldCommand extends HystrixCommand<String> { 
    private final String name; 
    public HelloWorldCommand(String name) { 
        // minimum configuration: specify the CommandGroup name (CommandGroup)
        super(HystrixCommandGroupKey.Factory.asKey("ExampleGroup")); 
        this.name = name; 
    } 
    @Override 
    protected String run(a) { 
        // The dependency logic is encapsulated in the run() method
        return "Hello " + name +" thread:" + Thread.currentThread().getName(); 
    } 
    // Invoke instance
    public static void main(String[] args) throws Exception{ 
        // Each Command object can only be called once.
        This instance can only be executed once. Please instantiate a new instance.
        HelloWorldCommand helloWorldCommand = new HelloWorldCommand("Synchronous-hystrix"); 
        Execute (); helloWorldCommand.queue().get();
        String result = helloWorldCommand.execute(); 
        System.out.println("result=" + result); 
    
        helloWorldCommand = new HelloWorldCommand("Asynchronous-hystrix"); 
        // asynchronous call, can freely control the time to obtain results,
        Future<String> future = helloWorldCommand.queue(); 
        // The get operation cannot exceed the timeout period defined by command. Default :1 second
        result = future.get(100, TimeUnit.MILLISECONDS); 
        System.out.println("result=" + result); 
        System.out.println("mainThread="+ Thread.currentThread().getName()); }}Copy the code

Capabilities supported by SENTIAL

  1. Limit current according to QPS
  2. Limiting traffic by number of threads
  3. Limit the flow according to the caller
  4. Black and white list
  5. Limit the flow according to the call chain, for example, count the logic of dropping interface C from A and B respectively
  6. Traffic limiting is based on resource read and write competition conditions
  7. Asynchronous traffic limiting interface supported
  8. Full current limiting and single-machine current limiting

Thinking and summarizing

Sentinel local limits flow by reading annotations, code to generate a local global context, and using the context to hold local call information: call chain, QPS, etc. Of course the proxy generates limiting logic for annotations.

With Sentinel full restricted stream, we can do prioritization based limiting, for example, restricting the flow of all related services when the payment volume is high.

Why sentinel

  1. Hystrix supports only fuses and downgrades of the API dimension, with fewer features
  2. API is much simpler to use and very low code intrusion
  3. Rich flow limiting logic to meet various needs
  4. Rich monitoring interfaces and Dashboards
  5. High performance and limited flow pipeline design ideas, based on previous experience design more lightweight, to expand
  6. Stable and endorsed by Ali middleware team
  7. Resource limiting is not limited to interfaces and supports any resource invocation
function Sentinel Hystrix (Tesla)
function Sentinel Hystrix (Tesla)
Isolation strategy Semaphore isolation Thread pool isolation/semaphore isolation
Fuse downgrading strategy Based on response time or failure rate Based on failure ratio
Real-time indicator realization The sliding window Sliding Windows (based on RxJava)
Rule configuration Support for multiple data sources Support for multiple data sources
scalability Multiple extension points Plug-in form
Annotation-based support support support
Current limiting Based on QPS, traffic limiting based on call relationships is supported Limited support
Traffic shaping Support slow start and constant speed mode Does not support
System load protection support Does not support
The console Out of the box, you can configure rules, view second-level monitoring, machine discovery, etc imperfect
Common framework adaptation Servlet, Spring Cloud, Dubbo, gRPC, etc Servlet, Spring Cloud Netflix

Refer to the content

Guava official documentation -RateLimiter class

Hystrix segmentfault.com/a/119000001 official document…

Hystrix, limit into door guide www.cnblogs.com/yepei/p/716…

Sentinel vs. Hystrix github.com/alibaba/Sen…