This is the 14th day of my participation in the First Challenge 2022

First of all, I wish you a happy New Year and a New Year of wealth and freedom.

  • Microservices series: Sentinel Basic Flow control rules for Spring Cloud Alibaba
  • Micro services: Sentinel Advanced Flow control rules for Spring Cloud Alibaba

In the above two articles, we have studied the flow control rules of Sentinel. From this article, we continue to study the circuit breaker degradation rules of Sentinel.

Without further ado, let’s begin today’s lesson.

An overview,

In addition to flow control, fusing downgrading of unstable resources in call links is also one of the important measures to ensure high availability. A service often calls another module, perhaps another remote service, a database, a third-party API, and so on. For example, when making a payment, you may need to remotely call the API provided by UnionPay. Querying the price of an item may require a database query. However, the stability of the dependent service is not guaranteed. If the dependent service is unstable and the response time of the request is longer, the response time of the method that invokes the service is also longer, threads pile up, and eventually the business’s own thread pool may be exhausted and the service itself becomes unavailable.

Modern microservice architectures are distributed and consist of a very large number of services. Different services call each other and form a complex call link. The above problems can have a magnified effect in link calls. If a link in a complex link is unstable, it may cascade to make the whole link unavailable. Therefore, we need to fuse down unstable weak dependent service calls to temporarily cut off unstable calls to avoid local unstable factors causing an avalanche of the whole. Fuse downgrading is usually configured on the client side (calling side) as a means of protecting itself.

This article is a new improvement and upgrade of the fusing degrade feature for Sentinel 1.8.0 and above

Ii. Circuit breaker strategy

Sentinel offers the following circuit breaker strategies:

  • Slow call ratio (SLOW_REQUEST_RATIO) : Select the ratio of slow calls as the threshold. You need to set the allowed RT of slow calls (that is, the maximum response time). If the response time of a request is greater than this value, the request is counted as slow calls. When unit statistics time (statIntervalMs) if the number of requests is greater than the set minimum number of requests and the ratio of calls is greater than the threshold, the requests will be fused automatically in the following fusing period. After the fuse duration, the fuse will enter the probe recovery state (half-open state). If the response time of the next request is less than the set slow-call RT, the fuse will end. If the response time is longer than the set slow-call RT, the fuse will be disconnected again.
  • Abnormal proportion (ERROR_RATIO) : When unit statistics duration (statIntervalMs) if the number of requests is greater than the set minimum number of requests and the proportion of exceptions is greater than the threshold, the requests will be fused automatically in the following fusing period. After the fuse period, the fuse enters the probe half-open state, terminating the fuse if the next request completes successfully (without error), otherwise it will be fused again. The threshold range for abnormal ratio is[0.0, 1.0], represents 0-100%.
  • Number of abnormal (ERROR_COUNT) : When the number of anomalies in a unit statistics period exceeds the threshold, the circuit breaker is automatically disabled. After the fuse period, the fuse enters the probe half-open state, terminating the fuse if the next request completes successfully (without error), otherwise it will be fused again.

Notice Exception degradation only applies to service exceptions. The exception (BlockException) of Sentinel traffic limiting degradation does not take effect.

Console definition

Click on the fuse

New circuit breaker rule

4. Code definition

private void initDegradeRule(a) {
    List<DegradeRule> rules = new ArrayList<>();
    DegradeRule rule = new DegradeRule();
    rule.setResource(KEY);
    // set threshold RT, 10 ms
    rule.setCount(10);
    rule.setGrade(RuleConstant.DEGRADE_GRADE_RT);
    rule.setTimeWindow(10);
    rules.add(rule);
    DegradeRuleManager.loadRules(rules);
}
Copy the code

5. Attribute description

A fuse DegradeRule contains the following important properties:

Field instructions The default value
resource The resource name, which is the object of the rule
grade Fuse breaker policy: supports the ratio of slow calls, ratio of exceptions, and number of exceptions policies Slow call ratio
count In slow call ratio mode, it is slow call critical RT (beyond this value, it is slow call); In exception ratio/Number of exceptions mode, the corresponding threshold is set
timeWindow Fusing duration, unit: s
minRequestAmount Minimum number of requests triggered by fuses. If the number of requests is less than this value, fuses will not be triggered even if the abnormal ratio exceeds the threshold (introduced in 1.7.0) 5
statIntervalMs Statistical duration (unit: ms), for example, 60*1000 represents minute level (introduced in 1.8.0) 1000 ms
slowRatioThreshold Slow call ratio threshold, valid only for slow call ratio mode (introduced in 1.8.0)

Sentinel’s circuit breaker is not half-open

Six, the actual test

All of these concepts are available on the Sentinel website, but we still need to use the actual configuration to better understand the Sentinel fuse downgrading rules.

1. Slow call ratio

1.1. Slow call ratio test interface/testD

@GetMapping("/testD")
public String testD(a){
    try {
        // Sleep for two seconds
        TimeUnit.SECONDS.sleep(2);
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
    log.info(TestD tests the percentage of slow calls);
    return "testD....";
}
Copy the code

1.2 console definition

The effect of this configuration is:

If the number of requests to /testD is greater than 5 (the minimum number of requests) and the proportion of requests whose response time is greater than 1 second (the maximum RT) exceeds the proportion threshold 0.1 within 2 seconds (the statistics duration), the next request triggers a fusing, and the fusing duration is 10 seconds, that is, within 10 seconds. After 10 seconds, the fuse will be closed. At this time, the fuse will enter the half-open state. If the response time of the next request is less than the set slow call RT, the fuse will end.

You can also say:

Ten requests are sent within 2 seconds (greater than the minimum number of requests 5), and if the response time of one request exceeds the maximum RT of 1 second, the next request starts to trigger a fuse

Note that both of these conditions have to be met.

1.3 Jmeter Configuration

Set up a thread group with a loop of 10 requests per second

HTTP requests are requests /testD

Start, observation result tree:

All 10 requests in 1 second are normal, and then the browser accesses the /testD resource again, and the fuse is triggered

Then access /testD again after the fuse has been down for 10 seconds (with the fuse in probe recovery)

Note: at this time, the fuse is in the half-open state, the next request will still directly fuse, if you want to cancel this state when testing, you need to restart the project, or change the fuse rule

2. Abnormal ratio

Once we understand the rule of slow call ratio, it will be very easy to learn the rule of exception ratio and exception number. Most of the parameters are the same

1.1. Slow call ratio test interface/testE

@GetMapping("/testE")
public String testE(a){
    int a = 10/0;
    log.info("TestD abnormal Percentage");
    return "testE....";
}
Copy the code

This interface is bound to raise an exception when called

1.2 console definition

Just one less maximum RT than the slow call ratio rule configuration

Effect of this configuration:

If the number of requests to /testE is greater than 5 (the minimum number of requests) and the proportion of abnormal requests exceeds 0.1 within the statistical period of 1 second (statistical period), the fuse will be triggered for the next request. The duration of the fuse is 10 seconds, that is, the fuse will be open for all requests during this period. After 10 seconds, the fuse will close, and then the fuse will enter the half-open state. If the response time of the next request is less than the set slow call RT, the fuse will end. If the response time is longer than the set slow call RT, the fuse will be broken again.

1.3 Jmeter Configuration

It’s the same as above, but the HTTP request interface is /testE

Start, observation result tree:

It was found that the circuit breaker was triggered directly from request 6

The resulting browser access is exactly the same as the slow call ratio, but never mind.

3. The number of abnormal

The outliers are even easier

For the interface we’ll use /testE as above

  • Console definition

This configuration effect also need not speak, believe everybody oneself can roll out.

The Jmeter configuration is the same as above

Start and observe the result tree

It was also found that the circuit breaker was triggered directly from the sixth request

Well, that’s the end of this article.

PS: Now that you’ve seen it, give it a thumbs up, Daniel!