1. The background

Today we learn about SpringCloud’s Hystrix fuse

We continue to use the previous Eureka-Server as the service registry today

The following versions use Springboot and SpringCloud

  • Springboot version: 2.3.5-release
  • Springcloud version: Hoxton SR9

2. What are Hystrix

In a distributed environment, some of the many service dependencies are bound to fail. Hystrix is a library that helps you control the interaction between these distributed services by adding delay tolerance and fault tolerance logic. Hystrix does this by isolating points of access between services, stopping cascading failures, and providing fallback options, all of which can improve the overall resilience of the system.

3.Hystrix for what

Hystrix is designed to:

  1. Protects and controls delays and failures of dependencies accessed through third-party client libraries, usually over a network.
  2. Preventing cascading failures in complex distributed systems.
  3. Fail fast, recover fast.
  4. Step back and demote as gracefully as possible.
  5. Enable near real time monitoring, alerts, and operational control.

4.. What does Hystrix do

Hystrix solves the avalanche problem by providing resource isolation, degradation mechanisms, meltdowns, caching, and more.

  1. Resource isolation: This includes thread pool isolation and semaphore isolation to limit the use of resources that can be used to call distributed services so that problems in one service invocation do not affect other service invocations.
  2. Demote mechanism: demote due to timeout or when resources are insufficient (thread or semaphore). After demote, data can be returned together with the demote interface.
  3. Disconnection: When the failure rate reaches the threshold, the fault is automatically degraded (for example, the failure rate is high due to network failure or timeout). The quick failure triggered by the fuse is quickly recovered.
  4. Cache: Returns the result cache. Subsequent requests can be directly removed from the cache.
  5. Request merge: It is possible to combine requests over a period of time (typically requests to the same interface) and then send the request only once to the service provider.

Service circuit breakers and downgrades

  • Service circuit breaker: It is a protection measure used to prevent the whole system from failure when the service is overloaded due to some reason. Simply speaking, service circuit breaker is the condition, and service circuit breaker is the defense mechanism configured on the server

  • Service degradation: Simple data service degradation is one of the solutions to service circuit breakers

Resource isolation:

  1. Thread isolation

Hystrix adds thread pools between user requests and services. Hystrix allocates a small thread pool for each dependent call. If the thread pool is full, the call is rejected immediately, with no queuing by default. Speed up the failure determination time. The number of threads can be set. Principle: the user’s request will no longer have direct access to services, but by free threads in the thread pool to access the service, if the thread pool is full, would downgrade processing, the user’s request will not be blocked, you can see at least one execution results (e.g., return friendly message), rather than the endless waiting for, or to see a system crash

  1. Semaphore isolation

In this mode, receiving requests and executing downstream dependencies are done in the same thread, and there is no performance overhead associated with thread context switching, so semaphore mode should be chosen for most scenarios, but in this case, semaphore mode is not a good choice

contrast

contrast Thread pool isolation Semaphore isolation
Whether fuses are supported Support, when the thread pool reaches MaxSize, the request will trigger the fallback interface for fusing Fallback will be triggered when the semaphore reaches maxConcurrentRequest
Whether timeout is supported Supported, you can return directly Not supported. If blocked, you can only call the protocol
The isolation principle Each service uses a separate thread pool Counter through semaphore
Whether asynchronous invocation is supported It can be asynchronous or synchronous. Look at the method called Synchronous invocation, not asynchronous
Resource consumption Large, large number of threads context switch, easy to cause high machine load Small. It’s just a counter
Whether timeout is supported row 2 col 2 row 1 col 2

5. Principle of A CircuitBreaker

Three states of fuse

  1. CLOSED: Indicates that the fuse is CLOSED and the request process is normal
  2. OPEN: the fuses are on, and the fuses are directly degraded
  3. Half-open: the fuse is half-open and a request is placed after the end of a fuse time window

Fuse configuration parameters – Important

Circuit Breaker consists of the following six parameters:

1, the circuitBreaker enabled

Whether to enable fuses. The default value is TRUE. 2, circuitBreaker forceOpen

The fuse is forced to open and always remains open, regardless of the actual state of the fuse switch. The default value is FLASE. 3, circuitBreaker. ForceClosed fuse forced closure, remain closed, don’t focus on the actual state of blowout switch. The default value is FLASE.

4, circuitBreaker errorThresholdPercentage error rates, the default value is 50%, for example, for a period of time (10 s) with 100 requests, there are 54 timeout or abnormal, the error rate is 54%, then this period of time is greater than the default value is 50%, This will trigger the fuse to open.

5, circuitBreaker requestVolumeThreshold

The default value is 20. ErrorThresholdPercentage is calculated only when there are at least 20 requests in a period of time. For example, there are 19 requests for a period of time, and all of these requests fail. The error rate is 100%, but the fuse does not turn on, and the total number of requests does not meet 20.

6, the circuitBreaker. SleepWindowInMilliseconds

Half-open trial sleep duration. The default value is 5000ms. For example, after the fuse is turned on for 5000ms, it will try to release part of the traffic to test whether the dependent service is restored

Fuse flow analysis

The detailed process of fuse operation is as follows:

The first step is to call allowRequest() to determine whether the request is allowed to be submitted to the thread pool

If the fuses are forced on, circuitBreaker. ForceOpen is true, disallow release, and return. If the fuse closed, circuitBreaker forceClosed is true, allow the release. In addition, you don’t have to worry about the actual state of the fuses, which means the fuses still maintain statistics and switch states, just not in effect.

The second step is to call isOpen() to determine whether the fuse switch is on

If the fuse switch is open, enter the third step, otherwise continue; If the total number of requests of one cycle is less than the circuitBreaker, requestVolumeThreshold values, allowing the request, otherwise continue; If the error rate is less than a cycle circuitBreaker, errorThresholdPercentage values, allow the request has been submitted. Otherwise, turn on the fuse switch and proceed to step 3.

The third step is to call allowSingleTest() to determine whether a single request is allowed to pass and to check whether the dependent service is restored

If open fuse, fuse open and distance of time or the last test request more than circuitBreaker release time. SleepWindowInMilliseconds value, fuse into the ajar, clears a testing request; Otherwise, release is not allowed. In addition, to provide a basis for decision making, each fuse maintains 10 buckets by default, one per second, and the oldest bucket is discarded when a new bucket is created. Each blucket maintains counters for success, failure, timeout, and rejection of requests, which Hystrix collects and counts.

6. Project construction

6.1 consumers

Add the dependent

<dependency>
       <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-netflix-eureka-client</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-openfeign</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-netflix-ribbon</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
Copy the code

Add annotations

@EnableCircuitBreaker

@EnableCircuitBreaker
Copy the code

Adding configuration Classes

@FeignClient(name ="ms-feign-producer",path = "/api/user",configuration = Config.class,fallback = UserServiceImpl.class)  public interface UserService { @GetMapping("/{id}") public String selectUser(@PathVariable("id") String id); } @Component public class UserServiceImpl implements UserService{ @Override public String selectUser(String id) { return "I'm a circuit breaker "; }}Copy the code

The configuration file

spring: application: name: ms-feign-consumer eureka: client: service-url: defaultZone: http://localhost:8000/eureka register-with-eureka: true instance: prefer-ip-address: true #appname: ${spring.application.name} instance-id: ${spring.cloud.client.ip-address}:${server.port} hostname: ${spring.cloud.client.ip-address} server: port: 8081 hystrix: command: default: circuitBreaker: requestVolumeThreshold: Amount of request # 5 set time window for at least five sleepWindowInMilliseconds: 5000 errorThresholdPercentage: 50 metrics: rollingStats: TimeInMilliseconds: 5000 # Time windowCopy the code

use

@RequestMapping("/api/comsumer/user") @RestController public class UserController { @Autowired UserService userService; @GetMapping("/{id}") public String selectUser(@PathVariable("id") String id){ return userService.selectUser(id); }}Copy the code

6.2 producers

Add the dependent

<dependency>
       <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-netflix-eureka-client</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-openfeign</artifactId>
</dependency>
Copy the code

Modifying a Configuration File

spring:
  application:
    name: ms-feign-producer
eureka:
  client:
    service-url:
      defaultZone: http://localhost:8000/eureka
    register-with-eureka: true
  instance:
    prefer-ip-address: true
    #appname: ${spring.application.name}
    instance-id: ${spring.cloud.client.ip-address}:${server.port}
    hostname: ${spring.cloud.client.ip-address}
server:
  port: 8082
Copy the code

To provide services

@RequestMapping("/api/user") @RestController public class UserController { @Value("${server.port}") Integer port; @GetMapping("/{id}") public String selectUser(@PathVariable("id") String id){ if ("1".equals(id)) { int i = 1/0; } User user = new User(); user.setId(id); user.setName("wangyunqi"); user.setPort(port); return user.toString(); }}Copy the code

test

  • 1: When requesting a consumer ID? =1, make more than 5 consecutive requests within 5 seconds, and then return directly after the request: “I am a circuit breaker”

Through the port fuse seriously: http://192.168.1.119:8081/actuator/health

"hystrix": {
            "status": "CIRCUIT_OPEN",
            "details": {
                "openCircuitBreakers": [
                    "ms-feign-producer::UserService#selectUser(String)"
                ]
            }
        },
        "ping": {
            "status": "UP"
        },
        "refreshScope": {
            "status": "UP"
        }
Copy the code
  • 2: after waiting for a time sleepWindowInMilliseconds, bearing id = 2, found normal return data
User{id='2', name='111', age=0, port=8082}
Copy the code

Through the port fuse seriously: http://192.168.1.119:8081/actuator/health

        "hystrix": {
            "status": "UP"
        },
        "ping": {
            "status": "UP"
        },
        "refreshScope": {
            "status": "UP"
        }
    }
Copy the code

7. Fuse working process

Process description:

  • 1: Create a new HystrixCommand for each call, encapsulating the dependent calls in the run() method.
  • 2: Execute execute()/queue for synchronous or asynchronous invocation.
  • 3: Check whether the circuit-breaker is on. If it is, go to Step 8 for downgrading; if it is, enter the step.
  • 4: Check whether the thread pool/queue/semaphore is full. If so, go to step 8. Otherwise, continue the following steps.
  • 5: Call the run method of HystrixCommand. Run dependency logic
  • 5a: The dependent invocation times out. Go to Step 8.
  • 6: Checks whether the logic is invoked successfully
  • 6A: Returns the result of a successful call
  • 6b: The call fails. Go to Step 8.
  • 7: Calculate the status of the fuse, and report all the operating status (success, failure, rejection, timeout) to the fuse for statistics to judge the status of the fuse.
  • 8:getFallback() downgrade logic.
  • The getFallback call is triggered in four ways:
  • (1) : the run () method throws the HystrixBadRequestException anomalies
  • (2) : The run() method call timed out
  • (3) : Fuse starts interception call
  • (4) : Whether the thread pool/queue/semaphore is full
  • 8a: A Command that does not implement getFallback will throw an exception directly
  • 8b: Fallback returns if the fallback call succeeds
  • 8C: An exception is thrown when the degraded logic call fails
  • 9: Indicates that the execution result is successful