Introduction to the

When microservices are implemented in actual projects, the number of services is often very large, and the interdependence between services is also complex. A network request usually needs to call multiple services to complete. If a service becomes unavailable, such as network delays or malfunctions, it can affect other services that depend on that service, leading to an avalanche of system failures.



In order to solve the avalanche effect of distributed system, fuse mechanism is introduced in distributed system. The word Circuit Breaker comes from the knowledge of circuits in physics. It is used to cut off a Circuit quickly if something goes wrong in it, protecting the Circuit.

In Spring Cloud, Sentinel and Hystrix can be used for service circuit breakers and downgrades.

Hystrix

introduce

Hystrix is an open-source project from Netflix that provides a fuse feature that prevents linkage failures in distributed systems. Hystrix prevents linkage failures by isolating the access points of services and provides solutions for failures, thereby improving the resiliency of the entire distributed system.

The latest version of Hystrix on Github is 1.5.18, and it has been announced that maintenance has stopped and no new version will be released. The current stable version 1.5.18 is sufficient to meet the needs of existing applications. So Hystrix is not recommended when using newer versions of Spring Cloud. Hystrix officially recommends an alternative open source component: Resilience4j or use Sentinel below.

Hystrix design principles

Hystrix’s design principles are as follows:

  • Prevent the failure of a single service from depleting the thread resources of the Servlet container for the entire service (such as Tomcat).
  • A quick failure mechanism whereby if a service fails, the request to invoke that service fails quickly, rather than waiting for a thread.
  • Fallback provides a preset fallback scheme when a request fails.
  • Use a circuit breaker mechanism to prevent the fault from spreading to other services.
  • The Hystrix Dashboard, a fuse monitoring component, monitors the fuse status in real time.

How Hystrix works

The circuit breaker

A circuit breaker wraps remote method calls into a circuit breaker object that monitors the failure of the method call process. Once the number of failed method calls reaches a certain threshold for a certain period of time, the circuit breaker will trip, and subsequent calls to the method will be returned by the circuit breaker without an actual call to the method. This protects the service caller from sending requests when the service provider is unavailable, thereby reducing resource consumption in the thread pool.

Although the circuit breaker avoids invalid calls to the protected method when it is on, external intervention is required to reset the circuit breaker when the situation returns to normal so that the method call can occur again. Therefore, a reasonable circuit breaker should have certain switching logic, which requires a mechanism to control its re-closing.

  • Off: The circuit breaker is off. The number of failed calls is counted. The circuit breaker is opened when it reaches a certain threshold within a period of time.
  • On: The circuit breaker is on and a failure error is returned for a method call. No real method call occurs. A reset time is set. At the end of the reset time, the circuit breaker comes to the half-open state.
  • Half-open: The circuit breaker is half-open and method calls are allowed. When all the calls are successful (or reach a certain percentage), the circuit breaker is closed. Otherwise, the service is not restored and the circuit breaker is turned on again.

Turning on the circuit breaker can ensure that the service caller can quickly return results when invoking abnormal services, avoid a lot of synchronous waiting, and reduce the resource consumption of the service caller. In addition, the circuit breaker can continue to detect the result of the request execution after being opened for a period of time to determine whether the circuit breaker can be closed and restore the normal invocation of services.

Service degradation operation

A circuit breaker provides a measure of protection against service avalanche by isolating service callers and abnormal service providers. Service degradation is to properly abandon some services when the overall resources are insufficient, and put the main resources into the core services to ensure the stability of the core services of the system. In Hystrix, when an interservice call goes wrong, it executes and returns the result with an alternate Fallback method instead of the main method, degrading the failed service. When the number of failed calls to the service exceeds the breaker’s threshold for a period of time, the breaker will open and no real method calls will be made. Instead, it will quickly fail and directly execute Fallback logic. The service degrades, reducing resource consumption of the service caller and protecting thread resources in the service caller.

Resource isolation

In cargo ships, in order to prevent the spread of water leakage and fire, the cargo warehouse is generally divided to avoid the tragedy of a cargo warehouse accident leading to the sinking of the whole ship. Similarly, In Hystrix, the bulkhead pattern is used to isolate service providers in the system, so that delayed rise or failure of one service provider does not lead to the failure of the entire system, and can control the concurrency of calling these services.

Hystrix separates the calling service thread from the execution thread that the service accesses so that the calling thread is free to do other work without the execution of the service invocation being blocked for too long. In Hystrix, separate thread pools are used for each service provider to isolate and restrict these services. Thus, high latency or saturated resource constraints for a service provider can only occur in the thread pool corresponding to that service provider.

A failure or high latency of a DependencyD call will block only five threads in its own thread pool and will not affect the thread pools of other service providers. The system is completely isolated from service provider requests, and even if the thread corresponding to the service provider is completely exhausted, other requests in the system will not be affected.

Hystrix implementation idea

Hystrix implementation ideas:

  • It encapsulates all remote call logic into HystrixCommand or HystrixObservableCommand objects, and these remote calls will be executed in separate threads (resource isolation) using the command pattern in design pattern.
  • Hystrix uses an automatic timeout policy for requests whose access time exceeds the set threshold. This policy is valid for all commands (this feature is invalid if resource isolation is semaphore), and the timeout threshold can be customized through command configuration.
  • Maintain a thread pool (or semaphore) for each service provider, and when the thread pool is full, requests for that service provider are rejected outright (failing quickly) rather than queued, reducing the system’s resource wait.
  • There are four possible scenarios for the request service provider: success, failure, timeout, and thread pool full.
  • The circuit breaker mechanism will manually or automatically cut off the service for a period of time after the number of failed requests to the service provider exceeds a certain threshold.
  • In case of service denial, timeout and short circuit (multiple service providers request in sequence, the previous service provider fails to request, and the subsequent request will not be sent), the Fallback method is executed and the service degrades.
  • Provides near real-time monitoring and configuration change services.

Hystrix workflow

The simple process is as follows:

  1. Build the HystrixCommand or HystrixObservableCommand object.
  2. Execute the command.
  3. Check if there is a cache for the same command execution.
  4. Check whether the circuit breaker is on.
  5. Check whether the thread pool or semaphore has been consumed.
  6. A call to hystrixObservablecoment #construct or HystrixCommand#run executes the encapsulated remote call logic.
  7. Calculates the link health status.
  8. Get Fallback logic if command execution fails.
  9. Returns a successful Observable.

Other use

Asynchronous and asynchronous callbacks execute commands

In addition to executing commands synchronously, Hystrix can also execute commands asynchronously and asynchronously through callbacks. To execute commands asynchronously, you need to define the return mode of the function as Future

Request to merge

Hystrix also provides the ability to request merges. Multiple requests are consolidated into one request and processed at once, effectively reducing network traffic and thread pool resources. After a request merge, a request that might have completed in less than 6 milliseconds must now wait 10 milliseconds after the request merge cycle before sending the request, increasing the request time (16 milliseconds). Request merging is appropriate for handling high-concurrency and high-latency commands.

Use the Hystrix Dashboard to monitor the fuses

In the microservice architecture, the fuse model is developed to ensure the availability of service instances and prevent threads from blocking due to service instance failures. The condition of fuses is an important indicator of the availability and robustness of a program. The Hystrix Dashboard is a component that monitors the health of Hystrix fuses, providing data monitoring and a user-friendly graphical display interface.

Sentinel 

introduce

Spring Cloud Alibaba Sentinel is a flow control component oriented to distributed service architecture. It mainly takes flow as the entry point and helps developers guarantee the stability of micro-services from multiple dimensions such as flow limiting, flow shaping, fuse downgrading, system load protection and hot spot protection. Official document address.

Basic concepts of Sentinel

resources

Resources are the key concept of Sentinel. It can be anything in a Java application, for example, a service provided by the application, or another application invoked by the application, or even a piece of code.

Any code defined through the Sentinel API is a resource that can be protected by Sentinel. In most cases, resources can be identified using method signatures, urls, and even service names as resource names.

The rules

The rules set around the real-time status of resources can include flow control rules, fuse degrade rules, and system protection rules. All rules can be dynamically adjusted in real time.

Sentinel function and design concept

Flow control design concept

Flow control has the following angles:

  • Resource invocation relationship, such as resource invocation link, resource and resource relationship;
  • Performance metrics, such as QPS, thread pool, system load, etc.
  • Control effects, such as direct current limiting, cold start, queuing, etc.

Sentinel is designed to give you the freedom to choose the Angle of control and combine it flexibly to achieve the desired effect.

Fuse downgrading design concept

The principle of Sentinel and Hystrix is the same: when a resource is detected to be unstable in the invocation link, such as a long response time or a high proportion of exceptions, the invocation of this resource is restricted to make the request fail quickly and avoid the cascading failure of other resources. In terms of the means of restriction,

Sentinel and Hystrix take a completely different approach. Hystrix isolates dependencies (corresponding to resources in the Sentinel concept) by means of thread pool isolation. The benefit of doing this is to achieve the most complete isolation between resources. The disadvantage is that in addition to increasing the cost of thread switching (too many thread pools result in too many threads), there is also a need to allocate thread pool sizes for each resource up front.

Sentinel approaches this problem in two ways:

  • Limit by the number of concurrent threads

Unlike the resource pool isolation approach, Sentinel reduces the impact of unstable resources on other resources by limiting the number of concurrent threads for resources. Not only is there no thread switching wastage, you don’t need to pre-allocate the size of the thread pool. When a resource is unstable, such as a long response time, the direct effect on the resource is a gradual accumulation of threads. When the number of threads accumulates to a certain number on a particular resource, new requests for that resource are rejected. The stacked thread completes its task before continuing to receive requests.

  • Degrade resources by response time

In addition to controlling the number of concurrent threads, Sentinel can quickly degrade unstable resources through response time. If the response time of a dependent resource is too long, all access to the resource is denied until the specified time window expires.

System adaptive protection

Sentinel also provides adaptive protection at the system dimension. Avalanche prevention is an important part of system protection. When system load is high, if you continue to let requests in, the system may crash and fail to respond. In a clustered environment, the network load balancer will forward the traffic that should be carried by this machine to other machines. If other machines are also in an edge state, the increased traffic will cause that machine to crash and the cluster to become unavailable.

Sentinel provides a protection mechanism to balance incoming traffic with the load of the system and ensure that the system can handle the most requests within its capacity.

Sentinel working mechanism

The main working mechanism of Sentinel is as follows:

  • Provide adaptive or display apis for mainstream frameworks to define resources that need to be protected, and provide facilities to perform real-time statistics on resources and call link analysis.
  • Traffic is controlled based on preset rules and real-time resource statistics. Sentinel also provides an open interface for defining and changing rules.
  • Sentinel provides a real-time monitoring system to quickly understand the status of the current system.

Use of Sentinel

Feign support

Sentinel ADAPTS to Feign components. If you want to use it, in addition to introducing the spring-cloud-starter-Alibaba-Sentinel dependency, there are two steps:

  • Enable Sentinel support for Feign: feign.sentinel.enabled=true

  • Adding the Spring-Cloud-starter-OpenFeign dependency enables the automatic configuration classes in Sentinel Starter to take effect:

    org.springframework.cloud spring-cloud-starter-openfeign

Here is a simple use example of FeignClient:

@FeignClient(name = "service-provider", fallback = EchoServiceFallback.class, configuration = FeignConfiguration.class) public interface EchoService { @RequestMapping(value = "/echo/{str}", method = RequestMethod.GET) String echo(@PathVariable("str") String str); } class FeignConfiguration { @Bean public EchoServiceFallback echoServiceFallback() { return new EchoServiceFallback(); } } class EchoServiceFallback implements EchoService { @Override public String echo(@PathVariable("str") String str) { return "echo fallback"; }}Copy the code

Feign resource name strategy in the corresponding interface definition: httpmethod: protocol: / / requesturl. All attributes in the @feignClient annotation are compatible with Sentinel.

The resource name of echo on the EchoService interface is GET:http://service-provider/echo/{str}.

reference

Spring Cloud microservices architecture advanced understanding of Spring Cloud microservices architecture