Moment For Technology

Spring Cloud Feign + Hystrix + Ribbon Call between services + downgrade + fuse + load balancing

Posted on Dec. 2, 2022, 8:16 a.m. by Martyn Williams
Category: The back-end Tag: Spring Cloud

Declarative calls between Feign microservices

Feign is a declarative Web Service client that makes it easier to write Web Service clients. You can use Feign to create and annotate an interface. In short, it helps to encapsulate HTTP requests so that you can request the interface of another service just as if you were calling a local method.

Led package: compile group: 'org. Springframework. Cloud', name: 'spring - the cloud - starter - openfeign'

Enable:@EnableFeignClients
Example:
@FeignClient(name = "${feign.name}", url = "${feign.url}")
public interface StoreClient {
    //..
}
Copy the code
contextId:

To split multiple interfaces under a service into different classes, register the bean with the same name (store in the following example) in different context, otherwise the bean name conflict will be reported

@FeignClient(name = "store", contextId="store/goods", url = "${feign.url}")
public interface StoreClient1 {
    //..
}
Copy the code
@FeignClient(name = store", contextId="store/cash", url = "${feign.url}")
public interface StoreClient2 {
    //..
}
Copy the code
Two design ideas of FeignClient when invoking between services:
  • The service provider encapsulates the methods it wants to expose to other parties into a FeignClient, packs them into a JAR package, and lets the service user import the JAR package to call them using the API method. The benefits of this are:

    1. The interface in the JAR package is exposed to the outside world as a capability of the service provider, and the consumer can know exactly what methods are available to him;

    2. As long as the service provider writes it once, multiple consumers can use the same JAR package, saving each consumer the effort of encapsulating FeignClient

  • The service consumer encapsulates FeginClient within its own service. The advantage of this is that the consumer can configure the return of the degraded method through the Fallback property of the FeignClient

Service downgrades fallback configuration

Two ways: 1.Fallback.class

@FeignClient(name = "test", url = "http://localhost:${server.port}/", fallback = Fallback.class) protected interface TestClient { @RequestMapping(method = RequestMethod.GET, value = "/hello") Hello getHello(); @RequestMapping(method = RequestMethod.GET, value = "/hellonotfound") String getException(); } @Component static class Fallback implements TestClient { @Override public Hello getHello() { throw new NoFallbackAvailableException("Boom!" , new RuntimeException()); } @Override public String getException() { return "Fixed response"; }}}Copy the code

2. Fallbackfactory.class, which you need if you want to catch exceptions

@FeignClient(name = "testClientWithFactory", url = "http://localhost:${server.port}/", fallbackFactory = TestFallbackFactory.class) protected interface TestClientWithFactory { @RequestMapping(method = RequestMethod.GET, value = "/hello") Hello getHello(); @RequestMapping(method = RequestMethod.GET, value = "/hellonotfound") String getException(); } @Component static class TestFallbackFactory implements FallbackFactoryFallbackWithFactory { @Override public FallbackWithFactory create(Throwable cause) { return new FallbackWithFactory(); } } static class FallbackWithFactory implements TestClientWithFactory { @Override public Hello getHello() { throw new NoFallbackAvailableException("Boom!" , new RuntimeException()); } @Override public String getException() { return "Fixed response"; }}}Copy the code

Hystrix service downgrading and fusing

Led package: compile group: 'org. Springframework. Cloud', name: 'spring - the cloud - starter - netflix - hystrix

When a service provider has a problem, the user of the service can be fault-tolerant with Hystrix, preventing errors from spreading throughout the service chain.

To use Hystrix, you need to enable it in your feign configuration: feign.hystrix.enabled = true

Degradation: When a service invocation fails, you can use the fallback method to return a base data

Circuit breaker: When the call failure reaches a certain threshold, for example, 70% of the requests within a minute fail, the service provider's interface is no longer requested and the base data is directly returned. When the call is in the circuit breaker state, the base data is directly returned for each request and the service provider is no longer requested

Half circuit breaker: Half circuit breaker is a recovery mechanism. After the circuit breaker is disconnected for a certain period of time, the system sends a request to the service provider again. If a successful response is received, the system closes the circuit breaker

Isolation policy: When we use Hystrix, Hystrix encapsulates all external calls into a HystrixCommand or HystrixObservableCommand object, and these external calls run in a separate thread. We can isolate the faulty service by means of circuit breaker, downgrade, etc., so that the main business of the whole system is not affected.

The process flow of Hystrix is as follows:

Common configurations:

Hystrix: command: # Global default configuration default: # Thread isolation related Execution: timeout: # Specifies whether to set a timeout period for method execution (degradation related). The default is true enabled: True isolation: # Configure request isolation mode, which is the default thread pool mode. Semaphore strategy: threadPool thread: # Specifies the number of milliseconds before you start a semaphore. The default value is 1000 milliseconds. 10000 circuitBreaker: # Fuse related configuration Enabled: true # Whether to enable the fuse requestVolumeThreshold: 20 # fuse in the attribute set rolling window of a minimum number of failed requests, if the attribute value of 20, in the window of time (10 seconds), if only received 19 requests and failed, the breaker will not open, the default value: 20 sleepWindowInMilliseconds: After 5000 # fusing, within this value of time, hystrix would reject the new request, only after the time of the breaker to send the request overtime errorThresholdPercentage: 50 # percentage of threshold setting failure. If the failure rate exceeds this value, the circuit breaker is brokenCopy the code

Ribbon Client load balancing

Led package: compile group: 'org. Springframework. Cloud', name: 'spring - cloud - starter - netflix - ribbon'

Load balancing can be divided into client load balancing and server load balancing.

For example, Nginx is the representative of server load balancing. When nGINx receives a request, it selects one of the machines corresponding to the request domain name according to the configured load balancing policy, and then sends the request to this machine.

And client side load balancing, it is every micro services registered to A registry, such as the consul, and A local service maintains A registry of each service corresponding to the list of all the machines when the service request service to B, it should be according to the load balancing strategy, from the service of the machine corresponds to A B, initiate the request. The problem the Ribbon solves is client load balancing.

Common configurations:

MaxAutoRetries: # Number of retry attempts on the current instance, not including the first request. Default is 0. The number of 5 # switch case, does not contain the first instance of the request and if the service registry list is less than configuration values, then will loop request A  B  A, default 1 MaxAutoRetriesNextServer: 3 # are all operating retry OkToRetryOnAllOperations: false # connection timeout, milliseconds ConnectTimeout: 3000 # read timeout, milliseconds ReadTimeout: 3000 # instance configuration clientName: ribbon: MaxAutoRetries: 5 MaxAutoRetriesNextServer: 3 OkToRetryOnAllOperations: false ConnectTimeout: 3000 ReadTimeout: 3000Copy the code

OkToRetryOnAllOperations: If this value is false, only Get requests will be retried. If this value is true, all requests, including POST and PUT, will be retried. The default value of this value is set to false. This is to prevent the interface from implementing idempotences when the post and PUT methods try again and modify the data multiple times.

Do not modify the data in the logic of the GET request. Otherwise, the retry mechanism may cause the interface to be called multiple times and the data to be modified multiple times, resulting in inconsistency.

What is the relationship between the timeouts of these components and how can they be set?

Timeout duration of a single HTTP request: Ribbon and Feign timeout (ConnectTimeout ReadTimeout) is the control of a single Http request timeout, if at the same time configuration Feign and Ribbon timeout, Feign timeout configuration overrides configuration of Ribbon.

The actual test of feIGN and configuration is as follows:

Will be configured in the RetryableFeignLoadBalancer cover, if there is feign configuration, it covers the ribbon, as follows:

The total time limit for a request is:

R e q T i m e = ( C o n n e c t T i m e o u t + R e a d T i m e o u t ) ReqTime=(ConnectTimeout + ReadTimeout )

Total number of HTTP requests: The total number of requests is based on the ribbon's MaxAutoRetries (the number of retries on the current instance, which does not count the first) and MaxAutoRetriesNextServer (the number of switched instances, If it is configured for a separate feignClient, the separate configuration takes precedence over the global configuration. Total number of requests:

T o t a l N u m = ( M a x A u t o R e t r i e s + 1 ) ( and M a x A u t o R e t r i e s N e x t S e r v e r + 1 ) TotalNum = (MaxAutoRetries+1)*(and MaxAutoRetriesNextServer+1)

The Ribbon sets the total time limit for retries. The Ribbon sets the total time limit for retries. The Ribbon sets the total time limit for retries.

T o t a l R i b b o n L i m i t T i m e = R e q T i m e T o t a l N u m TotalRibbonLimitTime = ReqTime * TotalNum

Actual retry time: for normal requests, the ActualConnectTime and ActualReadTime should be smaller than the configured timeout (if not, the configuration is faulty and needs to be adjusted). The number of retries that can be allowed still complies with the ribbon Settings. Therefore, the actual retry time should be:

A c t u a l T o T a l T i m e = A c t u a l R e q T i m e T o t a l N u m = ( A c t u a l C o n n e c t T i m e + A c t u a l R e a d T i m e ) T o t a l N u m ActualToTalTime = ActualReqTime * TotalNum = (ActualConnectTime + ActualReadTime) * TotalNum

Hystrix timeout period: Hystrix timeout configuration execution. The isolation. Thread. TimeoutInMilliseconds (hereinafter referred to as HystrixTimeout), is used to control when the relegation, more than the time it returns the fallback data.

So when exactly will the downgrade happen?

If the request times out, the Ribbon runs out of timeouts for each request, so the degrade time is:

F a l l b a c k T i m e = M i n ( T o t a l R i b b o n L i m i t T i m e . H y s t r i x T i m e o u t ) FallbackTime = Min(TotalRibbonLimitTime, HystrixTimeout)

If the request does not time out, but the request return code is not 200, the degradation time is:

F a l l b a c k T i m e = M i n ( A c t u a l T o T a l T i m e . H y s t r i x T i m e o u t ) FallbackTime = Min(ActualToTalTime, HystrixTimeout)

In summary, the downgrade time should be:

F a l l b a c k T i m e = M i n ( A c t u a l T o T a l T i m e . T o t a l R i b b o n L i m i t T i m e . H y s t r i x T i m e o u t ) FallbackTime = Min(ActualToTalTime,TotalRibbonLimitTime, HystrixTimeout)

Therefore, in the actual project configuration, HystrixTimeout should be slightly larger than TotalRibbonLimitTime. If HystrixTimeout TotalRibbonLimitTime, then the retries are not finished and the base data is degraded and returned. The subsequent retries will be useless even if successful. However, the setting of HystrixTimeout much larger than TotalRibbonLimitTime is also useless, because the actual fallback time is the one with the smallest time weight. Min (ActualToTalTime TotalRibbonLimitTime, HystrixTimeout).

In addition, if the server returns a result within the feign timeout, but the return status code is not 200 (400,500, etc.), the retry will not be performed, because the retry mechanism in the Ribbon determines whether a non-200 returned status code is in the list of retried status codes. This list is configured with retryableStatusCodes, which are null by default.

Search
About
mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.