This is the 10th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

This series code address: github.com/JoJoTec/spr…

In the previous section, we implemented FeignClient gluing Resilience4J to implement Retry. Careful readers may ask why the implementation here does not include circuit breakers and line limits:

@Bean public FeignDecorators.Builder defaultBuilder( Environment environment, String name = environment.getProperty("feign.client.name"); Retry retry = null; try { retry = retryRegistry.retry(name, name); } catch (ConfigurationNotFoundException e) { retry = retryRegistry.retry(name); } // Override the exception judgment and retry only on feign.retryableException, All exceptions that need retries are encapsulated in Our Terrordecoder and Resilience4jFeignClient as RetryableException Retry = Retry.of(name, RetryConfig.from(retry.getRetryConfig()).retryOnException(throwable -> { return throwable instanceof feign.RetryableException; }).build()); return FeignDecorators.builder().withRetry( retry ); }Copy the code

The main reason is that the disadvantages of adding circuit breakers and thread isolation, whose granularity is at the microservice level, are:

  • As long as one instance of a microservice remains abnormal, the entire microservice will be disconnected
  • As long as one method of the microservice remains abnormal, the entire microservice will be disconnected
  • One instance of the microservice is slow, others are fine, but the polling load-balancing mode causes the thread pool to be clogged with requests from this instance. Because of this one slow instance, the entire microservice request is being slowed down

Review the microservice retries, disconnections, thread isolation that we want to implement

Request retry

Take a look at a few scenarios:

1. When the service is published online or a service is offline due to problems, the old service instance has been offline in the registry and the instance has been closed, but other microservices have local service instance cache or are using this service instance to call. A java.io.IOException is thrown because a TCP connection cannot be established. Different frameworks use different subexceptions of this exception, but the message is usually connect time out or no route to host. If you try again and retry a normal instance instead of the same instance, the call succeeds. As shown below:

2. When calling a microservice returns a non-2xx response code:

A) 4XX: When publishing interface updates, both the caller and the called may need to publish them. If the new interface parameter changes and is not compatible with the old call, there will be an exception, usually a parameter error, that is, return 4XX response code. For example, the new caller calls the old called. In this case, retry can solve the problem. But to be on the safe side, we only retry GET methods (that is, query methods, or non-GET methods that are explicitly marked to retry) for requests that have already been issued. We do not retry non-GET requests. As shown below:

B) 5XX: a 5XX exception occurs when an instance fails to connect to the database, JVM stop-the-world, etc. In this case, retry is also possible. Again, to be on the safe side, we retry only GET methods (that is, query methods, or non-GET methods that are explicitly marked to retry) for such requests that have already been issued, and we do not retry non-GET requests. As shown below:

3. Circuit breaker open exception: As we will see later, our circuit breaker is specific to an instance of a microservice at a method level. If a circuit breaker open exception is thrown and the request is not actually sent, we can retry directly.

4. Flow limiting exceptions: As we will see later, we have a separate thread pool isolation for each microservice instance. If the thread pool is full and rejects requests, a flow limiting exception will be thrown, for which direct retries will also be required.

These scenarios are still common when updates are published online, and when traffic suddenly arrives causing problems in some instances. If no retry is performed, users often see abnormal pages, which affects user experience. So retries are necessary in these scenarios. For retries, we used Resilience4J as the core of our entire framework to implement the retry mechanism.

Thread isolation at the microservice instance level

Here’s another scenario:

Microservice A invokes all instances of microservice B from the same thread pool. If an instance has a problem, the request is blocked, or the response is very slow. Over time, the thread pool will be filled with requests to the exception instance, but microservice B actually has a working instance.

To prevent this, and to limit the concurrency (i.e., flow limiting) of calling each microservice instance, we use different thread pools to call different instances of different microservices. This is also done through Resilience4J.

Microservice instance method granularity of circuit breakers

If an instance for a period of stress lead to request slow, or instance is shut down, and there is something wrong with the instance to request response is mostly 500, so even if we have a retry mechanism, if there is something wrong with the many requests are in accordance with the request to the instance of the – > – > try again other instance failure, so efficiency is very low. This requires the use of circuit breakers.

In practice, we find that in most cases, some interfaces of some instances of a microservice have exceptions, while other interfaces on these instances are often available. So our circuit breaker cannot directly disconnect the entire instance, let alone the entire microservice. Therefore, what we use Resilience4J to achieve is the circuit breaker of microservice instance method level (i.e. different microservices, different methods of different instances are different circuit breakers).

Use resilience4J circuit breakers and line current limiter

Next, we will first look at the configuration of the circuit breaker to understand the principle of resilience4J circuit breaker:

CircuitBreakerConfig.java

Private Predicate<Throwable> recordExceptionPredicate = Throwable -> true; // Determine whether an exception is recorded as a breaker failure. By default, all exceptions are failures. // Determine if a return object is logged as a circuit breaker failure, Private TRANSIENT Predicate<Object> recordResultPredicate = (Object Object) -> false; Private Predicate<Throwable> ignoreExceptionPredicate = Throwable -> false; // Determine whether an exception is not considered a circuit breaker failure. All exceptions are considered failures by default. Private Function<Clock, Long> currentTimestampFunction = Clock -> system.nanotime (); Private TimeUnit timestampUnit = timeunit.nanoseconds; // Exception list, which specifies a list of exceptions. All exceptions in this set, or subclasses of exceptions, that are thrown when called, will be recorded as failures. Other exceptions are not considered failures, or exceptions configured in ignoreExceptions are not considered failures. The default is that all exceptions are considered failures. private Class<? extends Throwable>[] recordExceptions = new Class[0]; // Exception whitelist, in which all exceptions and their subclasses are not considered failed requests, even if they are configured in recordExceptions. The default whitelist is empty. private Class<? extends Throwable>[] ignoreExceptions = new Class[0]; // Percentage of failed requests, beyond which 'CircuitBreaker' becomes' OPEN ', default 50% private Float failureRateThreshold = 50; / / when ` CircuitBreaker ` in ` HALF_OPEN ` state, allows the number of requests private int permittedNumberOfCallsInHalfOpenState = 10; // Sliding window size. If COUNT_BASED is set to 100, the last 100 requests are made. If TIME_BASED is set to 100, the last 100 requests are made. private int slidingWindowSize = 100; // Sliding window type, 'COUNT_BASED' means counting sliding window, 'TIME_BASED' represents a time-based sliding window. Private SlidingWindowType SlidingWindowType = slidingWindowtype.count_based; // Minimum number of requests. Only in the sliding window, the number of requests reaches this number, will trigger the 'CircuitBreaker' to determine whether to open the CircuitBreaker. private int minimumNumberOfCalls = 100; // The exception stack is cached when the exception is generated. // All breaker related exceptions inherit runtimeExceptions. Unified specify these abnormal writableStackTrace here / / is set to false, the exception will be no exception stack, but will improve performance private Boolean writableStackTraceEnabled = true; // If set to 'true' indicates whether the 'OPEN' state automatically changes to 'HALF_OPEN' even if no request is received. private boolean automaticTransitionFromOpenToHalfOpenEnabled = false; // Wait time function in OPEN state, default is fixed 60s, after wait and time, Will exit the OPEN state private IntervalFunction waitIntervalFunctionInOpenState = IntervalFunction. Of (Duration. OfSeconds (60)); // When some object or exception is returned, the state is directly changed to another state. Private Function<Either<Object, Throwable>, TransitionCheckResult> transitionOnResult = any -> TransitionCheckResult.noTransition(); // When the slow calls reach this percentage, 'CircuitBreaker' will become 'OPEN' // By default, slow calls will not cause 'CircuitBreaker' to become 'OPEN', Because the default configuration is 100 percent private float slowCallRateThreshold = 100; / / slow call time, when a call is slow at this time, will be recorded as slow calls private Duration slowCallDurationThreshold = Duration. OfSeconds (60); // 'CircuitBreaker' keeps' HALF_OPEN 'for a time. Default to 0, that is, keep 'HALF_OPEN' until minimumNumberOfCalls succeeds or fails. private Duration maxWaitDurationInHalfOpenState = Duration.ofSeconds(0);Copy the code

Then there is the configuration of thread isolation:

ThreadPoolBulkheadConfig.java

Private int maxThreadPoolSize = Runtime.getruntime ().availableProcessors(); private int maxThreadPoolSize = Runtime.getruntime (). private int coreThreadPoolSize = Runtime.getRuntime().availableProcessors(); private int queueCapacity = 100; private Duration keepAliveDuration = Duration.ofMillis(20); private RejectedExecutionHandler rejectedExecutionHandler = new ThreadPoolExecutor.AbortPolicy(); // The exception stack is cached when the exception is generated. Unified specify these abnormal writableStackTrace here / / is set to false, the exception will be no exception stack, but will improve performance private Boolean writableStackTraceEnabled = true; // A lot of Context passing in Java is based on ThreadLocal, but this is equivalent to switching threads. Some tasks need to maintain Context. Private List<ContextPropagator> contextPropagators = new ArrayList<>();Copy the code

After adding the resilience4J-spring-Cloud2 dependency described in the previous section, we can configure circuit breakers and thread isolation as follows:

resilience4j.circuitbreaker:
  configs:
    default:
      registerHealthIndicator: true
      slidingWindowSize: 10
      minimumNumberOfCalls: 5
      slidingWindowType: TIME_BASED
      permittedNumberOfCallsInHalfOpenState: 3
      automaticTransitionFromOpenToHalfOpenEnabled: true
      waitDurationInOpenState: 2s
      failureRateThreshold: 30
      eventConsumerBufferSize: 10
      recordExceptions:
        - java.lang.Exception
resilience4j.thread-pool-bulkhead:
  configs:
    default:
      maxThreadPoolSize: 50
      coreThreadPoolSize: 10
      queueCapacity: 1000
Copy the code

How to implement microservice instance method granularity breaker

What we want to implement is that each method for each instance of each microservice is a different circuit breaker, and we need to get:

  • The service name
  • Instance ID, or a string that uniquely identifies an instance
  • Method name: Can be a URL path or a fully qualified method name.

We’re using the fully qualified method name here, not the URL path, because some FeignClient places parameters on the path, such as @pathVriable. If the parameter is something like a user ID, then a user will have a separate circuit breaker. This is not what we expected. So use fully qualified method names to get around this problem.

So where do you get them? Reviewing the core flow of FeignClient, we see that we need to get the instance ID after the load balancer call is completed during the actual call. In the org. Springframework. Cloud. Openfeign. Loadbalancer. FeignBlockingLoadBalancerClient call is completed. So, we put in our circuit breaker code here to implement the circuit breaker.

The other is the configuration granularity, which can be configured individually for each FeignClient, not at the method level. Here’s an example:

resilience4j.circuitbreaker:
  configs:
    default:
      slidingWindowSize: 10
    feign-client-1:
      slidingWindowSize: 100
Copy the code

So here’s the code, contextId, which is called Feign-client-1, which is a different microservice instance method, serviceInstanceMethodId. If contextId corresponding configuration didn’t find, will be thrown ConfigurationNotFoundException, this time we will read and use the default configuration.

try {
    circuitBreaker = circuitBreakerRegistry.circuitBreaker(serviceInstanceMethodId, contextId);
} catch (ConfigurationNotFoundException e) {
    circuitBreaker = circuitBreakerRegistry.circuitBreaker(serviceInstanceMethodId);
}
Copy the code

How to implement a microservice instance line stream limiter

For thread-isolated flow limiter, we only need the microserver name and instance ID, and these thread pools only make calls, so just like circuit breakers, Can be placed in org. Springframework. Cloud. Openfeign. Loadbalancer. FeignBlockingLoadBalancerClient call is completed, implant line form is related to the code implementation.

Wechat search “my programming meow” public account, a daily brush, easy to improve skills, won a variety of offers