Principle and practice of three circuit breaker frames for distributed systems

Springboot Actual e-commerce project Mall4j (https://gitee.com/gz-yami/mall4j)

Java open source mall system

With the popularity of micro services, circuit breaker is widely known as one of the most important technologies. When the running quality of the microservice is lower than a critical value, the circuit breaker mechanism is enabled to suspend the invocation of the microservice for a period of time to ensure that the back-end microservice will not break down due to continuous overload. This paper introduces how to use Hystrix, the new generation of fuse Resilience4j and Alibaba’s open-source Sentinel. Welcome to point out any mistakes.

1. Why is a circuit breaker needed

The Circuit Breaker pattern comes from Martin Fowler’s Circuit Breaker. The circuit breaker itself is a switching device used to protect the circuit from overload on the circuit. When there is a short circuit in the circuit, the circuit breaker can cut off the fault circuit in time to prevent overload, heating and even fire and other serious consequences.

In a distributed architecture, the breaker mode works similarly. When a service unit fails (similar to a short circuit in an appliance), the breaker’s fault monitoring (similar to blowing a fuse) returns an error response to the caller, rather than waiting for a long time. In this way, threads will not be held for a long time due to invocation of faulty services, and the spread of faults in distributed systems is avoided.

In view of the above problems, circuit breaker is a framework for realizing a series of service protection functions, such as disconnection, thread isolation and flow control. Nodes of systems, services, and third-party libraries to provide greater fault tolerance for delays and failures.

2. Hystrix

2.1 What is Hystrix

Hystrix is an open source framework for Netfix that features dependency isolation, system fault tolerance and degradation, which are two of its most important uses, and request consolidation.

2.2 Hystrix simple case

2.2.1 Create a Hystrix project to introduce dependencies

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
</dependency>
Copy the code

2.2.2 Add comments @enablecircuitbreaker // Enable a circuit breaker on the bootstrap class

@EnableCircuitBreaker public class TestApplication extends SpringBootServletInitializer{ public static void main(String[] args) { SpringApplication.run(ApiApplication.class, args); }}Copy the code

2.2.3 Add circuit breaker logic to TestProductController

@requestMapping ("/get/{id}") @hystrixCommand (fallbackMethod="errorCallBack") Public Object get(@pathVariable ("id") long id){Product p= productService.findById(id); If (p==null){throw new RuntimeException(" null "); } return p; } public Object errorCallBack(@pathVariable ("id") long id){return id+" not present,error"; }Copy the code

2.3 summarize

A brief introduction to how Hystrix works and a simple case study, but Hystrix has officially stopped development, so I won’t go into further details.

3. Resilience4j

3.1 introduction

After Hystrix officially stopped development, Hystrix officially recommended using the new generation fuse as Resilience4j. Resilience4j is a lightweight, easy-to-use fault-tolerant library inspired by Netflix Hystrix, but designed specifically for Java 8 and functional programming. Because the library only uses Vavr (formerly known as Javaslang), it does not have any other external dependencies under it. In contrast, Netflix Hystrix has compilation dependence on Archaius, and Archaius has more external library dependencies, such as Guava and Apache Commons Configuration. If Resilience4j is needed, You don’t need to import all the dependencies, just select the functional modules you need.

3.2 Module Composition

Resilience4j provides several core modules:

Resilience4j-circuitbreaker: Circuit disconnection Resilience4j-ratelimiter: speed limit Resilience4j-Bulkhead: Bulkhead Resilience4j-retry: Automatic retry (synchronous and asynchronous) Resilience4J-timelimiter: timeout processing Resilience4j-cache: result cacheCopy the code

3.3 setup Maven

Introduction of depend on

<dependency> <groupId>io.github.resilience4j</groupId> <artifactId>resilience4j-circuitbreaker</artifactId> The < version > 0.13.2 < / version > < / dependency >Copy the code

3.4 Circuit Breaker (CircuitBreaker)

Please note that to use this feature, we need to introduce the resilience4J-Circuitbreaker dependency above.

This fuse mode helps prevent cascading failures in the event of a remote service failure.

After multiple failed requests, we considered the service unavailable/overloaded and short-circuited all subsequent requests so that we could save system resources. Let’s look at how we can achieve this with Resilience4j.

First, we need to define the Settings to use. The easiest way is to use the default Settings:

CircuitBreakerRegistry circuitBreakerRegistry  = CircuitBreakerRegistry.ofDefaults();
Copy the code

You can also use custom parameters:

CircuitBreakerConfig config = CircuitBreakerConfig.custom()
  .failureRateThreshold(20)
  .ringBufferSizeInClosedState(5)
  .build();
Copy the code

Here, we set the ratethreshold to 20% and retry at least five times.

We then create a CircuitBreaker object and invoke the remote service through it:

interface RemoteService {
    int process(int i);
}

CircuitBreakerRegistry registry = CircuitBreakerRegistry.of(config);
CircuitBreaker circuitBreaker = registry.circuitBreaker("my");
Function<Integer, Integer> decorated = CircuitBreaker
  .decorateFunction(circuitBreaker, service::process);
Copy the code

Finally, let’s see how it passes the JUnit test.

We call the service 10 times. You can verify that the service is invoked at least five times, and if 20% of the time it fails, the invocation is stopped.

when(service.process(any(Integer.class))).thenThrow(new RuntimeException());

for (int i = 0; i < 10; i++) {
    try {
        decorated.apply(i);
    } catch (Exception ignore) {}
}

verify(service, times(5)).process(any(Integer.class));
Copy the code

Three states of the circuit breaker:

  • Off – The service is normal and no short circuit is involved
  • On – The remote service is down and all requests are short-circuited
  • Half-open – After entering the open state for a period of time (based on the configured amount of time), the fuse allows you to check whether remote service has been restored

We can configure the following Settings:

  • The value is higher than the threshold of the failure rateCircuitBreakerOpen the
  • Wait time, used for definitionCircuitBreakerThe time that should remain open before switching to half-open
  • whenCircuitBreakerThe size of the ring buffer when half open or closed
  • Listeners that handle custom eventsCircuitBreakerEventListenerThat processesCircuitBreakerThe event
  • The customThe predicateIs used to evaluate whether an exception is a failure, thereby increasing the failure rate

3.5 Current limiter

This functionality requires the Resilience4J-Ratelimiter dependency.

A simple example:

RateLimiterConfig config = RateLimiterConfig.custom().limitForPeriod(2).build();
RateLimiterRegistry registry = RateLimiterRegistry.of(config);
RateLimiter rateLimiter = registry.rateLimiter("my");
Function<Integer, Integer> decorated
  = RateLimiter.decorateFunction(rateLimiter, service::process);
Copy the code

All calls to decorateFunction now conform to Rate Limiter.

You can set the following parameters:

  • Limit refresh time
  • Permission restrictions during refresh
  • Default waiting permission period

3.6 Bulkhead isolation

The Resilience4J-Bulkhead dependency needs to be introduced to limit the number of concurrent calls to a particular service.

Let’s look at an example of configuring concurrent calls using the Bulkhead API:

BulkheadConfig config = BulkheadConfig.custom().maxConcurrentCalls(1).build();
BulkheadRegistry registry = BulkheadRegistry.of(config);
Bulkhead bulkhead = registry.bulkhead("my");
Function<Integer, Integer> decorated
  = Bulkhead.decorateFunction(bulkhead, service::process);
Copy the code

To test, we can call a mock service method. In this case, we make sure Bulkhead doesn’t allow any other calls:

CountDownLatch latch = new CountDownLatch(1);
when(service.process(anyInt())).thenAnswer(invocation -> {
    latch.countDown();
    Thread.currentThread().join();
    return null;
});

ForkJoinTask<?> task = ForkJoinPool.commonPool().submit(() -> {
    try {
        decorated.apply(1);
    } finally {
        bulkhead.onComplete();
    }
});
latch.await();
assertThat(bulkhead.isCallPermitted()).isFalse();
Copy the code

We can configure the following Settings:

  • The maximum number of parallelism allowed
  • Maximum time a thread will wait when entering a saturated bulkhead

3.7 try again

The Resilience4J-Retry library needs to be introduced. You can use the Retry call to automatically Retry after it fails:

RetryConfig config = RetryConfig.custom().maxAttempts(2).build();
RetryRegistry registry = RetryRegistry.of(config);
Retry retry = registry.retry("my");
Function<Integer, Void> decorated
  = Retry.decorateFunction(retry, (Integer s) -> {
        service.process(s);
        return null;
    });
Copy the code

Now, let’s simulate a situation where an exception is thrown during a remote service call and ensure that the library automatically retries failed calls:

when(service.process(anyInt())).thenThrow(new RuntimeException());
try {
    decorated.apply(1);
    fail("Expected an exception to be thrown if all retries failed");
} catch (Exception e) {
    verify(service, times(2)).process(any(Integer.class));
}
Copy the code

We can also configure:

  • Maximum number of attempts
  • Wait time before retry
  • Custom function to modify the wait interval after a failure
  • The customThe predicateIs used to evaluate whether an exception will result in a retry call

3.8 the cache

The Cache module needs to introduce the Resilience4J-cache dependency. The initialization code is as follows:

javax.cache.Cache cache = ... ; // Use appropriate cache here Cache<Integer, Integer> cacheContext = Cache.of(cache); Function<Integer, Integer> decorated = Cache.decorateSupplier(cacheContext, () -> service.process(1));Copy the code

The caching here is achieved through jSR-107 Cache implementation, and Resilience4j provides a method to operate the caching.

Note that there is no API for decorating methods (such as cache.decorateFunction (Function)), which supports only the Supplier and Callable types.

3.9 interval timer

For this module, we need to introduce the Resilience4J-Timelimiter dependency to limit the time it takes to call the remote service using Timelimiter.

We set a TimeLimiter with a timeout of 1 millisecond to facilitate testing:

long ttl = 1;
TimeLimiterConfig config
  = TimeLimiterConfig.custom().timeoutDuration(Duration.ofMillis(ttl)).build();
TimeLimiter timeLimiter = TimeLimiter.of(config);
Copy the code

Next, let’s call * future.get () * to verify that the Resilience4j times out as expected:

Future futureMock = mock(Future.class);
Callable restrictedCall
  = TimeLimiter.decorateFutureSupplier(timeLimiter, () -> futureMock);
restrictedCall.call();

verify(futureMock).get(ttl, TimeUnit.MILLISECONDS);
Copy the code

We can also use it in conjunction with a CircuitBreaker:

Callable chainedCallable
  = CircuitBreaker.decorateCallable(circuitBreaker, restrictedCall);
Copy the code

3.10 Additional Modules

Resilience4j also provides a number of additional functionality modules that simplify its integration with popular frameworks and libraries.

Some of the more common integrations are:

  • Spring the Boot –resilience4j-spring-bootThe module
  • RatpackResilience4j-ratpackThe module
  • Retrofit — Resilience4J-RetroFIT module
  • Vertx –Resilience4j-vertxThe module
  • Dropwizard –Resilience4j-metricsThe module
  • Prometheus –resilience4j-prometheusThe module

3.11 summarize

Through the above, we have learned about the simple use of all aspects of the Resilience4j library and how to use it to solve various fault tolerance problems in communication between servers. The Resilience4j source code can be found on GitHub.

4. Sentinel

4.1 What is Sentinel?

Sentinel is a lightweight flow control component oriented to distributed service architecture. It is open source by Alibaba. It mainly takes traffic as the entry point and guarantees the stability of micro-service from multiple dimensions such as flow limiting, traffic shaping, fuse downgrading and system load protection.

4.2 Sentinel has the following characteristics:

  • Rich application scenarios: Sentinel has undertaken the core scenarios of Alibaba’s double Eleven traffic drive in the past 10 years, such as SEC killing (i.e. burst traffic control within the range of system capacity), message peaking and valley filling, cluster flow control, real-time fusing of unavailable downstream applications, etc.
  • Complete real-time monitoring: Sentinel also provides real-time monitoring capabilities. From the console, you can see a summary of the performance of a single machine-by-second data, or even a cluster of less than 500 machines, for accessing the application.
  • Extensive Open source ecosystem: Sentinel provides out-of-the-box integration modules with other open source frameworks/libraries, such as Spring Cloud, Dubbo, and gRPC. You can quickly access Sentinel by introducing the appropriate dependencies and simple configuration.
  • Sophisticated SPI extension points: Sentinel provides an easy-to-use, sophisticated SPI extension interface. You can quickly customize the logic by implementing an extension interface. For example, customize rule management and adapt dynamic data sources.

4.3 Working mechanism:

  • Provide adaptive or display apis for mainstream frameworks to define resources that need to be protected, and provide facilities to perform real-time statistics on resources and call link analysis.
  • Traffic is controlled based on preset rules and real-time resource statistics. Sentinel also provides an open interface for defining and changing rules.
  • Sentinel provides a real-time monitoring system that allows you to quickly understand the current status of your system.

4.4 Sentinel Summary:

Sentinel is a highly available traffic protection component oriented to distributed service architecture. As the fuse breaker middleware of Alibaba, Sentinel has taken over the core scene of alibaba’s double Eleven Traffic promotion in the past 10 years, and is outstanding in terms of high availability and stability of traffic protection.

5. To summarize

The performance comparison of the three mainstream fuse middleware is shown in the table:

Springboot Actual e-commerce project Mall4j (https://gitee.com/gz-yami/mall4j)

Java open source mall system