Sentinal:

In addition to flow control, fusing downgrading of unstable resources in call links is also one of the important measures to ensure high availability. A service often calls another module, perhaps another remote service, a database, a third-party API, and so on. For example, when making a payment, you may need to remotely call the API provided by UnionPay. Querying the price of an item may require a database query. However, the stability of the dependent service is not guaranteed. If the dependent service is unstable and the response time of the request is longer, the response time of the method that invokes the service is also longer, threads pile up, and eventually the business’s own thread pool may be exhausted and the service itself becomes unavailable.

Modern microservice architectures are distributed and consist of a very large number of services. Different services call each other and form a complex call link. The above problems can have a magnified effect in link calls. If a link in a complex link is unstable, it may cascade to make the whole link unavailable. Therefore, we need to fuse down unstable weak dependent service calls to temporarily cut off unstable calls to avoid local unstable factors causing an avalanche of the whole. Fuse downgrading is usually configured on the client side (calling side) as a means of protecting itself.

Fusing strategy

Sentinel offers the following circuit breaker strategies:

  • Slow call ratio (SLOW_REQUEST_RATIO) : Select the ratio of slow calls as the threshold. You need to set the allowed RT of slow calls (that is, the maximum response time). If the response time of a request is greater than this value, the request is counted as slow calls. When unit statistics time (statIntervalMs) if the number of requests is greater than the set minimum number of requests and the ratio of calls is greater than the threshold, the requests will be fused automatically in the following fusing period. After the fuse duration, the fuse will enter the probe recovery state (half-open state). If the response time of the next request is less than the set slow-call RT, the fuse will end. If the response time is longer than the set slow-call RT, the fuse will be disconnected again.
  • Abnormal ratio (ERROR_RATIO) : When unit statistics duration (statIntervalMs) if the number of requests is greater than the set minimum number of requests and the proportion of exceptions is greater than the threshold, the requests will be fused automatically in the following fusing period. After the fuse period, the fuse enters the probe half-open state, terminating the fuse if the next request completes successfully (without error), otherwise it will be fused again. The threshold range for abnormal ratio is[0.0, 1.0], represents 0-100%.
  • Abnormal number (ERROR_COUNT) : When the number of anomalies in a unit statistics period exceeds the threshold, the circuit breaker is automatically disabled. After the fuse period, the fuse enters the probe half-open state, terminating the fuse if the next request completes successfully (without error), otherwise it will be fused again.

Notice Exception degradation only applies to service exceptions. The exception (BlockException) of Sentinel traffic limiting degradation does not take effect. To count the proportion or number of exceptions, record service exceptions through tracer. trace(ex). Example:

Entry entry = null; try { entry = SphU.entry(key, EntryType.IN, key); // Write your biz code here. // <<BIZ CODE>> } catch (Throwable t) { if (! BlockException.isBlockException(t)) { Tracer.trace(t); } } finally { if (entry ! = null) { entry.exit(); }}Copy the code

Open source integration modules such as Sentinel Dubbo Adapter, Sentinel Web Servlet Filter, or @SentinelResource annotations automatically count business exceptions without manual invocation.

Description of fuse downgrading rules

A fuse DegradeRule contains the following important properties:

Field

instructions

The default value

resource

The resource name, which is the object of the rule

grade

Fuse breaker policy: supports the ratio of slow calls, ratio of exceptions, and number of exceptions policies

Slow call ratio

count

In slow call ratio mode, it is slow call critical RT (beyond this value, it is slow call); In exception ratio/Number of exceptions mode, the corresponding threshold is set

timeWindow

Fusing duration, unit: s

minRequestAmount

Minimum number of requests triggered by fuses. If the number of requests is less than this value, fuses will not be triggered even if the abnormal ratio exceeds the threshold (introduced in 1.7.0)

5

statIntervalMs

Statistical duration (unit: ms), for example, 60*1000 represents minute level (introduced in 1.8.0)

1000 ms

slowRatioThreshold

Slow call ratio threshold, valid only for slow call ratio mode (introduced in 1.8.0)

Fuse event listening

Sentinel supports registering custom event listeners to listen for fuses’ state change events. Example:

EventObserverRegistry.getInstance().addStateChangeObserver("logging", (prevState, newState, rule, SnapshotValue) -> {if (newState == state.open) {// Conversion to OPEN State will carry the triggered value system.err.println (string.format ("%s -> OPEN at %d, snapshotValue=%.2f", prevState.name(), TimeUtil.currentTimeMillis(), snapshotValue)); } else { System.err.println(String.format("%s -> %s at %d", prevState.name(), newState.name(), TimeUtil.currentTimeMillis())); }});Copy the code

The sample

Slow call ratio fusing example: SlowRatioCircuitBreakerDemo

hystrix:

Description of the process in the figure:

  1. Encapsulate the remote service invocation logic into a HystrixCommand.
  2. You can use synchronous or asynchronous mechanisms for each service invocation, executing execute() or queue().
  3. Check whether the circuit-breaker is on or half on. If the circuit-breaker is on, go to Step 8. If the circuit-breaker is off, go to Step 4.
  4. Check whether the thread pool/queue/semaphore (using bulkhead isolation mode) is full. If so, go to rollback Step 8; otherwise, proceed to step 5.
  5. The actual service invocation is performed in the run method. A. If the service invocation times out, go to Step 8.
  6. Determine if the code in the run method executed successfully. A. The command is executed successfully. B. If an error occurs, go to Step 8.
  7. All operating status (success, failure, rejection, timeout) is reported to the fuse for statistical purposes to affect the fuse status.
  8. Go to getFallback() for fallback logic. A. Calls that do not implement the getFallback() fallback logic will throw an exception directly. B. The rollback succeeds. C. If the rollback fails, an exception is thrown.
  9. The result is displayed.

Note: Whether fuses are turned on or not is mainly determined by the error ratio of dependent calls = number of failed requests/total number of requests. Hystrix has a default breaker open request error rate of 50% (temporarily referred to as the request error rate), and a parameter that sets the minimum number of breaker open requests in a scrolling window (temporarily referred to as the scrolling window minimum requests). Here’s a specific example: If the minimum number of scrolling window requests is 20 by default and 19 requests are received in one window (10 seconds by default, the time for counting the scrolling window can be set), even if all 19 requests fail, the request error rate is as high as 95%, but the breaker will not open. [ResponseTime] [ResponseTime] [ResponseTime] [ResponseTime] [ResponseTime] [ResponseTime] [ResponseTime] [ResponseTime] [ResponseTime] [ResponseTime] [ResponseTime] It’s easy to get the illusion that multiple requests fail without triggering a circuit breaker. This is because the number of failed requests within a scrolling window did not reach the minimum number of requests required to open the breaker.)