This is the 14th day of my participation in Gwen Challenge

[a].

You may have a bad day, but that doesn’t mean you’ll have a bad life.

[Warm tips]

Continue from the previous article 🏹 [Hystrix technical Guide] (3) The principle and implementation of timeout mechanism

  • I recommend it heremartinfowlerFuse introduction and authoritative guide, interested partners can study ha.

  • Main introduction related: official website description

  • about[How Hystrix works]The introduction of

[Background]

  • As the scale and complexity of distributed systems increase, the requirements for availability of distributed systems become higher and higher. Among the various high availability design patterns, “fuses, isolation, downgrades, limiting” are frequently used. The related technology, Hystrix itself is not a new technology, but it is the most classic technology system! .

  • Hystrix is designed to achieve fusible downgrades, thus improving system availability.

  • Hystrix is a Java service component library that implements circuit breaker mode and hatch mode on the call side to improve system fault tolerance by avoiding cascading failures for highly available designs.

  • Hystrix implements a resource isolation mechanism

background

Currently, for non-core operations, such as storing operation logs after inventory increase or decrease and sending asynchronous messages (specific business processes), if MQ service exceptions occur, the interface response times out. Therefore, it can be considered to introduce service degradation and service isolation for non-core operations.

Hystrix instructions

The official documentation

Hystrix is Netflix’s open source disaster recovery framework that solves the problem of bringing down business systems and even causing avalanches when external dependencies fail.

Why do we need Hystrix?

  • In large and medium-sized distributed systems, the system usually has many dependencies (HTTP, Hession,Netty,Dubbo, etc.). Under high concurrent access, the stability of these dependencies has a great impact on the system. However, the dependencies have many uncontrollable problems, such as slow network connection, busy resources, temporarily unavailable, offline services, etc.

  • When a dependency blocks, most servers’ thread pools BLOCK, affecting the stability of the entire online service. Applications with complex distributed architectures that have many dependencies will inevitably fail at some point. High concurrency dependencies fail without isolation, and the current application service is at risk of being dragged down.

For example, a system that relies on 30 SOA services, each 99.99% available. 0.3% means 3,000,00 failures for 100 million requests, which translates into approximately 2 hours of service instability per month. As the number of service dependencies increases, the probability of service instability increases exponentially. Solution: Isolate dependencies.Copy the code

Hystrix design philosophy

To understand how to use Hystrix, you must first understand its core design concept. Hystrix is based on the command pattern, which is visually understood through UML diagrams.

  • As you can see, Command is an intermediate layer added between Receiver and Invoker. Command encapsulates the Receiver.

  • Apis can be Invoker or Reciever, and encapsulate these apis by inheriting from the Hystrix core class HystrixCommand (for example, remote interface calls, database queries, and the like that can cause delays).

  • You can provide elastic protection for your API.

How does Hystrix address dependency isolation

  1. Hystrix uses the Command pattern HystrixCommand(Command) to wrap the dependent call logic, with each Command executed in a separate thread/under signal authorization.

  2. You can configure the dependent call timeout period. The timeout period is generally set to slightly higher than 99.5% average time. When the call times out, the fallback logic is returned or executed directly.

  3. Provide a small thread pool (or signal) for each dependency. If the thread pool is full, the call will be rejected immediately, with no queuing by default, speeding up the failure determination time.

  4. Dependent call result points, success, failure (throw exception), timeout, thread rejection, short circuit. Fallback logic is executed when the request fails (exception, rejection, timeout, short circuit).

  5. Provides fuse components that can be run automatically or manually called to stop the current dependence for a period of time (10 seconds). The fuse default error rate threshold is 50%, beyond which it will run automatically.

  6. Provides statistics and monitoring for near real-time dependency

Hystrix process structure analysis

,

Process description:

  1. Each call builds a HystrixCommand or HystrixObservableCommand object, encapsulating the dependent calls in the Run () method.

  2. If there’s no execute()/queue doing sync or async call, then the real run()/construct()

  3. Check whether the circuit-breaker is on. If it is, go to Step 8 for downscaling; if it is off, enter the step.

  4. Check whether the thread pool/queue/semaphore is full. If so, go to downgrade step 8; otherwise, continue with the following steps.

  5. Using HystrixObservableCommand. The construct () or HystrixCommand. The run (), rely on logic operation

  6. The dependent logical call timed out, and go to Step 8

  7. Determines whether the logic was successfully invoked

    • 6A returns the result of a successful call

    • 6b call error, go to Step 8.

  8. Calculate the status of fuses, and report all operating status (success, failure, rejection, timeout) to fuses for statistics to determine the status of fuses.

  9. A. A Command that does not implement getFallback will throw an exception directly

    B. Fallback Returns after the degraded logic invocation succeeds

    C. The degraded logical invocation fails and an exception is thrown

  10. The result is displayed

The getFallback call is triggered in four ways:

  1. The run () method throws the HystrixBadRequestException anomalies.

  2. The run() method call timed out

  3. Fuse open short circuit call

  4. Whether the thread pool/queue/semaphore is full

A Circuit Breaker

By default, each fuse maintains 10 buckets, one bucket per second. Each bucket records the status of success, failure, timeout, and rejection. By default, errors exceed 50% and more than 20 requests are interrupted within 10 seconds.

Hystrix isolation analysis

Hystrix isolation uses thread/signal isolation to limit the concurrency and blocking spread of dependencies.

Thread isolation

  • The thread executing the dependent code is separated from the requesting thread (for example, the Jetty thread), and the requesting thread is free to control when it leaves (an asynchronous process).

  • The amount of concurrency can be controlled by the size of the thread pool. When the thread pool is saturated, the service can be denied in advance to prevent the proliferation of dependency problems.

  • It is recommended not to set the thread pool too large, otherwise a large number of blocked threads may slow down the server.

Actual cases:

Netflix internally believes that thread isolation overhead is small enough to not have a significant cost or performance impact. Netflix’s internal API relies on 10 billion HystrixCommand requests per day using thread isolation, with approximately 40 + thread pools per application and approximately 5-20 threads per thread pool.

Signal isolation

Signal isolation can also be used to limit the concurrent access to prevent blocking diffusion, and the maximum thread isolation is implementation dependent code is a different thread is still requesting thread (the thread through the signal applications), if the client is reliable and can quickly return to, you can use the signal isolation replace thread isolation, reduce the cost.

The size of the semaphore can be adjusted dynamically, but the thread pool size cannot.

The difference between thread isolation and signal isolation is shown below: