Original: Taste of Little Sister (wechat official ID: XjjDog), welcome to share, please reserve the source.

First, we need to clarify the context in which these terms appear: distributed, highly concurrent environments. If your product doesn’t look good and nobody cares, it doesn’t need these attributes. Low concurrency systems work fine without any bonuses.

A distributed system is a whole, and the invocation relationship is complex. If a resource is abnormal, cascading faults may occur. When the system is under excessive pressure, the container, or host, will be unusually vulnerable. Load spikes, rejections, and even avalanches can have serious consequences.

Given distributed systems’ sickie-esque responses, we have a variety of tools to deal with these anomalies. Next, we’ll take a brief look at each of these scenarios, along with common techniques.

1. The current limit

“My posts are restricted!” Even if you are not an Internet employee, you can say this with confidence. When he says this, he’s not talking about limiting traffic in high concurrency, it’s just in a logical sense.

In Web development, Tomcat defaults to a pool of 200 threads, and as more requests come in and no new threads can handle the request, the request will wait on the browser side. This takes the form that the browser keeps going in circles (not exceeding the acceptCount), even though you’re requesting a simple Hello World.

You can also view this process as flow limiting. In essence, it sets a limit on the number of resources beyond which requests will be buffered or simply failed.

It has a special meaning for limiting traffic in high-concurrency scenarios: it is primarily used to protect underlying resources. If you want to call a service, you need to get permission to call it first. Flow limiting is typically provided by the service provider and limits what the caller can do.

For example, A certain service provides services for A, B, and C. However, according to the traffic estimation obtained in advance, the requests of A, B, and C are limited to 1000/ s, 2000/ s, and 1W/s. At the same time, some clients may have rejected requests, and some clients can run properly. Traffic limiting is regarded as the self-protection capability of the server.

Common traffic limiting algorithms include: counters, leaky buckets, token buckets, etc. However, the counter algorithm can not achieve smooth current limiting and is seldom used in practical applications.

“High Concurrency limiting, what the hell is limiting”

“Semaphore Flow Limiting: Secrets to High-concurrency Scenarios”

2. The fusing

Generally speaking, if the emperor wants to have a comfortable night life in the micro service, he or she will have to rely on the fuse to cut off the main manager. The main function of the circuit breaker is to avoid an avalanche of service.

As shown in the figure, A→B→C call each other successively, but C project is likely to have problems (excessive traffic or errors, etc.), which will cause threads to wait all the time, dragging down the whole link layer and depleting thread resources.

As the name suggests, a fuse is like a fuse that burns out when the load is exceeded. Of course, we can hook it up again when the back-end service is moderated. The circuit breaker function is usually provided by the calling end and is used for minor bypass requests to prevent these minor services from affecting normal and important business logic due to exceptions or timeouts

In implementation, we can think of the circuit breaker as a proxy pattern. When the circuit breaker is turned on, the service suspends access to its protected resource and returns a default result that is fixed or does not result in a remote call.

Light close and slow Twist, Micro-service Fuse Breaker Main Pipe

3. The drop

Demotion is a vague term. Current limit, fuse, to a certain extent, can also be regarded as a kind of degradation. But the downgrade, as it’s often called, cuts at a higher level.

Downgrading generally considers the integrity of the distributed system, cutting off the source of traffic at its source. For example, on Double 11, in order to ensure the transaction system, some non-important services will be suspended to avoid resource contention. Service degradation involves human intervention. When some services are unavailable, it is usually a service degradation mode.

Where is the best place to downgrade? It’s the entrance. Like Nginx, like DNS, etc.

In some Internet applications, there is a concept called Minimum Viable Product (MVP), which means minimizing Viable products and has a very high SLA requirement. There will be a series of service unbundling operations around the minimum viable product, although in some cases it will need to be rewritten.

For example, an e-commerce system, in extreme cases, just needs to display the goods and sell them. Other supporting systems, such as reviews and recommendations, can be turned off temporarily. Consider these scenarios for physical deployment and invocation relationships.

4. The preheating

Take a look at the following situation.

In a high-concurrency environment, the DB process dies and restarts. The upstream load balancing policy was reallocated during the peak service period. The DB that has just been started accepts 1/3 of the traffic instantly, and then the load increases wildly until there is no response at all.

The cause is that the DB is newly started and various caches are not ready, and the system status is completely different from normal operation. Maybe a tenth of the normal amount would bring it to death.

Similarly, a JVM process that has just started, because the bytecode has not been optimized by the JIT compiler, has a slow response time for all interfaces when it starts. If the load balancing component that calls it does not take this start-up situation into account and 1/n traffic is normally routed to this node, problems can easily occur.

Therefore, we expect the load balancing component to dynamically ramp up and warm up the service until it reaches normal traffic levels, depending on the JVM process startup time.

“No warm-up, no call high concurrency, call concurrency high”

5. Back pressure

Consider the following two scenarios:

  1. There is no finite flow. When the number of requests is too high, the number of requests is too high, which can easily cause backend service crash or memory overflow
  2. Conventional current limiting. You impose a maximum capacity on an interface beyond outright rejection, but the back-end service is capable of handling these requests

How to dynamically change the value of current limit? This requires a mechanism. The caller needs to know the processing power of the called, that is, the called needs to have the ability to feedback. Back Pressure, in English, is actually a kind of intelligent current limiting, which refers to a strategy.

The idea of back pressure is that the requested side will not directly throw away the traffic of the request side, but continuously feedback its own processing ability. Based on this feedback, the requester adjusts its sending frequency in real time. A typical scenario is the use of sliding Windows for flow control in TCP/IP.

Reactive programming is the embodiment of the observer model. Most of them use event-driven, non-blocking elastic applications that deliver elastic data based on flow. In this scenario, the back pressure implementation is much simpler.

Back pressure makes the system more stable, more efficient, and it has more flexibility and intelligence.

conclusion

A quick summary:

  • Current limitingSpecify an upper limit. If the traffic exceeds the system capacity, services will be denied
  • fusingThe fault of the underlying bypass application does not cause system avalanche. If you want to practice this skill, you must first go to the palace
  • demotionFrom the request entry, disable the overload request extensively
  • preheatingGive the system some time to warm up, load the cache, and avoid resource deadlocks
  • Back pressureThe called returns its capabilities to the caller. Gentle calls require solid communication

In simple terms, as long as the flow does not enter the system, it is easy to say what, demotion is the most powerful and the most bullying means; Once traffic enters the system, it is subject to a set of rules within the system, of which limiting traffic is the most direct means to keep requests out. Although the user’s request failed, my system was still alive; A system without a circuit breaker is brutal, and it’s easy to let third-rate functions interfere with primary functions, so turn it on at the right time; As for warm-up, it is just a series of foreplay before the sparks of love, until the peak of service; Of course, as opposed to the throw-and-forget model, if the called party can give feedback on its status, the requester can increase or decrease power as needed. This is the idea of back pressure.

These are all effective means of dealing with limited resources. But if the company has the money and the flexibility, all of that becomes ancillary. After all, when all services are able to report their status back to the monitoring center, the monitoring center can flexibly expand. As long as the level of service unbundling is satisfied, we only need to add instances.

Xjjdog is a public account that doesn’t allow programmers to get sidetracked. Focus on infrastructure and Linux. Ten years architecture, ten billion daily flow, and you discuss the world of high concurrency, give you a different taste. My personal wechat xjjdog0, welcome to add friends, further communication.