In the previous article, we introduced the architecture scheme of seckill system, which involved the relevant content of flow limiting. Due to the limited space, we did not discuss this part of content at that time. In this article, we will focus on the relevant knowledge of flow limiting.

To make things easier to understand, let’s start with the business scenario.

I. Business scenario

In total, there are 100 specials, each with a very low price, in the seckill campaign, which is scheduled to open at 10:10 PM on October 10.

At that time, our server architecture was shown as follows. All client API requests first entered one NGINX layer, and then were forwarded from NGINX layer to the gateway layer (Java, using Spring Cloud Zuul), and finally to the background service 1 (Java).

It is predicted that there will be a massive influx of users at the moment when the seckill starts, so that the system cannot handle all user requests. In order to ensure that the server can withstand the large traffic, we can only put part of the traffic into the background service by limiting the flow.

So what is current limiting? When it comes to limiting the current, some people always like to talk about it together with the fuse, in fact, they are different.

A fuse break usually occurs when A service caller, for example, Service A needs to invoke Service B, and after several calls, it is found that Service B has A problem and cannot be invoked again. In this case, Service A must immediately trigger A fuse break and do not invoke Service B again for A period of time.

Flow limiting generally occurs on the side of the service being called and is primarily performed at the gateway layer. For example, if a backend service on an e-commerce site can only handle 100,000 requests a second, then suddenly a million requests flood in, what do we do? At this point, we can discard 90% of the requests without processing them, and concentrate on processing 10% of the requests, ensuring that at least 100,000 people can operate normally. (That may seem like an exaggeration, but in a real seckill scenario, it doesn’t matter if we throw away 99% of our traffic.)

Back to the business scenario in this lecture, this time our request is to control the TPS of seckill activity to 100 transactions/second by limiting the flow at a certain level (since seckill activity has a total of 100 inventory, that is to say, there are only 100 final transactions, and we want 100 transactions to be completed within 1 second). What should we do at this point? This requires the use of current limiting some of the common algorithms.

Second, current limiting algorithm

As far as I know, the current limiting algorithms on the market are divided into four types: fixed time window count, sliding time window count, leaky bucket and token bucket, which are described below.

(1) Fixed time window counting algorithm

Assuming our request is that every 5 seconds, the back-end service processes 500 requests (in 5 seconds for example), then every 5 seconds we need a time window to count the requests. To understand this, we have combed through the following business table.

Now the fixed time window algorithm looks like it’s going to do the trick, but there’s a problem with it, and let’s take a closer look at it.

Suppose there are 200 requests in 1-4 seconds, 300 requests in 5 seconds, 499 requests in 6-9 seconds, and 1 request in 10 seconds. 500 requests in 1 second to 5 seconds, 500 requests in 6 seconds to 10 seconds.

Through this algorithm, we found that the traffic did not exceed the threshold. Take a closer look at the number of requests between 5 seconds and 9 seconds, it has reached 300+499=799, which means that the number of requests between 5 seconds and 9 seconds exceeds 299, and the server is obviously unable to handle it.

Therefore, the fixed time window counting algorithm is not practical in reality.

(2) Sliding time window counting algorithm

Suppose our appeal is background services processing 100 requests in 1 seconds, counting sliding window algorithm is every 100 milliseconds to set a time interval, each time interval statistics within the area between the number of requests, and then incorporate computation request every 10 time interval, total number of requests beyond the maximum quantity is redundant request data to abandon. When the time node enters the next interval (such as the 11th interval), we no longer count the number of requests in the first interval, but combine the number of requests from the second to the 11th interval to calculate a total number, and so on, as shown in the figure below.

Although the sliding time window counting algorithm cannot guarantee 100% accuracy of the number of statistical requests per second, it can greatly reduce the probability that the number of requests per unit time exceeds the threshold and cannot be detected. For example, requests are stacked at the end of the first 100ms and at the beginning of the next 100ms. It is possible that the maximum number of requests may be exceeded without being detected.

Of course, we can divide the interval into more detail, such as 10 milliseconds as a interval. Because the finer the interval, the more sophisticated the calculation, but also the more resource depletion.

The algorithm seems to have met our needs so far. But here’s our scenario: There are only 100 items in the inventory. If we want to control the TPS to 100 transactions per second, we can set the sliding window to 1 second, which is divided into 10 intervals, each interval of 100 milliseconds, then we will have 100 requests exceeded in the first 100ms request. That is to say, the goods have been sold out.

The question then becomes, what kind of person can complete the whole process of click to buy, place, and submit an order in 100ms? All I can say is that robots can do it, which means that seckill products are basically for robots, which is not what we want.

(3) Leakage bucket algorithm

As for the idea of the leaky bucket algorithm, let’s first illustrate it with a picture so that you can immediately understand it.

From the figure above, we can see that the implementation of the leaky bucket algorithm is divided into three steps:

  1. After any request comes in, it directly enters the queue of leaky bucket;
  2. Process requests in a leaky bucket queue at a specific rate.
  3. New requests that exceed the load range of the leaky bucket are simply dropped and cannot be queued.

Given the above example of controlling 100 requests in 1 second, we can set the output rate to 100r/s (that is, a request is processed every 10ms) and set the bucket size to 100.

Because the leaky bucket algorithm processes requests on a first-in, first-out basis, the first 100 requests will end up being processed, which is the same problem as the sliding time window method.

So if we set the size of the bucket to 1, we’ll be able to do our job. But we have other considerations, and we’ll talk more about them later.

(4) token bucket algorithm

The idea of Token Bucket Algorithm is as follows:

  1. Tokens are generated at a specific rate and placed in a token bucket. If the bucket is full, new tokens are not generated.
  2. A new incoming request consumes a token in the bucket if it needs to be processed.
  3. If there are tokens in the bucket, consume one;
  4. If there are no tokens in the bucket, enter a queue and wait for a new token.
  5. If the queue for the token is full, the new request is simply dropped.

Combined with the above example of controlling 100 requests within 1 second, if the token bucket algorithm is used, we need to first set the token generation rate to 100/s and the queue waiting for tokens to 0, so as to meet our request of the second kill limit flow.

What is the number of buckets for tokens? If it is set to 100, assuming the token was created before the seckill, then the number of requests at the start of the seckill is already 100, and the first 100 requests will be released, meaning that the bot grabs all the goods again.

At this point, we can set the number of token buckets to 10, so that a maximum of 10 robots can grab the goods.

III. Implementation of the scheme

Once we’ve figured out the common algorithms for current limiting, we’re ready to implement the scheme, but we need to consider the following four questions.

Use Token Bucket or Leakage Bucket Mode?

As mentioned above, both the Token Bucket Algorithm and the Leakage Bucket Algorithm can satisfy our demands, but when we do the current limiting, we hope that this algorithm can be used not only for the seckill function, but also for other current limiting scenarios.

For example, when the server is idle, the server can process a flood peak directly in theory, but the mechanism of leaking bucket is that the request processing rate is constant. Therefore, the server resources in the early stage can only process the request gradually according to the constant leakage rate, and cannot be used in other flow limiting scenarios.

With the token bucket algorithm, this problem does not exist because we can fill the token bucket at once. So, we ended up using the token bucket for this problem.

(2) Implement flow limiting in NGINX or in the gateway layer?

In the above business scenario, we finally decided to implement flow limiting at the gateway layer for two reasons.

  • Nginx has a current limiting plugin that limits the number of requests per user, but it is based on the leaky bucket algorithm;
  • At the time, we wanted to be able to dynamically adjust the current limiting configuration, because we were not familiar with Nginx+Lua, so configuration management could not directly manipulate the data in Nginx.

(3) Use distributed current limiting or single machine current limiting?

If the single-machine current limiting method is used, we need to calculate the number of servers in advance, and then divide the TPS of 100/s equally to each server for a level of conversion.

If we use a distributed flow limiting approach, such as we store the token bucket data in Redis, that is, every request needs to access Redis, because at the beginning of the seckill, the number of orders is often very large, Redis may not be able to withstand such a large QPS.

The lesser of two evils, we finally decided to use the way of single machine flow limit.

(d) Which open source technology is used?

Finally, we implemented the flow limiting using the related classes of the rateLimiter in the open source library Google-Guava, which is an implementation library based on the token bucket algorithm.

Before using open source technology, we need to define a Zuul filter and then use Guava’s rateLimiter to filter the API request for submitting the order.

When using the rateLimiter, we need to set the following three configuration items.

  • PermitsPersecond: Number of requests allowed per second.
  • WarmupPeriod: How long the token bucket fills up.
  • The timeout time of tryAcquire: How long can you wait for a new token when the token bucket is empty.

For the above configuration items, we configure them like this:

  • PermitSperSecond is set to 100/10, where 100 represents the desired TPS, and 10 represents 10 gateway nodes.
  • The warmupPeriod is set to 100ms;
  • The tryAcquire timeout is set to 0, which means that requests that do not receive tokens are dropped without waiting.

PermitsPerSecond = 10, which means 1 second can generate 10 tokens. WarmupPeriod =100ms, meaning it takes 100ms to fill the bucket from start to finish. Even if the bucket size is 1, if we have 10 gateway servers, the total bucket size will be 10. (As mentioned earlier, we need to set the token bucket to 10 in case it’s all robots.)

IV. Matters needing attention for current limiting scheme

I have encountered a lot of problems when doing current limiting schemes. I will summarize the relevant precautions as follows. I hope it will be helpful to you.

The error code is returned to the client

In order to give the user a good experience, the user interface should be as error-free as possible, so a request that is discarded after traffic limiting should be returned with a special HTTP CODE for the client to handle specially.

(2) Real-time monitoring

In practical work, we had better record the current limiting log at any time and make real-time statistics, so as to help us monitor the current limiting situation in real time and deal with it in time once there is an accident.

(3) Real-time configuration

Because the flow limiting function also needs to be applied to scenes other than seckill, it is better to realize the dynamic management of token bucket + real-time setting in the configuration center, which is also convenient for us to manage other flow limiting scenarios.

(4) The current limiting configuration of scenes other than seckill

In this seckill activity, we can easily convert the TPS control to 100, but in normal flow limiting scenarios, TPS or QPS (other scenarios may not use TPS) need to calculate the correct configuration of the flow limiting based on the actual pressure test results.

Five, contact me

WeChat official account: server technology selection

Personal blog site: http://www.jiangyi.cool