Current limiter correlation algorithm

General flow limiter has five algorithms, respectively: token bucket, funnel bucket, fixed window, sliding log (refers to the sliding window in the broad sense), sliding window (here refers to the sliding log + fixed window combination of an algorithm).

1. Token bucket

The token bucket algorithm is used to control the amount of data sent to the network over a period of time and to allow burst data to be sent.

The algorithm is roughly: assuming r requests per second are allowed, a token is added to the bucket every 1/r second. The maximum size of the bucket is B. When a request of size N arrives, check whether there are enough tokens in the bucket. If so, the token count is reduced by N and the request passes. If not, the rejection policy will be triggered.

The token bucket has a fixed size, and assuming that each request has a size, the bucket is checked to see if it contains enough tokens at the time it is checked to see if the request meets the defined limits. If so, the tokens are removed and the request is passed. Otherwise, other actions will be taken, usually rejection. Tokens in the token bucket are restored at a rate that is the rate at which requests are allowed (of course, depending on the size configuration, this rate may actually be exceeded, but will be adjusted back to this recovery rate as the token bucket is consumed).

If the tokens are not consumed, or are consumed less quickly than they are generated, the tokens will continue to grow until the bucket is full. As you can see, the token bucket allows some degree of burst traffic while maintaining the overall request rate.

The implementation of token buckets in a distributed environment needs to consider the following issues:

  1. How exactly is the current size of the token bucket stored? Do you store only the size of the current token bucket (for example, through one of Redis’s key-value pairs), or do you store the time stamp of each incoming request (for example, through Redis’s Zset, whose size is the maximum size of the bucket)?
  2. Who replenishes the token bucket? The implementation of storing the size of a current token bucket requires a process at a raterKeep adding tokens to it,So how to ensure that there is only one such process in a distributed environment, and what if the process dies? For an implementation that stores the arrival time stamp of each incoming request,So how do you control the number of requests to log, not every one of themAnd,How do I determine the number of remaining tokens each time based on the current request and timestamp

2. Leaky Bucket

Funnel bucket control requests must be consumed at a maximum rate. Like a funnel, the amount of water can be large or small, but the maximum rate can only reach a certain value, not a small spike like token buckets.

The algorithm is basically: the main implementation is through a FIFO (First in First out) queue, this queue is a bounded queue, size B, if the queue is full of requests, will trigger the discard policy. Let’s say that the rate of allowed requests is r times per second, then the requests in this queue are going to be consumed at that rate.

The implementation of leaky buckets in distributed environment needs to consider the following issues:

**1. How to store the leaky bucket queue? ** This queue needs to store the time stamp of each passed request and corresponding consumption to ensure smooth consumption. At the same time, this queue should be lock-free because there will be distributed lock enlistment. Also, the queue size should be set to B, and each time a request comes in, it should be queued and cleaned up. **2. How to achieve consumption? How are ** requests, which are queued, consumed? When a request comes in, you can use the requests in the queue to determine how long it should take for the current request to execute, and then queue up and wait that long before executing the request. It can also be implemented using queues with their own delay-time implementation.

3. Fixed window

The fixed time window is relatively simple, that is, the time is divided into several time slices, and several requests are processed within each time slice. This implementation is not very rigorous, but because it is simple, it is suitable for some scenarios with less stringent requirements.The algorithm is roughly:Assuming thatnMaximum processing within secondsbOne request, then every othernSeconds resets the counter tob. When the request arrives, if the counter value is sufficient, it is deducted and the request is approved. If the counter value is insufficient, the rejection policy is triggered.

The fixed time window is the easiest algorithm to implement, but it also has an obvious disadvantage: that is, in many cases, especially if the rejection policy is queued after limiting the flow of requests, the requests are consumed quickly at the beginning of the time window and the remaining time is not processed, which is not desirable. And, in some limiting cases, the actual flow velocity may reach twice the current limit. For example, limit a maximum of 100 requests per second. Let’s say 100 requests come in 0.99 seconds and 100 requests come in 1.01 seconds, so there are actually 200 requests between 0.99 and 1.01 seconds. It’s not strictly 100 requests per second. In order to achieve strict request flow limiting, there are the latter two algorithms.

4. Sliding Log

The sliding log controls the rate by calculating the timestamp of the current request according to the timestamp of the received request before the cache. This severely limits the request rate. The general Sliding window algorithm mentioned on the Internet also refers to the Sliding Log algorithm here, but the Sliding window algorithm here is another optimization algorithm, which will be mentioned later.

The algorithm would be, let’s say, b requests at most in n seconds. At most b requests with the corresponding timestamp will be cached, assuming that the cache set is B. Every time a request comes, delete all the requests made n seconds ago from B and check whether the set is full. If not, pass the request and put into the set. If it is full, the reject policy is triggered.

The implementation of sliding logging in distributed environment needs to consider the following issues:

  1. Our algorithm actually simplifies storage, but for high-concurrency scenarios, there may be a lot of requests to cache (e.g. limited to 100,000 requests per second, should the cache size be able to store 100,000 requests?). How should this cache be implemented?
  2. In high concurrency scenarios, delete this setnAll requests before the second of this operation, need to be very fast. If your cache collection implementation is slow for timestamp deletion, you can cache more requests and periodically clean up the deletionnAll previous requests are deleted instead of each time a request comes in. When the request comes in, checkbWhether the previous request exists and the time difference is less thannThe value is smaller than or equal to seconds, indicating that the traffic limiting policy should be triggered.

5. Sliding window (sliding log + fixed window)

Earlier in the slide log, we mentioned a problem – there could be a lot of requests to cache. While we may not be able to use a proper cache in our architecture, we can reduce the number of requests to be stored by sliding Windows, and reduce the size of the collection to reduce concurrency on the same collection.

The algorithm would be, let’s say, b requests at most in n seconds. We can divide n seconds into time slices with a size of m milliseconds. Only the latest request and timestamp are cached in the time slice, and only one number of requests are kept in the previous time slice. This can greatly optimize storage and slightly increase the amount of computation. For the critical condition, that is, there are n/m time slices before, when calculating the number of requests within n seconds, we can calculate the percentage of elapsed time in the current time slice. Let’s say 25%, then we can calculate 75% of the number of requests in the first time slice.