In service flow limiting, the number of requests in a certain period of time is generally limited, and the fixed window algorithm (also known as the counter algorithm) is adopted, which is relatively simple and efficient to implement. However, in the actual application scenarios, the requests are not particularly uniform. In some cases, some instantaneous burst traffic will be generated, and then the traffic will quickly return to normal. In most cases, this will not have a destructive impact on the system, but the fixed window algorithm cannot deal with this situation well.

For example, a data query interface is limited to 100 requests per second. Most of the time, it does not exceed this number, but occasionally it reaches 120 requests per second, and then it quickly returns to normal. At this time, if the fixed window algorithm is adopted, the current limiting will be triggered, and the normal access of users will be interfered, resulting in poor experience. If the caller of the interface has retry logic, the system may receive more requests in subsequent time Windows, and then more requests are restricted and more retry requests are generated. The cycle becomes heavier and heavier on the system, and in severe cases, the system may crash.

Assuming that the 120 requests mentioned above do not have substantial impact on system stability, such instantaneous burst traffic can be allowed to a certain extent, thus bringing better user experience and avoiding further burden of the system caused by traffic limiting and retry to a certain extent. This paper introduces a token bucket algorithm to deal with this situation.

Algorithm principle

Having said that, how does the token bucket algorithm solve the problem? See the picture below:

As shown in the figure above, the basic principle of the algorithm is: there is a token bucket with a capacity of X, and Z tokens are put into the bucket every Y unit time. If the number of tokens in the bucket exceeds X, the tokens are discarded. In order for a request to pass, it first needs to get a token from the token bucket. If it does not get a token, the request is rejected. It can be seen that it is particularly important to set the numbers X, Y and Z of the token bucket algorithm. Z should be slightly larger than the number of requests in Y unit time for most of the time, and the system will be in this state for a long time. X can be the instantaneous maximum number of requests allowed to be carried by the system, and the system cannot be in this state for a long time.

Algorithm implementation

There are two implementation methods: in-process or in-memory token bucket algorithm and token bucket algorithm based on Redis.

In-process or in-memory token bucket algorithm

Here, the number of drops is calculated at request time. There is no separate drop processing, which is more troublesome than the fixed window algorithm, but it is also easy to understand if you read carefully.

Using a dictionary, Key is the flow limiting target and Value includes the current token bucket token count and the last token drop time. In the initial state, the token bucket of each flow limiting target is considered to be full, i.e. the number of token buckets = the token bucket capacity, but the token bucket is created only if the flow limiting target’s token bucket does not exist during processing.

After the request comes in, look it up in the dictionary according to the traffic limiting target:

  • If no token bucket can be found, create a token bucket and set the number of tokens to: token bucket capacity – Number of tokens consumed in this request, and set the last token drop time to: current time.

  • If found, the interval between the current time and the last time the token was placed is calculated:

    • If it is greater than or equal to the token placement interval, the number of tokens is calculated as Max (token bucket number + token placement number, token bucket capacity)- The number of tokens consumed in this request, and the last token placement time is: the current time.
    • If it is smaller than the token placement interval, the number of tokens is calculated as: number of token buckets – number of tokens consumed by this request. ‘
    • If the calculated number of tokens is less than 0, limiting is triggered, otherwise updates to the token bucket.

MemoryCache can be used in C#, which has an expiration time for cache entries and automatically recycles token buckets that are rarely or no longer used, reducing memory footprint.

The in-process algorithm is most suitable for the program flow limiting with single instance processing. In the case of multi-instance processing, the number of requests received by each instance may be uneven and the effect of flow limiting cannot be guaranteed.

Token bucket algorithm based on Redis

Redis is stored as KV, similar to a dictionary, and also comes with an expiration date. When processing the request, firstly extract the current limiting target from the request, and then search in Redis according to the current limiting target. Its processing rules are the same as the memory algorithm, but two Redis KV are used:

  • The token bucket of the traffic limiting target, where Value is the current token count.
  • Value is the timestamp of the last token that was placed on the traffic limiting target.

This operation logic can be encapsulated in a Lua Script, because Lua Script is atomic when executed in Redis, so Redis’s flow limiting counts are naturally accurate in distributed deployment.

Application of algorithm

This section uses the traffic limiting component fireflysoft. RateLimit as an example to implement the token bucket algorithm in ASP.NET Core.

1. Install the Nuget package

There are many ways to install it, just choose the one you like.

Package manager command:

Install-Package FireflySoft.RateLimit.AspNetCore
Copy the code

Or the.net command:

dotnet add package FireflySoft.RateLimit.AspNetCore
Copy the code

Or add directly to the project file:

<ItemGroup>
<PackageReference Include="FireflySoft.RateLimit.AspNetCore" Version=2. * "" />
</ItemGroup>
Copy the code

2,Using middleware

To use middleware in Startup, use the following code (more on that below) :

public void ConfigureServices(IServiceCollection services) { ... App. AddRateLimit (new InProcessTokenBucketAlgorithm (new [] {new TokenBucketRule (30, 10, the TimeSpan. FromSeconds (1)) { ExtractTarget = context => { return (context as HttpContext).Request.Path.Value; }, CheckRuleMatching = context => { return true; }, Name="default limit rule", } }) ); . } public void Configure(IApplicationBuilder app, IWebHostEnvironment env) { ... app.UseRateLimit(); . }Copy the code

You need to register the service and then use middleware.

The traffic limiting algorithm and corresponding rules must be provided when registering the service:

  • Here use in-process token bucket algorithm for distributed service can use RedisTokenBucketAlgorithm, support the StackExchange. Redis.
  • The bucket has a capacity of 30 and flows in 10 tokens per second. The maximum number of requests per second can be 40, and the minimum number of requests per second can be 10. In most cases, the maximum number of requests per second should not exceed 10. Occasionally, the maximum number of requests per second can be more than 40.
  • ExtractTarget is used to extract the traffic limiting target, here is each different request Path, can extract key data from the current request according to demand, and then set a variety of traffic limiting target. The corresponding asynchronous method, ExtractTargetAsync, is also supported if there is an IO request.
  • CheckRuleMatching is used to verify whether the current request is flow limiting, and the passed object is also the current request, which is convenient to extract key data for verification. The corresponding asynchronous method CheckRuleMatchingAsync is also supported if there is an IO request.
  • HttpStatusCode 429 is returned when the stream is restricted by default. You can customize this value with the optional error parameter in AddRateLimit, along with the contents of the Http Header and Body.

The basic uses are those in the previous example.

If it’s still based on tradition. NET Framework, you need to register a message handler RateLimitHandler in Application_Start. The algorithm and rules are shared.


Fireflysoft. RateLimit is a traffic limiting library based on the.NET Standard. Its kernel is simple and lightweight, and it can flexibly respond to traffic limiting scenarios of various requirements.

Its main features include:

  • A variety of traffic limiting algorithms: built-in fixed window, sliding window, leakage bucket, token bucket four algorithms, can also be customized expansion.
  • Multiple count storage: currently, memory and Redis storage are supported.
  • Distributed friendly: Support unified counting of distributed applications through Redis storage.
  • Flexible traffic limiting target: Various data can be extracted from the request to set the traffic limiting target.
  • Traffic limiting punishment: After traffic limiting is triggered, the client can be locked for a period of time.
  • Dynamic change rules: Supports dynamic change of traffic limiting rules while the program is running.
  • Custom error: You can customize the error code and error message after traffic limiting is triggered.
  • Universality: In principle, it can meet any scenario requiring traffic limiting.

Github open Source: github.com/bosima/Fire…

For more architecture knowledge, please pay attention to the public account firefly architecture. Original content, reproduced please indicate the source.