Frequency control is a classic problem in interview assessment cases and design.

This is actually a classic problem in programming.

As the provider of the back-end interface service, if the call frequency is not limited, the caller will call the interface with an unreasonable frequency, which will cause great pressure to the back-end service. Even if the service crashes directly, the promised 999 online time cannot be met. As a service caller, first of all, he/she needs to know whether the service will be limited and take corresponding measures to deal with it.

Frequency control scenarios and overall architecture

Frequency control scenarios mainly include these scenarios

  • Frequency control service
  • Browse record service

The caller invokes the service and passes through the frequency control service first. If the frequency does not exceed the limit, the service can be released to the background service.

In addition, for content production websites, the number of views of an article is a very important integral index, at this time we hope that users do not brush this number. When logging, the log service determines whether the record is logged by frequency control. Although the purpose of use is different, the frequency control service capability of the two scenarios can be the same idea.

Call service frequency control + retry

If the server limits the frequency, frequent client calls will fail, and unnecessary resources will be wasted if only retry calls are considered. Therefore, the client should know the frequency control policy of the server and do the same frequency control service.

  • If the client frequency is limited, the client becomes abnormal ❌
  • Clients can consider using retry failures to re-consume resources ❌
  • The client also uses Rate Limiting 👌 to call the service.

Client calls can be single-threaded, multi-threaded, or distributed (to a lesser extent). A single machine can use Guava’s RateLimiter to do this.

Implement solution and thinking optimization

What is the goal of CFC services? In general, don’t be too restrictive or too broad. Can be:

  • A maximum of 1000 accesses per second
  • A maximum of 100 accesses per second
  • A maximum of 1000 accesses per minute
  • A maximum of one access in 5 seconds

How to limit the frequency can be based on business rules or service capabilities. The service capability of “maximum 1000 accesses per second” needs to be higher than “maximum 1000 accesses per minute”.

On the other hand, this is also a tradeoff for the length of the service queue. A “Max 1000 accesses per minute” queue needs to have 1000, and a “Max 152 accesses per second” queue only has 152. While the overall processing power is not the same, the buffer and ability to deal with instantaneous stress needs to be many times greater to withstand it.

Frequency control service implementation

Generally, you can use Redis to achieve frequency control services, because the Redis key naturally has its own time attribute (there is a delay device).

Simple idea

Assume that the current limiting is “100 times per second at most” and use Redis to achieve this. It’s easy to think of pseudocode like this:

Expire key 1} else{expire key 1 >=100 otherwise set key incr}Copy the code

The logic seems pretty clear.

But such an implementation is problematic. The target of 100 beats per second is not reached. If you access 100 times in the last 1ms of one 1s and 100 times in the first 1ms of the 1s that follow, that’s 200 times in 1s.

To optimize the

The goal of 100 visits per second must be strictly met. If we write down the time of each visit, keep a list of the last 100 times, and compare the earliest time of the list to the present when a new visit is made. If the list is less than 100, it can be accessed; Otherwise: If the query duration is longer than 1s, the access is allowed. Otherwise not.

The pseudo-code implementation is as follows:

// For a user key=ratelimiter-service-user if(key does not exist){set key [timestamp]}else{if(list length less than 100){add timestamp to list}else{ If {difference between present time and earliest time <1s){frequency limit}else{access}}}Copy the code

Redis is single-process and single-threaded, so redis operations are atomic operations in concurrent invocation scenarios. That will satisfy the goal. But this solution requires more memory. Each key needs to maintain a list, which is of high space complexity.

Classic models: leaky bucket, token bucket

You must have heard that there are two classic algorithm models for traffic shaping or limiting, namely the leaky bucket algorithm and the token bucket algorithm.

There are also many tools that implement these algorithms for us, such as Guava’s RateLimiter, which implements the token bucket algorithm.

Guava’s RateLimiter is a standalone implementation. The standalone implementation does not require redis and has a lower latency overall. This is appropriate for the frequency control scenario of the caller we mentioned above.

Let’s take a look at what these model algorithms are and how they differ and relate to the implementation above.

Bucket algorithm

The basic idea of the leaky bucket algorithm is to issue call requests at a constant rate. The diagram below:

  • Service Request speed Depending on the actual situation, requests are put into buckets, but if the buckets are full, requests are discarded
  • The leaky bucket sends requests at a maximum constant rate.

If flow control is handed over to a new thread, the implementation logic is clear and simple, and the pseudocode is as follows

Service requests are placed in a queue (the queue size is the size of a bucket). New threads send requests out of the queue at a constant rate (a fixed time per sleep);Copy the code

The maximum rate of the leaky bucket algorithm is constant and can meet the target of 100 times per second. However, if the service requests are too large, the leaky bucket algorithm will be discarded, which is unacceptable in some scenarios.

Token bucket algorithm

The basic idea of token bucket algorithm is that the business requests to get the token itself, and can send the request if it can get the token. The diagram below:

  • The ticket service generates tickets at a constant rate and places them in the token bucket
  • Service request to obtain the number of tickets required,
  • If you can get enough tickets, the request is executed
    • Otherwise, traffic limiting is blocked.

If you also hand over flow control to a new thread, the implementation is also very simple:

A thread adds a ticket to the bucket at a constant rate (the bucket typically does not exceed traffic per unit of time); The service requests to obtain the corresponding number of tickets from the bucket. If the number cannot be obtained, the service waits. Otherwise, execute the corresponding request;Copy the code

Guava implementation

Guava RateLimiter also implements the token bucket algorithm, but rather than hand over flow control to a new thread, the basic idea is to let the thread invoked by the business side calculate how long to block itself, thus saving a ticket thread.

The basic scheme is:

Maintain an instance of Rate Limiter, Limiter; If the business thread acquire n tickets, it calculates the time to block, updates those conveniences, and then lets itself block for a certain amount of time before starting the callCopy the code

In Guava RateLimiter’s implementation, the generation time of the last call ticket is generally assumed by the next call.

Summary of token bucket algorithm

The token bucket algorithm meets the server’s goal of “up to 100 times per second” without the risk of dropping requests.

conclusion

Ultimately, traffic limiting or Rate Limit is to Limit the use of computer resources and prevent service crashes. This article through the analysis of the scene, and frequency limit to achieve the goal, draw the overall architecture, summed up the model and implementation of some views, I hope to help you.

If you feel helpful, you can connect with three keys