My official account is MarkerHub and my website is Markerhub.com

Small Hub read:

This paper introduces the commonly used flow limiting algorithm, such as counter, leaky bucket algorithm, token bucket algorithm, and then introduces guava tool use, nGINx middleware deployment and other schemes. It should be a very common means of service.


  • By Nick Hao
  • cnblogs.com/haoxinyue/p/6792309.html

Kaitao said in his blog: ** has three powerful tools to protect the system when developing high concurrency systems: **** caching, degradation and limiting traffic. ** This paper introduces some related concepts, algorithms and common implementation methods of current limiting based on the author’s experience.

The cache

Caching is easy to understand. In a large, high-concurrency system, without caching the database will burst in minutes and the system will crash in seconds.

Using cache can not only improve system access speed and concurrent access, but also protect the database and protect the system effectively.

Large sites tend to be mostly “read”, and the use of caching is easy to think of. Caching also often plays a very important role in large “write” systems.

For example, cumulative data writing in batches, cache queues in memory (production and consumption), and HBase data writing mechanism are all used to improve system throughput or protect the system. Even messaging middleware, you can think of as a distributed data cache.

demotion

Service degradation refers to the strategic degradation of some services and pages according to the current service status and traffic when the server pressure increases dramatically, so as to release server resources and ensure the normal operation of core tasks.

Downgrades often specify different levels and perform different processing for different levels of exceptions.

According to the service mode: service can be rejected, delayed, or sometimes random service. According to the scope of service: you can cut a function, you can also cut some modules.

In short, service degradation needs to adopt different degradation strategies according to different business requirements. The main purpose is that service is lossy but better than nothing.

Current limiting

Traffic limiting can be regarded as a kind of service degradation. Traffic limiting is to protect the system by limiting the input and output traffic of the system.

Generally speaking, the throughput of the system can be measured. In order to ensure the stable operation of the system, once the threshold of the need to limit is reached, it is necessary to limit the flow and take some measures to complete the purpose of limiting the flow. For example: delay processing, reject processing, or partial reject processing and so on.

Algorithms for limiting traffic

The common limiting algorithms are: counter, leaky bucket and token bucket algorithm.

counter

The counter is the simplest and most crude algorithm.

For example, a service can only handle a maximum of 100 requests per second. We can set up a 1-second sliding window with 10 cells, each 100 milliseconds, moving every 100 milliseconds, each movement needs to record the number of current service requests.

The number of times memory needs to be saved 10 times. This can be done using the data structure LinkedList. Each time a cell moves, determine if the current number of visits differs from the last LinkedList by more than 100. If so, stream limiting is required.

Obviously, the more grids the sliding window is divided, the smoother the rolling of the sliding window will be, and the more accurate the statistics of flow limiting will be.

Example code is as follows:

Long counter = 0L; // Use LinkedList to record the 10 squares of the sliding window. LinkedList<Long> ll = new LinkedList<Long>(); public static void main(String[] args) { Counter counter = new Counter(); counter.doCheck(); } private void doCheck() { while (true) { ll.addLast(counter); if (ll.size() > 10) { ll.removeFirst(); If ((ll.peeklast () -ll.peekfirst ()) > 100) {//To limit rate} thread.sleep (100); }}Copy the code

Bucket algorithm

The Leaky bucket algorithm is a common Traffic limiting algorithm that can be used to implement Traffic Shaping and Traffic Policing.

I posted a map from Wikipedia to help you understand:

The main concepts of leaky bucket algorithm are as follows:

  • A leaky bucket with a fixed capacity that drains water droplets at a constant rate;

  • If the bucket is empty, no drops of water need to flow;

  • Can flow water droplets into the drain bucket at any rate;

  • If the incoming water droplets exceed the bucket’s capacity, the incoming water droplets overflow (are discarded), while the leaking bucket capacity remains constant.

Leaky bucket algorithm is easy to implement, can be implemented in a stand-alone system using queues (.NET TPL DataFlow can handle similar problems, you can find the relevant introduction here), in a distributed environment message-oriented middleware or Redis are optional solutions.

Token bucket algorithm

The token bucket algorithm is a bucket that holds fixed capacity tokens and adds tokens to the bucket at a fixed rate.

The token bucket algorithm can be described by the following concepts:

  • Tokens are placed into the token bucket at a constant rate. Like 10 per second.

  • A bucket can hold a maximum of B tokens. When the bucket is full, newly added tokens are discarded or rejected.

  • When a packet of n bytes arrives, n tokens are removed from the bucket and the packet is sent to the network.

  • If there are less than n tokens in the bucket, the token is not removed and the packet is curbed (either discarded or buffered).

The diagram below:

The token algorithm controls the rate of output according to the rate of token release, that is, the rate of to network in the figure above. To network can be understood as a handler of a message, executing a certain service or calling an RPC.

Comparison of leaky buckets and token buckets

The token bucket can control and adjust the rate of data processing at run time to handle burst traffic at a certain time.

Increasing the frequency of issuing tokens can improve the overall data processing speed, while increasing or slowing down the token issuing speed and reducing the overall data processing speed through the number of tokens obtained each time. A leaky bucket does not work, because its flow rate is fixed, and so is the process speed.

Overall, the token bucket algorithm is better, but the implementation is more complex.

Flow limiting algorithm implementation

Guava

Guava is a Google open source project that contains several core libraries that are widely relied upon by Google’s Java project, including RateLimiter, which provides token bucket implementations: SmoothBursty and SmoothWarmingUp are implemented.

1. Conventional rate:

Create a stream limiter and set the number of tokens placed per second: 2. The returned RateLimiter object guarantees that no more than 2 tokens will be given in a second and is placed at a fixed rate.

Achieve smooth output effect

Public void test() {/** * create a stream limiter and set the number of tokens placed per second to 2. The rate is two messages per second. * The returned RateLimiter object guarantees that no more than 2 tokens will be given in 1 second and is placed at a fixed rate. */ RateLimiter r = RateLimiter. Create (2); While (true) {/** * acquire() acquires a token and returns the time it took to acquire the token. * If there is no token in the bucket, wait until there is one. * Acquire (N) can acquire more than one token. */ System.out.println(r.acquire()); }}Copy the code

The result of the above code execution is shown below, basically 0.5 seconds per data. Only after the token is obtained can the data be processed to smooth out the output data or call the interface.

The return value of acquire() is the time to wait for the token, and if some burst of traffic needs to be processed, a threshold can be set for this return value to be processed under different circumstances, such as expired discard.

2. Burst flow:

Burst traffic can be large or small. So let’s start with a case where there’s a lot of sudden. Same traffic as in the example above, 2 data tokens per second. The following code uses the acquire method to specify parameters.

System.out.println(r.acquire(2));
System.out.println(r.acquire(1));
System.out.println(r.acquire(1));
System.out.println(r.acquire(1));


Copy the code

You get output similar to the following.

 

If you want to process more data at once, you need more tokens. The code gets two tokens first, so instead of getting the next one after 0.5 seconds, it gets the next one after 1 second, and then goes back to normal speed.

This is a burst of many examples, if it is a burst of no traffic, the following code:

System.out.println(r.acquire(1));
Thread.sleep(2000);
System.out.println(r.acquire(1));
System.out.println(r.acquire(1));
System.out.println(r.acquire(1));


Copy the code

Results similar to the following are obtained:

After waiting two seconds, there are three tokens in the bucket that can be retrieved consecutively without taking any time. Dealing with a burst is essentially a constant output per unit time.

Both methods use a subclass of RateLimiter, SmoothBursty. Another subclass is SmoothWarmingUp, which provides a buffered traffic output solution.

/** * Create a current limiter and set the number of tokens placed per second to 2. The rate is 210 messages per second. * The returned RateLimiter object guarantees that no more than 2 tokens will be given in 1 second and is placed at a fixed rate. */ RateLimiter r = RateLimiter. Create (2,3, timeunit.seconds); While (true) {/** * acquire() acquires a token and returns the time it took to acquire the token. * If there is no token in the bucket, wait until there is one. * Acquire (N) can acquire more than one token. */ System.out.println(r.acquire(1)); System.out.println(r.acquire(1)); System.out.println(r.acquire(1)); System.out.println(r.acquire(1)); }Copy the code

The output is shown below. Since the buffer time is set to 3 seconds, the token bucket does not initially give a message 0.5 seconds. Instead, it forms a smooth linear gradient, increasing the frequency, reaching the original set frequency within 3 seconds, and then output at a fixed frequency.

The three times circled in red add up to about three seconds. This feature is suitable for scenarios where the system needs a little time to “warm up” after startup.

Nginx

Nginx provides two modules for traffic limiting:

  • Connection number flow limiting module ngx_HTTP_limit_conn_module

  • Leaky bucket algorithm to achieve the request traffic limiting module ngx_HTTP_limit_req_module

1.  ngx_http_limit_conn_module

We often encounter this situation, the server traffic is abnormal, the load is too high and so on.

Malicious access with large traffic will waste bandwidth, put pressure on the server, and affect services. Therefore, it is often considered to limit the number of connections and concurrent access to the same IP address. The ngx_HTTP_limit_conn_module module implements this requirement.

The module can limit the number of connections per key value based on the defined key, like the number of connections from an IP source.

Not all connections are counted by the module, only those where the requests being processed (whose headers have been fully read in) are counted.

We can add the following configuration restrictions to the HTTP {} of nginx_conf:

# one limit_conn_zone $binary_remote_addr zone=one:10m; The default error level is limit_conn_log_level error. 503 limit_conn_status 503 by default;Copy the code

Then add the following code to server{} :

Limit_conn one 1;Copy the code

Then we use ab tests to simulate concurrent requests:

Ab – 5 – c n 5 http://10.23.22.239/index.html

It is clear that the concurrency is limited, with 503 being displayed for those exceeding the threshold:

The configuration is similar to that of client IP addresses.

Limit_conn_zone $server_name zone= perServer :10m; Limit_conn PerServer 1;Copy the code

2.  ngx_http_limit_req_module

We used the ngx_HTTP_limit_conn_module module to limit the number of connections. So what about limiting the number of requests?

This is done through the ngx_HTTP_limit_req_module module, which can limit the frequency of request processing by defining key values.

In particular, you can limit the frequency with which requests from a single IP address are processed. The limiting method is to use a funnel algorithm, which processes a fixed number of requests per second and delays excessive requests.

If the frequency of requests exceeds the value configured for the limit domain, request processing is delayed or discarded, so all requests are processed at the defined frequency.

In HTTP {}

The region name is one, the size is 10m, and the average frequency of processing requests cannot exceed one per second. limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;Copy the code

Configure in server{}

Limit_req zone=one burst=5;Copy the code

The Settings above define a limit of 1 request per second per IP. The server can cache up to five requests for each IP address. If five requests are processed, they are discarded.

Use the AB test to simulate 10 consecutive client visits:

Ab – 10 – c n 10 http://10.23.22.239/index.html

As shown in the following figure, the number of passes is set to 5. There are 10 requests, and the first request is processed immediately. Numbers two to six are stored in buckets. Because the bucket was full, nodelay was not set so the remaining four requests were discarded.

Recommended reading

Great, this Java site, everything! https://markerhub.com

The UP master of this B station, speaks Java really good!

Too great! The latest edition of Java programming ideas can be viewed online!