Discuss various implementations of traffic limiting from building distributed kill systems

preface

As the saying goes, Rome was not built in a day, nor was Rome built in a day. The case took shape two weeks ago and was shared on China’s largest gay dating website, Miyun. At the same time, I also received many suggestions and complaints from my friends. I never think distributed, cluster, second kill these should be the patent of big factory, in the Internet today no matter what time to arm themselves, only in this way, maybe your spring is in tomorrow.

In the process of developing the seckill system example, the previous focus was on sharing queues, caches, locks and distributed locks, and statics. The purpose of cache is to improve the system access speed and enhance the processing capacity of the system. Distributed lock solves the problem of data security and consistency under the cluster. Statics definitely takes the pressure off the cache and DB layers.

Current limiting

However, no matter how awesome the machine is, or how optimized the design is, we also have to deal with special scenes. In the case of secakill, there may be millions of users snapping it up, but the number of products is much smaller than the number of users. If all of these requests are queued or cached, it means nothing to the end result, just adding fancy data in the background. Therefore, in order to reduce resource waste and reduce back-end pressure, we also need to limit the traffic flow of seckill to ensure the normal service of some users.

In the case of seckill interfaces, when the access frequency or concurrent requests exceed their tolerance range, at this time we should consider limiting the flow to ensure the availability of the interface, in order to prevent the system from being overwhelmed by unexpected requests. The usual strategy is to deny unwanted access or queue unwanted access for the service.

Current limit algorithm

Any traffic limiting is not aimless, nor can it be solved by a switch. The commonly used traffic limiting algorithms are token bucket and leaky bucket.

The token bucket

Token bucket algorithm is one of the most commonly used algorithms in Traffic Shaping and Rate Limiting. Typically, the token bucket algorithm is used to control the amount of data sent to the network and to allow the delivery of burst data (Encyclopedia).

In a seckill activity, the user’s request rate is variable, which we assume is 10r/s, and tokens are placed in the token bucket at a rate of 5 tokens per second, with a maximum of 20 tokens in the bucket. Think carefully if there is always a part of the request that is discarded.

bucket

Leaky bucket algorithm is used to control the rate of data injection into the network and smooth the burst traffic on the network. The leaky bucket algorithm provides a mechanism by which burst traffic can be shaped to provide a steady flow of traffic to the network (encyclopedia).

The token bucket is whatever your incoming rate is, I will process it at the set rate. If the bucket is full, I will refuse service.

Application of current limiting

Tomcat

In the Tomcat container, we can customize the thread pool, configure the maximum number of connections, request processing queue and other parameters to achieve the purpose of limiting traffic.

/conf/server. XML file to configure a thread pool before Connector: /conf/server. XML

<Executor name="tomcatThreadPool"
        namePrefix="tomcatThreadPool-"
        maxThreads="1000"
        maxIdleTime="300000"
        minSpareThreads="200"/>Copy the code

Name: indicates the name of the shared thread pool. This is the name that Connector will reference to share the thread pool, and it must be unique. Default: None;
NamePrefix: On the JVM, each running thread can have a name string. This property sets a prefix to the name string for each thread in the thread pool, and Tomcat appends the thread number to this prefix. Default value: tomcat-exec-;
MaxThreads: The maximum number of threads that the thread pool can hold. Default value: 200;
MaxIdleTime: The amount of time, in milliseconds, that an idle thread is allowed to last before Tomcat closes it. Idle threads are closed only when the number of active threads is greater than the minSpareThread value. Default value: 60000(one minute).
MinSpareThreads: The minimum number of inactive threads that Tomcat should always have open. Default value: 25.

Configure the Connector

<Connector executor="tomcatThreadPool"
           port="8080" protocol="HTTP/1.1"
           connectionTimeout="20000"
           redirectPort="8443"
           minProcessors="5"
           maxProcessors="75"
           acceptCount="1000"/>Copy the code

Executor: indicates the thread pool corresponding to this parameter value.
MinProcessors: The number of threads created to process requests when the server starts;
MaxProcessors: The maximum number of threads that can be created to process requests;
AcceptCount: Specifies the number of requests that can be placed in the queue when all available threads for processing requests are used. Requests exceeding this number will not be processed.

API current limiting

The number of interface requests is hundreds or even thousands of times higher than normal. As a result, the interface may become unavailable and the whole system may crash, and other services may be affected.

So how to deal with such sudden events? Here, we use RateLimiter, a stream limiting tool class provided by the open source toolkit Guava, for API stream limiting. This class is based on the “token bucket algorithm”, which is out of the box.

Custom define annotations

/** * @target ({ElementType.PARAMETER, ElementType.METHOD}) @Retention(RetentionPolicy.RUNTIME) @Documented public @interface ServiceLimit { String description() default ""; }Copy the code

Custom section

/** * AOP */ @component @scope @aspect public Class LimitAspect {// only 100 tokens are issued per second. Private static RateLimiter RateLimiter = RateLimiter. Create (100.0); / / Service layer cutting current limiting the @pointcut (" @ the annotation (com.itstyle.seckill.com mon. Aop. ServiceLimit) ") public void ServiceAspect () {} @Around("ServiceAspect()") public Object around(ProceedingJoinPoint joinPoint) { Boolean flag = rateLimiter.tryAcquire(); Object obj = null; try { if(flag){ obj = joinPoint.proceed(); } } catch (Throwable e) { e.printStackTrace(); } return obj; }}Copy the code

Business implementation:

@override @servicelimit @transactional public Result startSeckil(long seckillId, long userId) {Copy the code

Distributed current limiting

Nginx

How to use Nginx to implement basic traffic limiting, such as limiting a single IP address to 50 accesses per second. With the Nginx stream limiting module, we can set up a 503 error to be returned to the client if the number of concurrent connections exceeds our setting.

Configure nginx. Conf

Limit_req_zone $binary_remote_addr $URI zone= API_read :20m rate=50r/s; Zone limit_conn_zone $binary_remote_ADDR zone= perip_CONN :10m; Zone limit_conn_zone $server_name zone= perServer_CONN :100m; server { listen 80; server_name seckill.52itstyle.com; index index.jsp; The default value is 0 limit_req zone= API_read burst=5; 2 limit_CONN perip_CONN 2; Limit_conn PerServer_CONN 1000; # Connect limit_rate 100K; proxy_pass http://seckill; } } upstream seckill { fair; Server 172.16.1.120:8080 weight=1 max_fails=2 fail_timeout=30s; Server 172.16.1.130:8080 weight=1 max_fails=2 fail_timeout=30s; }Copy the code

Configuration instructions

imit_conn_zone

Is a container that defines session state for each IP address. This example defines a 100m container that can handle 3.2 million sessions at 32bytes per session.

limit_rate 300k;

Speed limit 300K per connection. Note that speed limit is for connections, not IP addresses. If an IP address allows two concurrent connections, the IP address is limit_rate x 2.

Burst = 5;

This is the size of a bucket, and if a request exceeds the system’s processing speed, it will be placed in the bucket, waiting to be processed. If the bucket is full, then sorry, the request returns 503 and the client gets a busy response from the server. If the system processes requests slowly, the requests in the bucket cannot stay in the bucket for a long time. If the request exceeds a certain period of time, the bucket returns a busy response to the server.

OpenResty

If you are familiar with him from the back, you will know that he is a fat man who donated the ticket proceeds to OpenResty at the Smartisan T2 press conference in 2015.

Here we use the OpenResty open source traffic limiting scheme. The test case uses the latest version of OpenResty1.13.6.1, with the lua-resty-limit-Traffic module and the case, which is easier to implement.

Limit the total number of concurrent requests or requests on an interface

During the activity of seckill, due to sudden traffic surge, the stability of the whole system may be affected and crash may occur. In this case, we need to limit the total number of concurrent requests/requests of the seckill interface.

Lua-resty-limit-traffic resty.limit.count is an important component of lua-resty-limit-traffic. For details, see openResty /lua/limit_count.

Limit the number of time window requests on an interface

In the second kill scenario, sometimes it is not all human mouse, such as 12306’s grab tickets software, software brush tickets much faster than human mouse. At this point we need to limit the number of requests per unit of time on the client so that brushing is not so rampant. Of course, ticket-snatching software will always find a way around your defenses, which, on the other hand, promotes technological progress.

Lua-resty-limit-traffic resty.limit.conn module, see openResty /lua/limit_conn.lua for more details.

Smoothing limits the number of interface requests

The previous limiting mode allowed burst flow, that is, instantaneous flow was allowed. If the sudden traffic is not restricted, the stability of the entire system is affected. Therefore, in the seckill scenario, the request must be processed at an average rate of 20r/s.

Here, we use the resty.limit-req module in lua-resty-limit-traffic to achieve leakage bucket traffic limiting and token bucket traffic limiting.

The fundamental difference between a leaky bucket and a token bucket is how to handle requests that exceed the request rate. Leaky buckets put requests into queues to wait for even processing. When the queue is full, service is denied. The token bucket processes these burst requests directly as the bucket capacity allows.

bucket

The bucket capacity is greater than zero and is in delayed mode. If the bucket is not full, the request is queued for processing at a fixed rate. Otherwise, the request is rejected.

The token bucket

The bucket capacity is greater than zero and is in non-delay mode. If a token exists in the bucket, burst traffic is allowed, otherwise the request is rejected.

Pressure test

To test the above configuration, we use AB pressure test. In Linux, run the following command:

# yum -y install httpd-tools # yum -y install httpd-tools # yum -y install httpd-toolsCopy the code

Test command:

Ab-n 1000-c 100 http://127.0.0.1/Copy the code

Test results:

Server Hostname: 127.0.0.1 #IP Server Port: 80 # Request Port number Document Path: / # file path Document Length: 12 bytes # page number Concurrency Level: 100 # Taken for tests: 4.999 seconds # Total access time Complete requests: 1000 # Total request tree Failed requests: 0 Write Errors: 0 Total extension: 140000 bytes # HTML processed: 12000 bytes # HTML page actual total bytes Requests per second: 200.06 [#/ SEC] (mean) # number of requests per second [ms] (mean) # number of requests per second 4.999 [ms] (mean, across all concurrent requests) # Transfer rate: 27.35 [Kbytes/ SEC] Received # Connection Times (ms) min mean[+/-sd] Median Max Connect: 00 0.8 0 4 Processing: 5 474 89.1 500 501 Waiting: 2 474 89.2 500 501 Total: 9 475 88.4 500 501 Percentage of the requests served within a certain time (MS) 50% 500 66% 500 75% 500 80% 500 90% 501 95% 501 98% 501 99% 501 100% 501 (longest request)Copy the code

Source code: from 0 to 1 build distributed kill system

conclusion

The above flow limiting scheme is just a simple summary for the second kill case, we do not deliberately distinguish the quality of that scheme, as long as it is suitable for the business scene is the best.

reference

Github.com/openresty/l… Blog.52itstyle.com/archives/17… Blog.52itstyle.com/archives/77…

Author: Xiao Qi

Reference: blog.52itstyle.com

Sharing is a happy experience, and it also witnessed the personal growth process. Most of the articles are summary of work experience and daily learning accumulation. Based on my own cognitive deficiencies, I would like to ask you to correct me and make progress together.

The copyright of this article belongs to the author, welcome to reprint, but without the consent of the author must retain this paragraph of statement, and give a prominent position in the article page, if you have any questions, please email ([email protected]) for consultation.

WebSocket seckill push notifications from building distributed seckill systems

Discuss various implementations of traffic limiting from building distributed kill systems

preface

Current limiting

Current limit algorithm

The token bucket

bucket

Application of current limiting

Tomcat

Configure the Connector

API current limiting

Distributed current limiting

Nginx

Configure nginx. Conf

Configuration instructions

OpenResty

Limit the total number of concurrent requests or requests on an interface

Limit the number of time window requests on an interface

Smoothing limits the number of interface requests

bucket

The token bucket

Pressure test

conclusion

reference

Related Posts

There are two scenarios for reading from a defined configuration file from NACOS

The final chapter of the Java Atom Class

RocketMQ consumer listening model parsing — graphical, source level parsing