Considering one-time energy optimization, a single 4-core 8G machine can support 50,000 QPS

preface

The theme of this article is to record one-time energy optimization, problems encountered in the process of optimization, and how to solve them. To provide you with an idea of optimization, the first point to declare is that my way is not unique, you in the road to performance optimization problems are absolutely more than one solution.

How to optimize

First of all, we should make it clear that from the need to talk about optimization is playing rogue, so who said to you in xx machine to achieve millions of concurrent, basically can be considered as do not understand understand, simple concurrent number is completely meaningless. Secondly, we must have a goal before optimization, to what extent we need to optimize, optimization without a clear goal is not controllable. Then, we need to figure out exactly where the performance bottlenecks are, rather than messing around aimlessly.

Requirements describe

For this project, I was in charge of a separate module in my previous company. Originally, it was integrated into the code of the master station, but later, due to too much concurrency, IN order to prevent problems from dragging down the service of the master station, I was solely responsible for the separation. The breakdown requirements for this module are: stress test QPS cannot be less than 30,000, database load cannot exceed 50%, server load cannot exceed 70%, single request duration cannot exceed 70ms, and error rate cannot exceed 5%.

The configuration of the environment is as follows: Server: 4-core 8G memory, centos7 system, SSD database: Mysql5.7, maximum connection number 800 cache: Redis, 1G capacity. The above environments are purchased from Tencent Cloud services. Pressure measurement tool: LocUST, using Tencent elastic scaling to achieve distributed pressure measurement.

The requirements are as follows: The user accesses the home page and checks whether there is a suitable popup configuration in the database. If not, the user waits for the next request. If there is a suitable popup configuration, the user returns to the front end. There are multiple conditional branches at the beginning. If the user clicks the popup, the user clicks will be recorded and the user does not return to the configuration within the configured time. If the user does not click, the user will return to the configuration 24 hours later.

The key analysis

According to the requirements, we know several important points: 1. We need to find out the popover configuration suitable for the user; 2. We need to record the time when the user returns to the configuration next time and record it in the database; 3.

tuning

As we can see, all three of the above points have database operations, not only reading libraries, but also writing libraries. From here we can see that if you do not add cache, all the requests are pressed to the database, which is bound to fill up the total number of connections, there are access denial errors, at the same time because the SQL execution is too slow, resulting in the request can not be returned in time. Therefore, the first thing we need to do is to separate the write library operation, improve the response speed of each request, and optimize the database connection. The architecture diagram of the whole system is as follows:Write library operations are done in a first-in, first-out message queue. To reduce complexity, redis list is used to do this message queue.

Then the pressure test was carried out, and the results were as follows:

QPS around 6000 error 502 increased to 30%, server CPU between 60% and 70% jump back and forth, the number of database connections is occupied by the number of TCP connections around 6000, it is clear that the problem is still in the database, after checking SQL statements, The cause of the query is that the number of connections is used up because the database is read on every request while the configuration operation for the appropriate user is found. Because we only had 800 connections, there was bound to be a database bottleneck if we had too many requests. Ok, the problem is found, we continue to optimize, the updated architecture is as followsWe load all the configuration into the cache and only read the database if there is no configuration in the cache.

Then we pressure again, the results are as follows: the QPS pressure is about 20,000, the server CPU beats between 60% and 80%, the number of database connections is about 300, TPC connections per second is about 15,000.

This problem has been bothering me for a long time, because we have 20,000 QPS, but the number of TCP connections has not reached 20,000. I guess that the number of TCP connections is the problem that causes the bottleneck, but for what reason cannot be found out for the time being.

Enter the ulimit -n command on the terminal, and the result is displayed as 65535. See here, I feel that the number of socket connections is not the reason for limiting us, in order to verify the guess, Example Change the number of socket connections to 100001.

Pressure test was carried out again, and the results were as follows:

The server CPU beats between 60% and 80%, the number of database connections is around 300, and the number of TPC connections per second is around 17,000.

Although has a little improvement, but there is no substantial change, in the next few days time, I found no optimized solutions can be found that the days really very afflictive, can not find out optimal solution, a few days later, again to comb the problem again and found that although the socket connections enough, but not all be used, speculation, every time after the request, The TCP connection is not released immediately, so the socket cannot be reused. After searching for information, I found the problem.

The TCP connection is not released immediately after the four-way handshake. Instead, the TCP connection is in timewait state. In this case, the subsequent data on the client is not received. Net.ipv4. tcp_max_tw_buckets = net.ipv4.tcp_max_tw_buckets = net.ipv4.tcp_max_tw_buckets = net.ipv4.tcp_max_tw_buckets = net.ipv4.tcp_max_tw_buckets Number of timewaits, default is 180,000. We adjust to 6000, then turn on timewait quick recycle, and turn on reuse. The full parameters are optimized as follows

# number of timewaits, default is 180000. Net.ipv4. tcp_max_tw_buckets = 6000 net.ipv4.ip_local_port_range = 1024 65000 Net.ipv4. tcp_tw_recycle = 1 # Enable reuse. Allows time-wait Sockets to be reused for new TCP connections. net.ipv4.tcp_tw_reuse = 1Copy the code

We pressure again, the results show: QPS50,000, server CPU70%, database connection is normal, TCP connection is normal, the average response time is 60ms, error rate is 0%.

conclusion

At this point, the development, tuning, and pressure testing of the entire service is complete. Review this time tuning, got a lot of experience, the most important of all, a deep understanding of web development is not an independent individual, but the network, database, programming language, operating system, such as the combination of multi-discipline engineering practice, this requires that web developers have a solid foundation of knowledge, otherwise there is a problem still don’t know how to analysis.

Source | segmentfault.com/a/119000001…

Considering one-time energy optimization, a single 4-core 8G machine can support 50,000 QPS

Related Posts

Spring Cloud Alibaba (III) : Project construction

Golang custom Error avoidance practice

Java 8 functional programming