10 billion requests? Let’s give it a try

Author: ppmsn2005#gmail.com project: github.com/xiaojiaqi/1… Wiki: github.com/xiaojiaqi/1…

1. Introduction


Goals achieved: supported 1 million connections in a single machine, simulated the process of shaking and sending red packets, and the peak QPS in a single machine was 60,000, which smoothly supported the business.

Note: this article and all the contents of the author, only represent personal understanding and practice, the process and wechat team has nothing to do with the real online system is different, just from some technical points of practice, please distinguish. Due to the author’s limited level, any questions are the author’s responsibility, please contact ppmsn2005#gmail.com. 10 billion requests? Let’s give it a try

2. Background


PPS: Packets per second Indicates the number of Packets per second

Red envelope: The client sends a red envelope request, and if the system has a red envelope, it returns and the user gets the red envelope

Send red envelope: produce a red envelope containing a certain amount of money, red envelope designated several users, each user will receive red envelope information, users can send requests to open red envelope, access to part of the amount.

3. Define your goals


Before starting any system, we should make sure that our system has a load capacity after completion. Total number of users: According to the article, we can know that there are 638 access servers, and the upper limit of service is about 1.43 billion users, so the upper limit of user load is about 1.43 billion /638 = 2.28 million users/set. But China will surely not currently has around 1.4 billion users online at the same time, the reference qiye.qianzhan.com/show/detail… In Q2 2016, wechat users are about 800 million, living in 540 million. Therefore, during the Spring Festival of 2015, although there will be a lot of users, there will be less than 540 million online users at the same time. ##3.2. Number of servers: there are 638 servers in total. According to the normal operation and maintenance design, I believe that all servers will not be completely online and there will be some hardware redundancy to prevent sudden hardware failure. Assume that there are 600 access servers. Number of loads supported by a single server: Number of users supported by each server: 540 million /600 = 900 thousand That’s 900,000 users per machine on average. If the real situation is more than 900,000, the simulated situation may be biased, but I think QPS is more important in this experiment.

3.4. Single peak QPS:

It is clearly stated in the article that it is 14 million QPS. This value is very high, but since there are 600 servers, the QPS of a single machine is 14 million /600= about 23,000 QPS. The article once mentioned that the system can support 40 million QPS. Then the QPS of the system should be at least 40 million /600 = about 66,000, which is about three times the current number and will not be touched in the short term. But I’m sure there have been stress tests.

3.5. Giving out red envelopes:

It is mentioned in the article that the system delivers at the speed of 50,000 per second, then the single-machine delivers at the speed of 50,000/600 = 83 per second, that is, the single-machine system should guarantee to deliver at the speed of 83 per second. Finally, considering the authenticity of the system, there are at least user login action, take red envelopes such business. A real system would also include services such as chat.

Finally, taking a look at the demand for 10 billion red packets, assuming that it happens evenly in the 4 hours of the Spring Festival Gala, the QPS of the server should be 10000000000/600/3600/4.0=1157. That’s more than 1,000 times per second for a single machine, which is actually not that high. If the peak speed of 14 million is digested completely, 10000000000/(1400*10000) = 714 seconds, that is to say, it only takes 11 minutes of peak persistence to complete all requests. One of the characteristics of Internet products is that the peaks are very high and don’t last very long.

Conclusion:

From a single server. It needs to meet the following conditions: 1. Support at least 1 million connected users; 2. At least 23,000 QPS per second, and we set the goal a little bit higher here at 30,000 and 60,000 respectively. 3. Red envelope shaking: support the speed of 83 per second to send red envelope, that is to say, there are 23,000 requests for red envelope shaking every second, of which 83 requests can win the red envelope, the remaining 22,900 requests will know that they did not win the red envelope. Of course, after the client receives the red envelope, it also needs to ensure that the number of red envelopes on both sides of the client and server is the same as the amount in the red envelope. Since there is no payment module, we also doubled the requirement to reach the distribution speed of 200 red packets per second. Support users to send red packets between each other to ensure that the number of red packets on both sides and the amount in the red packets are the same. We also set the distribution rate of 200 red packets per second as our target.

It is very difficult to simulate the whole system completely. First of all, you need a huge number of servers, and then you need hundreds of millions of simulated clients. It’s impossible for me, but one thing is for sure, the whole system can scale horizontally, so we can simulate a million clients on a single server and that’s 1/600 of the simulation.

Differences from existing systems: Unlike most high QPS tests, the emphasis of this system is different. I made some comparisons between the two.

Common high QPS system pressure test This system pressure test
The number of connections Generally <1000 (within a few hundred) 1 million (1 million)
Single-connection throughput Very large tens of megabytes per connection Very small tens of bytes per connection at a time
Number of I/OS required Not much Very much


#4. Basic software and hardware

# # 4.1 software:

Golang 1.8r3, shell, python (the development did not use c++ but Golang because the initial prototype using Golang met system requirements. Golang still has some problems, but compared to development efficiency, this loss is acceptable.)

Server operating system:

Ubuntu 12.04

Client operating system:

Debian 5.0

Hardware server: Dell R2950 8 nuclear physics machine, not only occupy other business at work, 16G memory. This hardware is about 7 years old, so performance should not be very demanding.

Server Hardware version:



Server CPU information:

Client: aN esxi 5.0 VM configured with 4 cores and 5 gb memory. There are 17 of them, each with 60,000 connections to the server. Complete 1 million client simulations

5. Technical analysis and implementation














5.2) 30000 QPS

This question needs to be looked at in two parts: the client side and the server side.

The client QPS

Because there are 1 million connections on the server, the QPS is 30,000. This means that every 33 seconds for each connection, a red envelope request is sent to the server. Because the number of connections that can be established with a single IP address is about 60,000, there are 17 servers simulating client behavior at the same time. All we have to do is make sure we get that many requests to the server every second. Among them the technical key point is the client side collaboration. However, the startup time and connection time of each client are inconsistent, and the network is disconnected and reconnected. How can each client determine when it needs to send requests and how many requests it should send?

Here’s how I solved it: Using the NTP service, all the server time is synchronized, and the client uses the timestamp to determine how many requests it needs to send at this time. The algorithm is easy to implement: assuming 1 million users, the user IDS are 0-999999. If time() % 20 == user ID % 20, then the user with this ID should make the request in this second. If time() % 20 == user ID % 20, then the user with this ID should make the request in this second. This enables multiple clients to work together. Each client only needs to know the total number of users and QPS to make accurate requests on its own. (Expanded thinking: What if the QPS is 30,000, which is not divisible? How to ensure that the number of requests issued by each client is as balanced as possible?

QPS on the server side is relatively simple, it only needs to handle requests from the client side. But to get an objective picture of the situation, we need to do two more things.

First: You need to keep track of the number of requests handled per second, which requires burying counters in your code.

Second: we need to monitor the network, because the throughput of the network can objectively reflect the real data of QPS. To this end, I used python script and ethtool to write a simple tool, through which we can intuitively monitor the network packets through the situation. It can objectively show how much data traffic is happening on our network.

Tool screenshots:

5.3) Red envelope business

The business of shaking red packets is very simple. First, the server produces red packets at a certain speed. If the red packets are not taken, they are piled inside. The server receives a request from a client. If there is a red envelope in the server, it will tell the client that there is one, otherwise it will tell the client that there is no red envelope. Since there are 30,000 requests per second on a single machine, most of them will fail. Just take care of the locks. In order to reduce competition, I put all the users in different buckets. This reduces competition for locks. You can also use high-performance queues — Disruptor — to further improve performance if higher performance requirements are required later.

Note that payment was missing from the core service in my test environment, so the implementation was much easier. Another set of numbers: In 2016, taobao’s Double 11 transaction peak was only 120,000 yuan per second, and wechat red envelope distribution speed was 50,000 yuan per second, which is very difficult to achieve. (mt.sohu.com/20161111/n4…

The business of sending red packets is very simple. The system randomly generates some red packets and randomly selects some users. The system prompts these users to have red packets. These users only need to send a request to open the red envelope, and the system can randomly split part of the amount from the red envelope and give it to the users to complete the business. There is also no payment for the core service.

5.5) monitoring

Finally, we needed a monitoring system to understand what was going on in the system, and I borrowed another project of mine (github.com/xiaojiaqi/f…). Part of the code to complete the monitoring module, using this monitoring, the server and the client will send the current counter content to the monitoring, monitoring needs to make an integration and display of the data of each client. The logs are also recorded to provide raw data for later analysis. Online systems tend to use timing databases such as Opentsdb, where resources are limited, so a primitive scheme is used

The monitor logs look something like this

#6. Code implementation and analysis

On the code side, there aren’t a lot of tricks involved, just design ideas and golang itself. First of all, golang’s goroutine number control, because there are at least 1 million connections, so according to the ordinary design scheme, at least 2 or 3 million Goroutines working. This places a heavy burden on the system itself. The second is the management of a million connections, both connections and businesses can cause some mental strain. My design looks like this:

Start by splitting a million connections into different sets, each of which is an independent, parallel object. Each SET manages only a few thousand connections, and if a single SET works, I just need to add sets to improve the system’s processing power. According to the SET points and a benefit, can be a SET as a business unit, the pressure of different properties can be load on the server, such as 10 8-core machines management SET, 4 core machine management five SET can be fine-grained bypass pressure, and easy migration of dealing with the second caution to design the size of the data structure in each SET, Ensure that the pressure on each SET is not too high and messages do not accumulate. Again reducing the number of Gcroutines, using only one Goroutine per connection, sending messages only one Gcroutine in a SET, saving 1 million Goroutines. So the whole system only needs to keep one million and a few hundred Gcroutines to do business. The workflow for a system that saves a lot of CPU and memory is as follows: After each client connection is successful, the system allocates a Goroutine to read the client’s message, and when the message is read, it converts it into a message object and puts it on the receiving message queue in the SET, and then returns to fetch the next message. Inside the SET, there is a working Goroutine that does very simple and efficient things. It does the following: it checks the SET’s accept message, and it receives three types of messages

1. The client sends a red envelope request message

2. Other messages on the client, such as chat friends

3. The response of the server to the client message

For the first message client red envelope request message is handled like this, get the red envelope request message from the client, try to get a red envelope from the red envelope queue of SET, if get the red envelope information back to the client, otherwise construct a message not shaken, back to the corresponding client. For other messages on the second message client, such as chat friends, simply take the message from the queue and forward it to the back-end chat service queue, which then forwards the message out. For the third message the server responds to the client message. The SET simply finds the reserved user connection object in the SET based on the user ID in the message and sends it back.

For the red packet generation service, its work is very simple, just need to take turns in each SET red packet generation pair column to put the red packet object can be done. In this way, it can ensure that each SET is fair. Secondly, its work intensity is very low, which can ensure the stability of business.

See the code github.com/xiaojiaqi/1…

7 practice


The process of practice is divided into three stages

## Phase 1: Start server side and monitor side respectively, then start 17 clients one by one and let them establish 1 million links. On the server side, the ss command is used to count how many connections are established between each client and the server. Command is as follows: Alias ss2 = Ss – ant | grep 1025 | grep EST | awk – F: “{print $8}” | sort | uniq c ‘

The results are as follows:

## Phase 2: Using the HTTP interface of the client, adjust all the QPS of the client to 30,000, and let the client issue the request with 3W QPS strength.

Run the following command:

Observe the network monitoring and the feedback from the monitoring end, and find that THE QPS reaches the expected data

Network Monitoring Screenshot

Start a service that generates red packets on the server side. The service will deliver 40 thousand red packets at a rate of 200 red packets per second. When you observe the client’s log on the monitor, you will find that the red packets are basically obtained at a rate of 200 per second.

! “Wave red envelopes] (raw.githubusercontent.com/xiaojiaqi/1…

Wait until all the red envelopes are delivered, and then start a red envelope service, the service system will generate 20,000 red envelopes, 200 per second, each red envelope randomly designated 3 users, and to these 3 users issued a message, the client will automatically take the red envelope, and finally all the red envelopes are taken away.

Phase 3

Using the HTTP interface of the client, adjust all the QPS of the client to 60,000, and let the client issue a request with 6W QPS strength.

Similarly, on the server side, start a service that generates red packets, which are delivered at a rate of 200 red packets per second. 40,000 of them. When you observe the client’s log on the monitor, you will find that the red packets are basically obtained at a rate of 200 per second. Wait until all the red envelopes are delivered, and then start a red envelope service, the service system will generate 20,000 red envelopes, 200 per second, each red envelope randomly designated 3 users, and to these 3 users issued a message, the client will automatically take the red envelope, and finally all the red envelopes are taken away.

Finally, practice is accomplished.

In practice, both the server and the client send their own internal counter records to the monitor, which become logs. We use simple Python scripts and gnuplt drawing tools to visualize the process in practice, thereby verifying the running process.

The first is the client’S QPS sending data



The abscissa of this graph is time in seconds, and the ordinate is QPS, which represents the QPS of all requests sent by clients at this time.

The first segment of the graph, with a few small spikes, is 1 million clients making connections. The second segment of the graph is 30,000 QPS, and we can see that the data is fairly stable at 30,000. Finally, the 60,000 QPS range. But as you can see from the whole picture, QPS are not perfectly aligned with what we want them to be. This is mainly caused by the following reasons

  1. When too many Goroutines are running at the same time, the sleep timing is not accurate and is offset. I think this is due to golang’s own scheduling. Of course, if the CPU is more powerful, this phenomenon will disappear.
  2. Due to network impact, the client may be delayed when initiating a connection, causing the connection to be incomplete in the first one second.
  3. When the server load is heavy, packet loss occurs on a 1000M network. You can run the ifconfig command to observe this phenomenon, so QPS fluctuations occur.

The second is the QPS diagram that the server processes

The server also has three intervals corresponding to the direction of the client, which is very similar to the situation of the client. But we saw a sharp drop in the system’s processing power at about 22:57, followed by a sharp increase. This shows that the code needs to be optimized.

As a whole, the QPS of the server is relatively stable in the range of 30,000 QPS, while the processing of the server becomes unstable at 60,000 QSP. I believe it has something to do with my code, and if I continue to refine it, it will work even better.

Combine the two graphs

Basically, it is consistent, which also proves that the system is designed as expected.

This is the state change diagram of the amount of red packets generated

It’s very stable.

This is the red envelope state that the client gets every second

It can be found that in the range of 30,000 QPS, the number of red packets obtained by the client per second is basically around 200, and at the time of 60,000 QPS, there is a violent jitter, which cannot guarantee the value of 200. I think it is mainly 60,000 QPS, the network jitter intensified, resulting in the number of red packets in the jitter.

Finally, there is golang’s own Pprof information, which has gc times over 10ms, which is acceptable considering it is a 7 year old hardware and non-exclusive mode.

Conclusion:

According to the design goal, we simulated and designed a system that supports 1 million users and can support at least 30,000 QPS and at most 60,000 QPS per second. We simply simulated the process of wechat’s red envelope shaking and sending red envelopes. It can be said that the desired purpose was achieved. If 600 hosts can support 60,000 QPS each, it only takes 7 minutes to complete 10 billion red envelope requests.

While this prototype simply does the preset business, how does it differ from a real service? I’ve listed them

The difference between Real service The simulation
Business complex More complicated Very simple
agreement Protobuf and encryption Simple protocol
pay complex There is no
The log complex There is no
performance higher There is no
User distribution User ids are distributed on different servers and need to be unified after hash. User ids are continuous, and many optimizations make the code simple and very efficient
Security control complex There is no
Hot update and version control complex There is no
monitoring detailed simple

Refers:

  • The practice of single machine millions
  • Github.com/xiaojiaqi/C…
  • How do I stress test a million users on AWS
  • Github.com/xiaojiaqi/f…
  • Build your own wechat-like system
  • Github.com/xiaojiaqi/f…
  • Djt.qq.com/article/vie…
  • Techblog.cloudperf.net/2016/05/2-m…
  • Datacratic.com/site/blog/1…
  • @Hottin notes
  • Huoding.com/2013/10/30/…
  • Gobyexample.com/non-blockin…