Directory [-]

  1. Tuning server parameters
    1. TCP/IP parameter Settings
    2. Maximum file descriptor
    3. Application runtime tuning
    4. OutOfMemory Killer
  2. Tuning client parameters
  3. Server testing
    1. Netty server
    2. The Spray server
    3. Undertow
    4. node.js
  4. Reference documentation

In fact, I’ve recently added several frameworks, now including Netty, Undertow, Jetty, Spray, vert. x, Grizzly, and Node.js. The test data can be found in the next article: Performance comparisons of seven WebSocket frameworks

The famous C10K problem came up in 2001. This article is one of the defining documents of high-performance server development, addressing the issue of serving 10,000 connections on a single machine, which was a very challenging goal at the time due to hardware and software limitations. But as time flies, with the rapid development of hardware and software, the single target of 10,000 has become the simplest thing. You can now use any of the major languages to provide up to 10,000 concurrent processing capabilities for a single machine. So now the goal has been raised 100 times, to C1000k, which is one server serving one million connections. In 2010,2011 has seen some implementation of C1000K articles, so in 2015, C1000K implementation should not be a difficult matter.

This paper is my record in the practice process. My goal is to use spran- WebSocket, Netty, Undertow and Node.js framework respectively to achieve the C1000K server, and see how difficult these frameworks are to achieve and how the performance is. The development languages are Scala and Javascript.

Of course, when talking about performance, we also have to talk about how many requests per connection per second, the number of RPS, and consider the size of each message. In general, we take a percentage, say 20% of connections per second that send or receive messages. My requirement is that the server just push and the client doesn’t actively send messages. One message is sent to a million people every minute. So the test tool implemented establishes 60000 websocket connections per client, a total of 20 clients. It was not really possible to use 20 machines, I used two AWS C3.2 Xlarge (8-core 16G) servers as client machines. Ten clients per machine. The server sends a mass message every 1 minute. The message content is simple, just the server’s time of the day.

Recently, I saw the message push system implemented by 360 with Go. The following is their data:

Current 360 news delivery system in the service of 50 + internal product, thousands of paragraph development platform App, real-time connect hundreds of millions of orders of magnitude, long day alone billions of scale, one minute can realize scale radio, peak day issued billions level, physical machine 400 units, more than 3000 instances distribution in nine separate clusters, each cluster multinational nearly 10 IDC inside and outside.

The code for the four servers and the Client test tool can be downloaded on Github. (There are actually more than four frameworks, including implementations of Netty, Undertow, Jetty, Spray-Websocket, vert. x, Grizzly and Node.js.)

Each server can easily reach 1.2 million simultaneous WebSocket active connections, but there are differences in resource usage and transaction time. 1.2 million is just a conservative figure, the server is still very easy with so many connections. Next, I will test C2000K.

We need to tune some server/client parameters before testing.

Tuning server parameters

Tend to modify the two files, / etc/sysctl. Conf and/etc/security/limits the conf, used to configure the TCP/IP parameters and maximum file descriptors.

TCP/IP parameter Settings

Modify the /etc/sysctl.conf file and set network parameters.

net.ipv4.tcp_wmem = 4096 87380 4161536
net.ipv4.tcp_rmem = 4096 87380 4161536
net.ipv4.tcp_mem = 786432 2097152 3145728Copy the code

The values are adjusted as required. More parameters can be found in a previous article: Tuning the Linux TCP/IP protocol stack. The /sbin/sysctl -p command takes effect immediately.

Maximum file descriptor

The Linux kernel itself has a maximum limit on file descriptors, which you can change as needed:

  • Maximum number of open file descriptors: /proc/sys/fs/file-max
    1. Temporary Settings:echo 1000000 > /proc/sys/fs/file-max
    2. Permanent Settings: Modify/etc/sysctl.confFile, addfs.file-max = 1000000
  • Maximum number of open file descriptors for a process

    useulimit -nView current Settings. useulimit -n 1000000Make temporary Settings.

    To make it permanent, you can change it/etc/security/limits.confFile, add the following line:
*         hard    nofile      1000000
*         soft    nofile      1000000
root      hard    nofile      1000000
root      soft    nofile      1000000Copy the code

Note that the hard limit cannot be greater than /proc/sys/fs/nr_open, so sometimes you need to change the value of nr_open. Run echo 2000000 > /proc/sys/fs/nr_open

To view the number of open file descriptors in use, run the following command:

[root@localhost ~]# cat /proc/sys/fs/file-nr             
1632    0       1513506Copy the code

The first value is the number of open file descriptors allocated and used by the current system, the second value is the number of open file descriptors released after allocation (no longer used at present), and the third value is file-max.

To sum up:

  • The number of open file descriptors for all processes cannot exceed /proc/sys/fs/file-max
  • The number of file descriptors opened by a single process cannot exceed the soft limit of nofile in user limit
  • The soft limit of nofile cannot exceed its hard limit
  • The hard limit of nofile cannot exceed /proc/sys/fs/nr_open

Application runtime tuning

  1. Java application memory tuning server using 12G memory, throughput first garbage collector:
JAVA_OPTS="-Xms12G -Xmx12G -Xss1M -XX:+UseParallelGC"Copy the code
  1. V8 engine
node --nouse-idle-notification --expose-gc --max-new-space-size=1024 --max-new-space-size=2048 --max-old-space-size=8192  ./webserver.jsCopy the code

OutOfMemory Killer

If the server itself does not have much memory, such as 8GB, you may have “Killed” server processes with less than 1 million connections. You can see this by running dmesg

Out of memory: Kill process 10375 (java) score 59 or sacrifice childCopy the code

This is OOM Killer for Linux. If oom-killer is enabled, there will be 3 more files in /proc/pid for each process related to the OOM rating adjustment. Echo -17 > /proc/$(pidof Java)/oom_adj echo -17 > /proc/$(pidof Java)/oom_adj

Tuning client parameters

On a system, the number of local ports to connect to a remote service is limited. According to TCP/IP protocol, the port is a 16-bit integer, which can only be 0 to 65535, and 0 to 1023 are reserved ports, so only 1024 to 65534, that is, 64511 ports can be allocated. In other words, a machine can only create more than 60,000 long connections with one IP. To achieve more client connections, you can use more machines or network cards, or you can use virtual IP addresses. For example, the following command adds 19 IP addresses, one for the server and the other 18 for the client, resulting in 18 * 60000 = 1080000 connections.

Ifconfig eth0:0 192.168.77.10 netmask 255.255.255.0 UP ifconfig eth0:1 192.168.77.11 netmask 255.255.255.0 UP ifconfig Eth0:2 192.168.77.12 netmask 255.255.255.0 UP ifconfig eth0:3 192.168.77.13 netmask 255.255.255.0 up ifconfig eth0:4 192.168.77.14 netmask 255.255.255.0 UP ifconfig eth0:5 192.168.77.15 netmask 255.255.255.0 up ifconfig eth0:6 192.168.77.16 netmask 255.255.255.0 UP ifconfig eth0:7 192.168.77.17 netmask 255.255.255.0 up ifconfig eth0:8 192.168.77.18 netmask 255.255.255.0 UP ifconfig eth0:9 192.168.77.19 netmask 255.255.255.0 up ifconfig eth0:10 192.168.77.20 netmask 255.255.255.0 UP ifconfig eth0:11 192.168.77.21 netmask 255.255.255.0 up ifconfig eth0:12 192.168.77.22 netmask 255.255.255.0 UP ifconfig eth0:13 192.168.77.23 netmask 255.255.255.0 up ifconfig eth0:14 192.168.77.24 netmask 255.255.255.0 UP ifconfig eth0:15 192.168.77.25 netmask 255.255.255.0 up ifconfig eth0:16 192.168.77.26 netmask 255.255.255.0 UP ifconfig eth0:17 192.168.77.27 netmask 255.255.255.0 up ifconfig eth0:18 192.168.77.28 netmask 255.255.255.0 upCopy the code

Modify /etc/sysctl.conf file:

net.ipv4.ip_local_port_range = 1024 65535Copy the code

The /sbin/sysctl -p command takes effect immediately.

Server testing

In the actual test, I used one AWS C3.4 Xlarge (16 cores, 32GB memory) as the application server and two AWS C3.2 Xlarge (8 cores, 16GB memory) servers as the clients. These two machines were more than enough test clients, each creating ten Intranet virtual IP addresses and 60,000 Websocket connections per IP.

The client configuration is as follows: /etc/sysctl.conf Configuration

fs.file-max = 2000000
fs.nr_open = 2000000
net.ipv4.ip_local_port_range = 1024 65535Copy the code

The/etc/security/limits. Conf configuration

* soft    nofile      2000000
* hard    nofile      2000000
* soft nproc 2000000
* hard nproc 2000000Copy the code

The server configuration is as follows: /etc/sysctl.conf Configuration

fs.file-max = 2000000
fs.nr_open = 2000000
net.ipv4.ip_local_port_range = 1024 65535Copy the code

The/etc/security/limits. Conf configuration

* soft    nofile      2000000
* hard    nofile      2000000
* soft nproc 2000000
* hard nproc 2000000Copy the code

Netty server

  • Set up 1.2 million connections, don’t send messages, easily reach. I have 14 gigabytes of unused memory left.
[roocolobu ~]# ss -s; free -m
Total: 1200231 (kernel 1200245)
TCP:   1200006 (estab 1200002, closed 0, orphaned 0, synrecv 0, timewait 0/0), ports 4
Transport Total     IP        IPv6
*         1200245   -         -        
RAW       0         0         0        
UDP       1         1         0        
TCP       1200006   1200006   0        
INET      1200007   1200007   0        
FRAG      0         0         0        
             total       used       free     shared    buffers     cached
Mem:         30074      15432      14641          0          9        254
-/+ buffers/cache:      15167      14906
Swap:          815          0        815Copy the code
  • Send a message every minute to all 1.2 million Websockets with the current server time. The sending display here is single thread sending, the server sent 1.2 million total time is about 15 seconds.
02:15:43. [307] - thread pool - 1-1 the INFO com.colobu.webtest.net ty. The WebServer $- send MSG to channels for C4453a26 bca6-42 b6 - b29b - 43653767 f9fc 02:15:57. [] - thread pool - 1-1 of 190 INFO com.colobu.webtest.net ty. The WebServer $- sent 1200000 channels for c4453a26-bca6-42b6-b29b-43653767f9fcCopy the code

The CPU usage is low and the network bandwidth usage is about 10 MBIT/s.

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 0 0 100 0 0 0| 0 0 | 60B 540B| 0 0 | 224 440 0 0 100 0 0 0| 0 0 | 60B 870B| 0 0 | 192 382 0 0 100 0 0 0| 0 0 | 59k 74k| 0 0 |2306 2166 2 7 87 0 0 4| 0 0 |4998k 6134k| 0 0 | 169k 140k 1 7 87 0 0 5| 0 0 |4996k 6132k| 0 0 | 174k 140k 1 7 87 0 0 5| 0 0 |4972k 6102k| 0 0 | 176k 140k 1 7 87 0 0 5| 0 0 |5095k 6253k| 0 0 | 178k 142k 2 7 87 0 0 5| 0 0 |5238k 6428k| 0 0 | 179k 144k 1 7 87 0 0 5| 0 24k|4611k 5660k| 0 0 | 166k 129k 1 7 87 0 0 5| 0 0 |5083k 6238k| 0 0 |  175k 142k 1 7 87 0 0 5| 0 0 |5277k 6477k| 0 0 | 179k 146k 1 7 87 0 0 5| 0 0 |5297k 6500k| 0 0 | 179k 146k 1 7 87 0 0 5|  0 0 |5383k 6607k| 0 0 | 180k 148k 1 7 87 0 0 5| 0 0 |5504k 6756k| 0 0 | 184k 152k 1 7 87 0 0 5| 0 48k|5584k 6854k| 0 0 | 183k 152k 1 7 87 0 0 5| 0 0 |5585k 6855k| 0 0 | 183k 153k 1 7 87 0 0 5| 0 0 |5589k 6859k| 0 0 | 184k 153k 1 5 91 0 0 3| 0 0 |4073k 4999k| 0 0 | 135k 110k 0 0 100 0 0 0| 0 32k| 60B 390B| 0 0 |4822 424Copy the code

Client (a total of 20, one of which is selected here to view its indicators). Each client maintains 60,000 connections. The total time for each message to be sent from the server to the client is 633 ms on average, and the standard deviation is very small, and the time per connection is similar.

Active WebSockets for eb810c24-8565-43ea-bc27-9a0b2c910ca4
             count = 60000
WebSocket Errors for eb810c24-8565-43ea-bc27-9a0b2c910ca4
             count = 0
-- Histograms ------------------------------------------------------------------
Message latency for eb810c24-8565-43ea-bc27-9a0b2c910ca4
             count = 693831
               min = 627
               max = 735
              mean = 633.06
            stddev = 9.61
            median = 631.00
              75% <= 633.00
              95% <= 640.00
              98% <= 651.00
              99% <= 670.00
            99.9% <= 735.00
-- Meters ----------------------------------------------------------------------
Message Rate for eb810c24-8565-43ea-bc27-9a0b2c910ca4
             count = 693832
         mean rate = 32991.37 events/minute
     1-minute rate = 60309.26 events/minute
     5-minute rate = 53523.45 events/minute
    15-minute rate = 31926.26 events/minuteCopy the code

The average RPS for each client is 1000, and the total RPS is about 20000 requests /seconds. The average RPS is 633 ms, the longest is 735 ms, and the shortest is 627ms.

The Spray server

  • Set up 1.2 million connections, don’t send messages, easily reach. It has relatively high memory, with 7GB left.
[root@colobu ~]# ss -s; free -m
Total: 1200234 (kernel 1200251)
TCP:   1200006 (estab 1200002, closed 0, orphaned 0, synrecv 0, timewait 0/0), ports 4
Transport Total     IP        IPv6
*         1200251   -         -        
RAW       0         0         0        
UDP       1         1         0        
TCP       1200006   1200006   0        
INET      1200007   1200007   0        
FRAG      0         0         0        
             total       used       free     shared    buffers     cached
Mem:         30074      22371       7703          0         10        259
-/+ buffers/cache:      22100       7973
Swap:          815          0        815Copy the code
  • Send a message every minute to all 1.2 million Websockets with the current server time. High CPU usage, fast transmission, bandwidth up to 46M. It takes about 8 seconds to send a group message.
05/22 04:42:57.569 INFO [ool-2-worker-15] c.c.w.s.WebServer - send msg to workers 。for 8454e7d8-b8ca-4881-912b-6cdf3e6787bf
05/22 04:43:05.279 INFO [ool-2-worker-15] c.c.w.s.WebServer - sent msg to workers for 8454e7d8-b8ca-4881-912b-6cdf3e6787bf. current workers: 1200000Copy the code
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
 74   9  14   0   0   3|   0    24k|6330k   20M|   0     0 |  20k 1696 
 70  23   0   0   0   6|   0    64k|  11M   58M|   0     0 |  18k 2526 
 75  11   6   0   0   7|   0     0 |9362k   66M|   0     0 |  24k   11k
 82   4   8   0   0   6|   0     0 |  11M   35M|   0     0 |  24k   10k
 85   0  14   0   0   1|   0     0 |8334k   12M|   0     0 |  44k  415 
 84   0  15   0   0   1|   0     0 |9109k   16M|   0     0 |  36k  425 
 81   0  19   0   0   0|   0    24k| 919k  858k|   0     0 |  23k  629 
 76   0  23   0   0   0|   0     0 | 151k  185k|   0     0 |  18k 1075Copy the code

Client (a total of 20, one of which is selected here to view its indicators). Each client maintains 60,000 connections. The total time for each message to be sent from the server to the client is 1412 ms on average, and the standard deviation is large and the time for each connection varies greatly.

Active WebSockets for 6674c9d8-24c6-4e77-9fc0-58afabe7436f
             count = 60000
WebSocket Errors for 6674c9d8-24c6-4e77-9fc0-58afabe7436f
             count = 0
-- Histograms ------------------------------------------------------------------
Message latency for 6674c9d8-24c6-4e77-9fc0-58afabe7436f
             count = 454157
               min = 716
               max = 9297
              mean = 1412.77
            stddev = 1102.64
            median = 991.00
              75% <= 1449.00
              95% <= 4136.00
              98% <= 4951.00
              99% <= 5308.00
            99.9% <= 8854.00
-- Meters ----------------------------------------------------------------------
Message Rate for 6674c9d8-24c6-4e77-9fc0-58afabe7436f
             count = 454244
         mean rate = 18821.51 events/minute
     1-minute rate = 67705.18 events/minute
     5-minute rate = 49917.79 events/minute
    15-minute rate = 24355.57 events/minuteCopy the code

Undertow

  • Set up 1.2 million connections, don’t send messages, easily reach. It takes up less memory, with 11GB remaining.
[root@colobu ~]# ss -s; free -m
Total: 1200234 (kernel 1200240)
TCP:   1200006 (estab 1200002, closed 0, orphaned 0, synrecv 0, timewait 0/0), ports 4
Transport Total     IP        IPv6
*         1200240   -         -        
RAW       0         0         0        
UDP       1         1         0        
TCP       1200006   1200006   0        
INET      1200007   1200007   0        
FRAG      0         0         0        
             total       used       free     shared    buffers     cached
Mem:         30074      18497      11576          0         10        286
-/+ buffers/cache:      18200      11873
Swap:          815          0        815Copy the code
  • Send a message every minute to all 1.2 million Websockets with the current server time. Group play takes about 15 seconds.
03:19:31. [154] - thread pool - 1-1 the INFO Arthur c. olobu. Webtest. Undertow. The WebServer $- send MSG to channels for D9b450da - 2631-42 BC - a802-44285 f63a62d 03:19:46. [] - thread pool - 1-1 of 755 INFO Arthur c. olobu. Webtest. Undertow. WebServer $- sent 1200000 channels for d9b450da-2631-42bc-a802-44285f63a62dCopy the code

Client (a total of 20, one of which is selected here to view its indicators). Each client maintains 60,000 connections. The total time for each message to be sent from the server to the client is 672 ms on average, and the standard deviation is small and the time per connection does not vary much.

Active WebSockets for b2e95e8d-b17a-4cfa-94d5-e70832034d4d
             count = 60000
WebSocket Errors for b2e95e8d-b17a-4cfa-94d5-e70832034d4d
             count = 0
-- Histograms ------------------------------------------------------------------
Message latency for b2e95e8d-b17a-4cfa-94d5-e70832034d4d
             count = 460800
               min = 667
               max = 781
              mean = 672.12
            stddev = 5.90
            median = 671.00
              75% <= 672.00
              95% <= 678.00
              98% <= 684.00
              99% <= 690.00
            99.9% <= 776.00
-- Meters ----------------------------------------------------------------------
Message Rate for b2e95e8d-b17a-4cfa-94d5-e70832034d4d
             count = 460813
         mean rate = 27065.85 events/minute
     1-minute rate = 69271.67 events/minute
     5-minute rate = 48641.78 events/minute
    15-minute rate = 24128.67 events/minute
Setup Rate for b2e95e8d-b17a-4cfa-94d5-e70832034d4dCopy the code

node.js

Node.js is not a framework FOR me to consider and is listed here for reference only. Performance is also good.

Active WebSockets for 537c7f0d-e58b-4996-b29e-098fe2682dcf count = 60000 WebSocket Errors for 537c7f0d-e58b-4996-b29e-098fe2682dcf count = 0 -- Histograms ------------------------------------------------------------------ Message latency for 537C7F0D-e58B-4996-b29E-098FE2682dcf count = 180000 min = 808 Max = 847 mean = 812.10STddev = 1.95 median = 812.0075% <= 812.00 95% <= 813.00 98% <= 814.00 99% <= 815.00 99.9% <= 847.00 -- Meters ---------------------------------------------------------------------- Message Rate for 537C7F0D-e58B-4996-b29E-098FE2682dcf count = 180000 mean rate = 7191.98 events/minute 1-minute rate = 10372.33 Events /minute 5-minute rate = 16425.78 events/minute 15-minute rate = 9080.53 events/minuteCopy the code

Reference documentation

  1. HTTP long connection 2 million attempts and tuning
  2. Maximum number of open file descriptors for Linux
  3. The target of 1M concurrent connections was achieved
  4. Zhihu: How to achieve 3 million long connections in a single server?
  5. Build the C1000K server
  6. The secret to multi-millionth concurrency implementation
  7. C1000k New idea: user-mode TCP/IP stack
  8. Github.com/xiaojiaqi/C…
  9. 600k concurrent websocket connections on AWS using Node.js
  10. Plumbr. Eu/blog/memory…
  11. It.deepinmind.com/java/2014/0…
  12. Access.redhat.com/documentati…
  13. www.nateware.com/linux-netwo…
  14. Warmjade. Blogspot. Jp / 2014 _03_22_…
  15. Mp.weixin.qq.com/s?__biz=MjM…