1 Simulate a single machine connection bottleneck

As we know, usually start a server will bind a port, such as port 8000, of course, the client connection port is limited, excluding the maximum port 65535 and the default port 1024 and below the port, only 1024 ~65 535, deducting some common ports, the actual available port is only about 60,000. So, how do we achieve single-machine million connections? Assuming that the 100 ports [8000,8100] are started on the server, 100× 60,000 can achieve about 6 million connections. This is a basic knowledge of TCP. Although the port number is the same for the client, it is different for the server. In other words, it is determined by the source IP address, source port number, destination IP address, and destination port number. When the source IP address and source port number are the same, but the destination port number is different, the underlying system will treat them as two TCP connections. This is the preparation for a single Megabyte connection, as shown below.

Ports 1024 and below can only be reserved for ROOT, and the client port range is 1025 65535. The following code is used to implement the simulation scenario of single machine million connections. First look at the server class, loop open the 100 listening ports [8000 8100] and wait for the client to connect.

package com.tom.netty.connection; import io.netty.bootstrap.ServerBootstrap; import io.netty.channel.ChannelFuture; import io.netty.channel.ChannelFutureListener; import io.netty.channel.ChannelOption; import io.netty.channel.EventLoopGroup; import io.netty.channel.nio.NioEventLoopGroup; import io.netty.channel.socket.nio.NioServerSocketChannel; /** * @author Tom */ public final class Server { public static final int BEGIN_PORT = 8000; public static final int N_PORT = 8100; public static void main(String[] args) { new Server().start(Server.BEGIN_PORT, Server.N_PORT); } public void start(int beginPort, int nPort) {system.out.println ("... ); EventLoopGroup bossGroup = new NioEventLoopGroup(); EventLoopGroup workerGroup = new NioEventLoopGroup(); ServerBootstrap bootstrap = new ServerBootstrap(); bootstrap.group(bossGroup, workerGroup); bootstrap.channel(NioServerSocketChannel.class); bootstrap.childOption(ChannelOption.SO_REUSEADDR, true); bootstrap.childHandler(new ConnectionCountHandler()); for (int i = 0; i <= (nPort - beginPort); i++) { final int port = beginPort + i; bootstrap.bind(port).addListener(new ChannelFutureListener() { public void operationComplete(ChannelFuture ChannelFuture) throws Exception {system.out. println(" Successfully bind listener port: "+ port); }}); } system.out. println(" Server started!" ); }}Copy the code

The ConnectionCountHandler class is used to count the number of requests per unit of time. The number of requests per unit of time increases with each connection.

package com.tom.netty.connection; import io.netty.channel.ChannelHandler; import io.netty.channel.ChannelHandlerContext; import io.netty.channel.ChannelInboundHandlerAdapter; import java.util.concurrent.Executors; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicInteger; /** * Created by Tom. */ @ChannelHandler.Sharable public class ConnectionCountHandler extends ChannelInboundHandlerAdapter { private AtomicInteger nConnection = new AtomicInteger(); public ConnectionCountHandler() { Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(new Runnable() { Public void run() {system.out.println (" Number of current client connections: "+ nConnection.get()); } },0, 2, TimeUnit.SECONDS); } @Override public void channelActive(ChannelHandlerContext ctx) { nConnection.incrementAndGet(); } @Override public void channelInactive(ChannelHandlerContext ctx) { nConnection.decrementAndGet(); }}Copy the code

Looking at the client class code, the main function is to loop requests to the 100 ports opened by the server until the server does not respond and the thread hangs.

package com.tom.netty.connection; import io.netty.bootstrap.Bootstrap; import io.netty.channel.*; import io.netty.channel.nio.NioEventLoopGroup; import io.netty.channel.socket.SocketChannel; import io.netty.channel.socket.nio.NioSocketChannel; /** * Created by Tom. */ public Class Client {private static final String SERVER_HOST = "127.0.0.1"; public static void main(String[] args) { new Client().start(Server.BEGIN_PORT, Server.N_PORT); } public void start(final int beginPort, int nPort) {system.out.println (" The client has started... ); EventLoopGroup eventLoopGroup = new NioEventLoopGroup(); final Bootstrap bootstrap = new Bootstrap(); bootstrap.group(eventLoopGroup); bootstrap.channel(NioSocketChannel.class); bootstrap.option(ChannelOption.SO_REUSEADDR, true); bootstrap.handler(new ChannelInitializer<SocketChannel>() { @Override protected void initChannel(SocketChannel ch) { } }); int index = 0; int port; while (! Thread.interrupted()) { port = beginPort + index; try { ChannelFuture channelFuture = bootstrap.connect(SERVER_HOST, port); channelFuture.addListener(new ChannelFutureListener() { public void operationComplete(ChannelFuture future) throws Exception { if (! Future.issuccess ()) {system.out.println (" Connection failed, program closed! ); System.exit(0); }}}); channelFuture.get(); } catch (Exception e) { } if (port == nPort) { index = 0; }else { index ++; }}}}Copy the code

Finally, package the server application and publish it to a Linux server, and package the client application and publish it to another Linux server. Next, start the server and client programs respectively. After running for a while, you see that the number of connections that the server is listening to stops at a value, as shown below.

Number of current client connections: 870 Number of current client connections: 870 Number of current client connections: 870 Number of current client connections: 870 Number of current client connections: 870 Number of current client connections: 870 Number of current client connections: 870 870 Number of current client connections: 870...Copy the code

And throws the following exception.

Exception in thread "nioEventLoopGroup-2-1" java.lang.InternalError: java.io.FileNotFoundException: The/usr/Java/jdk1.8.0 _121 / jre/lib/ext/cldrdata jar that many open files) at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:1040) at sun.misc.URLClassPath.getResource(URLClassPath.java:239) at java.net.URLClassLoader$1.run(URLClassLoader.java:365) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:411) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:503) at java.util.ResourceBundle$Control.newBundle(ResourceBundle.java:2640) at java.util.ResourceBundle.loadBundle(ResourceBundle.java:1501) at java.util.ResourceBundle.findBundle(ResourceBundle.java:1465) at java.util.ResourceBundle.findBundle(ResourceBundle.java:1419) at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1361) at java.util.ResourceBundle.getBundle(ResourceBundle.java:845) at java.util.logging.Level.computeLocalizedLevelName(Level.java:265) at java.util.logging.Level.getLocalizedLevelName(Level.java:324) at java.util.logging.SimpleFormatter.format(SimpleFormatter.java:165) at java.util.logging.StreamHandler.publish(StreamHandler.java:211) at java.util.logging.ConsoleHandler.publish(ConsoleHandler.java:116) at java.util.logging.Logger.log(Logger.java:738) at io.netty.util.internal.logging.JdkLogger.log(JdkLogger.java:606) at io.netty.util.internal.logging.JdkLogger.warn(JdkLogger.java:482) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run (SingleThreadEventExecutor.java:876) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run (DefaultThreadFactory.java:144) at java.lang.Thread.run(Thread.java:745)Copy the code

At this point, it should be noted that the server has reached the maximum number of client connections it can accept, i.e. 870 connections the server can support. The next thing to do is figure out how to break through this bottleneck and get a single server to support a million connections, which is really exciting.

2 single machine million connection tuning solutions

2.1 Break local file handle limit

Start by typing the command on the server to see the maximum number of handles a single process can support.

​
ulimit -n
​
Copy the code

After you enter the command, a number of 1 024 is displayed, indicating the maximum number of files that can be opened by a process in Linux. The maximum number of files is restricted by the creation of a file in Linux when a TCP connection is enabled. So why did the number of server connections shown earlier settle at 870, which is less than 1,024? In fact, this is because in addition to the number of connections, the Class Class that the JVM opens also counts as in-process open files, so 1 024 minus the number of files that the JVM opens is the number of connections TCP can support. Next to find a way to break through the limit, the first command line, enter the following command, the server open/etc/security/limits file.

​
sudo vi /etc/security/limits.conf
​
Copy the code

Then add the following two lines to the end of the file.

​
* hard nofile 1000000
* soft nofile 1000000
​
Copy the code

The * indicates the current user, hard indicates the limit, and soft indicates the warning limit, respectively. Nofile indicates the maximum number of files. The following number 1 000 000 indicates that any user can open 1 million files, which is the maximum number supported by the operating system, as shown in the following figure.

Next, type the following command.

​
ulimit -n
​
Copy the code

At this point, we find that it is still 1024, unchanged, and restart the server. Re-run the server program and the client program separately and watch the connection count silently until it stops at 137 920 and throws an exception, as shown below.

Current client connections: 137920 Number of current client connections: 137920 Number of current client connections: 137920 Number of current client connections: 137920 137920 Exception in thread "nioEventLoopGroup-2-1" java.lang.InternalError: java.io.FileNotFoundException: The/usr/Java/jdk1.8.0 _121 / jre/lib/ext/cldrdata jar that many open files)...Copy the code

Why is that? There must be a limit to the number of connections, and to break that limit, you need to break the limit on the number of global file handles.

2.2 Breaking the global file handle limit

Run the following command on the Linux cli to view the number of files that can be opened by all user processes in Linux.

​
cat /proc/sys/fs/file-max
​
Copy the code

Using the command above, you can see the global limit and see that the result is 10 000. As you can imagine, the number of local file handles cannot be greater than the number of global file handles. Therefore, the global limit on the number of file handles must be raised to exceed this limit. Switch to the ROOT user first. Otherwise, the user has no permission.

​
sudo  -s
echo 2000> /proc/sys/fs/file-max
exit
​
Copy the code

Let’s make it 20,000 and test it out. Keep trying. Start the server and client programs respectively and find that the number of connections has exceeded the limit of 20 000.

If you run the echo command to configure /proc/sys/fs/file-max, you can restart the server and change the value to 10 000. Therefore, run the vi command to change the value.

​
sodu vi /etc/sysctl.conf
​
Copy the code

Add the following to the end of the /etc/sysctl.conf file.

​
fs.file-max=1000000
​
Copy the code

The result is shown below.

Next, restart the Linux server and start the server and client programs.

Number of current client connections: 9812451 Number of current client connections: 9812462 Number of current client connections: 9812489 Number of current client connections: 9812501 Number of current client connections: 9812503...Copy the code

The final number of connections is about 980,000. We found that this was largely limited by the performance of the machine itself. Using the htop command, the CPU is close to 100%, as shown in the figure below.

The above is about the tuning and performance improvement at the operating system level. The following is about the tuning at the Netty application level.

3 Netty application-level performance tuning

3.1 Reoccurrence of Performance Bottlenecks at the Netty Application Level

To start with the application scenario, here is a piece of standard server-side application code.

package com.tom.netty.thread; import io.netty.bootstrap.ServerBootstrap; import io.netty.channel.*; import io.netty.channel.nio.NioEventLoopGroup; import io.netty.channel.socket.SocketChannel; import io.netty.channel.socket.nio.NioServerSocketChannel; import io.netty.handler.codec.FixedLengthFrameDecoder; /** * Created by Tom. */ public class Server { private static final int port = 8000; public static void main(String[] args) { EventLoopGroup bossGroup = new NioEventLoopGroup(); EventLoopGroup workerGroup = new NioEventLoopGroup(); final EventLoopGroup businessGroup = new NioEventLoopGroup(1000); ServerBootstrap bootstrap = new ServerBootstrap(); bootstrap.group(bossGroup, workerGroup) .channel(NioServerSocketChannel.class) .childOption(ChannelOption.SO_REUSEADDR, true); bootstrap.childHandler(new ChannelInitializer<SocketChannel>() { @Override protected void initChannel(SocketChannel ch) Ch.pipeline ().addLast(new FixedLengthFrameDecoder(long.bytes)); ch.pipeline().addLast(businessGroup, ServerHandler.INSTANCE); }}); ChannelFuture channelFuture = bootstrap.bind(port).addListener(new ChannelFutureListener() { public void OperationComplete (ChannelFuture ChannelFuture) throws Exception {system.out. println(" The server is successfully started and the bound port is: "+ port); }}); }}Copy the code

We will focus on the ServerHandler class for logical processing on the server side.

package com.tom.netty.thread; import io.netty.buffer.ByteBuf; import io.netty.buffer.Unpooled; import io.netty.channel.ChannelHandler; import io.netty.channel.ChannelHandlerContext; import io.netty.channel.SimpleChannelInboundHandler; import java.util.concurrent.ThreadLocalRandom; /** * Created by Tom. */ @ChannelHandler.Sharable public class ServerHandler extends SimpleChannelInboundHandler<ByteBuf> { public static final ChannelHandler INSTANCE = new ServerHandler(); @override protected void channelRead0(ChannelHandlerContext CTX, ByteBuf msg) { ByteBuf data = Unpooled.directBuffer(); // Read a timestamp data.writeBytes(MSG) from the client; Object result = getResult(data); // Write back to the client ctx.channel().writeAndFlush(result); } / / simulation results to database access to a protected Object getResult (ByteBuf data) {int level = ThreadLocalRandom. The current () nextInt (1, 1000); //90.0% == 1ms 1000 100 > 1ms int time; if (level <= 900) { time = 1; //95.0% == 10ms 1000 50 > 10ms} else if (level <= 950) {time = 10; //99.0% == 100ms 1000 10 > 100ms} else if (level <= 990) {time = 100; //99.9% == 1000ms 1000 1 > 1000ms} else {time = 1000; } try { Thread.sleep(time); } catch (InterruptedException e) { } return data; }}Copy the code

There is a getResult() method in the code above. The getResult() method can be thought of as a way to query data in a database, returning the results of each query to the client. In fact, to simulate query data performance, getResult() takes the timestamp passed by the client and ultimately returns the value passed by the client. It just does a random thread sleep before returning to simulate real business processing performance. The following table shows the performance parameters of the simulated scenario.

Let’s look at the client side, which is also a standard piece of code.

package com.tom.netty.thread; import io.netty.bootstrap.Bootstrap; import io.netty.channel.ChannelInitializer; import io.netty.channel.ChannelOption; import io.netty.channel.EventLoopGroup; import io.netty.channel.nio.NioEventLoopGroup; import io.netty.channel.socket.SocketChannel; import io.netty.channel.socket.nio.NioSocketChannel; import io.netty.handler.codec.FixedLengthFrameDecoder; /** * Created by Tom. */ public Class Client {private static final String SERVER_HOST = "127.0.0.1"; public static void main(String[] args) throws Exception { new Client().start(8000); } public void start(int port) throws Exception { EventLoopGroup eventLoopGroup = new NioEventLoopGroup(); final Bootstrap bootstrap = new Bootstrap(); bootstrap.group(eventLoopGroup) .channel(NioSocketChannel.class) .option(ChannelOption.SO_REUSEADDR, true) .handler(new ChannelInitializer<SocketChannel>() { @Override protected void initChannel(SocketChannel ch) { ch.pipeline().addLast(new FixedLengthFrameDecoder(Long.BYTES)); ch.pipeline().addLast(ClientHandler.INSTANCE); }}); For (int I = 0; i < 1000; i++) { bootstrap.connect(SERVER_HOST, port).get(); }}}Copy the code

As you can see from the code above, the client makes 1,000 requests to the server. Focus on the client logic handling the ClientHandler class.

package com.tom.netty.thread; import io.netty.buffer.ByteBuf; import io.netty.channel.ChannelHandler; import io.netty.channel.ChannelHandlerContext; import io.netty.channel.SimpleChannelInboundHandler; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.atomic.AtomicLong; /** * Created by Tom. */ @ChannelHandler.Sharable public class ClientHandler extends SimpleChannelInboundHandler<ByteBuf> { public static final ChannelHandler INSTANCE = new ClientHandler(); private static AtomicLong beginTime = new AtomicLong(0); Private static AtomicLong totalResponseTime = new AtomicLong(0); Private static AtomicInteger totalRequest = new AtomicInteger(0); public static final Thread THREAD = new Thread(){ @Override public void run() { try { while (true) { long duration = System.currentTimeMillis() - beginTime.get(); if (duration ! Println ("QPS: "+ 1000 * totalRequest.get()/duration + "," +" " + ((float) totalResponseTime.get()) / totalRequest.get() + "ms."); Thread.sleep(2000); } } } catch (InterruptedException ignored) { } } }; @Override public void channelActive(final ChannelHandlerContext ctx) { ctx.executor().scheduleAtFixedRate(new Runnable()  { public void run() { ByteBuf byteBuf = ctx.alloc().ioBuffer(); Bytebuf.writelong (system.currentTimemillis ()); // Send the current System time to the server bytebuf.writelong (system.currentTimemillis ()); ctx.channel().writeAndFlush(byteBuf); } }, 0, 1, TimeUnit.SECONDS); } Override protected void channelRead0(ChannelHandlerContext CTX, ByteBuf MSG) {Override protected void channelRead0(ChannelHandlerContext CTX, ByteBuf MSG) The response time of the request totalResponseTime. AddAndGet (System. CurrentTimeMillis () - MSG. ReadLong ()); / / every time since the increasing totalRequest incrementAndGet (); if (beginTime.compareAndSet(0, System.currentTimeMillis())) { THREAD.start(); }}}Copy the code

The above code mainly simulates the processing time of Netty’s real business environment. QPS is about 1 000 times, and statistics are performed every 2s. Next, start the server and client to view the console logs. First run the server and see the console log as shown below.

Then run the client and see the console log as shown in the figure below. After a period of time, it is found that the QPS remain within 1 000 and the average response time is getting longer and longer.

Going back to the getResul() method of the server ServerHander, there is thread sleep blocking in the getResult() method, which, as you can see, ends up blocking the main thread, causing all requests to be squeezed into one thread. If you put the following code into a thread pool, the effect will be completely different.

​
Object result =getResult(data);
ctx.channel().wrteAndFlush(result);
​
Copy the code

These two lines of code are put into a business thread pool and run continuously in the background, returning results as soon as they are finished.

3.2 Netty Application-level Performance Tuning Scheme

Let’s modify the code by creating a new ServerThreadPoolHander class in the server code.

package com.tom.netty.thread; import io.netty.buffer.ByteBuf; import io.netty.buffer.Unpooled; import io.netty.channel.ChannelHandler; import io.netty.channel.ChannelHandlerContext; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; /** * Created by Tom. */ @ChannelHandler.Sharable public class ServerThreadPoolHandler extends ServerHandler { public static final ChannelHandler INSTANCE = new ServerThreadPoolHandler(); private static ExecutorService threadPool = Executors.newFixedThreadPool(1000); @Override protected void channelRead0(final ChannelHandlerContext ctx, ByteBuf msg) { final ByteBuf data = Unpooled.directBuffer(); data.writeBytes(msg); threadPool.submit(new Runnable() { public void run() { Object result = getResult(data); ctx.channel().writeAndFlush(result); }}); }}Copy the code

The Handler Handler on the server side is then registered as ServerThreadPoolHander, and the original ServerHandler is removed as follows.

​
ch.pipeline().addLast(ServerThreadPoolHandler.INSTANCE);
​
Copy the code

Then, start the server and client programs to view the console logs, as shown in the figure below.

The final time was stable at about 15ms, and QPS exceeded 1000 times. In fact, this is not the optimal state, so keep adjusting. Change the number of threads for ServerThreadPoolHander to 20 as shown below.

Then start the program and find that the average response time is not much different, as shown in the figure below.

The conclusion is that the specific number of threads needs to be constantly adjusted and tested in the real environment to determine the most appropriate number. The purpose of this chapter is to show the method of optimization, not the result.

This article is shared by Teacher Tom. If this article is helpful to you, welcome to pay attention and like!