In the Java world, Netty is the undisputed master of web application development. You don’t need to pay too much attention to the complex NIO model and the details of the underlying network. With its rich interfaces, you can easily implement complex communication functions.

Compared to Golang’s network modules, Netty is still too bloated. But Java class frameworks are just that, one of those coding languages that can’t live without an IDE.

The latest version of Netty is very detailed module, if you do not know the contents of each module, directly use netty-all.

Purely from the use of netty is very simple, master ByteBuf, Channel, Pipeline, Event model and so on, can be developed. You’ll find there’s no need to talk about netty. But Netty is very different from other development patterns, most notably its asynchrony. Asynchronization results in different programming models, debugging difficulties, and higher coding requirements because the cost of bugs is not the same as the cost of bugs in business code.

However, in terms of the project, everything from the business layer to the service gateway, as well as various technical support, including monitoring and configuration, need to be considered. Netty itself is very small.

This article explains the common concerns of netty development, and then includes a Linux configuration that supports 100W connections on a standalone server. This article does not focus on the basics of Netty.

Protocol development

The most important thing in network development is its communication format and protocol. Common protobuf, JSON, Avro, MQTT, etc., are in this column. The protocol has three elements: syntax, semantics and timing.

I’ve seen a lot of middleware applications that use the Redis protocol while the back-end implementation is mysql. We have also seen more customized storage systems implemented by mysql protocol, such as proxy side sub-library sub-table middleware, TIDB, etc.

Redis, which we often use, is a text protocol; Mysql and others implement binary protocols. The same goes for Netty, which implements a coDEC (inherited from the Decoder or Encoder series). Netty implements DNS, HAProxy, HTTP, HTTP2, memcache, MQTT, Redis, SMTP, SOCKS, STOMp, XML and other protocols by default.

A possible product structure would look like this, providing a consistent external look but different core storage:

The text protocol is intuitive and easy to debug, but the security is not good. However, binary protocols need to be analyzed using other methods, such as logs and Wireshark, which increases the difficulty of development. Legendary sticky package unpacking, here to deal with. The main reason for sticky packets is the involvement of buffer, so it is necessary to agree the transmission profile information of both parties. Netty solves this problem to a certain extent.

Every student who wants to develop a network application has a dream seed in mind to redesign the protocol. But the protocol design can be said to be very difficult, to dig into the corresponding business, but also consider its scalability. Existing protocols are recommended if not particularly necessary.

Connection management function

To do Netty development, the connection management function is very important. Communication quality, system status, and some hacks are dependent on connection management.

When netty creates a connection, either as a server or client, it gets an object called a Channel. All we have to do is manage it, which I’m used to calling ConnectionManager.

The management class will cache some memory objects, which will be used to count the running data. For example, connection-oriented functions: the number of packets sent and received; Packet sending and receiving rate; Error count; Connection reconnection times; Call delay; Connection status. This makes frequent use of the concurrent package’s related classes in Java, and is often a bug-intensive place.

But we need more, and the management class will give more functionality to each connection. For example, if you want to warm up some functionality after a connection is created, these states can be involved in routing decisions. In general, attaching user or other meta information to the connection is a very important function to be able to multi-dimensionally filter some connections according to conditions and perform batch operations, such as gray scale and overload protection.

The admin background can see information about each connection, filter one or more connections, and enable traffic recording, information monitoring, and breakpoint debugging for those connections. You can feel in control.

The management function can view the entire system running status and adjust load balancing policies in a timely manner. It also provides data basis for capacity expansion and reduction.

The heartbeat detection

Application layer heartbeat is required, which is a completely different concept from TCP Keepalive.

The heartbeat of the application layer protocol layer detects the viability of both sides of the connection and the quality of the connection, while keepalive detects the viability of the connection itself. And the default timeout time of the latter is too long, completely unable to adapt to modern network environment.

Heartbeat depends on rotation training, whether it is the server, or the client such as GCM, etc. The survival mechanism switches dynamically between different application scenarios, such as program arousal and background rotation strategies.

Netty provides convenient heartbeat control by adding IdleStateHandler to generate IDLE events. You have to deal with the logic of a heartbeat timeout, such as delayed reconnection. However, its rotation training time is fixed and cannot be dynamically modified. Advanced functions need to be customized.

On some clients, such as Android, where frequent heartbeat calls waste a lot of network and battery power, the heartbeat strategy is more complex.

The border

Graceful exit mechanism

Elegant Java downtime is usually achieved by registering JDK ShutdownHook.

Runtime.getRuntime().addShutdownHook();Copy the code

Kill -15 is used to shut down Java processes so that some cleanup can be done before the process dies.

Note: Kill -9 will kill the process immediately, without giving the last word a chance, which is dangerous.

Although netty does a lot of graceful exit work, setting nio’s state via the EventLoopGroup’s shutdownGracefully method, in many cases this is not enough. It is only responsible for elegant shutdown of standalone environments.

Traffic may also continue to enter through the outer routes, resulting in invalid requests. My usual practice is to first do a local instance cull on the outer route, truncating the traffic, and then do a graceful shutdown of netty itself. This design is simple enough to work well without a retry mechanism, as long as the relevant interfaces need to be exposed in advance at the routing layer.

Exception handling function

Netty is extremely important in exception handling because of its asynchronous development style and its event mechanism. To ensure a reliable connection, many exceptions need to be quietly ignored or not sensed in user mode.

Netty exceptions are propagated through pipelines, so it is possible to handle them in either layer, but programming conventions tend to dump them in the outermost layer.

To maximize the distinction between exception information, a large number of exception classes are usually defined, and different errors can throw different exceptions. After an exception occurs, disconnection and reconnection can be selected according to different types (such as some binary protocol codec disorders), or scheduling to other nodes.

Function limit

Instruction mode

Web apps should do web apps. Any communication is expensive. In “Linux cast Away” (5) network, we talked about millions of connected servers, broadcast a 1KB message, it takes 1000M bandwidth, so not everything can be put in the network application.

The logical idea for a large network application is to send relevant instructions. After receiving the command, the client will fetch large files through other means, such as HTTP. Many IMs are designed that way.

The command mode also ensures the scalability and stability of the communication system. Add-ons can be configurational and take effect immediately without requiring an encoded restart of the server.

Stability guarantee

Network applications generally have heavy traffic and are not suitable for enabling full logs. Applications should only focus on logs of major events, and pay attention to the process of handling exceptions. Logs should be printed in a moderate manner.

Web applications are also not suitable for calling other slow apis, or any interface that blocks I/O. Some real-time events should also not spit out data through the call interface and can go through other asynchronous channels such as high-speed MQ.

Caching is probably the most used component in web applications. The JVM cache can store some statistics of a single machine, redis and other stores some global statistics and intermediate data.

Redis, KV and HIGH-throughput MQ are widely used in network applications to quickly respond to user requests. In short, try to keep your communications layer clean and you’ll save yourself a lot of worry.

Supports Linux configuration with 1 million connections on a single machine

Stand-alone support for 1 million connections is feasible, but bandwidth issues can be a significant bottleneck. Binary protocols that enable compression save some bandwidth, but are more difficult to develop.

As with the ES configuration mentioned in “LWP Process Resources Run out, Resource Temporarily Unavailable”, optimizations follow a similar thread. This configuration, can save you a few days of time, please accept!

Operating system Optimization

Change the maximum number of file handles for a process

ulimit -n 1048576Copy the code

Change the maximum number of files that can be allocated to a process

echo 2097152 > /proc/sys/fs/nr_openCopy the code

Modify/etc/security/limits file

*   soft nofile  1048576*   hard nofile 1048576*   soft nproc unlimitedroot soft nproc unlimitedCopy the code

Remember to clean up /etc/security/limits.d/*

Network optimization

Open /etc/sysctl.conf, add the configuration and execute it to take effect using sysctl

Fs. File-max = 1048576#backlog Set net.core.somaxconn=32768net.ipv4.tcp_max_syn_backlog=16384net.core.netdev_max_backlog=16384# net.ipv4.ip_loca can be configured with a well-known port range L_port_range ='1000 65535'#TCP Socket reads and writes Buffer Set the.net. Core. Rmem_default = 262144 net. Core. Wmem_default = 262144 net. Core. Rmem_max = 16777216 net. Core. Wmem_max = 16777216 net. The core. O ptmem_max=16777216net.ipv4.tcp_rmem='1024 4096 16777216'net.ipv4.tcp_wmem='1024 4096 16777216'#TCP Connection tracking Settings net.nf_conntrack_max=1000000net.netfilter.nf_conntrack_max=1000000net.netfilter.nf_conntrack_tcp_timeout_time_wait Net.ipv4. tcp_max_tw_buckets=1048576# fin-WAI-2 Socket timeout net.ipv4.tcp_fin_timeout = 15Copy the code

conclusion

The development of NetTY is not focused on Netty itself, but on ensuring high reliability and stability of services. There is also a lot of focus on monitoring and debugging to reduce the cost of bug fixes.

An in-depth understanding of Netty is the ability to dig deep to troubleshoot difficult system problems or improve demanding performance. But for the vast majority of app developers, Netty’s start-up costs are small, and digging for the bottom doesn’t generate much revenue.

It’s just a tool. What else can you do with it?