This project code address: github.com/HashZhang/s…

In our project, instead of using the default Tomcat container, we used UnderTow as our container. In fact, the performance difference is not so obvious, but with UnderTow, we can use the direct memory as the buffer of network transmission, reduce the GC of business and optimize the performance of business.

Undertow’s official website: undertow. IO /

But Undertow has some concerns:

  1. The NIO framework is XNIO, with a 3.0 roadmap Announcement that it will migrate from XNIO to Netty starting with version 3.0, see The Undertow 3.0 Announcement. However, it’s been almost two years and 3.0 still hasn’t been released, and the github 3.0 branch hasn’t been updated in over a year. Currently, the 2.x version of Undertow is still being used. I don’t know if 3.0 is currently unnecessary or stillborn. At present, the domestic environment for netty is more widely used and most people are more familiar with Netty, XNIO application is not many. However, XNIO’s design is much the same as Netty’s.
  2. The update of official documents is relatively slow, which may be delayed by 1 or 2 minor versions, resulting in the configuration not being so elegant when Spring Boot glues Undertow. While consulting the official documentation, it’s best to take a look at the source code, or at least the configuration classes, to get a sense of what’s going on.
  3. If you look closely at Undertow’s source code, you’ll see that there are a lot of defensive programming designs or functional designs that Undertow’s authors had in mind but didn’t implement, and a lot of half-baked code that wasn’t implemented. This also makes people worry that Underow is not enough development power, which day will suddenly die?

Use Undertow:

  1. You need to enable NIO DirectBuffer and understand and configure related parameters.
  2. The access.log file should contain the necessary time, call chain and other information, and by default, some of the parameters in the access.log file will not show the information we want to see.

Use Undertow as our Web service container

For Servlet containers, the dependencies are as follows:

<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> <exclusions> <exclusion> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-tomcat</artifactId> </exclusion>  </exclusions> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-undertow</artifactId> </dependency>Copy the code

For Weflux containers, the dependencies are as follows:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-undertow</artifactId>
</dependency>
Copy the code

Undertow basic structure

Undertow is currently (2.x) based on Java XNIO, which is an extension of the JDK NIO class. It has the same basic functionality as Netty, but netty is more like a wrapper around Java NIO. Java XNIO is more like extended packaging. The main reason is that the basic data transmission in Netty is not ByteBuffer in Java NIO, but ByteBuf encapsulated by itself, and the design of Java XNIO interfaces is still based on ByteBuffer as the transmission processing unit. The design is also very similar, both are Reactor model design.

Java XNIO consists of the following concepts:

  • Java NIO ByteBuffer:BufferIs a stateful array that holds data and keeps track of what has been written or read. The main attributes include: Capacity (the capacity of the Buffer), position(the subscript of the next position to read or write), and limit(the limit of the position that can currently be written or read).Programs must read or write data from a Channel by putting data into a Buffer.ByteBufferThe Bytebuffer is a more specialized Buffer that can be allocated in direct memory, so that the JVM can use the Bytebuffer directly for IO operations, saving a step of copying (see my article for details).Java off-heap memory, zero copy, direct memory, and thoughts on FileChannel in NIO). It is also possible to allocate file mapped memory directly, known as Java MMAP (see one of my articles for details).JDK core JAVA source code parsing (5) – JAVA File MMAP principle parsing). Therefore, most IO operations are performed through ByteBuffer.
  • Java NIO ChannelA: Channel is a Java abstraction for opening a connection to an external entity (such as a hardware device, a file, a socket for a network connection, or some component that can perform IO operations). A Channel is the source of I/O events. All data written or read must pass through a Channel. For NIO channels, it passesSelectorTo notify the event of readiness (such as read and write readiness), and then read or write through Buffer.
  • XNIO Worker: Worker is the basic network processing unit in Java XNIO framework. A Worker contains two different thread pool types, which are:
    • IO thread pool, the main callSelector.start()The various callbacks that handle the corresponding event cannot, in principle, handle any blocking task, as this would cause other connections to fail. The IO thread pool consists of two types of threads (in the XNIO framework, this thread pool size is set by setting WORKER_IO_THREADS, which defaults to one IO thread per CPU) :
      • Read thread: A callback that handles read events
      • Writer thread: Handles callbacks to write events
    • WORKER_TASK_CORE_THREADS = WORKER_TASK_CORE_THREADS = WORKER_TASK_CORE_THREADS = WORKER_TASK_CORE_THREADS = WORKER_TASK_CORE_THREADS = WORKER_TASK_CORE_THREADS = WORKER_TASK_CORE_THREADS = WORKER_TASK_CORE_THREADS
  • XNIO ChannelListenerThe: ChannelListener is an abstraction used to listen for Channel events, including:channel readable.channel writable.channel opened.channel closed.channel bound.channel unbound

Undertow is an XNIO-based Web services container. Based on XNIO, add:

  • Undertow BufferPool: If ByteBuffer is requested every time it is needed, it is inefficient to go through the JVM memory allocation process (TLAB -> heap) for heap memory and system call for direct memory. Therefore, memory pools are generally introduced. Here it isBufferPool. Currently, there is only one UnderTowDefaultByteBufferPoolOther implementations are not currently available. DefaultByteBufferPool this DefaultByteBufferPool is very simple compared to Netty’s ByteBufArena, similar to the JVM TLAB mechanism (see my other series:The most hardcore JVM TLAB analysis on the entire web), but much simpler.We just need to configure buffer size and enable the use of direct memory.
  • Undertow ListenerBy default, there are three built-in Listener types, namely HTTP/1.1, AJP, and HTTP/2 (HTTPS enables SSL implementation through the corresponding HTTP Listner). The Listener is responsible for parsing all requests and packaging the requests asHttpServerExchangeAnd give it to the nextHandlerTo deal with.
  • Undertow Handler: A complete Web server is formed by processing the business through a Handler.

Some default configurations for Undertow

The Undertow Builder sets some default parameters.

Undertow

private Builder() {
    ioThreads = Math.max(Runtime.getRuntime().availableProcessors(), 2);
    workerThreads = ioThreads * 8;
    long maxMemory = Runtime.getRuntime().maxMemory();
    //smaller than 64mb of ram we use 512b buffers
    if (maxMemory < 64 * 1024 * 1024) {
        //use 512b buffers
        directBuffers = false;
        bufferSize = 512;
    } else if (maxMemory < 128 * 1024 * 1024) {
        //use 1k buffers
        directBuffers = true;
        bufferSize = 1024;
    } else {
        //use 16k buffers for best performance
        //as 16k is generally the max amount of data that can be sent in a single write() call
        directBuffers = true;
        bufferSize = 1024 * 16 - 20; //the 20 is to allow some space for protocol headers, see UNDERTOW-1209
    }

}
Copy the code
  • IoThreads is the number of available cpus x 2, that is, the number of Undertow’s XNIO reader threads is the number of available cpus, and the number of writer threads is the number of available cpus.
  • The size of workerThreads is 8 x number of ioThreads.
  • If the memory size is less than 64 MB, direct memory is not used and bufferSize is 512 bytes
  • If the memory size is larger than 64 MB and smaller than 128 MB, direct memory is used and bufferSize is 1024 bytes
  • If the memory size is greater than 128 MB, direct memory is used, and bufferSize is 16 KB minus 20 bytes, which are used for protocol headers.

The Undertow Buffer Pool is configured

DefaultByteBufferPool constructor:

public DefaultByteBufferPool(boolean direct, int bufferSize, int maximumPoolSize, int threadLocalCacheSize, int leakDecetionPercent) { this.direct = direct; this.bufferSize = bufferSize; this.maximumPoolSize = maximumPoolSize; this.threadLocalCacheSize = threadLocalCacheSize; this.leakDectionPercent = leakDecetionPercent; if(direct) { arrayBackedPool = new DefaultByteBufferPool(false, bufferSize, maximumPoolSize, 0, leakDecetionPercent); } else { arrayBackedPool = this; }}Copy the code

Among them:

  • Direct: Whether to use direct memory, we need to set it to true to use direct memory.
  • BufferSize: Specifies the size of each requested buffer
  • MaximumPoolSize: indicates the maximum size of a buffer pool
  • ThreadLocalCacheSize: specifies the size of the thread-local buffer pool
  • Leaktionpercent: it is not useful to check memory leaks

For bufferSize, use the same configuration as your system’s TCP Socket Buffer. In our container, we made the size of the read/write Buffer of the TCP Socket Buffer in the container of the microservice instance into the same configuration (since the request sent between microservices is also accepted by another microservice, so adjust the size of the read/write Buffer of all microservice containers to be the same. To optimize performance, the default is automatically calculated based on system memory).

Check the size of the TCP Socket Buffer in Linux.

  • /proc/sys/net/ipv4/tcp_rmem(For reading)
  • /proc/sys/net/ipv4/tcp_wmem(For writes)

In our container, they are:

Bash-4.2 # cat /proc/sys/net/ipv4/tcp_rmem 4096 16384 4194304 bash-4.2# cat /proc/sys/net/ipv4/tcp_wmem 4096 16384 4194304Copy the code

The values from left to right are the minimum, default, and maximum sizes of read and write buffers of each TCP Socket, in bytes.

We set the buffer size of our Undertow to the default value of the TCP Socket buffer, which is 16 KB. In The Undertow Builder, if memory is greater than 128 MB, the buffer size is 16 KB minus 20 bytes (reserved for protocol headers). So, let’s just use the default.

Application. Yml configuration:

server.undertow: # directBuffers are used by default if the memory size is greater than 128 MB: If bytebuffers are requested every time they are needed, they need to go through the JVM memory allocation process (TLAB -> heap). For direct memory, system calls are required, which is inefficient. Therefore, memory pools are generally introduced. In this case, it's a BufferPool. Currently, there is only one 'DefaultByteBufferPool' in UnderTow. The other implementations are not currently available. # This DefaultByteBufferPool is very simple compared to Netty's ByteBufArena, similar to the JVM TLAB mechanism # It is best to use the same TCP Socket Buffer configuration as your system's # '/proc/sys/net/ipv4/tcp_rmem' (for reading) # '/proc/sys/net/ipv4/tcp_wmem' (for writing) # in memory greater than 128 For MB, bufferSize is 16 KB minus 20 bytes, which are used for the protocol header buffer-size: 16384-20Copy the code

Undertow Worker configuration

Worker configuration is actually the core configuration of XNIO. The main configuration is IO thread pool and Worker thread pool size.

By default, the size of I/O threads is 2 x the number of available cpus. That is, the number of read threads is the number of available cpus, and the number of write threads is the number of available cpus. The worker thread pool size is the I/O thread size x 8.

Due to the large number of blocking operations involved in microservice applications, the size of the worker thread pool can be increased. Our application is set to the IO thread size by 32.

Application. Yml configuration:

Server. The undertow. Threads: # set IO number of threads, it performs a non-blocking main task, they will be responsible for multiple connections, the default setting each CPU core thread a read and a write IO thread: The value is set based on the blocking factor of the system thread executing the task. The default value is the number of IO threads *8 worker: 128Copy the code

Undertow configuration in Spring Boot

The abstraction for the Undertow-related configuration in Spring Boot is the ServerProperties class. All configurations involved in Undertow are described as follows (excluding accesslog-related, which will be analyzed in detail in the next section) :

server: undertow: If bytebuffers are requested every time they are needed, they need to go through the JVM memory allocation process (TLAB -> heap). For direct memory, system calls are required, which is inefficient. Therefore, memory pools are generally introduced. In this case, it's a BufferPool. Currently, there is only one 'DefaultByteBufferPool' in UnderTow. The other implementations are not currently available. # This DefaultByteBufferPool is very simple compared to Netty's ByteBufArena, similar to the JVM TLAB mechanism # It is best to use the same TCP Socket Buffer configuration as your system's # '/proc/sys/net/ipv4/tcp_rmem' (for reading) # '/proc/sys/net/ipv4/tcp_wmem' (for writing) # in memory greater than 128 When MB, bufferSize is 16 KB minus 20 bytes, which are used for the protocol header buffer-size: 16364 # Direct memory allocated (NIO directly allocated out-of-heap memory), this is enabled, so the Java startup parameter needs to configure direct memory size to reduce unnecessary GC # directBuffers are used by default when memory is greater than 128 MB: True threads: # Set the number of I/O threads that perform non-blocking tasks. These threads are responsible for multiple connections, with one reader and one writer per CPU core as the default: Undertow will retrieve threads from this thread pool when executing servlet-like blocking I/O operations. The default value is the number of I/O threads *8 worker: Max-http-post-size: -1b max-http-post-size: -1b max-http-post-size: -1b max-http-post-size: -1b Max-parameters: 1000 max-headers: 200 200 # Limits the number of key-value pairs for cookies in HTTP headers. The default is 200 max-cookies: 200 # Whether to allow/escape with %2F. / is a URL-reserved word. Do not enable this escape unless your application explicitly requires it. Default is false allow-encoded-slash: Undecipher: true # utf-8 url-charset: utF-8 url-charset Utf-8 # HTTP header with 'Connection: keep-alive'; default: true always-set-keep-alive: True # Request timeout, default is not timeout, our micro service may have a long time scheduled task, so do not do the server timeout, the client timeout, so we keep the default configuration of no-request-timeout: -1 # enable preserve-path-on-forward: false options Options org.xnio.Options socket: SSL_ENABLED: SSL_ENABLED False # spring boot no undertow related configuration configuration here, in the abstract. The corresponding IO undertow. UndertowOptions class server: ALLOW_UNKNOWN_PROTOCOLS: falseCopy the code

Spring Boot does not abstract all of the Undertow and XNIO configurations. If you want to customize some of the configurations, you can do so by configuring server.undertow.options at the end of the configuration above. Server. The undertow. Options. The socket corresponding XNIO related configuration, configuration class is org. XNIO. The options; Server. The undertow. Options. The corresponding undertow of related configuration, server configuration class is IO undertow. UndertowOptions.