Zero copy of Netty

Zero copy in Netty is not quite what we traditionally think of as zero copy.

In traditional zero-copy mode, data does not need to be copied by CPU during data transmission. This is the copying of data between user space and the kernel.

Zero copy in the traditional sense

Zero-Copy describes computer operations in which the CPU does not perform the task of copying data from one memory area to another.

When sending data, the traditional implementation is:

  • File.read(bytes)
  • Socket.send(bytes)

This method requires four data copies and four context switches:

  • Data is read from disk to the kernel’s Read buffer
  • Data is copied from the kernel buffer to the user buffer
  • Data is copied from the user buffer to the kernel’s socket buffer
  • Data is copied from the kernel socket buffer to the nic interface buffer

Obviously the second and third steps above are not necessary, and Java’s Filechannel.transferto method can avoid the two extra copies above (which of course requires the underlying operating system support).

  1. TransferTo is called, and the data is copied from the file by the DMA engine to the kernel read Buffer
  2. DMA then copies the data from the kernel read buffer to the nic interface buffer

Neither of the above operations requires any CPU input, so it is zero copy.

Zero copy in Netty

The filechannel. transferTo method is also used in Netty, so Netty zero-copy includes the operating system-level zero-copy method described above. In addition, Netty also provides some zero-copy implementations for ByteBuf.

For ByteBuffer, Netty provides two interfaces:

  • ByteBuf
  • ByteBufHolder

Netty provides multiple implementations of ByteBuf:

  • Heap ByteBuf: Allocated directly in Heap memory
  • Direct ByteBuf: Allocated directly in memory area rather than heap memory
  • CompositeByteBuf: Composite Buffer
Direct Buffers

Allocate space directly in the memory area, rather than in the heap memory.

  • With traditional heap allocation, when data is sent over sockets, it is copied from heap to direct memory, and then from direct memory to the nic interface layer.

  • The direct Buffer provided by Netty allocates data directly to the memory space, thus avoiding data copying and realizing zero copy.

Out of memory

If I/O operations are performed inside the JVM, the data must be copied to out-of-heap memory in order to make system calls. So why can’t the operating system just use the JVM heap memory for I/O reading and writing?

There are two main reasons:
  1. The operating system is not aware of the JVM’s heap memory, and the MEMORY layout of the JVM is not the same as that allocated by the operating system, and the operating system does not read or write data according to the behavior of the JVM.

  2. The memory address of the same object can change over time as the JVM GC is executed, for example, the JVM GC is compressed to reduce memory fragmentation, which involves object movement.

Netty uses out-of-heap memory for all I/O operations to avoid copying data from JVM heap memory to out-of-heap memory.

  • The JDK tells us that NIO operations are not suitable for direct operations on the heap. Since the heap is managed directly by the GC, the GC may defragment memory during an I/O write, resulting in an incomplete memory address for an I/O write.

  • When JNI (Java Native Inteface) calls the C class library for IO operations, it stipulates that the address cannot be invalidated during writing, which causes that IO operations cannot be performed directly on the heap. Disabling GC during I/O operations is also an option; if I/O takes too long, it may cause heap space to overflow.

Composite Buffers

In traditional Bytebuffers, if we want to combine the data from two bytebuffers, we first create a new array of size=size1+size2, and then copy the data from the two arrays into the new array. The CompositeByteBuf provided by Netty, however, avoids this because the CompositeByteBuf does not actually combine buffers, but instead keeps references to them, avoiding copying data and achieving zero copy.

The use of FileChannel. TransferTo

Netty uses FileChannel’s transferTo method, which relies on the operating system to achieve zero copy.

conclusion

Netty’s zero copy is reflected in three aspects:

  1. Netty receives and sends bytebuffers using DIRECT BUFFERS, which use out-of-heap DIRECT memory for Socket reading and writing without the need for secondary copy of byte BUFFERS.

    • If traditional HEAP BUFFERS are used for Socket reads and writes, the JVM copies the HEAP Buffer into direct memory before writing it to the Socket. The message is sent with an extra memory copy of the buffer compared to direct out-of-heap memory.
  2. Netty provides the combined Buffer object, which can aggregate multiple ByteBuffer objects. Users can operate the combined Buffer as conveniently as one Buffer, avoiding the traditional way of combining several small buffers into one large Buffer through memory copy.

  3. Netty adopts the transferTo method to transfer files, which can directly send the data in the file buffer to the target Channel, avoiding the memory copy problem caused by the traditional write method.

About reclamation of off-heap memory

The collection of out-of-heap memory really depends on our GC mechanism

  • First of all, we need to know that at the Java level only the DirectByteBuffer object associated with the memory we allocated out of the heap records the base address and size of the memory, so since GC is also relevant, That is, the GC can indirectly manipulate the corresponding out-of-heap memory by operating on DirectByteBuffer objects.

  • The DirectByteBuffer object is associated with a PhantomReference when it is created. The PhantomReference is basically used to track when objects are collected and does not affect GC decisions.

  • If an object is found in the process of GC besides only PhantomReference quote it, and there is no other place to refer to it, it will be placed the references in Java. Lang. Ref. Reference. In the pending queue, The ReferenceHandler daemon is notified to perform some post-processing when GC is complete, and the PhantomReference associated with DirectByteBuffer is a subclass of PhantomReference. The Unsafe free interface is used to free DirectByteBuffer’s out-of-heap memory block.

Why call System.gc actively

System.gc() will recycle all old and new generations of DirectByteBuffer objects and their associated out-of-heap memory.

The DirectByteBuffer object itself is actually small, but it can be associated with a very large out-of-heap memory, so it is often referred to as an iceberg object.

In yGC, unreachable DirectByteBuffer objects in the new generation and their out-of-heap memory are reclaimed, but old DirectByteBuffer objects and their out-of-heap memory are not reclaimed. This is the biggest problem we often encounter.

If a large number of DirectByteBuffer objects were moved to old, but instead of doing CMS or full GC, we could be running out of physical memory, but we don’t know what’s going on because the heap has a lot of memory left.

Learning resources

www.jianshu.com/p/61a7916b3…