For Java applications, memory allocation is done programmatically and memory release is done through GC, which simplifies the programmer’s life but also increases the stress on the JVM. Many Java programmers rely too much on GC, but no matter how well the JVM’s garbage collection mechanism works, memory is a limited resource, so even if GC does most of the garbage collection for us, it’s important to pay proper attention to memory optimization during coding. The main purpose of memory optimization is to reduce the frequency of youngGC and fullGC. Excessive youngGC and fullGC will occupy more system resources (mainly CPU) and affect the throughput of the whole system.

For high-performance services such as gateways, it is necessary to focus on memory optimization. This can effectively reduce the number of GC, while improving the memory utilization, maximize the efficiency of the program.


Netty ChannelHandler

In Netty, each Channel has its own Channel pipeline, and each Channel pipeline manages a series of Channel handlers.

When a client connects to a server, Netty creates a new ChannelPipeline to handle its events. Each ChannelPipeline adds several custom or netty-provided channelhandlers. If a new Instance of ChannelHandler is created for every client connection, the server will need to store a large number of ChannelHandler instances when there are a large number of connections. For example, if 100,000 connections are established, 10W * (the number of ChannelHandlers added) objects will be created. This will be a very large memory consumption.

Netty provides a way to solve this problem by marking ChannelHandler @sharable as long as it is stateless (i.e. does not need to store any state data). No matter how many connections there are, you only need to create one ChannelHandler instance, shared by all ChannelPipelines.

So when implementing our custom ChannelHandler, it is best to make it stateless (one thing to note is that codecs like ByteToMessageDecoder are ststate and cannot use Sharable annotations).

Also, the channelhandlers added to the ChannelPipeline are called sequentially, so it is best to reduce the creation of channelhandlers. In Mercury, in addition to the codec related channelhandlers, I just implemented a stateless handler.

@ Sharable principle

At io.net ty. Channel. DefaultChannelPipeline# addLast method of adding ChannelHandler (), there will be a @ Sharable annotations use check. If the ChannelHandler instance has already been added without the @sharable annotation, an exception will be thrown.

From the source code, Netty does a very simple check to prevent instances without @sharable annotations from being used as singletons, and is not that smart. It doesn’t know if the ChannelHandler instance is stateless or not, and the developer needs to make sure of that, otherwise it could be thread unsafe. As mentioned above, the codec instances of ChannelHandler provided by Netty should never be singletons shared by all channels.


AtomicXXXFieldUpdater

In some high concurrency scenarios, there is a lot of time to use the **AtomicXXX ** object to ensure thread safety, such as the common AtomicInteger or AtomicLong, the underlying implementation is CAS.

However, AtomicXXXFieldUpdater can be found in many open source frameworks, such as Netty, where it is widely used to save memory.

Here with AtomicInteger and AtomicIntegerFieldUpdater as an example to illustrate:

The AtomicInteger member variable has only one int value, which doesn’t seem to take up much memory, but our AtomicInteger is an object, and the correct calculation of an object should be:

Object header + actual data size + alignment padding

Name (byte) 32 – A 64 – bit After pointer compression is enabled (Pointers are valid for 64 bits and are enabled by default)
Object head (Header) 8 16 12
Array object header 12 24 16
Reference (reference) 4 8 4

On 64-bit machines, the AtomicInteger object takes up the following memory:

Turn pointer compression off: 16(object header)+4(instance data)=20 is not a multiple of 8, so align padding 16+4+4(padding)=24

Enable pointer compression (-xx :+UseCompressedOop): 12+4=16 is already a multiple of 8, no padding required.

Since our AtomicInteger is an object and needs to be referenced, the real usage is:

Turn off pointer compression: 24 + 8 = 32

Enable pointer compression: 16 + 4 = 20

Like in the Netty AbstractReferenceCountedByteBuf, familiar with Netty classmates know Netty is own memory management, All ByteBuf inherit AbstractReferenceCountedByteBuf in Netty ByteBuf will be created.

If you use **AtomicInteger,** will need to be occupied when pointer compression is turned on:

(Number of bytebufs)* 20 bytes

And volatile AtomicIntegerFieldUpdater * * * * is * * int * * to use, regardless of how many objects are only need to create a AtomicIntegerFieldUpdater * * * *.

AtomicIntegerFieldUpdater * * * * is 16 bytes, * * volatile int * * is 4 bytes, the total memory is:

(Number of bytebufs)* 4 bytes + 16 bytes

This may not be obvious in the case of a small number of objects, but when we have hundreds of thousands, or millions, or tens of millions of objects, the savings may be tens of meters, hundreds of meters, or even a few grams.

Mercury in the gateway needs to maintain a large number of the Channel connection, and involves many complicated scenario, I use the AtomicIntegerFieldUpdater to optimize, in the process of before the pressure test, the effect is very good.


JOL

JOL stands for Java Object Layout. Object layout is a widget that analyzes Object layouts in the JVM. This includes the usage of Object in memory, the reference of instance objects, and so on. You can also use this tool for quick analysis if you find it tedious to calculate Java objects using formulas like the one above.

<dependency>
            <groupId>org.openjdk.jol</groupId>
            <artifactId>jol-core</artifactId>
            <version>0.10</version>
</dependency>

Copy the code
 System.out.println(ClassLayout.parseClass(AtomicInteger.class).toPrintable());
Copy the code

You can see the memory usage of the printed object

There is a JOL plug-in in IDEA, which can also be viewed directly by installing the plug-in

Right-click on the Object and select Show Object Layout to see how much memory is being used