preface

This chapter learns about Netty memory pools:

  • Memory classification
  • PoolArena
  • PoolChunkList
  • PoolChunk
  • PoolSubpage

A, memory classification

1. Classify according to size

classification The lower limit ceiling specifications
Tiny >0Byte <=496Byte 16 b, 32 b… 496B, difference of 16 arithmetic series, a total of 31 specifications
Small >=512Byte <=4096Byte There are four specifications: 512B, 1024B, 2048B, and 4096B, whose ratio is 2
Normal >=8192Byte <=16MB 8192B
Huge >16MB Unpooled allocation

2. Classification according to structure

  • Page: A Page represents an 8KB block of memory, and Subpage is responsible for allocating Tiny and Small sizes of memory. The memory block responsible for the Subpage belongs to Chunk, and the Subpage is created by Chunk.
  • Chunk: a Chunk represents a Chunk of 16MB memory. Netty applies for a Chunk in unit of 16MB each time. If the Chunk exceeds 16MB, it does not use the pooling technology. You can think of a Chunk as a collection of pages.

Second, the PoolArena

The PoolArena is the entry to the memory pool and does the following:

  • Apply for resources from the system, create Chunk, and manage Chunk.
  • Manage Subpage and allocate Tiny and Small size memory by Subpage. Note that all subpages are created by Chunk and put into the Subpage pool of its corresponding Arena by Chunk.

PoolArena member variables are as follows: omit PoolArenaMetric, these xxxmetrics are only required for subclasses to have some statistical functionality:

abstract class PoolArena<T> implements PoolArenaMetric {
    // Size classification
    enum SizeClass {
        Tiny, // Less than or equal to 496B
        Small, // Greater than or equal to 512B Smaller than or equal to 4K
        Normal // the value is greater than or equal to 8K and less than or equal to 16M
    }
    // Tiny pool size = 32
    static final int numTinySubpagePools = 512 >>> 4;
    / / distributor
    final PooledByteBufAllocator parent;
    // Tree depth is 11
    private final int maxOrder;
    // A Page size = 8192B = 8K
    final int pageSize;
    // log2(8192) = 13
    final int pageShifts;
    // Chunk size = 16MB
    final int chunkSize;
    // -pageSize = -8192
    final int subpageOverflowMask;
    // small Pool size = pageshifts-9 = 4
    final int numSmallSubpagePools;
    // Direct cache memory alignment 0
    final int directMemoryCacheAlignment;
    / / the cache memory directly aligned mask = = directMemoryCacheAlignment - 1-1
    final int directMemoryCacheAlignmentMask;
    16B,32B,48B... 496B arithmetic sequence difference 16 array size 32 the actual subscript starts at 1
    / / the tiny pool
    private final PoolSubpage<T>[] tinySubpagePools;
    // Responsible for specifications 512B, 1024B, 2048B, 4096B ratio 2 array size 4
    / / small pool
    private final PoolSubpage<T>[] smallSubpagePools;

    // PoolChunkList
    private final PoolChunkList<T> q050;
    private final PoolChunkList<T> q025;
    private final PoolChunkList<T> q000;
    private final PoolChunkList<T> qInit;
    private final PoolChunkList<T> q075;
    private final PoolChunkList<T> q100;
}
Copy the code

In general, an Arena is divided into two parts, one is a Chunk connected by PoolChunkList and the other is a PoolSubpage pool. The actual Arena structure looks like this:

Explain the image above:

  • PoolChunkList: Each ChunkList maintains a Chunk bidirectional linked list, and all chunklists are linked to each other as a linked list in the Arena. Each newly created Chunk (16M of memory requested from the system) will be placed into qInit. Due to fluctuations in the actual usage, Chunk usage moves back and forth among several poolChunkLists. The overlap between the upper and lower limits of the PoolChunkList prevents Chunk usage from moving back and forth frequently when the threshold is reached. PoolChunkList has the following specifications:
classification The lower limit ceiling
qInit Integer.MIN_VALUE 25%
q000 1% 50%
q025 25% 75%
q050 50% 100%
q075 75% 100%
q100 100% Integer.MAX_VALUE
  • PoolSubpage[32] tinySubpagePools: Stores poolsubpages allocated by chunks and is responsible for allocating Tiny size memory blocks. Array subscripts 1-31, corresponding to 16B, 32B… 496B, a total of 31 sizes of Tiny memory blocks. Each Subpage is connected to each other as a bidirectional linked list. The head node is empty and does not store data.

  • PoolSubpage[4] smallSubpagePools: Stores poolsubpages allocated by Chunk and is responsible for allocating Small memory blocks. The array subscripts are 0 to 3, corresponding to the four types of Small memory blocks: 512B, 1024B, 2048B, and 4096B. Each Subpage is connected to each other as a bidirectional linked list. The head node is empty and does not store data.

Take a look at the Arena construction method, which basically calculates various parameters and builds the Subpage pool and the PoolChunkList chain.

protected PoolArena(PooledByteBufAllocator parent, int pageSize,
      int maxOrder, int pageShifts, int chunkSize, int cacheAlignment) {
    this.parent = parent;
    // Some page chunk related size calculations
    this.pageSize = pageSize; // 8192 B = 8 KB
    this.maxOrder = maxOrder; / / 11
    this.pageShifts = pageShifts;/ / 13
    this.chunkSize = chunkSize; // 16777216 B = 16 MB = pageSize << maxOrder
    directMemoryCacheAlignment = cacheAlignment; / / 0
    directMemoryCacheAlignmentMask = cacheAlignment - 1; // -1
    subpageOverflowMask = ~(pageSize - 1); // -pageSize
    // Create a Tiny specification Subpage pool
    tinySubpagePools = newSubpagePoolArray(numTinySubpagePools); // 32 length array
    for (int i = 0; i < tinySubpagePools.length; i ++) {
        tinySubpagePools[i] = newSubpagePoolHead(pageSize); // Construct the PoolSubpage header node
    }
	// Create a Small Subpage pool
    numSmallSubpagePools = pageShifts - 9; // pageShifts - 9 = 4
    smallSubpagePools = newSubpagePoolArray(numSmallSubpagePools); // 4 length array
    for (int i = 0; i < smallSubpagePools.length; i ++) {
        smallSubpagePools[i] = newSubpagePoolHead(pageSize); // Construct the PoolSubpage header node
    }
    // instantiate PoolChunkList
    q100 = new PoolChunkList<T>(this.null.100, Integer.MAX_VALUE, chunkSize);
    q075 = new PoolChunkList<T>(this, q100, 75.100, chunkSize);
    q050 = new PoolChunkList<T>(this, q075, 50.100, chunkSize);
    q025 = new PoolChunkList<T>(this, q050, 25.75, chunkSize);
    q000 = new PoolChunkList<T>(this, q025, 1.50, chunkSize);
    qInit = new PoolChunkList<T>(this, q000, Integer.MIN_VALUE, 25, chunkSize);
    // PoolChunkList link
    q100.prevList(q075);
    q075.prevList(q050);
    q050.prevList(q025);
    q025.prevList(q000);
    q000.prevList(null);
    qInit.prevList(qInit);
}
Copy the code

PoolArena is an abstract class that requires subclasses to implement several methods.

// Whether to use direct memory
abstract boolean isDirect(a);
// Create a new Chunk
protected abstract PoolChunk<T> newChunk(int pageSize, int maxOrder, int pageShifts, int chunkSize);
// Create unpooled chunks
protected abstract PoolChunk<T> newUnpooledChunk(int capacity);
/ / create PooledByteBuf
protected abstract PooledByteBuf<T> newByteBuf(int maxCapacity);
// Memory copy
protected abstract void memoryCopy(T src, int srcOffset, PooledByteBuf<T> dst, int length);
/ / destroyed the Chunk
protected abstract void destroyChunk(PoolChunk<T> chunk);
Copy the code

These methods are deferred to subclass implementation because of problems with generic T, which can’t determine whether to use heap memory byte arrays or direct memory ByteBuffers, so PoolArena has two implementation classes. The DirectArena part of the direct memory implementation class is implemented as follows.

static final class DirectArena extends PoolArena<ByteBuffer> {

    DirectArena(PooledByteBufAllocator parent, int pageSize, int maxOrder,
            int pageShifts, int chunkSize, int directMemoryCacheAlignment) {
        super(parent, pageSize, maxOrder, pageShifts, chunkSize,
                directMemoryCacheAlignment);
    }

    @Override
    boolean isDirect(a) {
        return true;
    }

    @Override
    protected PoolChunk<ByteBuffer> newChunk(int pageSize, int maxOrder,
            int pageShifts, int chunkSize) {
        if (directMemoryCacheAlignment == 0) {
            return new PoolChunk<ByteBuffer>(this,
                    allocateDirect(chunkSize), pageSize, maxOrder,
                    pageShifts, chunkSize, 0);
        }
        final ByteBuffer memory = allocateDirect(chunkSize
                + directMemoryCacheAlignment);
        return new PoolChunk<ByteBuffer>(this, memory, pageSize,
                maxOrder, pageShifts, chunkSize,
                offsetCacheLine(memory));
    }

    private static ByteBuffer allocateDirect(int capacity) {
        return PlatformDependent.useDirectBufferNoCleaner() ?
                PlatformDependent.allocateDirectNoCleaner(capacity) : ByteBuffer.allocateDirect(capacity);
    }

    @Override
    protected PooledByteBuf<ByteBuffer> newByteBuf(int maxCapacity) {
        if (HAS_UNSAFE) {
            return PooledUnsafeDirectByteBuf.newInstance(maxCapacity);
        } else {
            returnPooledDirectByteBuf.newInstance(maxCapacity); }}}Copy the code

Third, PoolChunkList

PoolChunkList is divided into different PoolChunkList instances based on Chunk usage specifications (minUsage and maxUsage). Chunks with the same usage specifications are stored in the same PoolChunkList instance and are stored in a linked list (head). The Arena maintains chunks of PoolChunkList with different usage specifications, connected to each other by the prevList (nextList) pointer to the PoolChunkList.

In the initial Arena, all PoolChunkList head Pointers were empty. After a new Chunk was created, the PoolChunkList instance for Qinit was added. Subsequently, the PoolChunkList (q000-Q100) was moved back and forth among the PoolChunkList (q000-Q100) for Chunk usage fluctuations.

Member variables

final class PoolChunkList<T> implements PoolChunkListMetric {
    / / Arena they belong to
    private final PoolArena<T> arena;
    / / PoolChunkList after flooding
    private final PoolChunkList<T> nextList;
    / / precursor PoolChunkList
    private PoolChunkList<T> prevList;
    // Chunk usage lower limit
    private final int minUsage;
    // Chunk usage upper limit
    private final int maxUsage;
    // The upper limit of Chunk allocated memory managed by the current instance (calculated by minUsage)
    private final int maxCapacity;
    // The Chunk header node is initially NULL
    private PoolChunk<T> head;
    // The lower limit of free memory. Chunks less than or equal to this value need to be moved to nextList
    private final int freeMinThreshold;
    // The upper limit of free memory. Chunks larger than this value need to be moved to prevList
    private final int freeMaxThreshold;
}
Copy the code

A constructor

PoolChunkList(PoolArena<T> arena, PoolChunkList<T> nextList, int minUsage, int maxUsage, int chunkSize) {
    this.arena = arena;
    this.nextList = nextList;
    this.minUsage = minUsage;
    this.maxUsage = maxUsage;
    / / maxCapacity calculation
    maxCapacity = calculateMaxCapacity(minUsage, chunkSize);
    freeMinThreshold = (maxUsage == 100)?0 : (int) (chunkSize * (100.0 - maxUsage + 0.99999999) / 100L);
    freeMaxThreshold = (minUsage == 100)?0 : (int) (chunkSize * (100.0 - minUsage + 0.99999999) / 100L);
}
Copy the code

Focus here on the logic of the calculateMaxCapacity method.

private static int calculateMaxCapacity(int minUsage, int chunkSize) {
// minUsage = max(1, minUsage);
	// Set the value of 1 and minUsage to a larger value to ensure that minUsage>=1 during calculation
    minUsage = minUsage0(minUsage);
	
    if (minUsage == 100) {
        return 0;
    }
    return  (int) (chunkSize * (100L - minUsage) / 100L);
}
Copy the code

Why is maxCapacity a member variable? Considering that Arena allocates memory using PoolChunkList, the PoolChunkList instance needs to determine whether the PoolChunk it manages is allowed to allocate such a large amount of memory. For example, the PoolChunkList can only manage chunks with 20% to 50% memory usage. In this case, the client needs to allocate 81% x 16MB of memory, even if the PoolChunkList has the least 20% memory usage. The current PoolChunkList instance will not be able to allocate memory if the number of memory allocation requests exceeds this value and returns false.

Four, PoolChunk

PoolChunk is a full binary tree that uses arrays to store memorymaps. Created by Arena and stored in the PoolChunkList.

Take a look at the member variables first, and then explain the diagram above.

final class PoolChunk<T> implements PoolChunkMetric {
	// Identifies which Arena the Chunk belongs to
    final PoolArena<T> arena;
    // The actual 16MB memory block, JDKByteBuffer for Direct and byte array for Heap
    final T memory;
    // Ignore memory alignment and consider it 0
    final int offset;
    // tree, with the same initial value as depthMap
    private final byte[] memoryMap;
    // Node - Node depth
    private final byte[] depthMap;
    // Allocate the Subpage set (if Chunk has not allocated memory less than or equal to 4KB, there is no Subpage instance)
    private final PoolSubpage<T>[] subpages;
    // the -8192 mask is used to determine whether the allocated memory is larger than 8K, x&-8192! Equals 0 means more than 8K
    private final int subpageOverflowMask;
    // 8192 page size
    private final int pageSize;
    // log2(page size) = 13
    private final int pageShifts;
    // Tree depth = 11
    private final int maxOrder;
    // Chunk size 16MB
    private final int chunkSize;
    // log2(chunk size) = 24
    private final int log2ChunkSize;
    // Subpage array size = 2048 = 16MB/8KB
    private final int maxSubpageAllocs;
    // Mark bit = tree depth + 1 = 11 + 1 = 12
    private final byte unusable;
	// Cache ByteBuffer to reduce New objects and GC
    private final Deque<ByteBuffer> cachedNioBuffers;
	// Remaining allocated bytes = 16MB - Allocated bytes
    int freeBytes;
	// Specifies which PoolChunkList is currently in
    PoolChunkList<T> parent;
    // The precursor node
    PoolChunk<T> prev;
    // Rear-drive node
    PoolChunk<T> next;
}
Copy the code

Focus on the following variables:

  • MemoryMap: Complete tree with 4096 nodes (=2^(maxOrder+1)=2^12), using subscripts 1-4095. Each node has its own ID (the subscript of a memoryMap). The initial value of each node is equal to the depth of the node, such as memoryMap[1]=0, memoryMap[2]=1, and memoryMap[2048]=11. If the value of a node is greater than maxOrder, the memory of the current node has been allocated. How memoryMap node values change at runtime is discussed later.

  • DepthMap: contains 4096 elements. Subscripts 1-4095 are used. The subscripts correspond to memoryMap subscripts (node IDS), and the element values represent the depth of the node. DepthMap does not change during the entire run.

  • Subpages: contains 2048 elements (maxSubpageAllocs=1<

  • MaxOrder: Tree depth, 11.

  • Unusable: fixed value = tree depth +1=12, used on memoryMap saved values, indicating that memory for the current node and its children is allocated and unavailable.

  • Memory: The actual 16M blocks of memory, which are JDK byteBuffers for direct memory and byte arrays for heap memory. After Chunk creation, 16MB of ByteBuffer is allocated with different sizes (Normal/Small/Tiny), only the offset and length that they occupy in the ByteBuffer. You can think of memory as an atomic memory pool that manages 16M of memory resources.

  • FreeBytes: indicates the number of bytes that can be allocated.

As shown in the figure above, there are 4095 nodes in the tree of Chunk, corresponding to the subscripts 1-4095 of the memoryMap. The initial value of each node is the value stored in the memoryMap[index], that is, the depth of the node. The depth of the node is represented by the depthMap array and does not change at runtime.

When Chunk is used to allocate memory, the size of the memory allocated by the user is standardized, which will be discussed later. If the memory allocated is greater than or equal to 8K, the free node needs to be selected from the memoryMap and marked as unusable=12. To allocate memory less than or equal to 4K, select a free leaf node of the memoryMap, mark it as unusable=12, and create a Subpage for the subPages. The subscript of subpages corresponds to the location of the memoryMap leaf node (see figure). Subpage is then responsible for dividing smaller memory blocks (such as 32B) for use by clients.

Finally, look at the Chunk constructor.

PoolChunk(PoolArena<T> arena, T memory, int pageSize, int maxOrder, int pageShifts, int chunkSize, int offset) {
    // Set some variables
    unpooled = false;
    this.arena = arena;
    this.memory = memory;
    this.pageSize = pageSize;
    this.pageShifts = pageShifts;
    this.maxOrder = maxOrder;
    this.chunkSize = chunkSize;
    this.offset = offset;
    unusable = (byte) (maxOrder + 1);
    log2ChunkSize = log2(chunkSize);
    subpageOverflowMask = ~(pageSize - 1);
    freeBytes = chunkSize;
    assert maxOrder < 30 : "maxOrder should be < 30, but is: " + maxOrder;
    maxSubpageAllocs = 1 << maxOrder;
    / / create a tree
    memoryMap = new byte[maxSubpageAllocs << 1];
    depthMap = new byte[memoryMap.length];
    int memoryMapIndex = 1;
    for (int d = 0; d <= maxOrder; ++ d) {
        int depth = 1 << d;
        for (int p = 0; p < depth; ++ p) {
            memoryMap[memoryMapIndex] = (byte) d;
            depthMap[memoryMapIndex] = (byte) d; memoryMapIndex ++; }}// An empty array of 2048 elements
    subpages = newSubpageArray(maxSubpageAllocs);
    cachedNioBuffers = new ArrayDeque<ByteBuffer>(8);
}
Copy the code

Fifth, PoolSubpage

Member variables

final class PoolSubpage<T> implements PoolSubpageMetric {
    // Chunk to which Subpage belongs
    final PoolChunk<T> chunk;
    // memoryMap subscript (tree 2048-4095 nodes)
    private final int memoryMapIdx;
    // Offset allocated to a ByteBuffer or byte array
    // Remember that the entire 16M memory block is in the memory member variable of PoolChunk, and Subpage only gets part of it
    // Use this offset to determine the offset of the current instance
    private final int runOffset;
    // Page size 8K
    private final int pageSize;
    // The 8-length long array can identify up to 64*8=512 binary bits
    // The minimum size of 8K memory blocks =8192/512=16
    private final long[] bitmap;
    / / precursor
    PoolSubpage<T> prev;
    / / after flooding
    PoolSubpage<T> next;
    // memory block size managed by Subpage, for example, 16B or 32B
    int elemSize;
    // Page pageSize Specifies the maximum number of memory blocks to be divided according to the value of elemSize
    / / such as 8192/32 = 256
    private int maxNumElems;
    // Actually use the length of the long bitmap array
    // For example, to manage the 16B specification, you need to use 512 bits, which requires 8 positions in the long array
    // For example, to manage 32B specifications, you need to use 256 bits, which requires 4 positions in the long array
    private int bitmapLength;
    // Next assignable bitmap subscript
    private int nextAvail;
    // The number of memory blocks that can be allocated
    private int numAvail;
}
Copy the code

First, a Subpage is created by a Chunk, and Chunk assigns itmemoryMapIdxA leaf node representing the Chunk tree is also assigned to itrunOffset.Represents the offset of the current Subpage in the 16MB memory blockGiven the page size of 8K, this offset must be an integer multiple of 8192.

Since Subpage needs to manage small chunks of memory of various sizes, it also allocates offsets of small chunks in addition to its own runOffset. RunOffset =8192. The offset of the first 32B block allocated in 16MB is 8192+0. The actual offset for the second allocated 32B block in 16MB is 8192+32=8224, and so on.

A bitmap is an array of longs, each of which can represent a 64-bit binary, or 32 offsets that can identify a small chunk of memory. Because the current Subpage manages 32B size, the actual need is 8192 (page size) /32 (size) =256 binary bits (maxNumElems), 256 (number of binary bits) /64 (long bytes) = 4, So the bitmap array only needs four longs (bitmapLength).

When creating a Subpage for the first time from Arena->Chunk->Subpage, the Subpage will hang in the Tiny or Small pool of the corresponding Arena. TinyPool/SmallPool/Arena->TinyPool/SmallPool/Arena->TinyPool/SmallPool

In addition, as Subpage allocates small memory blocks during runtime and bitmaps are filled, numAvail reduces the number of allocated memory blocks remaining. When a small memory block is returned to Subpage, bitmap restores the flag bit to 0 and saves the bitmap index of the returned memory block to nextAvail for direct use in the next allocation. (As you’ll see later, small memory blocks and ThreadLocal caches are not necessarily returned directly to the Subpage.)

Six, PooledByteBuf

PooledByteBuf pooling ByteBuf base classes, inherit from reference counting ByteBuf abstract class, implementation class divided into DirectPooledByteBuf and HeapPooledByteBuf, the difference is that the former use memory is the underlying JDK ByteBuffer directly, The latter uses heap memory with byte arrays at the bottom.

abstract class PooledByteBuf<T> extends AbstractReferenceCountedByteBuf {
    // Object pool related (discuss later)
    private final Handle<PooledByteBuf<T>> recyclerHandle;
    // Which Chunk does it belong to
    protected PoolChunk<T> chunk;
    // The lower 32 bits are the subscripts of the memoryMap from Chunk
    // The high 32 bits are bitMap subscripts (only the lower 32 bits are assigned by subpage, split 8K pages) derived from subpage
    protected long handle;
    // A large memory block (16M) is JDK ByteBuffer for Direct and byte array for Heap
    protected T memory;
    // The initial offset allocated to memory
    protected int offset;
    // Request memory size (not standardized)
    protected int length;
    // Actual allocated memory size (standardized)
    // Apply for the use of the 16M memory block offset,offset+maxLength
    int maxLength;
    // Which thread cache does it belong to?
    PoolThreadCache cache;
    / / distributor
    private ByteBufAllocator allocator;
}
Copy the code

Here are a few variables to focus on:

  • Chunk: Identifies the chunk to which the current Buffer belongs.
  • Memory: Indicates the actual 16MB memory block that the current Buffer belongs to.
  • Handle: an important variable that holds the Chunk and Subpage that it belongs togene.
    • Lower 32 bits: node ID of the Chunk tree, which is the subscript of the memoryMap.
    • High 32 bits. For memory blocks greater than or equal to 8K that are not allocated by Subpage, the 32 bits are 0. Memory blocks smaller than or equal to 4KB, namely Tiny and Small size chunks, are allocated by Subpage, and these 32 bits record the bitmap subscripts (bitmap arrays, global binary bits after the long element is merged).
  • Offset, length, maxLength: offset indicates the start offset of the current Buffer in the 16MB memory block. Length indicates the original memory size applied by the user. MaxLength indicates the standardized memory size, which is the actual size allocated from the 16MB memory block.

conclusion

  • Memory classification:

    • It is classified into Tiny, Small, Normal, and Huge by size
    • The data structure includes Chunk and Page
  • PoolArena: Holds two Subpage pools and a PoolChunkList chain. The entry point to which the user allocates memory is the Arena. Create chunks manage chunks and subpages.

  • PoolChunkList: The PoolChunkList is divided into different PoolChunkList instances based on Chunk usage specifications. The PoolChunkList instances are stored in the same PoolChunkList instance in a linked list. The Arena maintains the Chunk PoolChunkList of different usage specifications, which are connected to each other through the front and back Pointers of the PoolChunkList. The run-time arenas pass poolChunks to each other in the PoolChunkList, based on thresholds set by the PoolChunkList.

  • PoolChunk: Chunk is the smallest unit of memory that Netty requests from the system. The size is 16MB. If it’s DirectBuffer, it’s JDK ByteBuffer. If it’s HeapBuffer, it’s byte array. Chunk is a full binary tree with a depth of 11 and a number of 4095 nodes. Each node represents an offset of the memory block. When the node value is 12, the offset of the current node and its descendants are unavailable. Chunk is also responsible for creating subpages and attaching them to the Tiny or Small pool of the corresponding Arena.

  • PoolSubpage: PoolSubpage is created by PoolChunk and is maintained in pools Tiny or Small of PoolArena. Each PoolSubpage is allocated 8 KB memory blocks. However, n memory blocks need to be allocated based on the self-managed specifications, for example, 32B. The offset of a small memory block is maintained by a bitmap inside it.