This series of articles is a translation of the InnoDB series from Jeremy Cole’s Blog. This paper is the fourth of 16 articles. Page Management in InnoDB Space Files

Due to the limited amount of translation, in order to avoid misunderstanding to the reader, some proper nouns will be followed by [] marking the original text.

Page management in InnoDB space file

In On Learning InnoDB: A Journey to the Core, I described the project Innodb_Diagrams for documenting the internal structure of InnoDB. The diagrams used in this paper can be found in this project.

The basic structure of space and The basic structure of each page are described in The article “Basics of InnoDB Space File Layout”. Next, we will describe in detail The structure related to page management, section management, and free space management in InnoDB. And how it tracks the usage of pages assigned to different purposes.

Extents and extent descriptors

As described earlier, InnoDB pages are typically 16KB and grouped into 1MB blocks, each containing 64 contiguous pages, known as “extences”. InnoDB allocates FSP_HDR and XDES pages at fixed locations within the space to keep track of which extents are in use and which pages within each extent are in use. The structure of these pages is very simple:

They contain the generic FIL Header and FIL Trailer, an FSP Header (discussed later), 256 “extent descriptors,” and a lot of unused space.

An extent descriptor has the following structure:

The purpose of each field in the extent descriptor is as follows:

  • File Segment ID(8), the file sectionID: If an extent belongs to a file segment, this field indicates the extent of the file segment to which the extent belongsID
  • List node for XDES list(12).XDESList node of a list: A pointer to one extent up and the next extent in the extent descriptor list
  • State(4), extents’ current state: Only four values are currently defined:FREE.FREE_FRAG.FULL_FRAGThese three states indicate that the extent belongs toFREEList /FREE_FRAGEList /FULL_FRAGEList),FSEGThis state indicates that the extent belongs to a file segment, and the file segment ID is stored inXDES EntryThe structure of theFile segment IDIn the field)
  • Page State Bitmap(16), page status bitmap: Allocated for each page in this sectiontwoBitmap (64 x 2 = 128position16Bytes). The first digit indicates whether the page is free; The second bit is reserved to indicate whether the page is clean [clean] (clean means there is no dirty data that has not been flushed to disk), but this bit is currently not used and is always set to1

Other structures that reference extents are implemented using a combination of the page number of the FSP_HDR page (or XDES page) on which the extent descriptor is located and the byte offset within the page of the extent descriptor entry itself. For example, page 0 offset 150 refers to the first extent in the space and contains pages 0-63, while page 16384 offset 270 contains pages 16576-16639.

The translator’s note: Here’s a more detailed calculation for the second example. For page 16384 offset 270, first page 16384 is the first page in the 256th extent (XDES type), so this page records the extent descriptors in the 256-511 extent. Offset 270 refers to the fourth XDES entry on this page (38+112+40×3 = 270), so it describes the 259th section of information, i.e., pages 16576-16639.

Linked list base node and linked list node

Linked lists (InnoDB calls them “free lists”) are a very generic structure that links multiple related structures together. It consists of two complementary structures, “base list node” and “list node”, which together make up a good bidirectional list on disk. The structure of the “base node of the list” is as follows:

“List base nodes” are stored only once in some high-level structure, such as FSP headers. It contains the length of the list and Pointers to the first and last nodes in the list. The structure of a “linked list node” looks very similar, except that instead of storing Pointers to the first and last nodes, a “linked list node” stores Pointers to the previous and next nodes:

All Pointers contain a page number (which must be in the same space) and a byte offset from the page where the linked list node can be found. All Pointers point to the beginning of the list node structure (that is, N+0), not necessarily to the beginning of the structure to which the list node belongs. For example, in a linked list of extent descriptor entries, because the list node is offset by 8 in the XDES entry structure, the code reading the XDES entry must “know” that the XDES structure begins eight bytes before the offset of the list node, and start reading the structure from there. (It might be a good idea to make sure that list nodes are first in any structure, but they’re not.)

File space header and extent list

In addition to storing the extent descriptor entry itself, the FSP_HDR page (which is always page 0 in space) also stores the FSP Header structure, which contains a large number of linked lists, which is why I didn’t describe this structure earlier. The structure of the FSP Header is as follows:

Non-list-related fields in the FSP Header (out of order) :

  • Space ID(4): Indicates the ID of the current space
  • Highest page number in the space (size)(4):sizeRepresents the highest valid page number and increases as the file grows. However, not all pages assigned with page numbers are initialized (some may be zero-filled) because space expansion is a multi-step process
  • Highest page number initialized (free limit)(4).free limitIs that allFIL headerThe maximum number of pages in all pages that have been initialized, which involves storing the page number in the page itself.free limitIs always less than or equal tosize.
  • Flags(4): Stores space-related tags
  • Next Unused Segment ID(8): The file segment that will be used in the next allocated file segmentID. (This is an auto-incrementing integer.)
  • Number of pages used in the FREE_FRAG list: This field is stored for performance optimization purposes so that it can be computed quicklyFREE_FRAGThe sum of free pages for all extents in the list, without traversing all extents in the list and summing up the free pages available in each extent.

The list base nodes for the following list of extent descriptors are also stored in the FSP Header:

  • FREE_FRAGLinked list: The extents in this list are all pages that are free and can be allocated in a “fragmented” manner, meaning that individual pages can be allocated according to different needs, rather than the entire extent. For example, each hasFSP_HDRThe page orXDESSections of the page will be placed inFREE_FRAGList so that the remaining free pages in the extent can be allocated for other purposes.
  • FULL_FRAGList: withFREE_FRAGExactly the same, but the linked list holds extentswith no remaining pages available. whenFREE_FRAGEWhen an extent in a linked list becomes full (all pages are used), the extent is removed fromFREE_FRAGThe list moves toFULL_FRAGLinked list, if any pages are released, these sections are moved backFREE_FRAGLinked lists, because they’re no longer full.
  • FREELinked list: The extents in the list are all completely unused extents and can be used to form a whole by the entire extent.FREEExtents can be assigned to a file segment (and placed in the appropriateINODEList), or move toFREE_FRAGThe linked list is allocated according to the “fragmentation” mode of page use.

File segments and inodes

File segments and inodes are perhaps the most obscure areas of InnoDB terminology and documentation. InnoDB overloads the common file system term INODE, using it for INODE entries (single small structures) and INODE pages (pages containing many INODE entries). Naming confusion aside, an INODE entry in InnoDB describes only a single file segment (usually called an FSEG), which we’ll refer to as a “file segment INODE” for the rest of this article. The INODE page that contains this information has the following structure:

Each INODE page contains 85 file segment INODE entries (for a 16KB page), each of which is 192 bytes. In addition, each INODE page contains a linked list node that can be used to form the following two linked lists of INODE pages:

  • FREE_INODES: At least one is freeINODEThe entry ofINODEA linked list of pages.
  • FULL_INODES: No leisureINODEThe entry ofINODEA linked list of pages. When using”file per table“Type table space, the list in each table space will be empty unless the table has more than one42Because each index consumes exactly two file segmentsINODEEntries.

The base node of the INODE page list is stored in the FSP Header structure of the FSP_HDR page mentioned above.

A file segment INODE entry has the following structure:

The non-list fields of a file segment INODE entry are used for the following purposes:

  • File Segment ID, the file sectionID: The file segmentINODEThe file segment of the entry description (FSEG)ID. ifID0, the entry is not used.
  • Magic Number, magic number: fixed storage97937874As the file segmentINODEThe flag that the entry was properly initialized.
  • Number of used pages in "NOT_FULL" list.NOT_FULLNumber of pages already used in the list: with spaceFREE_FRAGLinked list (saved inFSP HeaderSimilarly, this field stores isNOT_FULLThe number of pages that have been used in the list so that you can quickly calculate the number of free pages in the list without having to traverse and sum all the sections in the list.
  • Fragement Array, fragment page array: from spaceFREE_FRAGList orFULL_FRAGIn the Fragments section of a linked list32An array of page numbers for each individually allocated page. Once this array becomes full, only full extents can be allocated to file segments.

As the table grows, InnoDB first allocates individual pages in each file segment until the array of fragmented pages in the file segment becomes full, then changes to allocating one extent at a time, and finally allocating four at a time.

Each file segment INODE entry also contains the base node of the extent descriptor list:

  • FREELinked list: Contains the completely free extents allocated to this file segment.
  • NOT_FULLLinked list: Contains the extent assigned to this file segment and has at least one used page. When the last free page is used, the section is moved toFULLChain in the table.
  • FULLLinked list: Contains extences allocated to this file segment that have no free pages. If a page in one of the sections becomes free, the section moves toNOT_FULLIn the list.

If the last used page in an extent of the NOT_FULL list is freed, the extent can be moved to the FREE list of the file segment, but in fact the extent is directly moved back to the FREE list of the space.

How does the index use segments

While we haven’t covered the INDEX page yet, it’s time to take a look at one small aspect of it. The root page of each index contains Pointers to file segment INODE entries that describe the file segments used by the index. Each index uses one file segment for leaf pages and one file segment for non-leaf (internal) pages. This information is stored in the FSEG Header structure (which is a structure in the INDEX page) :

The Space ids stored there are a bit redundant — they will always be the same as the current Space. The Page Number and Offset point to a file segment INODE entry in the INODE Page. These two file segments will always exist, even though they may be completely empty.

For example, in a newly created table, the only page that exists is the root page, which is also a leaf node page but exists in an “internal” file segment (so that you don’t have to move it when you add data). The INODE list and fragment page array for the “leaf” file segment are both empty. The INODE list for the “internal” file segment is also empty, and the only allocated root page exists in the fragmented page array.

Put it all together

The following diagram links the entire multilevel structure of the index together:

The index root page points to two file segments, each of which has an array of shard pages (pointing to individual pages in the shard page list, up to 32), and a list of several full extents-linked together by a list node pointer in the extent descriptor. The extent descriptor is used to reference the extent and to track the free pages within the extent. Very simple!

What’s next

Next, we’ll look at the structure of INDEX pages, one of the most important page types in InnoDB, from the user’s perspective. Then we’ll look at how InnoDB builds indexes on a macro level.