Today’s sharing started, please give us more advice ~

What is the page

First, we need to know that Pages are the smallest unit in InnoDB that manages data. Buffer pools store pages of data. For example, InnoDB loads the entire page into the Buffer Pool when the data we are querying is not in the Buffer Pool. Similarly, dirty pages from the Buffer Pool are flushed to disk on a per-page basis.

The overview page

The data we insert into MySQL ends up in pages. In InnoDB’s design, pages are connected by a two-way linked list.

The data stored row by row in a page is connected by a singly linked list.

The User Records area in the figure above is used to store row data. So why does InnoDB design this way? Assuming we don’t have the concept of pages, how do we get results quickly from thousands of queries? MySQL is known for its good performance, but without pages, we’d be left with nothing but iterating through data item by item.

How does that page do that fast lookup? In the current page, it can be traversed through the single linked list of each record in User Records. If it is not found in the current page, it can quickly skip to the next page for query by using the next page pointer.

Supremum and Infimum

One could argue that you don’t solve User Records by traversing it, you simply group the data. If my data is not on the current page at all, do I still have to go through every single piece of data on the previous page? It’s also inefficient.

Of course, MySQL takes this into account, so there is actually a section in The page called The Infimum and Supremum Records, which represents The maximum and minimum Records in The previous page.

With Infimum Record and Supremum Record, a query now does not need to traverse all User Records on a page, but only compare the two Records to the target Record to be queried. For example, if I want to query for data id = 101, that is obviously not on the current page. You can then use the next page pointer to jump to the next page for retrieval.

Use the Page Directory

Some people may say, isn’t your User Records full of single linked lists? Even if I know that the data I’m looking for is on the current page, in the worst case, wouldn’t I still have to go back and forth 100 times to find the data I’m looking for? You call that efficient?

Admittedly, this is a problem, but one that MySQL has already taken into account. Yes, it’s really inefficient to go through them one by one. To solve this problem, MySQL adds another section to the Page Directory.

As the name implies, Page Directory is a Directory with a number of Slots, each of which points to a record in User Records. As you can see, every few pieces of data, a slot is created. In fact, the data presented in my diagram is very strictly set, with a Slot every 6 data points in a full page.

The Page Directory design does not remind you of another data structure, the hop table, but only abstracts one level of index.

MySQL creates slots for new data. With Page Directory, you can perform a rough binary lookup of data on a Page. As for why it is rough, after all, the Page Directory is not complete data, the result of binary search can only be a rough position, after finding this rough position, we need to go back to User Records to continue to traverse the match one by one.

But it’s much more efficient than the original version we started talking about.

The true appearance of the page

If I start by throwing out the various components of the page, the various concepts, first of all, I can’t accept myself, it would be very stiff. Second, people who are not familiar with pages may not be able to understand why pages are designed the way they are. So I query a set of ideas in accordance with a data, the general appearance of the page presented to you.

In fact, there are many other fields stored on the page, as well as other areas, but these do not affect our understanding of the page. So, with a clearer picture of the page, we can take a look at what the actual page looks like.

The figure above is the actual composition of the Page, with a few more that we haven’t talked about before, such as File Header, Page Header, Free Space, and File Tailer. Let’s take it one by one.

File Header

The previous and next Pointers mentioned above are actually part of the File Header, along with a lot of other data.

I actually prefer not to list a bunch of parameters, telling you how big this is and what that is for. For the page we need to know in detail, in fact, only two need to know enough for the time being, respectively:

FIL_PAGE_PREV

FIL_PAGE_NEXT

These two variables are the previous and next page Pointers mentioned above, which are actually the offset of the page on disk.

Page Header

The data in the Page Header is much more familiar to us than the File Header, and I’ve drawn a diagram that lists the contents in detail.

This is all listed here because understanding what these parameters mean and why they are set helps us understand the principles and structure of the page.

For example, the parameter PAGE_HEAP_TOP is called HEAP. It’s the same as if you’re writing comments to Init and calling it an initialization. It’s better not to write it. In fact, if you look into it, the heap here actually refers to User Records.

PAGE_N_HEAP and PAGE_N_RECS are the number of Records currently in User Records. The only difference is that the PAGE_N_HEAP contains Records marked for deletion. And PAGE_N_RECS is actually all the data we can query.

Infimum & Supremum Records

Infimum & Supremum Records Records of the maximum and minimum size of the current page. Actually not exactly, more accurately described is the open interval between the minimum record and the maximum record. Because Infimum Records will actually be smaller than the minimum value on the current page, and Supremum Records will be larger than the maximum value on the current page.

User Records

User Records can be said to be the part we usually contact the most, after all, our data is ultimately there. After the page is initialized, there is no data in User Records. As the system runs and data is generated, the data in User Records will continue to expand, and the corresponding Free Space will gradually shrink.

I’ve talked about the concept of User Records before. Here’s just one thing that I think is crucial, and that’s the order.

We know that in a clustered index, keys are actually sorted in the order of the Primary Key. Is that true in User Records? When we insert a new data into User Records, will we also reorder the existing data in the order of the Primary Key?

The answer is no, because it would slow down MySQL processing.

The data in User Records is guaranteed by the pointing of the singly linked list pointer, that is, the actual representation of the row data on disk is queued in insertion order, with the data arriving first and the data coming after. It’s just a single linked list of rows in User Records that creates an order by Primary Key.

As shown in the figure, it is roughly as follows:

Free Space

In fact, this was discussed in other modules in disguise. At first, User Records were completely empty, and when new data came in, they would apply for Space in Free Space. When Free Space ran out of Space, it meant that they needed to apply for a new page.

File Trailer

This is mainly to prevent the page in the process of brushing into the disk, due to extreme unexpected circumstances (network problems, fire, natural disaster) caused by failure, and resulting in data inconsistency, that is, the formation of dirty pages.

It has only one component:

conclusion

At this point, I think we’re done talking about pages, understanding the underlying principles of pages, which I personally think will help us use MySQL in a more friendly, intelligent way, and make it perform as well as it can.

Today’s share has ended, please forgive and give advice!