Basic structure of a Buffer Pool

Whether the Buffer Pool generates memory fragments

Since the size of the Buffer Pool is determined by you, it is possible that after the Buffer Pool is divided into all the cache pages and descriptive data blocks, there is still a little bit of memory left. This little bit of memory will not be able to hold any of the cache pages.

How to reduce memory fragmentation?

When the database divides the cache pages in the Buffer Pool, all the cache pages and description blocks are placed close together to minimize memory waste and reduce memory fragmentation as much as possible.

If the cached pages in your Buffer Poo are east and west, it is inevitable that there will be a lot of memory gaps between the cached pages, which will lead to a large amount of memory fragmentation.

What is the use of free lists when reading data pages from disk into Buffer pools

How is the Buffer Pool initialized when the database is started?

A Buffer Pool contains multiple cache pages, and each cache page has a corresponding cache page description.

  1. Once the database is started, it will slightly increase the size of the specified Buffer Pool and apply for a memory area from the operating system as the memory space of the Buffer Pool.

  2. When the memory area is allocated, the database will divide the cache pages and their corresponding description data into the Buffer Pool according to the default size of 16KB cache pages and the corresponding description data size of about 800 bytes

    At this time, the cache pages are empty. Only after the database is running and the operation of adding, deleting, modifying, and querying the database is performed, will the corresponding pages of the data be read from the disk and put into the cache page of the Buffer Pool.

How do I know which cached pages are free

The database will continuously obtain data pages from disk files and put them into the cache page of the Buffer Pool to cache the data, and the subsequent add, delete, change and check operations will be completed in the cache page.

Therefore, reading data pages from disk into cache pages inevitably involves the question of which cache pages are free.

By default, there is a one-to-one correspondence between the data page and cache page sizes, both 16KB, one data page for each cache page.

The database designs a free linked list for the Buffer Pool. It is a two-way linked list structure. In the free linked list, each node is the address of a free Buffer page describing the data block.

  • As long as the cache pages are free, their corresponding description data blocks are added to the free list, and each node is bidirectionally linked to its front and back nodes to form a bidirectionally linked list.
  • In addition, the free list has a base node, which refers to the first and last nodes of the list. It also stores how many nodes in the list describe data blocks, i.e. how many free cache pages there are.

How much space does the free list take up

The free linked list itself is composed of description data blocks in the Buffer Pool. It can be considered that each description data block has two Pointers, one is Free Pre and the other is free Next, respectively pointing to the node of the previous free list and the node of the next free list.

The Buffer Pool contains two Pointers, free_pre and freenext, to create a free linked list of all description blocks.

For the free list, there is only one base node that does not belong to the Buffer Pool. It is a node with a size of 40 bytes, which stores the address of the head node of the free list, the address of the last node, and how many nodes there are in the free list.

How do I read a data page from disk into a cache page in a Buffer Pool

  1. First read a description data block from the free linked list (see the listing information sent by the landlord), and then get the free cache page corresponding to this description data block (find the house address).
  2. Read the data page of the disk into the corresponding cache page (move things in and out), and write some description data into the description block of the cache page (I live in, this is my POTS and pans).
  3. Remove the description data block from the Free list (the landlord has removed the listing and the house has been rented)

How do I know if a data page is cached

If it is not cached, we will find a free cached page from the free list, read the data page from disk and write the description to the cache page, and remove the description block from the free list.

But if the data page is already cached, it will be used directly.

Hash table

The database will also have a hash table data structure, which will use the table space number + data page number as a key, and the cache page address as a value. When you want to use a data page, use the “table space number + data page number” as the key to check the table. If there is no data page, read the data page, if there is already, the data page is cached.

Each time a page is read into the cache, a key-value pair is written into the hash table. The key is the tablespace number + the page number, and the value is the address of the cached page. So the next time you use the page, you can read it directly from the hash table, and the page is already in a cached page.

Table Spaces, data pages

We use tables and rows in SQL statements, but what is the relationship between table Spaces and data pages? In simple terms, one is a logical concept and the other is a physical concept.

Table, column and row are all logical concepts. We only know that there is a table in the database, how many fields there are in the table, and how many rows there are. But do you know how the data in these tables is stored on the disk of the database? We don’t care, so they’re all logical concepts.

Table space, data page, these things, is a physical concept, is actually at the physical level, your list of data in a table space, table space is composed of a pile of data files on disk, the data files are stored in your list of data, the data was organized by a a data page, these are the concept on the physical plane, That’s the difference between them.

Flush the list

Dirty page data updated in memory is to be flushed back to disk files.

However, it is not possible to flush all cached pages back to disk, because some cached pages may have been read into the Buffer Pool during query, and may not have been modified at all!

So the database introduced another flush list, similar to the Free list, which essentially uses two Pointers in the cached page’s description block to form a two-way list.

Any cached page that has been modified will have its description block added to the flush list. Flush means dirty pages that will be flushed to disk (asynchronously).

If there are not enough pages in the Buffer Pool, how to use the LRU algorithm to flush out part of the cache?

As you keep loading data pages from disk into free cache pages, will the free cache pages in the free list get smaller and smaller?

Because whenever you load a data page into a free cache page, there is one less free cache page in the free list.

So, if you keep loading data pages from disk into free pages and removing free pages from the free list, sooner or later you will find that there are no free pages left in the free list.

What if you want to load a data page into a free cache page?

If all the cached pages are filled with data, you can’t load new pages from disk to the cached page, so you have only one option: to flush out some of the cached pages.

Flush the cached page: Flush the modified data from a cached page back to the data page in the disk, and then empty the current cached page to become a free cached page.

New data pages that need to be loaded from disk are then loaded into the idle cache page.

Cache hit ratio

Suppose you now have two cached pages, and the data on one cached page is frequently modified and queried, for example, 30 times out of 100 requests, the data on that cached page is queried and modified. So now we can say that in this case, the cache hit ratio is high why? Because the cache can be operated on 30 times out of 100 requests without the need to load data from disk, this cache hit ratio is high.

Another cache data in a page, is that just from the disk loaded into the cache pages later, modified and query 1 times, 100 times after the request is not a page is to modify and query the cache of the data, so when we say cache hit ratio is a bit low, because most of the request may also need to go disk data query, they want operation data is not in the cache.

So for the above two cached pages, suppose you have a choice to flush the data of the cached pages to disk to free up a free cached page, which one would you choose?

Of course, select the second cache page to flush to disk

LRU

Use LRU lists to determine which cached pages are Least Recently Used.

  1. If we load a data page from disk to a cache page, we put the description of the cache page in the head of the LRU list, then any cached page with data will be in the LRU list, and the most recently loaded cache page will be in the head of the LRU list.

  2. Then assume that the description data block of a cached page was originally at the end of the LRU list. As long as you query or modify the data of the cached page, you also need to move the cached page to the head of the LRU list. That is to say, the most recently accessed cached page must be at the head of the LRU list

So, when none of your cache pages are free, do you need to find the least recently accessed cache page to flush to disk?

At this point, you will find a cache page directly at the end of the LRU list, it must be the least recently accessed cache page! Then you swipe the cache page at the end of the LRU list into disk and load the disk data page you need into the free cache page.

What problems might a simple LRU list cause when a Buffer Pool is actually running?

Problems with the prefetch mechanism

The first problem is MySQL’s prefetch mechanism, which means that when you load a data page from disk, it may load other data pages adjacent to the data page into the cache.

Suppose there are two free cache pages, and when a data page is loaded, a neighboring data page is also loaded into the cache, exactly one free cache page per data page! But then, only one of the cached pages is actually accessed, and the other cached page loaded through the prefetch mechanism is not actually accessed, and both of these pages are at the front of the LRU list.

Which conditions trigger the MySQL prefetch mechanism
  1. One parameter is innodb_read ahead_threshold. The default value is 56, which means that if multiple pages in an area are accessed sequentially and the number of pages accessed exceeds this threshold, the prefetch mechanism will be triggered and all pages in the next adjacent area will be loaded into the cache

  2. If the Buffer Pool has 13 consecutive pages in an area, and these pages are frequently accessed, the prefetch mechanism will be triggered and the rest of the area will be loaded into the cache

    This mechanism is controlled by the innoDB Random Read Ahead parameter, which defaults to OFF, i.e. this rule is turned OFF.

So by default, the first rule of mainly lectures may trigger mechanism, suddenly put a lot of the data in the adjacent zone page loaded into the cache, the cache pages if all of a sudden all on the front of the LRU list, and they actually is not a person can access, will lead to some already in the cache is accessed frequently cached page at the end of the LRU list.

Once some cache pages are flushed to disk to free up some cache pages, as described above, some frequently accessed cache pages at the end of the LRU list are flushed to disk and flushed! This is totally unreasonable!

Problem caused by full table scan

Full table scan means an SQL statement similar to the following: SELECT*FROM USERS

Instead of adding any where conditions, the Buffer Pool loads all the data pages from the table from disk at once.

At this point, it is possible to load all the data pages of the table into each cache page at once!

At this time may be LRU linked list in front of a large string of cache pages, are full table scan loaded in the cache page!

What if after this full table scan, almost no data is used in this table?

At this point, the end of the LRU list may be all those cache pages that have been frequently accessed before!

Then when you want to flush out some cache pages to make room, you will flush out the frequently accessed cache pages at the end of the LRU list, leaving a large number of infrequently accessed cache pages loaded in from the previous full table scan!

summary

If use simple LRU list mechanism, actually is flawed, because may be proofread the mechanism, or a full table scan mechanism, will suddenly bring a lot of the future may not be how to access data is loaded into the cache pages, and then the front of the LRU list all is how the future may not be accessed cached page!

The actual cache pages that have been frequently accessed may now be at the end of the LRU list!

If at this point, some cache pages need to be flushed to disk to make room for new data pages, then the only cache pages that are frequently accessed at the end of the LRU list can be flushed to disk!

Why MySQL design prefetch mechanism

When loading a data page into the cache, why load adjacent data pages into the cache? What’s the point of doing that? What kind of scene is it for?

To improve performance.

If you read from page 01 into the cache, then it is possible to read from page 01 into the cache in sequence and then from page 02 into the cache, is it possible to initiate disk I/O again when reading from page 02?

In order to optimize performance, MySQL has designed a prefetch mechanism, which means that if you read multiple data pages in sequence in a region, such as data page 01 to data page 56, MySQL will determine that you may continue to read the following data pages in sequence.

In this case, the Buffer Pool will read a large number of subsequent data pages (such as data page 57~ data page 72) in advance. Then, when you read data page 60 again, you can directly fetch data from the Buffer Pool.

Of course, the ideal is that, very plump, but the reality may be very skinny. If you preread a lot of data pages that occupy the first part of the LRU list, it is possible that these preread data pages will not be used in the future at all, then you are ruining the preread mechanism.

LRU algorithm was optimized based on cold and hot data separation scheme

In order to solve the problem of simple LRU linked list, the real MySQL in the design of LRU linked list, is actually the idea of hot and cold data separation.

To put it bluntly, the previous series of problems were caused by all the cached pages being mixed in an LRU list.

So a real LRU list is split into two parts, one is hot data, and the other is cold data. The ratio of hot data to hot data is controlled by innodb Old Blocks PCT parameter. The default value is 37, which means that cold data accounts for 37%.

When the data page is first loaded into the cache page, the cache page is placed in the head of the LRU list in the cold data area.

MySQL sets a rule for innodb old blocks time. The default value is 1000 milliseconds.

A data page must be loaded into the cache page, and then after 1 second, you access the cache page, it will be moved to the head of the hot data list.

Because if you load a data page into the cache page, and then you visit the cache page for 1s, it means that you are likely to visit it frequently, and the time limit is 1s, so only after you visit the cache page, it will put the cache page into the hot data section of the linked list head. (Judgment of thermal data)

How to solve the problem of cold and hot separation LRU?

  1. Cached pages loaded in by prefetch and full table scan are placed in front of the cold data area of the LRU list

  2. Will it stay in the cold data area?

    If it is just a full table scan query, then you must load a bunch of cached pages within 1s and then access those cached pages, usually within 1s. Based on the current mechanism, you can be sure that those cached pages will not be moved from the cold data area to the hot data area!

    Unless your cached page in the cold data area is accessed by someone after 1 second, then they will determine that the cached page is likely to be accessed more frequently in the future and move to the head of the hot data list!

  3. What if there are not enough cached pages (there are no free cached pages)?

    Find the cache page directly at the end of the cold data area in the LRU list. They must have been loaded in before and have not been accessed for 1s after loading (confirm you are cold data). At this point, the cache page at the end of the cold data area is directly eliminated and flushed to disk

To sum up:

In such a cache page is divided into hot and cold data loading scheme and cold data into hot data time limit scheme, as well as the priority to eliminate the cold data area when the cache page, based on this scheme, we will find that the problems found before, perfect is solved.

Because most of the data pages loaded by the read-ahead and full table scan mechanisms are accessed within a second and may never be accessed again, these cached pages are basically left in the cold data area. Frequently accessed cached pages are then left in the hot data area.

When you want to eliminate the cache, it makes sense to choose the cached pages at the end of the cold data area first! It does not allow the cache page that has just been loaded in to occupy the head of the LRU list. Frequently accessed cache pages are at the end of the LRU list.

Cold data and hot data should be separated to prevent cold data from affecting hot data access.

Application of hot and cold isolation in cache design

What might be the problem if you put a lot of your business system’s cache data in Redis that contains both hot and cold data?

Could you consider using hot and cold isolation in your cache design to optimize your refactoring?

A common scenario is the commodity cache data in the e-commerce system. Suppose you have 100 million commodities, and as long as you find that the commodities are not in the cache when querying the commodities, you will put them in the cache. If you do this, a large number of commodities that are not often accessed will be put in the Redis cache! Frequently accessed goods are actually hot data, and infrequently accessed goods are actually cold data. We should try to make Redis put frequently accessed hot data, rather than a large number of cold data. Because you put a lot of infrequently accessed items in Redis, it takes up a lot of memory and doesn’t get access to them much later! (It is not reasonable for cold data to occupy memory, since memory is a finite resource and should be allocated to hot data as much as possible.)

So we in the design of the cache mechanism, often will consider thermal data cache preload is to say, every day the statistics which items are most times, and then in the evening, system start a regular job, the hot commodity data (must be updated in real time data items), preloaded into the Redis; The next day’s access to hot items will naturally go through the Redis cache first. (Cache preheating)

In fact, Redis LRU memory elimination strategy is a kind of hot and cold data processing method, the least recently accessed data, it will be eliminated.

How does MySQL optimize the performance of LRU lists to the utmost?

How is the thermal data region of the LRU linked list optimized?

In the hot data area, if you access a cached page, do you immediately move it to the head of the list in the HOT data area of the LRU?

But the cached pages in the hot data area are probably frequently accessed, so is it not performing well to move them around so often? Why would that be necessary?

So the access rules for the hot data area of the LRU list have been optimized to move you to the head of the list only when pages in the last three quarters of the hot data area are accessed. If the first quarter of the cache page in the hot data area is accessed, it is not moved to the head of the list.

How do you understand that?

If there are 100 cached pages in the hot data section of the list, the first 25 cached pages will not move to the head of the list, even if they are accessed. But for the next 75 cached pages, they move to the head of the list whenever they are accessed.

This way, you can reduce the number of nodes in the list as much as possible.

For cached pages at the end of the LRU list, how do you eliminate them from being flushed to disk?

A review of cache pages and a few linked lists

When the Buffer Pool is used, it is actually frequently loaded from disk into the cache page, and then the free list, Flush list, and Iru list are all used simultaneously:

  • For example, when data is loaded into a cached page, the free list removes the cached page, and the head of the cold data section of the Iru list is placed in the cached page.
  • Then, if you modify a cached page, the dirty page is recorded in the Flush list and the Iru list may move you from the cold data area to the head of the hot data area.
  • If you are querying a cached page, the page will be moved in the Iru linked list (from the cold data area) to the hot data area, or in the hot data area may be moved to the header.

In short, MySQL performs add, delete, change, query, first is a large number of operations cache pages and the corresponding several linked lists. Then, when the cache pages are full, you have to flush some of the cache pages to disk, clear them, and load the data pages into the cache pages.

Periodically flush part of the cache page at the end of the LRU to disk

Several cache pages at the end of the LRU’s cold data area are not picked up and flushed to disk when the cache pages are full

Instead, there is a background thread that runs a timed task that periodically flushers some cached pages at the end of the cold data section of the LRU list to disk, empties them, and adds them back to the free list. (Removed from Flush list and removed from LRU list)

Periodically flush some cached pages from the Flush list to disk

It’s not enough to just flush the cache pages of the cold data area of the LRU list to disk.

Because many of the cached pages in the hot data area of the Iru list may also be frequently modified, will they never be flushed to disk?

The background thread will also flush all the cached pages in the Flush list to disk when MySQL is not busy, so that the data you have modified will be flushed to disk sooner or later.

Whenever a wave of cached flush pages is flushed to disk, they are removed from flush and Iru and added to the free list!

It can be interpreted as follows:

  • We load data into the cache, query and modify the cache, and then the free list cache decreases, flush list cache increases, and Iru list cache increases and moves.

  • On the other hand, the background thread keeps flushing the cold data area of the Iru and the flush list to disk to clear the cache. Then the flush and Iru lists decrease and the free lists increase.

What if there are no free pages

Maybe all the free lists are used, and then flush lists have a bunch of cached pages that have been modified, and Iru lists have a bunch of cached pages.

  1. If you want to load a data page from disk into a free cache page, you will find a cache page at the end of the cold data section of the LRU list, which must be the least used cache page!

  2. Then swipe it to disk and clean it, and load the data page into the free cache page!

conclusion

This is how MySQL’s Buffer Pool works. We thoroughly studied the loading and use of cached pages, the use of free lists, Flush lists, and Iru lists, including how cached pages are flushed to disk to make free cached pages, and what to do with cached pages when they are not free.

Production experience

Optimize the concurrency performance of the database with multiple Buffer pools

  • When multiple threads concurrently access a Buffer Pool, they must lock it, then have one thread perform a sequence of operations, such as loading data pages to the cache page, updating the free list, updating the Iru list, releasing the lock, and then the next thread performs a sequence of operations
  • The lock operation has an impact on the performance of the database
  • MySQL can be configured with multiple Buffer pools to optimize its concurrency.
    • In general, MySQL’s default rule is that if you allocate less than 1GB of memory to a Buffer Pool, you will be given at most one Buffer Pool. However, if the machine has a large amount of memory, it is necessary to allocate a large amount of memory to the Buffer Pool, such as 8 GB of memory. Therefore, multiple Buffer pools can be set at the same time, such as the following MySQL server configuration. 【server】 Innodb buffer pool size=8589934592 Innodb buffer pool instances=4 We then set it to have four Buffer pools, each of which is 2GB
    • In this case, each Buffer Pool manages a portion of the cache pages and description blocks and has its own separate LRU, Flush, and free linked lists
    • Multithreaded access to the Buffer Pool reduces stress

Databases are supported through ChunksDynamic adjustment of the Buffer Pool during runtime

What is the chunk

MySQL has designed a chunk mechanism:

The size of the buffer pool is controlled by the innoDB buffer_pool_chunk_size parameter. The default value is 128MB.

Therefore, we can actually make an assumption. For example, if we set a buffer pool with a total size of 8GB, and then there are four buffer pools, then each buffer pool is 2GB, and each buffer pool consists of a series of 128MB chunks. This means that each buffer pool has 16 chunks.

Then each chunk in each buffer pool contains a series of descriptive data blocks and cache pages. Multiple chunks in each buffer pool share a set of linked lists such as free, Flush, and Iru.

The total size of the buffer pool is 8GB. To dynamically add the buffer pool to 16GB, you only need to apply for a series of 128MB chunks, as long as each chunk has contiguous 128MB memory. Then allocate the chunk to the buffer pool.

With this chunk mechanism, there is no need to request an additional 16GB of contiguous memory space and then copy the existing data.

The resources

From scratch to become a MySQL combat optimization master