preface

We have already seen that a buffer pool is a free list of free cached pages and a Flush list of dirty pages waiting to be flushed. When we load data pages from disk into the free cache pages of the buffer pool, the free list removes a block of description data. With the constant increase and deletion, the number of free cached pages must be less and less. When a moment, the database finds that there are no free cached pages in the free list, what will the database do?

1. Eliminate cached pages

If there are no idle pages in the buffer pool, the buffer pool will flush out some of the cached pages.

The so-called elimination of cached pages is to select cached pages that have been modified, flush them to disk, and then empty the cached pages so that they become idle again.

Which cache pages should be selected to flush to disk?

2. Cache hit ratio

If there are two cached pages, and the data on one cached page is frequently modified and queried, for example, 30 times out of 100 requests, the cache hit ratio is high.

Another cached page that is only modified or queried once after being loaded from disk, and not queried or modified once in 100 requests, demonstrates a low cache hit ratio.

Of the two cache pages mentioned above, choose one to flush to disk, and the second cache page is undoubtedly the best choice.

3. Determine whether the cache page is commonly used -LRU linked list

First, whenever a data page is loaded from disk to a cache page, the cache page’s description data block is placed at the head of the LRU list. Then, whenever the cache page has data, it will be in the LRU list, and the cache page with the most recently loaded data will be placed at the head of the LRU list.

Assume that the description block of a cached page is at the end of the LRU list. Subsequent queries or modifications to this cached page will move the page’s description block to the head of the LRU list. That is, the most recently accessed page must be at the head of the LRU list.

At this point, if no cached pages are free, the cache pages at the end of the LRU list are flushed to disk to free up the cached pages.

4. Hidden dangers of simple LRU linked lists

MySQL has a read-ahead mechanism. When loading a data page to a cache page, the data page near the target data page is loaded in advance. This can lead to a problem: the data page that has just been loaded must be placed at the head of the LRU list, even if the cached page is not accessed.

When this happens, the cache pages will be flushed, so the next two cache pages will be flushed to disk first, and the cache pages that are not accessed will remain in disk, which becomes very unreasonable. If a full table scan is performed, there will be more pages loaded in the cache that no one accesses, which will cause more problems.

MySQL triggers the prefetch mechanism in two ways:

  1. One parameter is innodb_read_ahead_threshold. The default value is 56, which means that if multiple pages in an area are accessed sequentially and the number of pages accessed exceeds this threshold, the prefetch mechanism will be triggered and all cached pages in the next adjacent area will be loaded into the cache.
  2. If the buffer pool contains 13 consecutive pages in an area and these pages are accessed frequently, the prefetch mechanism will be triggered and all pages in the area will be loaded into the cache.

The second prefetch mechanism is turned off by default.

5. Optimized LRU algorithm based on cold and hot data separation

Although MySQL places all loaded cached pages in LRU linked lists, MySQL takes the idea of separating hot and cold data.

The LRU list is split into two parts, one for hot data and one for cold data. This ratio is controlled by the innodb_old_blocks_pct parameter. The default value is 37, that is, 37% of cold data.

1) Where is the data page placed in the LRU list when it is first loaded?

I believe many partners have the answer to this question. Put it in the head of the cold data area.

2) When will the cache page of the cold data area be put into the hot data area?

I guess a lot of you think that if you access the cache page in the cold data area you put the cache in the hot data area as well.

That doesn’t make sense. If the cached page is accessed within a second, it is placed at the head of the hot data area, and it may not be accessed again, which is not desirable either.

MySQL sets a rule in the innodb_old_blocks_time parameter, the default value is 1000, which is 1000 milliseconds.

This means that the data page will only be placed at the head of the hot data section of the LRU list if it is loaded into the cache and accessed again after 1s.

Why one second?

Because in through pre-reading mechanism and a full table scan loading data page is usually 1 seconds to load a lot then visit to them, these are 1 seconds, their data can be stored in cold area waiting for the brush set to empty, basically not too will have the opportunity to put hot data area, unless others visit after 1 second, may be someone explain next visit, In the head of the thermal data area.

3) Eliminate cached pages

When there are not enough cached pages in the buffer pool, the cache pages at the end of the cold data area are the best choice to flush out the cache pages.

4) Regional optimization of thermal data

Cached pages in the hot data area are frequently accessed, and each cached page is placed at the head of the LRU list, which is not performance friendly.

Therefore, the access rules for the hot data area are optimized so that the cached pages are placed in the head only when the bottom 3/4 of the hot data area is accessed. In other words, the first 1/4 of the cached pages are not moved even if they are accessed.

6. Brush plate timing

Does the cache page flush for the LRU list need to wait until there are no idle cache pages?

Cache page flush several times:

  • The timed thread performs the timed task, and every once in a while, flusher some cached pages at the end of the cold data area to disk, clears the cached pages, removes them from the LRU list, and adds them to the free list.
  • Flush lists store cached pages that have been modified and are flushed randomly by threads, removed from flush and LRU lists, and added to free lists.
  • If there are no idle pages in the buffer pool, flush the pages at the end of the cold data area to clear them for the pages to be loaded.

Therefore, when doing CRUD, a dynamic running effect is formed.

As data pages are loaded, the number of cached pages in the free list decreases, the number of cached pages in the Flush list increases, and the number of cached pages in the LRU list is constantly added and moved.

On the other hand, background threads constantly flush the cold data area of the LRU list to disk to flush the cache pages. Lru and Flush lists have fewer cached pages, and free lists have more cached pages.