Writing in the front

  1. whyInnodb bufferNeed to use LRU cache?
  2. Why not plainLRU strategyEliminate memory pages?
  3. The improved versionThe LRU cacheWhat problem does it solve and why?

What is a buffer pool

As we know, mysql (using innoDB storage engine) data is stored on disk. When the data needs to be queried or modified, the system will load the page where the data resides into an area called buffer pool in memory for operation

  • If a read operation is performed, the page is not immediately removed from memory after use, but is cached so that it can be used next time to avoid disk IO
  • If the page is modified so that it becomes a dirty page, alsoNot immediatelyThe dirty pages are flushed back to disk, instead by recordingredo logAnd background tasks periodically flush dirty pages back to disk, which ensures data persistence and improves I/O performance by combining random I/OS into a few sequential ONES

Flush cached pages with the LRU policy

The size of the buffer pool is limited. To check the size of the buffer pool, run the following command

SHOW VARIABLES LIKE ‘innodb_buffer_pool_size’;

+———————–+———-+

|Variable_name |Value |

+———————–+———-+

|innodb_buffer_pool_size|4294967296|

+———————–+———-+

The query result shows that the size of the msyql buffer pool is 4GB

As the program runs, if the number of pages that need to be cached exceeds the size of the buffer pool, some of the old pages need to be removed so that the pages that need to be used can fit in. Which pages to remove? To increase cache hit ratio and reduce disk I/O times, one strategy is as follows: The Least recently used page is eliminated. This page has been used Least in the past and is expected to be used Least in the future according to the time locality principle. Therefore, it is reasonable to eliminate this page, which is the naive LRU (Least recently used) strategy

So when you visit a page:

  • If the page is already inbuffer pool, it is removed from the linked list and added to the head of the LRU list
  • If the page is not presentbuffer pool, then load the page from disk, also add the head of LRU link, when the space is not enough, eliminate the page at the end of the linked list (if the page is dirty, it needs to brush back to disk)

When a page in the buffer pool remains unused, it moves to the end of the link and eventually becomes obsolete

The problem of naive LRU strategy

The above cache flushing strategy works fine under normal circumstances, but if the following two situations occur, the buffer pool will flush data that is not the “least recently used” data, which is the opposite of the original purpose of the LRU policy, thus reducing the cache hit ratio and affecting the overall program performance

  • Prefetch: According to the principle of spatial locality, mysql asynchronously loads the pages around the current page into the buffer pool if it thinks the pages around the current page are likely to be used, which can be divided into linear prefetch and random prefetch

    • Linear to proofread: If the number of pages in a region (a region has 64 pages) is more than a certain number, all pages in the next region are read into memory in advance. The number is determined by the system variableinnodb_read_ahead_thresholdControl, the default is 56
  • Random to proofread: When a buffer pool already has some pages in an area, other pages in the area are written in advancebuffer pool. Whether to enable this function depends on the system variableinnodb_random_read_aheadControl, disabled by default

This feature can reduce response time and improve application performance if the pre-read page is used quickly. However, if they are not used later, and some of the old pages that will be used later are eliminated due to the addition of these pages, it will have the opposite effect

  • A full table scan: loads all the pages of a table tobuffer pool, if there are many pages in the table, the originalbuffer poolAll pages in the. So when the program wants to use the previous page, it needs to load it from disk once

Generally speaking, the frequency triggered by the full table scan is very low, so the above approach is equivalent to caching the pages that will not be used in a short time, and expelling the pages that are likely to be used in the future out of memory, and the high efficiency approach is completely opposite, greatly affecting efficiency

Improved version LRU

Mysql then splits the LRU link into two parts:

  • The first section stores hot data that is accessed frequently and is called the Young area
  • The latter part stores cold data that is accessed less frequently, called the Old region

The proportion of each part is controlled by the system variable Innodb_old_blocks_pct. The default value is 37, that is, 37% of the cold data area

+———————+—–+

|Variable_name |Value|

+———————+—–+

|innodb_old_blocks_pct|37 |

+———————+—–+

When a page is accessed, it is placed in the old section header if the page is not in memory, i.e. accessed for the first time

In this way, if the preread page is not used later, it will be phased out, and the old pages that are frequently accessed will not be affected

So far, the young section has not been used, so when can pages from the old section be promoted to the young section? That is, under what conditions does the system consider a page to be hot

  • Let’s look at the first strategy: the second visit to the page leads to promotion

    • In addition to loading page visits, subsequent visits to the page promote it to the Young section. This strategy is perfect for a read-ahead scenario, but not for a full table scan scenario, where you want a full table scan to load pages intobuffer poolAfter that, all records on the page are accessed immediately. Each record accessed equals one visit to the page. If each record is small, at 100 bytes, a 16K page can hold about 160 records, requiring a total of 160 visits. Which means not only the second visit,A page may be accessed many times in a short period of time, but that page is not hot data in business terms
  • To do that? Mysql uses the second strategy: access interval promotion

    • Full table scanning has a feature that the access to each page is concentrated in a short period of time, so the policy is:

When the page is first accessed, an initial access time is recorded. If the subsequent access time of the page, minus the initial access time, is greater than a certain value, the page is added to the hot data section of the LRU linked list

The “some value”, i.e. the threshold of the time interval, is controlled by the system variable Innodb_old_blocks_time, which defaults to 1s (it usually takes less than 1s to access all records on a page in memory).

+———————-+—–+

|Variable_name |Value|

+———————-+—–+

|innodb_old_blocks_time|1000 |

+———————-+—–+

In other words, one second after a page is first loaded from disk, another request will access the page. Whether the page is entered into the buffer pool due to full table scan or not, the page is a frequently accessed page and is hot data. However, most of the pages entered into the buffer pool due to the full table scan will not be accessed again. They are cold data and will be gradually eliminated from memory

This strategy can also effectively prevent the elimination of pages that are really frequently used in the full table scan scenario

conclusion

  • becauseinnodb buffer poolSpace is limited, but there is no available space, you need to select a part of the page out, in order to load the page to be used
  • The naive LRU cache elimination strategy can eliminate the least recently used pages and improve the cache hit ratio in most scenarios, but it is not suitable for preread and full table scan scenarios
  • By dividing the LRU linked list into two parts, the page is placed in the cold data area for the first visit, and the page is placed in the hot data area for the visit after a certain interval, so that the real hot page can be guaranteed to live in memory in these two scenarios, and the cache hit ratio is improved