Hello, THIS is Yes.

Is there a relationship between LRU and Innodb Buffer Pool?

There is.

In fact, I wrote about this in a previous post, back in March of this year, but I mentioned Innodb Buffer Pool in passing while writing about Kafka’s hot and cold partitions.

Today, let’s take a closer look at the connection between the two, or quite inspiring.

Buffer Pool

Buffer Pool Buffer Pool Buffered is the data in the database.

The data of the database is stored in the disk, but the access speed of the hard disk is too slow. Therefore, the internal data of the hard disk needs to be loaded into the memory. In this way, the database reads data directly from the memory, reducing the DISK I/O, and the speed is fast.

This can be done transparently with the page cache of the operating system, but it is not convenient for the database itself to manage the data, because there are many other applications on the operating system that use the Page cache.

MySQL creates its own Buffer Pool to manage data.

LRU

However, the Buffer Pool is limited because memory is limited. Generally speaking, memory is not larger than hard disk, right?

So trying to load all the data on the hard drive into memory is not practical. When the requested data is not in memory, have to go to the hard disk, and this time the query speed will be slow, users in the use of the direct response is: why this broken website so card?

Therefore, we want to avoid this as much as possible, which is to improve the cache hit ratio.

How to improve?

When the data stored in the memory is full, the data frequently accessed by users is reserved, and some data that is not frequently accessed is eliminated to make room for newly accessed data.

Wouldn’t that improve the cache hit ratio?

Hot and cold data has such characteristics. Similar to the hot search on weibo, the hotter the data, the greater the number of visits, and the less popular the data, the less people visit it.

LRU is the Least Recently Used LRU. According to this algorithm, the least recently used data can be selected to eliminate ~

The fifth floor

If you answer to the above level, it is not enough to meet the expectations of the interviewer.

Let’s think about the problem with a normal LRU implementation in this scenario of buffer management.

The following figure shows what happens after data 6 and 3 are accessed successively.

As you can see, the accessed data will be moved to the header, and if memory is insufficient, the tail data will be discarded.

What are the problems with this implementation in a Buffer Pool?

The first thing you need to understand is the principle of locality.

  • Time locality: If a piece of data is accessed now, it may be accessed multiple times in the near future.
  • Spatial locality: If a piece of data is accessed, the data stored near it is likely to be accessed immediately.

The hardware, operating system, and application programs have been implemented according to the principle of locality. For example, disks have preread functions to reduce disk I/O.

The Buffer Pool also implements the prefetch function. Innodb preloads pages into the Buffer Pool when sequential access data pages reach a certain number or when a certain extent(the logical partition where pages are managed) has many pages loaded.

Prefetch is a good thing, but if you use a naive LRU to implement data obsolescence, it’s a bit of a problem.

Because the preread data will also be moved to the head, so that the original hot data of the head will be further behind, facing the crisis of being eliminated. If the preread data is useful, that’s ok. If not, this wave is good doing bad.

So what?

Hot and cold partitioning, also known as old and New generation

Innodb divides the buffer pool into new generation and old generation. By default, 63% of the head is Cenozoic and 37% of the tail is old.

When loading data from disk into Buffer Pool for the first time, the data is placed in the head of the older generation, not the new generation, so that hot data in front of the Buffer Pool will not be overwritten even with prefetch.

And then the next time you access that data, you move it from the old age to the head of the new generation.

It’s like it’s perfect, right? Let’s look at another case full table scan.

A full table scan is one of the queries we need to avoid in daily development, but sometimes a full table scan is required, or inadvertently used incorrectly.

At this time, a lot of cold data will be loaded into the Buffer Pool and placed in the old age, followed by a certain wave of processing of the full table scan data, so that the data will be accessed again, will be put into the head of the new generation, which will massively eliminate the hot zone data.

A full table scan, replacing a lot of hot data, reducing the cache hit ratio, this wave is a bit hurt.

So what?

Add time.

Since most of the data in a full table scan is accessed immediately after and then useless, Innodb sets a time window, default is 1s.

That is, only when the time interval between the data being accessed again in the old era and the time interval before being accessed exceeds 1s will the data be promoted to the new generation, otherwise it is still in the old era, so as not to pollute the thermal data of the new generation.

This is a bit of a show.

Therefore, Innodb improves the mechanism of LRU to eliminate cached pages based on partition and time window to improve the hit ratio of cache and query efficiency.

So, if the interviewer asks you, what is the connection between * LRU and Innodb Buffer Pool? * Just answer with my above sentence.

Then wait for him to ask further, and then explain the reason to him, this wave is OK.

If the interviewer asks: Anything else?

The following answer can be used: Yes.

If you follow the normal LRU implementation, the new generation of page visits will frequently move data to the head, this movement is expensive, and is largely unnecessary, you think is all the hot data moving around, is it useless?

So the new generation can be divided into sections, the first quarter of the new generation of data access will not be moved to the head, the back three quarters of the data range will be moved to the head (this content reference from the root to understand MySQL “, I have not seen the source does not know whether Innodb is really implemented in this way).

This wave of responses is about level five.

However, there are many LRU variants, such as TLRU related to CDN, PLRU related to CPU cache, etc., interested students to query by themselves, here will not do expansion.

The last

Innodb Buffer Pool Innodb Buffer Pool Innodb Buffer Pool Innodb Buffer Pool

Innodb_old_blocks_pct = innodb_old_blocks_pct = innodb_old_blocks_pct

And I’ve moved the heads and tails in data units above, just to make it easier to understand. Buffer pools manage data on a per-page basis. I’ll talk about Buffer pools in the next article.

I’m yes, from a little bit to a billion bits and we’ll see you next.