The importance of caching

We only need to access one record of a page, which also requires loading the entire page into memory first. After loading the entire page into memory, it is ready for read and write access. After reading and writing access, it is not in a hurry to free up the memory space corresponding to the page. Instead, it is cached so that future requests to access the page again can save disk IO overhead. This can greatly increase efficiency.

The InnoDB Buffer Pool

The concept of Buffer pools

InnoDB is designed to cache pages on disk. When MySQL server starts up, InnoDB requests a contiguous piece of memory from the operating system. They call this piece of memory a Buffer Pool. How big is it? If you have 512 GIGABytes of memory, you can allocate a few hundred gigabytes of memory to the Buffer Pool. If you don’t have the money, you can set a smaller Buffer Pool. By default, the size of the Buffer Pool is only 128 MB. Innodb_buffer_pool_size specifies the size of the Buffer Pool if you don’t like the size of 128MB. [server] Innodb_buffer_pool_size = 268435456 innodb_buffer_pool_size = 268435456 Note that the Buffer Pool cannot be too small. The minimum Buffer Pool is 5M(if smaller, it is automatically set to 5M).

Internal structure of the Buffer Pool

The default page size in the Buffer Pool is the same as the default page size on disk, which is 16KB. In order to better manage the cached pages in the Buffer Pool, InnoDB’s design uncle creates some so-called control information for each cached page. This control information includes the tablespace number of the page, page number, address of the cache page in the Buffer Pool, linked list node information, some lock information, and LSN information. The size of memory occupied by the control information corresponding to each cache page is the same, so we call the memory occupied by the control information corresponding to each page a control block. The control block and the cache page correspond one by one, and they are stored in the Buffer Pool, where the control block is stored in the front of the Buffer Pool. Cache pages are stored behind the Buffer Pool, so the entire Buffer Pool will look like this:

Each control block corresponds to a cache page. After allocating enough control blocks and cache pages, the remaining space may not be enough for the size of a pair of control blocks and cache pages. Therefore, the unused memory space is called fragmentation. Of course, if you set the size of the Buffer Pool just right, it may not cause fragmentation

Free list management

Since there is no way to determine which buffer pool memory is being used and which is free to use, we use a control block to manage the cache pages. The control block has a free list to manage the cache page usage.When a page is loaded from disk into a Buffer Pool, a free page is fetched from the free list and the corresponding control block is filled in with information (i.e. the tablespace in which the page is located, page number, etc.). Then remove the free linked list node corresponding to the cached page from the linked list, indicating that the cached page has been used ~

Caching page hash processing

How do we access data? If the data is loaded into the buffer pool, how do we know exactly if it has been loaded into the Beffer Pool? Well, we actually hash it by using the table space number + page number as the key. A cached page is just a value, which is addressed by a hash table and then used if it’s there, and if it’s not there, we load the data in from disk.

Management of flush linked lists

If beffer pool one page of data is modified, so she went and inconsistent data in disk, we gave this data a state, we say it is dirty, of course, the simplest method is a modified we immediately flushed to disk, maintain a high level of consistency, but frequent brush plate can affect performance, so in general, We are all to a certain point in time or a certain condition and then brush the plate. So how can we quickly locate the data that needs to be swiped, also known as dirty pages. As soon as the cache page was changed, we put it in a linked list for dirty pages, which we call Flush, and flushed it when needed.

Management of LRU linked lists

In fact, the Beffer pool is not infinite, and it will be full eventually, so how should we delete the data and let the new data load in? The answer must be that the frequency of recent use is low, and it must be eliminated first. The following is definitely the famous LRU elimination algorithm. It is also important to note that if we discard data, we must make sure that it is a dirty page. If so, we cannot discard data because it has not been synchronized to disk, which is considered data loss.