Quote:

MySQL also has memory. What’s the difference between MySQL and Redis? Why do you choose Redis instead of MySQL? Indeed, the choice of middleware is to evaluate the business before the choice, that requires us to master a solid business evaluation, but also to do the corresponding analysis of the characteristics and performance of various middleware.

Later, I met an interviewer who talked to me for a long time about the selection of joint indexes, so I need to understand InnoDB well.

Before we proceed to the analysis of this chapter, we need to note a few data:


Mechanical hard disk: Sequential average read speed can be achieved 84.0 M B / s Sequential average write speed can be achieved 79.0 M B / s Random data blocks are 512 The average read speed is only 0.033 M B / s . The data block size is 4 K B , the reading speed is only 0.226 M B / s . Random in data blocks for 512 The average write speed is only 0.083 M B / s . The data block size is 4 K B , the average write speed is only 0.576 M B / s . The average sequential read speed is 84.0MB/s, the average sequential write speed is 79.0MB/s, the average read speed is only 0.033MB/s when the random data block is 512 bytes, and the average read speed is only 0.226MB/s when the data block is 4KB. \\ random average write speed is only 0.083MB/s when the data block is 512 bytes, and 0.576MB/s when the data block is 4KB. \ \ \ \

We can easily see that for a mechanical hard disk, the ratio of sequential read speed to random read speed is 2500 times, and the ratio of sequential write speed to random write speed is about 1000 times. This is obviously a reminder that when you have to do disk writes, if you can merge as sequential writes as possible merge as sequential writes. This is exactly what InnoDB is trying to do. Key InnoDB features such as change Buffer, adaptive hash indexing, asynchronous IO, and adjacency refresh are all trying to do this.

However, when we analyze, we generally ignore the time spent on reading and writing memory, because the reading and writing of memory is too fast. The following provides some comparisons to understand why the time spent on reading and writing memory is ignored.

Comparison of speed levels: DDR3 memory read and write speed is about 10 gigabytes per second (10,000 megabytes) solid state drives are 300 million megabytes per second, which is one 30th of the memory speed mechanical hard drives are 100 million megabytes per second, which is one hundredth of the memory speed DDR4 memory read and write speed is about 50 gigabytes per second (50,000 megabytes) solid state drives are about 300 million megabytes per second, It's 1/200th of memory and the speed of a mechanical hard drive is 100 megabits per second, 1/500th of memoryCopy the code

1. InnoDB memory model

As described in the introduction, InnoDB’s memory model is something we can’t ignore when we need to compare with others.

The difference between data page and index page is only its content, one is data, one is index.

After looking at its memory model, one must think about what to do with full memory. This is the memory replacement strategy. InnoDB uses THE LRU method to replace memory, putting the most frequently used memory at the front of the queue and the obsolete memory at the bottom of the queue. Innodb ADAPTS to the traditional LRU algorithm. Let’s think about what the problem is:

According to the traditional LRU algorithm, each time the latest access should be placed at the front of the queue. According to the principle of locality, this is perfectly fine. But given the database, it’s easy to have a situation where the page is accessed only once, and then not accessed again, and then wasted. Because the LRU list is limited, it is generally full load, when you enter one, it means that one will go out, if you replace the frequently accessed one, it will cause speed loss.

But it also raises the question, so why is the operating system designed that way, and doesn’t the database satisfy the principle of locality? Not really, because the database also involves a large number of scans (such as full table scans, index scans), which load all the data into memory, but actually use very little data.

As a result, InnoDB’s designers did not know whether to refer to the garbage generation in the JVM, but eventually designed a generational memory management approach as well. The modified LRU algorithm is as follows:

This new LRU algorithm adds a MIDpoint position, set by the user, which defaults to the 5/8 position from tail to head. When a new page is inserted, it is inserted to the MIDpoint location, the list before the MIDpoint location becomes the old list, and the list after the MIDpoint location becomes the New list. After innodb_olD_blocks_time is exceeded, the next access to a newly inserted page is placed at the head of the entire LRU list, a similar chronological migration. If a full table scan occurs, only the old list before MIDpoint will be replaced, and the new list can continue to be used. By adding innodb_old_blocks_time, you can protect the new list, or hot data, more effectively.

The memory management of the LRU that we’ve been talking about for so long is essentially going back to the speed data, because if memory is not properly managed, it can lead to a lot of page misses, increasing disk access, and thus increasing our time too much.

2. The Checkpoint technology

Consider a few questions:

1.The database is down. How can I recover it? From the beginning?2.What happens when the buffer pool is insufficient? Flush dirty pages to disk, but how do you mark your current flush?3.How do I identify dirty pages when redo logs are not available?Copy the code

For InnoDB, Checkpoint technology is proposed to solve these problems.


for C h e c k p o i n t Previous pages need to be flushed to disk. Ensure that all pages before Checkpoint have been flushed to disk.

So, once you get to Checkpoint, there are some solutions to these problems.

1.Recovery starts at Checkpoint2.When dirty pages are refreshed, they are forced to Checkpoint.3.When the redo log is unavailable, it is forced to Checkpoint at least to the location of the current redo logCopy the code

By enforcing the guarantee of 2.3., condition (2) is always met, so we can start recovery at Checkpoint

The Checkpoint is implemented by using the Log Sequence Number (LSN), which is an eight-byte Number. On every page, redo log, and Checkpoint there is an LSN, which identifies whether it is always there and where the Checkpoint point is.

3.Change buffer

Again, how many questions should we think about before we introduce this concept?

  1. When a table is inserted into a database, we increment the secondary index by primary key.
  2. When updating the secondary index, because it is increasing by the primary key, it is basically impossible to follow the secondary index order, so the update secondary index must be discrete, how to do?

Similarly, in order to solve the discrete problem of auxiliary index update, the Change buffer technology is adopted to combine multiple discrete updates into a sequential update. We also know that there is a thousand-fold difference between sequential and random reads and writes to disk, so the savings can be significant with a large number of secondary indexes.

However, if the Change Buffer technology can be used, the following conditions should be met:


1. An index is a secondary index 2. So it’s not unique 1. An index is an auxiliary index. 2

The reason is very simple, the update of the unique index needs to determine whether it is unique first, so it needs to read the index page of the secondary index. There is no way to save efficiency, and the primary key index must be the unique index.

The implementation of Change buffer is as follows:

  1. For each non-unique secondary index to be updated, a change buffer node is created ([table_space,offset] is obtained through each update), and the nodes are sorted into the change buffer tree according to [table_space,offset]
  2. The change buffer tree is a B + tree. The non-leaf node only has [table_space,offset], while the leaf node only has [table_space,offset], and relevant information about the update operation
  3. When a secondary index page is read into memory, the associated node operations on the Change buffer can be merged into the page, so that sequential writes are written when flushed to disk

Also, an interesting point is that the Change Buffer not only has space in memory, it actually has space on disk to ensure that the secondary index is updated correctly in case of an error.

4. Double Write

This technique is also for business compensation, consider the following scenarios:

What happens when you’re halfway through writing a page and the database crashes?

It’s easy to think: redo log is for this. Why not redo log again?

But there’s actually a problem:

  1. Redo logs record physical operations on pages, such as offset XXX, writing ‘TTT’ records.
  2. What about when the page itself is corrupted? For example, if we start with offset 50, and we don’t know what caused the confusion before 50, then it would be meaningless to write the page at this point

In fact, in a nutshell: we need to be able to restore an intact, uncorrupted page. Hence the double Write technology.

When updating dirty pages, the process is as follows:

1.Copy dirty pages indouble write buffer
2.willdoubleThe dirty pages in the write Buffer are sequentially written to the disk shared table space3.Flush dirty pages to disk (where they should be)Copy the code

Compared with a single write, it can be seen that the overhead is more than 2. The dirty pages in the double Write buffer are sequentially written to the disk shared table space. However, because this step is sequential, the actual consumption of time is not much, compared with the guarantee of data integrity and security, it is worth.

5. Adaptive hash index, asynchronous I/O, and adjacency page refresh

Since hash indexes are obviously a faster way to search than B + trees, InnoDB has also created an adaptive hash index, which is created by the system and stored in memory.

The conditions for existence are:

1.Use the same access mode continuously, for example where a=' '  , where a=' ' and b=' '
2.Usage patterns exceed a threshold, generally yes100
3.The page is accessed N times in this mode, where N= records on the page /16
Copy the code

To sum up, you can set an adaptive hash index for a page that exceeds a certain threshold when accessed through the same pattern.

You don’t need to worry about the speed of the adaptive hash index creation, because by the time the condition of the adaptive hash index creation is reached, the page will be in the buffer and can be created directly.

The advantage of asynchronous I/O is to merge multiple I/OS into one IO, especially if the pages read are contiguous.

Refresh adjacency is a feature that can be disabled when a page is refreshed. If the adjacent page is also dirty, it is also updated. However, if the page is dirty soon, it will cause repeated refresh.

6. Summary

This blog summarizes the memory model of InnoDB, and analyzes several technologies used by InnoDB, such as Change buffer, double write and other problems solved, and I always think that when analyzing database implementation technology, we must grasp the point is to Change the random read and write disk to sequential read and write.