preface

MySQL is the mainstream database of the Internet, and InnDB is the preferred storage engine of the core application table. This paper focuses on the InnnoDB storage engine architecture and characteristics.

1 InnoDB architecture

As shown in the figure below, InnoDB has multiple memory blocks that are responsible for the following tasks:

  1. Maintains multiple internal data structures that all processes/threads need to access.
  2. Cache the data on the disk for quick reading and cache it before making changes to the data on the disk file.
  3. Redo log buffering.

    The background thread is responsible for refreshing the data in the memory pool to ensure that the memory in the buffer pool is caching the most recent data. It is also responsible for refreshing modified data files to disk files.

1.1 Background Threads

InnoDB has 7 background threads: 4 IO threads (insert buff thread, log thread, read thread, write thread), 1 master thread, 1 lock monitor thread, and 1 error monitor thread.

1.2 memory

InnoDB memory consists of the following three parts:

  1. Buffer pool
  2. Redo log buffer
  3. Additional Memory pools

• Read: Database files are read into the buffer pool per page (16K per page), and then cached data is retained in the buffer pool according to the LRU algorithm. • Write: Each time a database file is written, the page in the buffer pool is first modified (the page is a dirty page after modification), and then the dirty page in the buffer pool is flushed to the file at a certain frequency.

You can run the SHOW ENGINE INNODB STATUS command to check the BUFFER POOL usage AND see the BUFFER POOL AND MEMORY usage in the result: Buffer pool size indicates the number of buffer frames. Each buffer frame is 16K. Database Pages indicates the buffer frames that have been used, and Modified DB Pages indicates the number of dirty pages.

Data page types cached in the buffer pool include index page, data page, undo page, Insert buffer, adaptive Hash Index, lock information stored in InnoDB, data dictionary information, etc.

The redo log buffer pool puts redo log information into this buffer and flusher it to redo log files on a regular basis. The default is to refresh once per second, so the memory value of this buffer is generally not set too large.

Additional memory pools are required from this region primarily to allocate memory to some data structures themselves.

2 master thread

2.1 Working Principles of master Threads

We now know that the InnoDB storage engine does most of its work in a single background thread, the Master Thread. The master thread has the highest priority. It consists of four loops: main loop, background loop, Flush thread, and suspend thread. The loop is called the main loop because it is where most of the operations are done, and there are two major operations: operations per second and operations per ten seconds. • Redo log buffers are flushed to disk redo log files, even if the transaction has not yet been committed. This also explains why commit times for even large transactions are fast. • The InnoDB engine determines the number of I/OS in a second. If the number is less than 5, it considers that the current I/O pressure is small and will merge and insert the buffer. • InnoDB determines whether the percentage of dirty pages in the current buffer pool exceeds the configuration file parameter (default 90, representing 90%) and writes up to 100 dirty pages to disk if this threshold is exceeded. • If the current user is not active, switch to background loop. • InnoDB determines whether the number of disk I/O operations in the past 10 seconds is less than 200. If so, InnoDB determines that it has sufficient DISK I/O capacity and flushes 100 dirty pages to disk. • Merge up to 5 insert buffers. • Flush the log buffer to disk. • Perform the full Purge operation to delete the unusable undo pages. In fact, when performing update and delete operations on a MySQL table, the original rows are marked as deleted, but the information about the version of these rows needs to be retained due to consistent read. During the full Purge process, InnoDB determines whether the deleted rows of the current transaction system can be deleted (for example, there may be a query operation that requires reading undo information from previous versions), and if so, InnoDB will immediately delete them. • Flush 100 dirty pages (more than 70% dirty pages) or 10 dirty pages (less than 70% dirty pages) to disk • Generate a checkpoint. Writes the pages of the oldest LSN to disk.

2.2 Potential problems with master Threads

• InnoDB flushes a maximum of 100 dirty pages to disk and merges 20 insert buffers, which may actually generate more than 100 dirty pages per second or more than 20 insert buffers. This rule greatly limits disk IO performance when solid-state drives are available. In addition, it may take a long time to recover data that has not been flushed back to the disk during server downtime. • InnoDB defaults to flush dirty pages to disk when the number of dirty pages exceeds 90% of the buffer pool capacity, but this ratio is too large and can easily cause a backlog of dirty pages, which puts pressure on the database server.

3 Key Features

3.1 Insert buffer: Insert buffer

Insert buffering is the most exciting key feature of InnoDB search. MySQL has clustered indexes and non-clustered indexes. Insert clustered indexes are usually sequential and do not require random reads from disk. However, for non-clustered and not unique indexes, due to the nature of B+ trees, the insertion of leaf nodes is no longer sequential. In this case, discrete access to non-clustered index pages is required, and the insertion performance degrades here. So insert buffers are introduced. Insert buffer: Inserts or updates to a non-clustered index, not directly into the index page each time, but to determine whether the inserted non-clustered index page is in the buffer pool. If yes, insert directly, if not, put an insert buffer first, and then perform the merge operation of the insert buffer and the child nodes of the non-clustered index page at a certain frequency. In this case, it is usually possible to merge multiple inserts into one operation, which greatly improves the performance of inserting and modifying the non-clustered index. But there are limitations to the insert buffer: • The index must be a secondary index. • Indexes cannot be unique.

3.2 Double Write: Double Write

If insert buffering brings performance to InnoDB’s storage engine, two writes bring reliability to data. A partial Page write occurs when a database goes down and only part of a page is being written (for example, 6K on a 16K page), resulting in data loss. You might be thinking, well, don’t panic, we have redo logs. That’s a good idea, but it’s important to remember that the redo log records physical operations on a page. If the page itself is corrupted, the data recorded in the redo log must also be corrupted, and there is no point in redoing it.

So a smart developer designs a copy of a page that, when a write failure occurs, is restored via a copy of the page and then redone via the redo log. This is called double Write.



As shown in the figure above, InnoDB will copy the double buffer of the dirty page, and then divide the double buffer into two times, each time 1MB to the physical disk of the shared table space (in sequence). The pages in the double buffer are then written to the data files in each tablespace (in this case, discrete writes). This is the double write process.

What happens when writing to disk crashes? As you can see, the shared table space has a copy of the page. You can copy it to the tablespace file and restore the data using the redo log.

3.3 Adaptive Hash Index: Adaptive Hash Index

The InnoDB storage engine monitors index lookups on tables and creates hash indexes if it sees an increase in speed, so it’s called adaptive. Adaptive hash indexes are fast through buffer pool B+ tree construction. And you don’t need to hash the entire table, just some pages.