preface

In the previous article, MySQL architecture was introduced. Today, MySQL’s most mainstream storage engine, InnoDB, is introduced.

InnoDB storage structure

InnoDB logical storage units are divided into table Spaces, segments, extents and pages. The hierarchical relationship is shown as follows:

Table space

After MySQL5.7, tablespaces are divided into system tablespaces, independent tablespaces, universal tablespaces, undo tablespaces, and temporary tablespaces.

System table space

The system tablespace is named ibData1. When a new database is created, InnoDB storage engine initializes a tablespace file named IBData1. By default, this file stores all table data, such as invisible system tables. Data dictionaries, double-write buffers, update buffers, and undo logs are also stored. After MySQL 5.6, you can set the storage location of undo log by parameter, which can be stored independently of ibData1 file.

If the size of the system tablespace is insufficient, you can configure automatic expansion. Use the following parameters to specify the path, initial size, and automatic expansion policy of the system tablespace. The default automatic expansion size of the database is 64MB, although you can increase the size of the system table space by adding another data file.

innodb_data_file_path
Copy the code

The default ibdata1 size of MySQL is 12MB (MySQL 8). It is not recommended to use the default ibdata1 size. In case of high concurrent transactions, it is recommended to set the initial ibdata1 size to 1GB

Independent table space

In addition to system table Spaces, there are file-per-tables. Each table space contains the data and indexes of a single InnoDB table and is stored in a single data File on the File system.

MySQL > create tablespace table_name; create tablespace table_name;

What is the difference between system tablespaces and standalone tablespaces?

  • Truncation or deletion of a table created in a separate tablespace will free disk space. Truncation or deletion of a table created in a system tablespace will create free space in the system tablespace data file. This space can only be used to store InnoDB data.
  • You can import tables in a separate table space from another MySQL instance, while system table Spaces cannot
  • Check out more…



Generic table space

Similar to system tablespaces, generic tablespaces are shared tablespaces that can store data for multiple tables. You can divide tables based on activity and store them on different disks to reduce metadata storage overhead.

The undo tablespace

The Undo tablespace contains the Undo Log, which is a collection of records that contains information about how to undo the most recent changes made by a transaction to the clustered index record.

Undo logs are stored in system tablespaces by default, but can also be stored in one or more Undo tablespaces. Using undo tablespaces reduces the amount of undo log space in any one tablespace. Undo log’s I/O mode also makes undo space an ideal choice for SSD storage.

Temporary table space

The innodb_temp_data_file_path tablespace is named ibtmp1 and the default size is 12MB

Period of

Table space is composed of a period, also can interpret a table for a period, usually by the data segment, rollback segments, such as index segments, each segment by N area and on page 32 bits, period of space expansion was extended area, under normal circumstances, create an index will create two section at the same time, the leaf nodes and leaf nodes, respectively.

area

Extents consist of contiguous pages, which are physically contiguous allocated segments of space, each of which is fixed in size at 1MB

page

InnoDB’s minimum physical storage unit is a page, with data page rollback pages, etc. Typically, an area consists of 64 contiguous pages, the default page size is 16KB, and you can adjust the page size. So 64*16KB is equal to 1MB

line

InnoDB storage engine is column oriented, meaning data is stored in rows. The row format can be divided into four types: Compact, Dynamic, Redundant, compressed, MySQL5.7 uses dynamic row format by default. Compact is currently the most used, but the default is Dynamic. What’s the difference?

Row overflow simply means that the data that needs to be stored is split into multiple pages for storage outside the current storage page. For data stored in fields of large data types such as TEXT or BLOB, the actual data used by Dynamic is stored in the overflow page, and the data page only holds Pointers to the first 20 bytes. In compact row format, the overflow column stores only 768 prefix bytes. Dynamic row format is more efficient for new pages where the overflow column resides. Therefore, dynamic row format is recommended for current production environment.

Redundant is the earliest row recording format and is not recommended because it consumes more storage space than Compact.

Compressed is a compressed row format that compresses data and index pages. But only for the physical level of compression, in memory is not compressed, when the data call to memory involves conversion, there will be a lot of useless CPU consumption, but also low efficiency. Compression ratio is not high, about 1/2 of the ratio, compression will make the TPS of the database decline, very affect online business, not recommended.

InnoDB architecture

The InnoDB architecture consists of three parts:

  • Memory structure
  • thread
  • Disk file

Memory structure

The memory structure includes a buffer pool (innodb_buffer_pool), redo log buffer (redo log_buffer), and additional memory pool (innodb_additional_mem_pool_size).


innodb_buffer_pool

In database systems, there is a huge speed difference between CPU and disk speed. Disk-based database systems usually use buffer pool technology to improve overall database performance. A buffer pool is an area of memory that compensates for slow disk speeds.

A database page read operation starts by storing the page read from disk into the buffer pool, called fixing the page to the buffer pool. The next time the same page is read, it first checks whether the page exists in the buffer pool. If so, the page is directly read. If not, the process of fix is carried out.

Page modification operations in the database are performed by modifying pages in the buffer pool and then brushing them to disk at a certain frequency. Flushing from the buffer pool is not done after each page change, but via a checkPoint mechanism.

The size of the buffer pool directly affects the overall performance of the database. For the InnoDB storage engine, the buffer pool configuration is set with the parameter Innodb_buffer_pool_size.

The data types in the buffer pool are: Index page, Data page, Change buffer, Adaptive Hash Index, Lock info, Dictionary Cache

Buffer pool full displacement algorithm

LRU List

The buffer pool in the database is managed by LRU algorithm. The most recently used pages are placed at the front of the LRU list, and the least used pages are placed at the end of the LRU list. When the buffer pool cannot store new data, the pages at the end of the LRU list are released first.

FREE List

LRU manages pages that have been read, and when the database starts, the LRU list is empty with no pages. At this time, all the pages are placed in the Free List. When paging from the buffer pool is required, check whether the Free List has Free pages first. If so, delete the page from the Free List and add it to the LRU List.

FLUSH List

When a page in the LRU list is modified, it is said to be a dirty page. That is, the page in the buffer pool has inconsistent data with the page on disk. The dirty pages are flushed back to disk using the checkpoint mechanism, and the pages in Flush List are the dirty pages.


Three features of InnoDB:

Insert buffering, double writes, and adaptive hash indexes make up InnoDB’s three main features that give the storage engine better performance and reliability.

Insert buffer

The main performance problem affecting a database is I/O, and the insert buffer is used to change DML operations on ordinary indexes from random I/O to sequential I/O, improving I/O efficiency. If the index page is in the buffer pool, it can be directly inserted. If the index page is not in the buffer pool, it can be inserted into the change buffer first, and then merge the change buffer and the normal index. This improves the performance of normal indexes in one fell swoop.

Double Write

Insert buffering improves insert performance for normal indexes, while double Write ensures write security, preventing MySQL downtime and InnoDB from partial data page writes. If a database instance crashes, the redo log can be used to restore the database instance. However, the redo log files record the physical operation of the page. If the page is damaged, no recovery operation can be performed. A copy of the page is required. If the instance is down, you can restore the page using the copy and redo it using the Redo log. That’s what double Write does.

Adaptive Hash Index

The InnoDB storage engine has a mechanism to monitor index searches, and if InnoDB notices that queries can be optimized by creating hash indexes, it will do so automatically. This can be controlled with the innodb_adaptive_hash_index parameter. Innodb_adaptive_hash_index is turned on by default.

Starting with MySQL5.7, the adaptive hash index search system is partitioned, with each index bound to a special partition and each partition protected by its own latch. Partitions can be controlled by the innodb_adaptive_hash_index_parts parameter, which defaults to 8 and can be set to a maximum of 512. By setting the partition value, you can reduce contention and improve concurrency.

Redo Log Buffer

InnoDB caches redo log information in the redo log buffer and periodically flusher it to the redo log file on disk.

The redo log buffer is flushed to the redo log file on the external disk in the following three cases.

  • The Master Thread flushes the redo log buffer to the redo log file every second
  • The redo log buffer is flushed to the redo log file as each transaction commits
  • When the redo log buffer space is less than half, the redo log buffer is flushed to the redo log file

Innodb_additional_mem_pool_size

Innodb_additional_mem_pool_size is the size of the memory pool that InnoDB uses to store data dictionary information and other internal data structures, in bytes. The default value is 8 meters. The larger the number of tables in the database, the larger the value should be. If InnoDB runs out of memory in the memory pool, it allocates memory from the operating system and inserts an alarm message in the Error log.

thread

Background threads include Master threads, Page Cleaner threads, Read /Write threads, Redo log threads, and Change buffer threads

  • Master Threads are responsible for asynchronously refreshing data from the buffer pool to disks to ensure data consistency, including refreshing dirty pages and merging and inserting buffers.
  • The Page Cleaner Thread, introduced after Innodb1.2. x, turns the dirty Page refresh operation that was originally carried out in the Master Thread into a separate Thread.
  • Read thread/Write thread indicates the Read /Write request threads of the database. The default value is four. You can increase the value if high speed disks are used.
  • The Redo log Thread refreshes the log buffer to the Redo log file.
  • The Change buffer Thread is responsible for flushing the contents of the inserted buffer to disk

Disk file

Every table in the InnoDB engine generates an.ibd file that stores data information and index information.

conclusion