Welcome to pay attention to github.com/hsfxuebao, I hope to help you, if you think it can trouble to click on the Star ha

For the convenience of management, connection management, query caching, syntax parsing, query optimization and other functions that do not involve real data storage are divided into MySQL Server functions, and real data access functions are divided into storage engine functions. Therefore, after MySQL Server completes query optimization, it only needs to call the API provided by the underlying storage engine according to the generated execution plan and return the data to the client.

The storage engine concept is mentioned in MySQL. In short, a storage engine is a table type. In fact, the storage engine used to be called the table processor, later renamed the storage engine, its function is to receive the instructions handed down from the top, and then the data in the table for extraction or write operations.

1. View the storage engine

Check out what storage engines mysql provides:

show engines;
Copy the code

show engines \G;
Copy the code

The following is explicit:

*************************** 1. row ***************************
   Engine: InnoDB
  Support: DEFAULT
  Comment: Supports transactions, row-level locking, and foreign keys
Transactions: YES
     XA: YES
Savepoints: YES
*************************** 2. row ***************************
   Engine: MRG_MYISAM
  Support: YES
  Comment: Collection of identical MyISAM tables
Transactions: NO
     XA: NO
Savepoints: NO
*************************** 3. row ***************************
   Engine: MEMORY
  Support: YES
  Comment: Hash based, stored in memory, useful for temporary tables
Transactions: NO
     XA: NO
Savepoints: NO
*************************** 4. row ***************************
   Engine: BLACKHOLE
  Support: YES
  Comment: /dev/null storage engine (anything you write to it disappears)
Transactions: NO
     XA: NO
Savepoints: NO
*************************** 5. row ***************************
   Engine: MyISAM
  Support: YES
  Comment: MyISAM storage engine
Transactions: NO
     XA: NO
Savepoints: NO
*************************** 6. row ***************************
   Engine: CSV
  Support: YES
  Comment: CSV storage engine
Transactions: NO
     XA: NO
Savepoints: NO
*************************** 7. row ***************************
   Engine: ARCHIVE
  Support: YES
  Comment: Archive storage engine
Transactions: NO
     XA: NO
Savepoints: NO
*************************** 8. row ***************************
   Engine: PERFORMANCE_SCHEMA
  Support: YES
  Comment: Performance Schema
Transactions: NO
     XA: NO
Savepoints: NO
*************************** 9. row ***************************
   Engine: FEDERATED
  Support: NO
  Comment: Federated MySQL storage engine
Transactions: NULL
     XA: NULL
Savepoints: NULL
Copy the code

2. Set the default storage engine

View the default storage engine:

show variables like '%storage_engine%'; # or SELECT @ @ default_storage_engine;Copy the code

Change the default storage engine InnoDB is used as the storage engine for tables if the table creation statement does not explicitly specify the storage engine. If we want to change the default storage engine of the table, we can write the command line to start the server like this:

SET DEFAULT_STORAGE_ENGINE=MyISAM;
Copy the code

Or modify the my.cnf file:

Default-storage-engine =MyISAM # restart mysqld.serviceCopy the code

3. Set the storage engine for the table

Storage engines are responsible for extracting and writing data in tables. We can set different storage engines for different tables, that is, different tables can have different physical storage structures, and different extraction and writing methods.

3.1 Specify a Storage engine when creating a table

Our previous table creation statements did not specify the storage engine of the table, so we will use the default storage engine InnoDB. If we wanted to explicitly specify the storage engine for the table, we could write:

CREATE TABLE name; ENGINE = Storage ENGINE name;Copy the code

3.2 Modifying the Storage Engine of a Table

If the table is already created, we can also use the following statement to change the storage engine of the table:

ALTER TABLE table_name ENGINE = storage ENGINE;Copy the code

For example, if we modify the storage engine of engine_demo_table:

mysql> ALTER TABLE engine_demo_table ENGINE = InnoDB;
Query OK, 0 rows affected (0.05 sec)
Records: 0 Duplicates: 0  Warnings: 0
Copy the code

Engine_demo_table = engine_demo_table

mysql> SHOW CREATE TABLE engine_demo_table\G *************************** 1. row *************************** Table: engine_demo_table Create Table: CREATE TABLE 'engine_demo_table' (' I 'int(11) DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8 1 row in set (0.01sec)Copy the code

4. Engine Introduction

4.1 InnoDB engine: transaction storage engine with foreign key support

MySQL has included the InnoDB storage engine since 3.23.34A.InnoDB engine is used by default after 5.5.
InnoDB is MySQLDefault transactional engine, which is designed to handle a large number of short-lived transactions. You can ensure a complete Commit and Rollback of transactions.
In addition to add and query, you also need to update and delete operations, so InnoDB storage engine should be preferred.
Unless there is a very specific reason to use another storage engine, the InnoDB engine should be preferred.
Data file structure:
- Table name. FRM stores table structure (MySQL8.0, merge in table name.ibd)
- Table name. ibD stores data and indexes
InnoDB isDesigned for maximum performance in handling large data volumes 。
- In previous versions, dictionary data was stored in metadata files, non-transactional tables, and so on. The metadata files are now deleted. Such as:.frm,.par,.trn,.isl,.db.optMySQL8.0 does not exist anymore.
Comparing MyISAM’s storage engine,InnoDB writes are less efficientAnd takes up more disk space to hold data and indexes.
MyISAM caches only indexes, not real data;InnoDB caches real data as well as indexes.High memory requirementsAnd memory size has a decisive impact on performance.

4.2 MyISAM engine: the main non-transactional storage engine

MyISAM offers a number of features, including full-text indexing, compression, spatial functions (GIS), etc., but MyISAMTransactions, row-level locks, and foreign keys are not supportedThere is an undoubted drawback thatUnable to safely recover from a crash 。
The default storage engine before 5.5
The advantage is to visitSpeed is fast, do not require transaction integrity or SELECT, INSERT – based applications
There is additional constant storage for data statistics. Therefore, count(*) queries are very efficient
Data file structure:
- Table name FRM Stores the table structure
- Table name. MYD Store data (MYData)
- Table name. MYI Storage index (MYIndex)
Application scenario: Read-only applications or read – based services

4.3 Archive Engine: Used for data archiving

archiveIt means file,Only insert and direct polling are supportedTwo functions (rows cannot be modified after being inserted).
Indexing is supported after MySQL5.5. |
Have a good compression mechanism, useZlib compression libraryCompression is performed in real time while recording requests and is often used as a repository.
createARCHIVE tableThe storage engine creates a file whose name starts with the table name. Data fileThe extension is.arz.
According to the results of the English test,For the same amount of data, the Archive table is about 75% smaller than the MyISAM table and about 83% smaller than the transactional workNNODB table.
ARCHIVE Storage engineUsing theRow-level locks. The ARCHIVE bow | with supportAUTO_INCREMENTList of attributes.AUTO_INCREMENTcanHas a unique or non-unique index. Attempting to create an index on any other column results in an error.
Archive tableSuitable for log and data collection (archive) applications;Suitable for storing large amounts of independent data as historical records. Has high insert speed, but poor query support.
The following table shows the ARCHIVE storage engine function

4.4 Blackhole Engine: The write operation is discarded. The read operation returns empty content

The Blackhole engine does not implement any storage mechanism, it willDiscard all inserted data, do not do any save.
But the server logs Blackhole tables, so it can be used to copy data to a standby repository, or simply to log. However, there are many problems with this approach, so it is not recommended.

4.5 CSV engine: When storing data, separate data items with commas

The CSV engine willNormal CSV files are treated as MySQL tables, but does not support line | bow.
The CSV engine can serve as oneMechanism for data exchangeVery useful.
CS gy’s data can be read directly in the system, using a text editor, or Excel.
It has obvious advantages for fast data import and export.

When a CSV table is created, the server creates a plain text data file whose name begins with the table name and carries. CSV extension. When you store data in a table, the storage engine stores it in a data file as a comma-separated value.

Case as follows

mysql> CREATE TABLE test (i INT NOT NULL, c CHAR(10) NOT NULL) ENGINE = CSV;
Query OK, 0 rows affected (0.06 sec)

mysql> INSERT INTO test VALUES(1,'record one'),(2,'record two');
Query OK, 2 rows affected (0.05 sec)
Records: 2 Duplicates: 0  Warnings: 0

mysql> SELECT * FROM test;
+---+------------+
| i | c     |
+---+------------+
| 1 | record one |
| 2 | record two |
+---+------------+
2 rows in set (0.00 sec)
Copy the code

Creating a CSV table also creates a meta-file to store the status of the table and the number of rows in the table. The name of this file is the same as the name of the table, with the suffix CSM. As is shown in

If you examine the file in the test.csv database directory created by executing the above statement, its contents will be opened using Notepad++ as follows:

"1","record one"
"2","record two"
Copy the code

This format can be read and even written by spreadsheet applications such as Microsoft Excel. Use Microsoft Excel to open as shown

4.6 Memory Engine: Tables stored in Memory

Summary:

Memory uses the logical medium of Memory, which responds quickly but loses data when the mysqld daemon crashes. In addition, data is required to be stored in a format of constant length, for example, Blob and Text data are not available (with variable length).

Main Features:

The Memory at the same timeSupports HASH indexes and B+ tree indexes 。
- Hash indexes are fast for equality, but much slower for ranges
- Hash indexes are used by default, which is faster than using a BTREE index
- If you want to use b-tree indexes, you can choose to use them during index creation
The Memory table at leastAn order of magnitude faster than MyISAM tables 。
MEMORY Tables are limited in size. The size of the table depends on two parameters, respectivelymax_rowsandmax_heap_table_size. Among them,max_rowsYou can specify this at table creation time;max_heap_table_sizeThe default size is 16MB and can be expanded as needed.
Data files are stored separately from index files.
- Using the Memory engineThe table actually corresponds to a disk file, the file name is the same as the table name and the type isFRM type, the file stores only the structure of the table, while theData files are stored in memory
- This facilitates fast data processing and provides efficiency for the entire table
Disadvantages:Its data is easy to lose and its life cycle is short. Because of this flaw, you need to be careful when choosing a MEMORY storage engine.

Scenarios using Memory storage engines:

The target data is small and frequently accessed. The data is stored in memory. If the data is too large, the memory will overflow. The max_heap_table_size parameter can be used to control the size of the Memory table and limit the maximum size of the Memory table.
If the data is temporary and must be immediately available, it can be placed in memory.
It does not matter much if data stored in Memory tables is suddenly lost.

4.7 Federated engine: Accesses remote tables

The Federated engine is a proxy to access other MySQL servers, and although it appears to offer a good cross-server flexibility, it often causes problems and is disabled by default.

4.8 Merge engine: Manages a set of MyISAM tables

4.9 NDB Engine: dedicated storage engine for MySQL cluster

NDB Cluster storage engine, also called NDB Cluster storage engine, is mainly used in MySQL Cluster distributed Cluster environment, similar to Oracle RAC Cluster.

4.10 Engine Comparison

In MySQL, different tables in the same database can use different storage engines. The following table compares common storage engines.

The characteristics of	MyISAM	InnoDB	MEMORY	MERGE	NDB
Storage limits	There are	64TB	There are	There is no	There are
The transaction	support
Locking mechanism	Table locks, which lock the entire table even when operating on a single record, are not suitable for high-concurrency operations	A row lock is used to lock only one row without affecting other rows. It is suitable for high-concurrency operations	Table locks	Table locks	Row locks
B-tree indexes	support	support	support	support	support
The hash index			support		support
The full text indexing	support
The cluster index		support
Data cache		support	support		support
The index buffer	Only indexes are cached, not real data	Caching not only indexes but also real data requires a lot of memory, and memory size has a decisive impact on performance	support	support	support
Data compressibility	support
Space use	low	high	N/A	low	low
Memory usage	low	high	medium	low	high
The speed of batch inserts	high	low	high	high	high
Support foreign keys		support

In fact, we most commonly use InnoDB and MyISAM, sometimes mentioning Memory.

5. The difference between MyISAM and InnoDB

Many people have questions about InnoDB and MylSAM. Which one is better?

The default storage engine was MylSAM before MySQL5.5, and InnoDB after 5.5.

First, the InnoDB storage engine provides good transaction management, crash recovery and concurrency control. Because InnoDB storage engine supports transactions, it is necessary to choose InnoDB for occasions requiring transaction integrity. For example, data operations include many update and delete operations in addition to inserts and queries, and systems requiring high data accuracy such as financial systems. The disadvantage is that its read and write efficiency is slightly poor, occupying relatively large data space.

Secondly, if the MyISAM storage engine is a small application, the system is mainly read operation and insert operation, only a few update, delete operation, and the transaction requirements are not so high, you can choose this storage engine. The advantages of MyISAM storage engine lie in the small footprint and fast processing speed; The missing point is that transaction integrity and concurrency are not supported.

These two engines have their own characteristics, but you can also choose different storage engines in MySQL for different tables.”

Compare the item	`MyISAM`	`InnoDB`
A foreign key	Does not support	support
The transaction	Does not support	support
Line table lock	`Table locks`, even a single record will lock the entire table, not suitable for high-concurrency operations	`Row locks`, locks only one line during the operation and does not affect other lines. It is suitable for high-concurrency operations
The cache	`Cache only indexes`Real data is not cached	`Cache not only indexes but also data`, has high memory requirements, and memory size has a decisive impact on performance
Built-in system table for use	Y	N
concerns	Performance: Saves resources, consumes less, and provides simple services	Transactions: Concurrent writes, transactions, larger resources
The default installation	Y	Y
Use the default	N	Y

6. Which is alibaba or Taobao

Percona is an improvement to the MySQL database server, providing significant improvements in functionality and performance over MySQL.
This release improves InnoDB performance under high loads and provides some very useful performance diagnostics tools for DBAs. There are more parameters and commands to control server behavior.
The company has created a new storage engine called Xtradb that can replace Innodb with better performance and concurrency
Most of Alibaba’s mysql databases are actually modified using the Percona prototype.

7. Focus on InnoDB storage engine

7.1 Advantages of InnoDB

InnoDB storage engine has many advantages in practical application, such as convenient operation, improved database performance, low maintenance cost. If the server crashes due to hardware or software, no additional operations need to be performed after the server restarts. InnoDB crash recovery automatically finalizes previously committed content, then uncommits processes and restarts to resume execution from the crash point.

The InnoDB storage engine maintains a buffer pool in main memory, where frequently used data is processed directly. This caching method is applied to a variety of information and speeds up the processing process

On dedicated servers, up to 80% of the physical memory is applied to the buffer pool. If you need to insert data into different tables, you can set foreign keys to enforce data integrity. Update or delete data, the associated data will be automatically updated or deleted. If you try to insert data into a secondary table, but there is no corresponding data in the primary table, the inserted data will be removed automatically. If data on disk or in memory crashes, the checksum mechanism warns before using dirty data. When the primary key of each table is set properly, operations related to these columns are automatically optimized. Insert, update, and delete operations are optimized with automatic change buffering. InnoDB not only supports current reads and writes, but also buffers changed data to the data stream disk.

InnoDB’s performance benefits are not limited to large tables with long-running queries. Adaptive hash indexes can speed up queries when the same column is queried multiple times. With InnoDB you can compress tables and associated indexes, and you can create or delete leads without affecting performance and availability. For large text and BLOB data, this storage layout is more efficient using dynamic row form. Through the query

The tables in the INFORMATION_SCHEMA library can monitor the internal workings of the storage engine. InnoDB tables can be mixed with other storage engine tables in the same statement. Even if some operating systems limit the file size to 2GB, InnoDB can still handle it. When dealing with large amounts of data, InnoDB takes care of the CPU for maximum performance.

7.2 InnoDB and ACID model

The ACID model is a set of database design rules that emphasize reliability, which is important for business data and mission-critical applications. MySQL includes innoDB-like storage engine components that are tightly linked to the ACID model so that when things go wrong, data doesn’t crash and results don’t get distorted. If you rely on the ACID model, you can dispense with consistency checking and crash recovery mechanisms. With additional software protection, highly reliable hardware or applications that can tolerate a small percentage of data loss and inconsistencies can be tuned to MySQL Settings that rely on only part of the ACID feature for higher performance. Here are four ways that the InnoDB storage engine works in the same way as the ACID model

7.2.1 atoms

The atomic aspect of ACID mainly involves InnoDB transactions, and MySQL related features mainly include:

Automatically commit Settings.
COMMIT statement.
The ROLLBACK statement.
Manipulate the table data in the INFORMATION_SCHEMA library.

7.2.2 Consistency aspect

The CONSISTENCY of the ACID model involves internal InnoDB processing that protects data from crashes. Features associated with MySQL include:

InnoDB double-write cache.
InnoDB crash recovery.

7.2.3 Isolation aspects

Isolation is the level that applies to transactions, and the main features associated with MySQL include: auto-commit Settings.

SET ISOLATION LEVEL statement.
Low level information about InnoDB locks.

7.2.4 Durability aspect

Durability of the ACID model is primarily concerned with MySQL software features that interact with hardware configurations. Due to the complexity and diversity of hardware, there are no specific rules for durability. MySQL features include:

InnoDB double-write cache, configured with the Innodb_doublewrite configuration item.
Innodb_flush_log_at_trx_commit Configuration item.
Configuration item sync_binlog.
Innodb_file_per_table configuration item.
Write cache of storage device.
Spare battery cache for storage devices.
Run the MySQL operating system.
Continuous power supply.
Backup policy.
For distributed or hosted applications, the most important factors are hardware location and network availability

7.3 InnoDB Architecture

The following diagram briefly describes the InnoDB storage engine architecture:

InnoDB storage engine has multiple memory blocks that make up a large memory pool. Background threads are responsible for flushing data in the memory pool, flushing modified data to disk, and so on. Next we introduce background threads and memory pools respectively.

7.3.1 Background Threads

InnoDB has several different threads in the background for different tasks. The main ones are as follows:

Master Thread: the core Thread is responsible for asynchronously refreshing data from the buffer pool to the disk to ensure data consistency, including updating dirty pages, merging and inserting buffers, and recycling UNDO pages.
IO Threads: The InnoDB storage engine uses a lot of asynchronous I/OS to handle write I/O requests. IO threads are mainly responsible for the callback of these I/O requests.
Purge Thread: The Undo log may not be needed after the transaction is committed, so Purge Thread is needed to reclaim the undo pages that have been used and allocated. InnoDB supports multiple Purge threads to speed up undo page recycling.
Page Cleaner Thread : Page Cleaner Thread is newly introduced in InnoDB 1.2.x version. Its function is to refresh dirty pages in the previous version into a separate Thread to complete, so as to reduce the work of Master threads and the blocking of user query threads.

7.3.2 memory

The InnoDB storage engine is disk-based, meaning data is stored on disk. Due to the gap between CPU speed and disk speed, the InnoDB storage engine uses buffer pool technology to improve overall database performance. A buffer pool is simply an area of memory. When a page is read in the database, the page read from the disk is stored in the buffer pool. When the same page is read next time, the system checks whether the page is in the buffer pool. If yes, the system says the page is hit in the buffer pool and reads the page directly. Otherwise, the page on disk is read. When a page is modified in the database, it is first modified in the buffer pool and then flushed to disk at a certain frequency, not flushed back to disk every time a page changes.

1. Buffer Pool

The data page types cached in the buffer pool include: index page, data page, undo page, insert buffer, adaptive hash index, InnoDB lock information, data dictionary information, etc. Index pages and data pages make up a large portion of the buffer pool (these pages are known for now and are described in a later section). In InnoDB, the default page size in the buffer pool is 16KB.

Now that we know that the Buffer Pool is actually a contiguous memory space, we are faced with the question: how do we cache the pages on disk into the Buffer Pool? Buffer Pool = Buffer Pool = Buffer Pool = Buffer Pool InnoDB creates some control information for each cached page. These control information includes the tablespace number of the page, page number, page address in the Buffer Pool, some lock information and LSN information (lock and LSN can be ignored here). There’s also some other control information.

The size of memory occupied by the control information corresponding to each cache page is the same, so we call the memory occupied by the control information corresponding to each page a control block. The control block and the cache page correspond one by one, and they are stored in the Buffer Pool, where the control block is stored in the front of the Buffer Pool. Cache pages are stored behind the Buffer Pool, so the entire Buffer Pool will look like this:

What is that fragment between the control block and the cache page? Think about it, each control block corresponds to a cache page, so after allocating enough control blocks and cache pages, the remaining space may not be enough for the size of a pair of control blocks and cache pages, naturally, this unused bit of memory space is called fragmentation. Of course, if you set the size of the Buffer Pool just right, it may not cause fragmentation

2. Redo the log buffer

Innodb memory area has redo log buffer in addition to buffer pool. Innodb storage engine writes redo log information to the buffer and then flusher it to redo log files on a regular basis. Typically, the master thread flushes the cache to the log file every second. So the user only needs to keep the number of transactions per second within this buffer size. This is controlled by the innodb_log_buffer_size parameter. The default of 8M redo log buffers is sufficient for most applications. Because the following conditions are flushed to the redo log file

The master thread flushes the redo log buffer to the redo log file every second
eachWhen a transaction commitsWill redo the log bufferFlush to redo log files
When the redo log buffer pool space is less than 1/2, the redo log buffer is flushed to the redo log file

3. Free List

We saw the structure of the buffer pool earlier. How does InnoDB storage engine manage buffer pools? Let me talk about my confusion at that time. (I can skip this part, because some nouns are not involved.) I read the book “inside Mysql Technology”, which talks about LRU List a lot. As a result, I could not imagine the management of buffer pool, but I finally figured it out after looking for information on the Internet. Take a look:

When we first start MySQL server, we need to complete the initialization process of the Buffer Pool, that is, allocate the memory space of the Buffer Pool, divide it into several control blocks and cache pages. However, there are no actual disk pages cached in the Buffer Pool (because they are not used yet). Later, as the program runs, there will be more and more disk pages cached in the Buffer Pool. Which cache page should be placed when reading a page from disk into a Buffer Pool? Or how do you tell which pages in the Buffer Pool are idle and which are already in use? We’d better keep track of which pages are available somewhere, and we can wrap all the Free pages into a node to form a linked list, which can also be called a Free list (or Free list). Since all cached pages in the newly initialized Buffer Pool are Free, each cached page will be added to the Free list. Assuming that the number of cached pages in the Buffer Pool is N, the effect of adding the Free list looks like this:

As can be seen from the figure, in order to manage the Free linked list well, we have specially defined a control information for the linked list. The inner side contains the address of the first node, the address of the last node, and the number of nodes in the current linked list. We record the address of a cache page control block in each Free linked list node, and each cache page control block records the corresponding cache page address, so it is equivalent to each Free linked list node corresponds to a Free cache page.

Whenever a page needs to be loaded from disk into the Buffer Pool, a Free page is taken from the Free list and the corresponding control block is filled in. Then the corresponding Free list node is removed from the list. Indicates that the cache page is already in use ~

Just to recap, why do we talk about free list? Buffer pool management. The free list is a data structure that maintains a buffer pool of free cached pages when the database service has just started.

4. LRU list

Let’s briefly review how Buffer pools work. The Buffer Pool has two main functions: one is to speed up reads and the other is to speed up writes. What about speed-reading? When you need to access a data page, if the page is already in the cache pool, you don’t need to access disk, and the content of the page can be retrieved directly from the buffer pool. What about accelerated writing? When a page needs to be modified, the page is first modified in the buffer pool, and the related redo log is recorded, the page modification is complete. When the modified page is actually flushed to disk is done by the background refresh thread.

When implementing the above two functions, we need to consider the limitations of objective conditions. Because the memory size of the machine is limited, so the size of the MySQL InnoDB Buffer Pool is also limited. If the memory size of pages to be cached exceeds the size of the Buffer Pool, There are no more pages in the Free list. If you remove some of the old cached pages from the Buffer Pool, then add new pages to the Buffer Pool.

To answer this question, we also need to go back to the original purpose of our Buffer Pool, which is to reduce THE I/O interaction with the disk. Ideally, each time a page is accessed, it is already cached in the Buffer Pool. Assuming that we have accessed a total of n pages, then the number of pages that have been accessed divided by n is called the cache hit ratio, and the expectation is to make the cache hit ratio as high as possible ~

How to improve cache hit ratio? InnoDB Buffer Pool uses the classic LRU algorithm for page culling to improve cache hit ratio. When there are no more idle pages in the Buffer Pool, you need to flush out some of the recently infrequently used pages. However, how do we know which cache pages have recently been used heavily and which have recently been used sparingly? The Least Recently Used LRU list is the Least Recently Used LRU list. The Least Recently Used LRU list is the Least Recently Used LRU list. When we need to access a page, we can handle the LRU list like this:

If the page is not in the Buffer Pool, the cache page is wrapped as a node into the head of the linked list when the page is loaded from disk into the cache page in the Buffer Pool.
If the page is in the Buffer Pool, the corresponding LRU list node is moved directly to the head of the list.

However, there are some performance issues, such as hot data flushing in one full table scan or logical backup, which can cause buffer pool contamination problems! All the data pages in the Buffer Pool are bloodied once. Other query statements have to be loaded from disk to the Buffer Pool again. However, the full table scan statement is not executed frequently, and the cache pages in the Buffer Pool are bloodied once. This severely affects the use of the Buffer Pool by other queries and seriously reduces the cache hit ratio!

So InnoDB storage engine has made some improvements to the traditional LRU algorithm and added midpoint to InnoDB. The newly read page, although the most recently accessed page, is not inserted directly into the head of the LRU list, but into the MIDpoint location of the LRU list. The algorithm is called midpoint Insertion Testy. The default configuration is inserted at 5/8 of the list length. Midpoint is controlled by the innodb_old_blocks_pct parameter.

The list before MIDPoint is called a new list, and the list after it is called an old list. Pages in the New list can simply be thought of as the most active hot data.

The InnoDB storage engine also introduced innodb_old_blocks_time to indicate how long a page must wait after reading the mid position before being added to the hot end of the LRU list. You can set this parameter to prevent hotspot data from being flushed out easily.

5. Flush list

Okay, now that we’ve basically got the LRU list, let’s move on. Previously, we talked about pages that are updated first in the cache pool, which is not the same as pages on disk. Such cached pages are also called dirty pages. So when do these modified pages flush to disk? In what order are they flushed to disk? Of course, the simplest thing to do would be to synchronize every change to the corresponding page on disk immediately, but writing data to disk frequently can seriously affect program performance (after all, disks are slow as tortoises). Therefore, every time a cache page is modified, we do not rush to synchronize the changes to disk immediately, but at a certain point in the future, and the background refresh thread will flush the changes to disk in turn to achieve the landing of the changes to disk.

But if we don’t synchronize to disk immediately, how do we know which pages in the Buffer Pool are dirty and which have never been modified when we synchronize again? If the Buffer Pool is set to a large size (say 300GB), then it will take a long time to synchronize all the data.

As a result, we had to create a dirty page list, and any page that had been modified in the LRU list would be added to this list, also known as FLUSH, or sometimes abbreviated as FLU, because pages in this list need to be flushed to disk. The structure of a linked list is similar to that of a Free list. Dirty Page modification refers to a Page that has been modified for the first time since it was loaded into the Buffer Pool. Only when the Page is modified for the first time is it added to the FLUSH list (oldest_modification == 0 in the Page header). If this page is modified again it will no longer be in the FLUSH list because it already exists. It is important to note that the dirty page data is actually still in the LRU list, whereas the dirty page records in the FLUSH list are Pointers to dirty pages in the LRU list. Dirty pages in the FLUSH linked list are flushed to disk in order of oldest_LSN (the LSN number that was first modified, corresponding to oldest_modification, at the head of each page). Smaller pages are flushed first to avoid data inconsistencies.

Note: Dirty pages exist in both the LRU list and the Flush list. The LRU list is used to manage the availability of pages in the buffer pool, and the Flush list is used to manage flushing pages back to disk.

6. Summary of Free List, LRU List and Flush List

The relationship between the three important lists (LRU List, Free List, and Flush list) can be expressed as follows:

The relationship between the Free list and the LRU list is reciprocal, and pages are swapped back and forth between the two lists. The FLUSH list, which records dirty pages, is also a pointer to the LRU list, so the FLUSH list is wrapped by the LRU list.

7.4 CheckPoint technology

With buffer pools out of the way, CheckPoint technology. CheckPoint technology is designed to solve the following problems:

Shorten the database recovery time
When the buffer pool is insufficient,Flush dirty pages to disk
When the redo log is not available,Refresh the dirty pages

To shorten the database recovery time, the checkpoint is recorded in redo logs, and the pages before this point have been flushed back to the disk. You only need to restore the redo logs after the checkpoint. This greatly reduces the recovery time.

When the buffer pool is insufficient, the LRU algorithm overflows the least recently used pages. If the pages are dirty, checkpoint is performed to flush the dirty pages back to the disk.

The redo log is unavailable. This part of the redo log cannot be overwritten. Why? Because: Because redo logs are designed to be recycled. The data corresponding to this part is not flushed to disk. If these logs are not needed during database recovery, they can be overwritten. If necessary, checkpoint must be forced to flush the pages in the buffer pool at least to the current redo log location.

Question: How many pages are flushed to disk at checkpoint? Where do I pick up dirty pages each time? When to trigger checkpoint?

Inside InnoDB storage engine, there are two checkpoint types:

Sharp Checkpoint
Fuzzy Checkpoint

Sharp Checkpoint occurs when the database is shutdown and all dirty pages are flushed back to disk. This works by default with the parameter innodb_fast_shutdown=1. Does not apply to database runtime refresh.

When the database is running, The InnoDB storage engine uses Fuzzy Checkpoint to refresh only part of the dirty pages. Several Fuzzy Checkpoint situations occur as follows:

MasterThread Checkpoint
- Asynchronous flush, flush a percentage of pages back to disk from the buffer pool dirty page list every second or every 10 seconds. Asynchronous refresh, that is, InnoDB storage engine can do other operations, the user query thread is not blocked.
FLUSH_LRU_LIST Checkpoint
- The InnoDB storage engine needs to ensure that there are approximately 100 free pages available in the LRU list. Before InnoDB 1.1.x, the user query thread checked to see if the LRU list had enough space to manipulate. If not, the LRU algorithm overflows the pages at the end of the LRU list. If there are dirty pages on these pages, checkpoint is required. Therefore, it is called flush_lru_list checkpoint.
- Starting with InnoDB 1.2.x, this inspection is carried out in a separate process (Page Cleaner). Benefits: 1. Reduce pressure on master threads 2. Reduce user Thread blocking.
- Setting parameters:innodb_lru_scan_dept: controls the number of available pages in the LRU list. This defaults to 1024
Async/Sync Flush Checkpoint
- The page is selected from the dirty page list when the redo log is unavailable and the page needs to be forcibly refreshed to disk.
- In this case, the redo log is available, which means that the redo log has too little space to cycle over, or in other words, a large number of redo logs are generated in a very short period of time.
- And then there are a couple of variables, and it’s not too hard to graph, so take a look.
  - The InnoDB storage engine marks versions by Log Sequence Number (LSN), which is an 8-byte Number. Each page has an LSN, redo logs have an LSN, and checkpoint has an LSN.
  - LSN written to logs: redo_lSN
  - Refresh the latest page of disk LSN:checkpoint_lsn
  - It has the following definitions:
  - checkpoint_age = redo_lsn - checkpoint_lsn
  - async_water_mark = 75% * total_redo_file_size
  - sync_water_mark = 90% * total_redo_file_size
  - The refresh process is as follows:
Dirty Page too much Checkpoint
- If there are too many dirty pages, force checkpoint. Ensure there are enough available pages in the buffer pool.
- Parameter Settings:innodb_max_dirty_pages_pct = 75Indicates: Checkpoint is forced when the number of dirty pages in the buffer pool is 75%. After 1.0.x, the default is 75

7.5 InnoDB Key Features

7.5.1 Inserting a Buffer

Insert Buffer is one of the most exciting and exciting of InnoDB’s key storage engine features. However, the name might suggest that the insert buffer is part of the buffer pool. No, it’s nice to have Insert buffers in the InnoDB Buffer pool, but Insert buffers are as much a part of the physical page as data pages.

In general, a primary key is a unique identifier for a row. Typically, row records in an application are inserted in order of primary key increment. Therefore, the insertion of clustered indexes is generally sequential and does not require a random read from disk. Because, for this type of insertion, it’s pretty fast. (If the primary key class is a class like UUID, then inserts are just as random as secondary indexes.)

If the index is non-clustered and not unique. During the insertion operation, the storage of data is not sequential for the insertion of non-clustered index leaf nodes. In this case, the non-clustered index pages need to be accessed discreetly, which leads to the degradation of the performance of the insertion operation due to the existence of random reads. This is because the property of B+ trees determines the discreteness of non-clustered index inserts.

The Insert Buffer is designed to determine whether a non-clustered index page is in the Buffer pool, and if it is not, it is inserted into an Insert Buffer object first. The non-clustered index of the database is inserted into the leaf node when it is not, but stored in a different location. The merge of Insert Buffer and secondary index page children is then performed with a certain frequency and circumstance, and it is usually possible to merge multiple inserts into a single operation (because it is in a single index page), which greatly improves performance for non-clustered index inserts.

Two conditions need to be met:

The index isSecondary index;
The indexIs not the only.

Secondary indexes cannot be unique because the database does not look up the index page to determine the uniqueness of the inserted record when the buffer is inserted. If you look it up, you’re bound to get discrete reads, making the Insert Buffer meaningless.

7.5.2 write twice

If insert buffering is for write performance, write twice is for reliability.

Partial write failure: Imagine a scenario where the database is writing a data page from memory to disk, and the database goes down. As a result, only part of the page is written. This is called partial write failure, which results in data loss. This is not possible with redo logs because redo logs record physical changes to the page and cannot do anything if the page itself is corrupted.

From the above analysis, we know that in the case of partial write failure, we need a copy of the original page before applying redo log. Two writes are used to solve this problem. Here is the schematic diagram:

Two writes require two additional sections:

Two write buffers in memory (doublewrite buffer), the size is 2MB
On the disk128 contiguous pages in the shared tablespace, also 2MB in size

It works like this:

When a dirty buffer pool page is flushed, it is not written directly to the data file, but ratherFirst copy to memory twice write buffer.
And then write the buffer twiceThe data is written to the disk shared tablespace in two times, write 1MB each time
After step 2 completes, write two more write buffers to the data file

So we can solve the problem of writing which is mentioned in the above failure, because of existing data in the Shared disk tablespace page copy of a copy of the page if the database in writing data files in the process of downtime, during instance recovery, you can find a copy of the page from sharing a table space, to copy it cover the original data page, application of redo logs again can.

Step 2 is an additional performance overhead, but it is not significant because the disk shared table space is contiguous. You can disable the doublewrite function with the skip_innodb_doublewrite parameter. This function is enabled by default and is strongly recommended.

7.5.3 Adaptive Hash Index

A hash is a very fast lookup method with O(1) time in the general case. However, the search times of B+ tree depend on the height of B+ tree. In the generation environment, the height of B+ tree is usually 3-4 layers, and there is no need to search 3-4 times.

The InnoDB storage engine monitors queries to index pages on tables. If it is observed that creating a Hash Index can improve the speed, a Hash Index is established, called Adaptive Hash Index (AHI). The AHI is constructed from the B+ tree pages of the buffer pool. So build very fast, and do not build hash indexes on the entire table. The InnoDB storage engine automatically hashes some hot pages based on the frequency and mode of access.

AHI has a requirement that the continuous access pattern (query criteria) for this page be the same. For example, the access mode of the joint index (a, B) can be as follows:

WHERE a=XXX;
WHERE a= XXX AND b= XXX. InnoDB storage engine does not construct an AHI for the page if the above two queries are interchangeably performed. AHI also has the following requirements:
Accessed 100 times in this mode;
The page is accessed through this mode N times, where N= record on the page /16. According to official documentation, with AHI enabled, read and write speeds are two times faster, and link operations responsible for indexing are five times faster. The idea is to liberalize the database without the need for DBA to manually adjust the database.

7.5.4 asynchronous I/o (AIO)

In order to improve the performance of disk operations, the current database systems use asynchronous I/O to process disk operations. So it is with InnoDB.

Sync I/O corresponds to AIO. After performing an I/O operation, you can perform subsequent operations only after the operation is complete. However, if the user issues an index scan query, the SQL statement may need to scan multiple index pages, which requires multiple IO operations. It is not necessary to scan each page and wait for it to complete before the next scan. After sending an I/O request, you can immediately send another I/O request. After all I/O requests are sent, all I/O operations are completed, which is called AIO.

Another advantage of AIO is I/O Merge, which merges multiple I/OS into one I/O operation to improve IOPS.

Prior to InnoDB 1.1.x, the implementation of AIO was simulated through code in InnoDB storage engine. Since then, however, kernel-level AIO support has been provided, called Native AIO. Native AIO requires operating system support. Windows and Linux support it, but Mac does not. Consider this when choosing the operating system for the MySQL database server.

MySQL can use the innodb_use_native_aiO parameter to determine whether to enable Native AIO. In InnoDB storage engine, read ahead is done through AIO, and dirty page refresh is also done through AIO.

7.5.5 Refreshing the adjacency page

When InnoDB’s storage engine flushes a dirty page, it detects all pages in the extent of the page and flushes them together if they are dirty. The advantage of this is that with AIO you can combine multiple IO writes into one IO operation. This working mechanism has significant advantages over traditional mechanical disks. But there are two things to consider:

Do you write pages that are not so dirty and then become dirty very quickly?
Solid-state drives have high IOPS. Do you still need this feature?

For this reason, InnoDB storage engine 1.2.x starts to provide the parameter innodb_flush_NEIGHBORS to determine whether to enable it or not. It is recommended for traditional mechanical drives, but can be turned off for solid-state drives.

7.6 the InnoDB architecture

Buffer pool A buffer pool is a portion of main memory used to cache used table and index data. Buffer pools are often used

Data can be obtained directly in memory, thus increasing speed.

Change cache The change cache is a special data structure that caches the helper string when the affected index page is not in the cache

Changes to the citation page. When the index page is loaded into the cache pool by other read operations, cached changes are merged. Unlike cluster indexing, secondary indexing is not unique. When the system is largely idle, the sweep operation runs periodically, flushing updated index pages to the disk. Updating cache merges can significantly degrade query performance. In memory, the update cache takes up part of the InnoDB buffer pool. On disk, the update cache is part of the system table space. The data type of the update cache is governed by the Innodb_change_buffering configuration item.

Adaptive hash index Adaptive hash index combines load with enough memory to make InnoDB run like an in-memory database,

There is no need to degrade performance or reliability on transactions. This feature is configured with the innodb_adaptive_hash_index option, or with the — skip-innodb_adaptive_hash_index command line when the service is started.

Redo log cache The redo log cache holds data to be placed in redo logs. Redo log cache size passed

Innodb_log_buffer_size Configuration item configuration. The redo log cache periodically flushers log files to disk. A large redo log cache allows large transactions to run without writing to disk.

** System tablespaces include InnoDB data dictionaries, double-write caches, update caches and undo logs, as well as table and index data. Multiple tables are shared and the system table space is treated as a shared table space.

Dual-write cache The dual-write cache resides in the system table space and is used to write data pages flushed from the cache pool. Only after flushing and writing to the double-write cache

InnoDB will then write the data page to the appropriate location.

Undo log An undo log is a collection of undo records related to a transaction, including how to undo the most recent changes to the transaction. If the other

For transactions to query raw data, unchanged data can be traced from the undo log. Undo logs exist in undo log fragments that are contained in rollback fragments.

Table space per file Table space per file means that each individual table space is created in its own data file.

Not in the system tablespace. This is enabled through the innodb_file_per_table configuration item. Each table space is represented by a separate.ibd data file, which is created by default in the database directory.

Use the CREATE TABLESPACE syntax to CREATE a shared InnoDB TABLESPACE. Generic table Spaces can be created in MySQL number

A table that can manage multiple tables outside of the data directory and supports all row formats.

Undo table Spaces An undo table space consists of one or more files that contain undo logs. The number of tablespace unloads is determined by

Innodb_undo_tablespaces configuration item Configuration.

Temporary table Spaces Temporary table Spaces created by users and disk-based internal temporary tables are created in temporary table Spaces.

The innodb_TEMP_DATA_FILe_PATH configuration item defines the associated path, name, size, and properties. If this value is null, an automatically expanded data file is created by default in the directory specified by the innodb_datA_home_DIR variable.

Redo logs Redo logs are disk-based data structures used during crash recovery to correct data. During normal operation,

Redo logs encode requests that alter InnoDB table data. After an unexpected crash, unfinished changes are automatically reperformed during initialization

Refer to the article

InnoDB storage engine (2nd edition) chapter 1 and chapter 2: InnoDB storage engine “Database index design and optimization” MySQL InnoDB features DoubleWrite InnoDB feature – DoubleWrite

MySQL8 Architecture Chapter 5- Storage Engine (Key InnoDB feature)