How does MYSQL query and insert data flow

The steps that a query statement goes through

This time we will talk about the execution process of SQL from the overall architecture of MySQL, as shown in the following figure:

The whole is divided into two parts: Server and engine layer. I use InnoDB to replace the engine layer. The design of engine layer is in the form of plug-in, which can be replaced at will.

Server layer

Connector: The connector is responsible for establishing the connection with the client, obtaining permissions, maintaining and managing the connection; Query cache: the query cache of the service. If the corresponding query can be found, it will directly return the result set in the cache without the process of query parsing, optimization and execution. Parser: According to the query statement, the parser constructs a parsing tree, which is mainly used to verify whether the statement is correct according to the syntax rules, for example, whether the SQL keyword is correct and the keyword sequence is correct. Optimizer: The parse tree is converted into a query plan. Generally, a query can be executed in many ways and eventually return the same result. The optimizer is to find the optimal execution plan according to the cost. Executor: The execution plan calls the query execution engine, which queries the data through a series of APIS;

InnoDB

Background thread: responsible for refreshing the data in the memory pool, ensuring that the memory cache in the cache pool is the latest data, refreshing the modified data to disk files, and ensuring that the abnormal situation of the database can be restored to the normal situation; Memory pool, pool can also be called a buffer pool, mainly to make up for slower disk on the impact of the database, query, will first disk read page data in the memory pool, next time read directly read data from the memory pool, modify the data, modify data in a memory pool first, Background threads are then flushed to disk at a regular rate. Files: mainly tablespace files, but also some log files; MySQL5.6 系统的图片 the overall architecture of MySQL, including memory pool, files, background threads and other things with details, is not introduced. Later we introduce other times to bring out the detailed part of it, and attach a picture of MySQL5.6 overall architecture.

How does InnoDB store data

This part is built on the basis of the previous part, which needs to go into details about the memory pool, files and background threads to understand the composition. Next, we will start to explain it in three parts:

file

The files are divided into log files and storage files, which are divided into two parts:

Storing files

The storage file is also the storage of table data. The overall storage structure is shown as follows:

A table space is divided into two types of files: a shared table space and a unique table space for each form. A separate table space stores data and indexes in a table, while a shared table space mainly stores transaction information and rollback information. Table Spaces are made up of segments, Extend, pages, and rows.

Segment (Segment)

Common segments include data Segment, index Segment, rollback Segment, etc. Data Segment is Leaf node Segment of B+ tree, and index Segment is non-leaf node Segment of B+ tree. The diagram below:

Every index is created, an index segment is created, and the leaf node of the index segment points to the data segment, which is required when we query data through such a combination. Therefore, the more indexes are created, the more index segments need to be built, resulting in an increase in data insertion time.
Extend is the basic element that constitutes a segment. A segment consists of several segments. A segment is a segment of space that is physically consecutively allocated. If an area is insufficient to store more data, you need to allocate a new area to store the new data. The amount of space managed by a segment is infinite and can be extended forever, but the smallest unit of extension is the extent. Each extents are fixed in size at 1MB and consist of pages. InnoDB usually requests 4-5 extents from disk at a time to ensure the continuity of pages in the extents. With the default Page size of 16KB, an area consists of 64 consecutive pages.
Page: A Page is the basic unit of an area and the smallest unit of InnoDB disk management. Both logically (page numbers are contiguous from small to large) and physically. When to insert the data in the table, if a page has been finished, the system will be from the current area allocated in a new handle to use free pages, if the current area of 64 pages are assigned finished, system will be from the current page in a new section in the distribution area, and then assign a new page in this area.
InnoDB stores data in rows. Each Page can store up to 16KB of data. When data is larger than 16KB, rows overflow and the data is stored in an external Page (Uncompressed BLOB Page).

The log file

Log files: binlog, redo log, and redo log

binlog

Binlog Records write operations (excluding queries) performed by the database and saves them in binary format on the disk. Binlog is the logical log of mysql and is logged by the Server layer. Binlog is logged by mysql databases using any storage engine. The size of each binlog file can be set using the max_binlog_size parameter. When the file size reaches the specified value, a new file will be generated to store the log.

Binlog Log format

ROW is based on ROW replication and does not record the context information for each SQL statement, only which data was modified. Advantages: There are no problems with stored procedures, or function, or trigger calls and triggers that cannot be copied correctly in certain cases; Disadvantages: Because each line should be logged, the log volume will be skyrocketed;
STATMENT is based on the replication of SQL statements. Each SQL statement that modifies data is recorded in a binlog. Advantages: No need to record the changes of each row, reducing the amount of binlog, saving I/O, and improving performance. Disadvantages: Can cause inconsistencies between master and slave data in some cases, such as when performing functions such as sysdate().
MIXED replication is based on the MIXED replication of STATMENT and ROW modes. Generally, STATEMENT mode is used to store binlogs for replication. For operations that cannot be copied in STATEMENT mode, ROW mode is used to store binlogs

Usage scenarios

Binlog can be used in two scenarios: master/slave replication and data recovery.

For Master/Slave replication, enable binlog on the Master and then send the binlog to each Slave. The Slave replays the binlog to achieve data consistency between the Master and Slave.
Restore data to a log at a time, using the mysqlbinlog tool to restore data;

Brush set time

For InnoDB storage engine, biglog is only recorded when a transaction is committed, while the records are still in memory. Mysql controls the flush timing of Biglog by using the sync_binlog parameter. The value ranges from 0 to N, where N indicates how many times to flush. If innodb_support_XA is set to 0, the system determines when to write to disk. If innodb_support_XA is set to 1, the system determines when to write to disk. If innodb_support_XA is set to 1, the system determines when to write to disk. This ensures that the two logs are in sync.

redo log

The redo log consists of two parts :redo log buffer and redo log file. The redo log buffer is stored in memory, and the redo log file is stored on disk. Redo log buffer is written first, and redo log file is written in conditional order. When is the buffer written to the file triggered?

The main thread in the InnoDB background thread flusher buffer data to disk once per second.
The innodb_flush_log_at_trx_COMMIT parameter is set to 1 to control the flush timing. Each time a transaction commits, the log buffer is written to the OS buffer and fsync() is called to the log file on disk. This way, no data is lost even if the system crashes, but IO performance is poor because every commit is written to disk. When set to 0, instead of writing logs from the log buffer to the OS buffer during transaction commits, the OS buffer is written every second and fsync() is called to log file on disk. This means that 0 is written to disk (approximately) every second, and when the system crashes, 1 second of data is lost. When set to 2, each commit is written only to the OS buffer, followed by a call to fsync() every second to write the logs from the OS buffer to the log file on disk.

Redo log Log format

Redo logs log changes to data pages. Redo logs are designed in a fixed size, circular fashion. When redo logs are written to the end, they are written back to the beginning, essentially in a loop.

The redo log is designed to reduce the need for data pages to be flushed to the real data disk. Here’s how to update data files to checkpoint:

The ring has 4 ib_logfile_* files. This is the redo logfile. You can control the number of innodb_log_files_in_group files by controlling the number of innodb_log_files_in_group files. Innodb_log_file_size is used to control the size of files. If the size is too large, the crash recovery will be too slow. Do not set the size too small, which may cause a transaction to switch log files multiple times.

Write POS and Check Point. The redo log is empty between write POS and check point. Between the check point and Write POS are data page changes recorded in the redo log. When the Write POS catches up with the Check point, it moves the Check point forward to make room for new logs.

When InnoDB is started, it tries to restore the database regardless of whether it was shut down normally last time. There are two cases:

The checkpoint indicates that all logs are flushed to the LSN on the data page on the disk. Therefore, only the logs starting from the checkpoint are recovered. The LSN indicates the total number of bytes written to the log. And the transaction is committed. When the database is started, the LSN of the data page on disk is checked. If the LSN of the data page is smaller than the LSN in the log, the recovery starts from the checkpoint.
The data page is flushed faster than the log page. Procedure In case of a downtime, the LSN recorded in the data page is larger than that recorded in the log page. This is detected during the recovery process of the restart. In this case, the part that exceeds the log progress will not be redone, because this indicates that the work has been done.

Because redo logs record physical changes to data pages during recovery, redo logs are much faster than logical logs such as binlogs.

Scenarios used

MySQL is used to ensure the persistence of transactions. The redo log records the status after a transaction is executed. It is used to restore the updated data of a successful transaction that is not written to the data file. In case there are dirty pages that have not been written to disk at the time of the failure, redo the mysql service according to the redo log to achieve transaction persistence.

undo log

Undo logs record logical changes of data, rollback operations of user transactions, and MVCC. Undo logs are stored in the shared tablespace as rollback segments.

Undo log format

Logs in a logical format can logically restore data to the state before the transaction when the transaction is rolled back.

Scenarios used

The atomicity of the data is ensured, a version of the data before the transaction occurs can be used for rollback, and multi-version concurrent control read (MVCC), also known as unlocked read, can be provided.

Brush set time

After a transaction is committed, the Undo log cannot be deleted immediately. Instead, it is placed in the linked list to be cleaned, and the Purge thread determines whether the undo log space can be cleaned by other transactions that used version information prior to the last transaction in the undo segment table.

Memory pool

The InnoDB storage engine is disk-based, meaning data is stored on disk. Due to the gap between CPU speed and disk speed, the InnoDB storage engine uses buffer pool technology to improve overall database performance. A memory pool is simply an area of memory. When a page is read in the database, the page read from the disk is stored in the memory pool. When the same page is read next time, the system checks whether the page is in the memory pool. If yes, the system reads the page directly. Otherwise, the page on disk is read. For page changes in the database, the page is first modified in the memory pool and then flushed to disk at a certain frequency, not flushed back to disk every time the page changes.

The information cached in the memory pool includes index page, data Page, INSERT Buffer, adaptive hash index, Lock INFO, and data dictionary information. Index and data pages make up a large portion of the buffer pool. In InnoDB, the default page size in the memory pool is 16KB, the same as the default page size on disk. We have already introduced the storage structure of data files and we believe that you will have some understanding of the contents of the cache structure. We will not introduce the contents of the cache structure separately, but we will focus on the insert buffer and adaptive hash index, as well as the design principle of the extended memory pool.

Insert Buffer

The Insert Buffer is designed to determine whether a non-clustered index page is in the Buffer pool, and if it is not, it is inserted into an Insert Buffer object first. The non-clustered index of the database is inserted into the leaf node when it is not, but stored in a different location. The merge of Insert Buffer and secondary index page child nodes is then performed with a certain frequency and circumstance, which usually increases performance for non-clustered index inserts. This may be the case when the MySQL database is down and a large number of Insert buffers are not merged into the pages of the non-clustered index, which takes a long time for MySQL to recover. The following conditions must be met: The index is a non-clustered index, and the index is not unique. We’ll talk about the implementation next time;

Adaptive hash index

The InnoDB storage engine monitors queries to index pages on tables. If it is observed that creating a hash index can improve speed, the resume hash index is called an adaptive hash index. The AHI is constructed from the B+ tree pages of the buffer pool. So build very fast, and do not build hash indexes on the entire table. The InnoDB storage engine automatically hashes some hot pages based on the frequency and mode of access.

A background thread

Master Thread

This is the core thread, which is mainly responsible for asynchronously refreshing data from the buffer pool to disk to ensure data consistency, including refreshing dirty pages, merging and inserting buffers, etc.

IO Thread

InnoDB storage engine uses a lot of asynchronous IO to handle write I/O requests. IO threads are mainly responsible for the callback processing of these I/O requests.

Purge Thread

After the transaction is committed, the Undo log may no longer be needed, so Purge Thread is needed to reclaim the undo pages that have been used and allocated. InnoDB supports multiple Purge threads to speed up undo page recycling. With the overall feature introduction completed, let’s talk about how data is inserted into the InnoDB engine: Assume the following scenario: Create table T(Id int primary key, a int not null, name vARCHar (16),index (a))engine=InnoDB; Insert into T (id,a,name) values(id1,a1,’ hahaha ‘),(id2,a2,’ hahaha ‘); There are two possible scenarios for inserting data: the first scenario assumes that Id1 is in the memory pool. The second scenario assumes that Id1 is in the memory pool.

Update the Index Page and Data Page in the Buffer Pool.
Write to redo log in pre-commit state;
I’ll write it in binlog,
Commit transaction, in commit state, two-phase commit;
Background threads write to the index and data segments of the data file;

The second scenario assumes that id2 is no longer in the memory pool,

Data is written to the memory pool, non-clustered indexes to the Insert Buffer, and other Data to the Data Page;
The subsequent actions remain the same as the remaining steps above.

The end of the

Welcome to your little attention, little praise, thank you!

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

How does MYSQL query and insert data flow