Refer to the Dark Horse Architecture course

A storage engine system

1.1 MySQL Architecture



Described above,

Connection Pool Management Services & Utilities COMPONENT SQL Interface: SQL Interface component Parser: Query analyzer component Optimizer Caches & Buffers Pluggable Storage Engines File SystemCopy the code

1. The connection layer

At the top layer are some client and link services, including local SOCK communication and most TCP/ IP-like communication implemented based on client/server tools. Mainly complete some similar connection processing, authorization and authentication, and related security schemes. On this layer, the concept of thread pools is introduced to provide threads for clients that are securely accessed through authentication. SSL - based secure links can also be implemented at this layer. The server also validates that it has operational permissions for each client that is securely connected.Copy the code

2. The service layer

The second layer architecture mainly completes most of the core service functions, such as SQL interface, and completes the cache query, SQL analysis and optimization, and the execution of some built-in functions. All cross-storage engine functionality is also implemented in this layer, such as procedures, functions, and so on. In this layer, the server will parse the query and create the corresponding internal parse tree, and perform the corresponding optimization, such as determining the order of the query table, whether to use the index, and finally generate the corresponding execution operations. If it is a SELECT statement, the server will also query the internal cache. If the cache space is large enough, this can greatly improve the system performance in an environment that handles a large number of read operations.Copy the code

3. The engine layer

Storage engine layer, storage engine is really responsible for the storage and extraction of data in MySQL, server through API and storage engine communication. Different storage engines have different functions. In this way, we can select a proper storage engine based on our own needs.Copy the code

4. Storage layer

The data storage layer stores data on file systems and interacts with storage engines.Copy the code

Compared to other databases, MySQL storage engine is a plug-in storage engine architecture. Separate query processing from other system tasks and data storage extraction. This architecture allows you to select the appropriate storage engine based on business requirements and actual needs.

1.2 Introduction to storage Engines

1. Overview You can select an optimal storage engine based on different storage requirements. A storage engine is a way to store data, build indexes, update query data, and so on. Storage engines are table based, not library based. So storage engines can also be called table-like.

2. View the MySQL storage engine

3. There are two common MySQL engines



MySQL supports InnoDB by default

Two InnoDB in-depth plane analysis

2.1 InnoDB architecture



1. The buffer pool

introduce

The InnoDB storage engine is based on disk file storage. In order to make up for the difference in I/O efficiency between accessing physical disk and in-memory, we need to load frequently used data into the buffer pool and avoid magnetic disk I/O on every access.

InnoDB’s buffer pool not only caches index and data pages, but also contains undo pages, insert caches, adaptive hash indexes, and InnoDB lock information.

Read When a page is read in a database, the page data read from the disk is stored in the buffer pool. The next time the same page is read, the system checks whether the page data exists in the buffer pool. If the page data is not read, the system reads the page data from the disk.

Update For pages in the database, the page in the buffer pool is first modified, and then refreshed to disk at a certain frequency to ensure that the data in the buffer pool is consistent with the data in disk. The flushing of pages from the buffer pool to disk does not need to be triggered every time a page is updated. For overall performance, the flushing of pages to disk is done through the checkpoint mechanism.

Parameters are configured on dedicated servers, typically allocating up to 80% of physical memory to buffer pools. Parameter Settings:

show variables like 'innodb_buffer_pool_size';
Copy the code



In the InnoDB engine, multiple buffer pool instances are allowed to be allocated to different buffer pool instances according to the page hash value, thus reducing resource contention within the data library and improving concurrent processing capability. Parameter configuration:

vi /etc/my.conf

innodb_buffer_pool_size=268435456
Copy the code

2. Background threads Master threads are responsible for asynchronously refreshing data in the buffer pool to disks to maintain data consistency, including updating dirty pages, merging and inserting caches, and recycling undo pages.

IO Thread

InnoDB storage engine uses AIO to handle I/O requests, which can greatly improve database performance. IO Threads are responsible for the callback of these I/O requests.



Purge Thread

It is used to retrieve the undo log that has been committed by the transaction. After the transaction is committed, the Undo log may not be used, so it is used to retrieve the undo log.

The Pager Cleaner Thread is a new Thread designed to help the Master Thread refresh dirty pages to the disk. It can reduce the working pressure of the Master Thread and reduce the congestion.

3. File FRM file This file is used to store the metadata information of each table, including the table structure definition.

System table space

The system table space is the storage area for InnoDB data dictionary, secondary write buffer, change buffer and undo log. A system tablespace can have one or more data files. By default, a tablespace data file named IBDatA1 is created in the directory where the data is stored. The file name can be specified with the innodb_data_file_path parameter.

file_name:file_size[:autoextend[:max:max_file_size]]
Copy the code

Exclusive tablespace innoDB if innodb_file_per_table is set to 1/ON, data and index data are stored in an exclusive tablespace. Therefore, an exclusive tablespace file (IBD) is generated.

Redo log A redo log used to restore the page operations that committed transaction changes to ensure atomicity and persistence of the transaction. The main solution is to solve the submitted transaction is not completed but the database crash, when the database recovery, you can complete the recovery of data. InnoDB storage engine puts redo logs into the redo log buffer, and then flusher the buffer to redo logs with varying strategies and frequencies. The redo log files are named ib_logfile0 and ib_logFILe1.

Bin log Binary logs that record data changes in the table structure, including DDL and DML.

Other error logs, query logs, and slow query logs.

2.2 InnoDB Logical Storage Structure



1. The table space

The table space is the highest level of logical structure of InnoDB storage engine, and most of the data resides in the shared table space IBDatA1. If innodb_file_per_TABLE is enabled, each table will have a tablespace (xxx.ibd) that holds data from the table, indexes, and inserts cached Bitmap pages. Other data such as undo log, insert cache index page, system transaction information, and secondary write cache are all in the shared tablespace.

2. A segment tablespace consists of segments, such as data segments, index segments, and rollback segments. InnoDB storage engine is organized based on indexes, so data is index and index is data. The data segment is the leaf node of B+ tree, and the index segment is the non-leaf node of B+ tree. InnoDB segment management is done by the engine itself, there is no need to control it.

3. An area is a unit structure of a tablespace. Each area is 1 MB in size. By default, InnoDB storage engine page size is 16K, that is, a total of 64 consecutive pages in a region.

4. Pages are the smallest unit of the extent-they are also the smallest unit of disk management for InnoDB storage engine. The default size of each page is 16KB. To ensure page continuity, InnoDB storage engine requests 4-5 extents from disk at a time.

InnoDB storage engine is row-oriented, that is, data is stored in rows. Each page can store rows with a maximum of 16KB/2-200 rows (7992 rows).

Trx_id: Each time a change is made to a cluster index record, the corresponding transaction ID is assigned to the trx_ID hidden column. Roll_ pointer: Every time a change is made to a clustered index record, the old version is written to the Undo log. The hidden column is then used as a pointer to the previous record.Copy the code

2.3:

1. Introduction During daily DML operations, the buffer pool is operated first and the data in the disk is not directly written to the disk. This may cause data inconsistency between the memory and the data in the disk. Checkpoint is the job of flushing dirty pages from memory to disk under certain conditions.

If an outage occurs while flushing page data from the buffer pool to disk, the data cannot be recovered; To avoid this, Write Ahead Log(WAL) policy is adopted. When a transaction is committed, the redo Log is written, the buffer pool data page is modified, and the data is flushed to disk by Checkpoint (Checkpoint triggered by transaction promotion). In this way, ongoing transactions can be recovered because of the presence of the log, and non-logged transactions have not yet executed and do not lose data.

2. Functions A. Shorten the data recovery time When A database outage occurs, the database does not need to redo all the logs because the pages before the Checkpoint have been flushed. Therefore, the database only needs to redo the logs after the Checkpoint, which greatly reduces the recovery time.

B. If the buffer pool is insufficient, flush dirty page data to disks. When the buffer pool is insufficient, the LRU algorithm overflows the least recently used pages. If the pages are dirty, Checkpoint is performed to flush the dirty pages to the disk.

C. Flush dirty pages to disk when redo logs are unavailable.

Redo logs are of a fixed size. In the current InnoDB engine, redo logs are designed to be recycled, not infinitely large. The portions of redo logs that can be reused are no longer needed, and can be overwritten even when the database is down. If redo logs are still needed, you must Checkpoint. Flush the pages in the buffer pool at least to disk and checkpoint moves to the current redo log location.



Write pos indicates the position where logs are currently recorded. When ib_logFILe_1 is full, logs are recorded from ib_logFILe_0. Check point Indicates that the log changes are written to the disk and the data falls to the disk. After the data falls to the disk, the checkpoint deletes the log records. Write position ->checkpoint is the empty part of the redo log. Used to record new records, checkpoint->write position between the redo log disk data modification record. When write Postion attempts to catch up with checkpoint, it stops recording and pushes checkpoint forward to record new logs.

A. Sharp Checkpoint Sharp Checkpoint occurs when A database is shutdown and all dirty pages are flushed back to disk. This is the default operation mode.

B. Fuzzy Checkpoint When InnoDB storage engine is running, Fuzzy Checkpoint is used to refresh pages. Only some dirty pages are refreshed.

2.4 InnoDB Key Features

1. Insert cache Primary keys are unique identifiers of rows. In applications, row records are generally inserted in the order of increasing primary keys. Because of this, the insertion of clustered indexes is generally sequential and does not require a random read from disk. Therefore, in such cases, the insertion is usually done quickly.

However, it is not possible to have only one clustered index on each table, and more often, there are multiple non-clustered secondary indexes on a table. For example, we also need to search by the name field, and the name field is not unique, which results in a non-clustered and not unique index. When inserting, data pages are stored in the order in which the primary key ID is executed, but for non-clustered indexes, leaf nodes are inserted no longer in that order. At this point, non-clustered index pages need to be accessed discretely, where insert performance is reduced. However, this is not an index error on the name field, because the nature of the B+ tree determines the discreteness of non-clustered index inserts.

InnoDB storage engine innovatively designed the insert buffer. The insert or update operation of the non-clustered index is not directly inserted into the index page each time, but determines whether the inserted non-clustered index page is in the buffer pool. If yes, insert directly; If not, is first put into an insert buffer, like cheating the index database this aggregation is inserted to the leaf node, and then at a certain frequency of the insert buffer and non clustered index leaf node merging, then usually can insert multiple merge into one operation (because in an index page), This greatly improves the performance of insert and modify operations on non-clustered indexes.

2. Write twice When a database writes to a physical page, if it goes down, the consistency of the physical page may be broken.

One might say, isn’t redo logging a way to recover physical pages? Actually, yes, but only if the physical pages are consistent. That is, if the physical page is exactly as it was before it was written, it can be recovered using redo logs. If the physical page is completely written, it can also be restored using redo logs. However, if new data is written at 2K before the physical page, but old data is still written at 2K after the physical page, then the redo log cannot be used for recovery in this case.

The two writes here ensure that the physical page is consistent, allowing for recovery with redo logs even if it goes down. The physical page is not written directly to the actual physical page, but first to a temporary page, after the temporary page is written, then to the physical page. In this way:

A. If A temporary page is written and the physical page is still in its original state, the redo log can be used to restore the page. If there is an outage while writing the physical page, you can use temporary pages to restore the physical pageCopy the code

Each time a physical page is written, it is written to the Double Write Buffer and then from the Double Write Buffer to the Double Write. Finally, it is written to the physical page from the double write buffer.

3. Adaptive Hash index in InnoDB supports B+ tree index structure by default. B+ tree index can use range search and store data in order, so it is easy to sort data. The Hash index can only meet the =, <>,in query, cannot use the range query, and the data is stored in no order.

MySQL uses B+ trees as indexes by default because B+ trees have advantages that Hash indexes do not, so why do you need adaptive Hash indexes?

This is because the number of B+ tree lookups depends on the height of the B+ tree, which in a production environment is typically 3-4 levels high, requiring 3-4 queries. However, Hash index is very efficient in data retrieval, usually requiring only O(1) complexity, that is, data retrieval can be completed once. Although there are many limitations to how Hash indexes can be used, the advantages are obvious. The InnoDB storage engine monitors queries on each Index page of the table and builds hash indexes, called Adaptive Hash Indexes (AHI), if it sees that hash indexes can improve the speed.

Note that by adaptive, we mean that there is no need for manual specification, and the system will do it automatically.

When should an adaptive Hash index be used? If certain data is frequently accessed, the address of the data page is stored in the Hash table when certain conditions are met. So that the next time you query, you can directly find the location of the page. It is important to note that the hash index can only be used for =, in queries. For other query types, such as range matching, the hash index cannot be used. And adaptive Hash indexes only store hot data (frequently used data), not full table data. Therefore, the amount of data is not very large, so the adaptive Hash is also stored in the buffer pool, which further improves the search efficiency.

4. Asynchronous IO To improve disk performance, InnoDB storage engine uses asynchronous non-blocking AIO to operate disks.

Sync IO corresponds to AIO. For synchronous I/O operations, you can perform subsequent operations only after each I/O operation is complete. However, if the user issues an index scan query, the SQL query may need to scan multiple index pages, which requires multiple I/O operations. It is not necessary to scan each page and wait for it to complete before the next scan.

After sending an I/O request, you can immediately send another I/O request. After all I/O requests are sent, all I/O operations are completed, which is called AIO.

InnoDB provides the ability to refresh adjacent pages: When refreshing a dirty page, detect all the pages in the extent area at the same time. If there are dirty pages, refresh them at the same time. The advantage of AIO is to combine write I/O requests, but the disadvantage is that some pages can be refreshed even if they are not dirty, and frequently change those pages that are not dirty quickly become dirty pages, causing frequent refresh. Consider turning this off for solid state disks (set innodb_flush_NEIGHBORS to 0).

2.5 the InnoDB transaction

A transaction can consist of a single SQL statement or a complex set of SQL statements. A transaction is a program execution unit that accesses and updates data items in a database. The SQL in this set of execution units will either all succeed or all fail during a transaction operation.

1. Transactions have the following four properties (ACID)

2. Isolation level

Problems with concurrent transactions:



To solve the transaction concurrency problem mentioned above, the database provides transaction isolation mechanisms to solve this problem. The more stringent the transaction isolation of the database, the less concurrency side effects, but the higher the cost, because transaction isolation is essentially “serialization” using transactions to a certain extent, which is obviously contradictory to “concurrency.”

The isolation levels of a database are Read uncommitted, Read committed, Repeatable Read, and Serializable. These four levels can solve dirty write, dirty Read, unrepeatable Read, and phantom Read problems one by one.

3. The implementation

1). redo log

Redo logs are used for transaction persistence. The log file consists of two parts: the redo log buffer, which is in memory, and the redo log, which is on disk. After the transaction is committed, all the modified information will be saved in this log, which is used for data recovery when errors occur when dirty pages are flushed to disk. Ex. :



Perform transaction operations

start transaction; select balance from bank where name="Tom"; -- Create redo log balance=8000 update bank set balance= balance-2000; Update finance set account= account + 2000; commit;Copy the code

process



To improve performance, mysql does not synchronize every change to disk in real time. Instead, it stores the change to a Buffer Pool and uses it as a cache. The cache pool is then flushed to disk using background threads.

When a refresh is performed, some data may be lost due to downtime or power outages. Therefore, the redo log was introduced to record the changes of successfully committed transactions, and the redo log was persisted to disk during the transaction commit. After the system was restarted, the redo log was read to recover the latest data.

Simply put, redo logs are used to recover data and to protect the persistence of committed transactions;

2). undo log

Undo log is called rollback log, which records information before data modification. This is exactly the opposite of the redo log, which records information after the data has been modified. The Undo log records logical changes of data. To roll back previous operations when errors occur, you need to record all previous operations and roll back when errors occur.



The undo log records the data of the version before the transaction is modified. Therefore, if the transaction is rolled back due to a system error or the rollback operation, the undo log information can be used to rollback to the state before the modification.

Application scenarios of storage engines

When selecting a storage engine, select an appropriate one based on the characteristics of the application system. For complex application systems, you can combine multiple storage engines based on actual conditions. The following are some common storage engine environments.

InnoDB: is the default storage engine for Mysql, used for transaction processing applications, supports foreign keys, row locks. InnoDB storage engine is a good choice if the application has high requirements for transaction integrity, data consistency under concurrent conditions, and data operation includes many update and delete operations in addition to insert and query. InnoDB storage engine not only effectively reduces locking due to deletion and update, but also ensures complete commit and rollback of transactions. InnoDB is the most suitable choice for storing items (SPU, SKU, category, brand), orders, users and other information in e-commerce systems. MyISAM: If the application is mainly read and insert operations, with few updates and deletes, and the requirements for transaction completeness and concurrency are not very high, then this storage engine is very suitable. For e-commerce systems, MyISAM is a suitable choice for the storage of system operation logs, user comments, footprints and other information.Copy the code