Common storage engines

1.1 the InnoDB

InnoDB is the default storage engine after MySQL 5.5. It has the characteristics of high reliability and high performance. It has the following advantages:

  • DML operation completely follows ACID model, supports transaction, supports crash recovery, can greatly protect user data security;
  • Support multi-version concurrency control, it will save the old version of data information, so as to support concurrency and transaction rollback;
  • Supports row-level locking and supports consistent oracle-like read features to withstand high concurrent access;
  • InnoDB organizes data by primary key clustering by default, which makes primary key lookup more efficient. For frequently accessed data, InnoDB also creates hash indexes to improve the efficiency of equivalent queries. This is also called adaptive hash indexes.
  • InnoDB storage is disk based and all stored records are managed on a page basis. To bridge the gap between CPU speed and disk speed, InnoDB uses Buffer pools to improve overall data performance. When querying, the target page is read into the cache; During the modification, pages in the buffer pool are first modified and then flushed back to disk following the CheckPoint mechanism. All cached pages are cleaned periodically using the least Recently used (LRU) principle.
  • InnoDB supports DoubleWrite to ensure data security and improve system reliability.

The complete memory structure and disk structure of an InnoDB engine are shown below:

1.2 MyISAM

MyISAM was the default storage engine before MySQL 5.5. Two files with the same name are created when the MyISAM table is created:

  • extension.MYD(MYData) : used to store table data;
  • extension.MYIMYIndex) : Used to store index information of a table.

After MySQL 8.0, only the above two files with the same name will be created because the definitions of table structures after 8.0 are stored in the MySQL data dictionary, but before MySQL 8.0, a file with the extension.frm will also exist to store table structure information. The main difference between MyISAM and InnoDB is that it only supports table-level locking, not row-level locking, transactions, and automatic crash recovery, but it can be checked and repaired using the built-in mysqlCheck and Myisamchk tools.

1.3 the MEMORY

A MEMORY storage engine (also known as a HEAP storage engine) is typically used to store data from tables in MEMORY and has the following characteristics:

  • The table definition information of the MEMORY table is stored in the MySQL data dictionary, while the actual data is stored in the MEMORY space and divided into blocks. Therefore, when the server restarts, the table itself is not deleted, but all data in the table is lost.
  • The MEMORY storage engine supports HASH index and BTREE index. The default HASH index is used.
  • MEMORY tables use a fixed-length row storage format, even for VARCHAR types.
  • MEMORY supports AUTO_INCREMENT columns, but not BLOB or TEXT columns.
  • The difference between MEMORY tables and MySQL internal temporary tables is that MEMORY tables are stored in MEMORY by default, but MEMORY tables are not affected by storage conversion, while internal temporary tables are automatically converted to disk storage when the threshold is reached.

Based on the above features, MEMORY tables are mainly suitable for storing temporary data such as session state, real-time location, and so on.

1.4 the CSV

The CSV storage engine uses comma-separated values to store data in text files. When creating a CSV table, two files with the same name are created simultaneously:

  • An extension namedcsv, is responsible for storing the data of the table. The file format is plain text, which can be modified by the spreadsheet application program (such as Microsoft Excel). The corresponding modification operation will be directly reflected in the database table.
  • Another extension is calledCSMIs responsible for storing the state of the table and the number of rows that exist in the table.

1.5 ARCHIVE

By default, the ARCHIVE storage engine uses the Zlib lossless data compression algorithm to compress data, which can store a large amount of data in a small space. When an ARCHIVE table is created, the storage engine creates an ARZ file with the same name as the table to store data. It also has the following characteristics:

  • The ARCHIVE engine supports INSERT, REPLACE, and SELECT, but not DELETE or UPDATE.
  • The ARCHIVE engine supports the AUTO_INCREMENT attribute and indexes on its corresponding columns. If you attempt to index a column without the AUTO_INCREMENT attribute, an exception will be thrown.
  • The ARCHIVE engine does not support partitioning.

1.6 MEGRE

MERGE Storage engine, also known as MRG_MyISAM engine, is a collection of the same MyISAM tables.” “Same” means that all tables must have the same column data type and index information. MERGE tables can be created with the UNION = (list-of-tables) option as follows:

mysql> CREATE TABLE t1 ( a INT NOT NULL AUTO_INCREMENT PRIMARY KEY, message CHAR(20)) ENGINE=MyISAM;
mysql> CREATE TABLE t2 ( a INT NOT NULL AUTO_INCREMENT PRIMARY KEY, message CHAR(20)) ENGINE=MyISAM;
mysql> INSERT INTO t1 (message) VALUES ('Testing'),('table'),('t1');
mysql> INSERT INTO t2 (message) VALUES ('Testing'),('table'),('t2');
mysql> CREATE TABLE total (a INT NOT NULL AUTO_INCREMENT,message CHAR(20), INDEX(a))
       ENGINE=MERGE UNION=(t1,t2) INSERT_METHOD=LAST;
Copy the code

The INSERT_METHOD option controls the insertion of a MERGE table when creating a table: use FIRST or LAST to indicate that an insert is performed in the FIRST or LAST underlying table, respectively; If INSERT_METHOD is not specified or the value is set to NO, inserts are not allowed on the MERGE table. MERGE tables support SELECT, DELETE, UPDATE, and DELETE statements as shown in the following example:

mysql> SELECT * FROM total; +---+---------+ | a | message | +---+---------+ | 1 | Testing | | 2 | table | | 3 | t1 | | 1 | Testing | | 2 | table | |  3 | t2 | +---+---------+Copy the code

Second, the index

2.1 B+ tree Data structure

Unless otherwise specified, most databases use B+ tree indexes, which are built based on B+ Tree data structures. Why B+ tree rather than balanced binary tree (AVL) or red-black tree data structures? Here, it is assumed that the index is 1-16, the performance of various data structures is as follows:

Balanced binary tree data structure:

Red-black tree data structure:

Btree data structure:

B+ Tree data structure

The above pictures are automatically generated by Data Structure Visualizations website, interested friends can also try by themselves.

From the above diagram, we can see that the B+ Tree has the following advantages:

  • All non-leaf nodes (for example, 003,007) of a B+ Tree are redundant in leaf nodes. All leaf nodes are organized in the form of a linked list. The advantage of this is that all index information can be obtained by traversing leaf nodes in range query.
  • All non-leaf nodes of B+ Tree can store multiple data values, depending on the size of the node. In MySQL, the size of each node is 16K, so it has a larger outgo, that is, the height of the Tree is lower for the same amount of data.
  • All non-leaf nodes store only index values, not actual data. Only leaf nodes store pointer information or data information. Based on the size of 16K for each node, the height of the tree is usually between 3 and 6 (depending on the number of bytes of index value) for tens of millions of data, so its query performance is very excellent.
  • The data stored by leaf nodes depends on the implementation of different databases and, in the case of MySQL, on the storage engine used and whether it is a primary key index.

2.2 B+ tree Index

In The case of InnoDB, the primary key index is a clustered index, so the leaf node stores the actual data. A non-primary key index stores the value of the primary key:

For MyISAM, since the primary key index is a non-clustered index, its leaf node stores only Pointers to data locations:

To sum up, the B+ tree structure is generally applicable to range search, optimal sorting and grouping. B+ tree is built based on lexicographical order, so it is suitable for the following queries:

  • All values match: Performs an exact lookup based on the index. Such asemp_noThe field is an index, and the search condition isemp_no = 10008.
  • Prefix matching: Uses the prefix of the union index as the search condition. Such asemp_nodept_noIs the joint index, and the search condition isemp_no = 10008.
  • Column prefix matching: matches the beginning of the value of the index column. Such asdept_noIs the index, and the query condition isdept_no like "d1%". Prefix matching and column prefix matching are both the embodiment of index prefixes and are also called prefix indexes in some cases.
  • Matching range value: Matches a range of values by index column. Such asemp_noThe field is an index, and the search condition isemp_no > 10008.
  • Queries that access only indexes, such as:emp_noThe field is the index, and the query statement isselect emp_no from employeesIn this case, emp_NO index is called the overwrite index of this query, that is, only the index can obtain all query information, without accessing the data in the actual table.
  • Matches exactly one column and ranges exactly one column, such as:emp_nodept_noIs the joint index, and the search condition isdept_no = "d004" and emp_no < 10020In this case, the index order can be either (emp_no, dept_no) or (dept_no, emp_no). EXPLAIN is used to analyze the index, and its TYPE is range. But (dept_NO, emp_NO) performs better.

2.3 Hash Index

When a hash index is used, the storage engine hashes the value of the indexed column and stores the computed hash value and pointer to that row in the index, so it is more suitable for equal-comparison queries than range queries, and also cannot be used for operations such as optimal sorting and grouping. When creating hash indexes, select columns with high selectivity, that is, the data on the columns is not easy to repeat (such as id number), so as to avoid hash conflicts as much as possible. Because hash indexes do not need to store index column data, their structure is relatively compact and the corresponding query speed is relatively fast.

InnoDB engine has a function called adaptive Hash Index. When certain index values are frequently used, it will create a hash index based on B+ tree index in memory, so that the B-tree index has the advantages of quick hash index search.

2.4 Advantages of indexes

  • Indexes greatly reduce the amount of data that the server needs to scan;
  • Indexes help the server avoid sorting and temporary tables;
  • An index converts random IO to sequential IO.

2.5 Using Policies

  • You should avoid using functions or expressions on indexed columns when querying.
  • For multi-column indexes, the union index should be established in descending order of frequency of use.
  • Try to avoid creating redundant indexes. If there is an index (A, B), then index A is created, because index A is A prefixed index of index (A, B), which is redundant.
  • When building indexes, you should consider the sorting and grouping requirements for queries. MySQL uses indexes to sort results only if the index columns are in exactly the same ORDER as the ORDER BY clause and follow the same ascending or descending rules.

Third, the lock

3.1 Shared and exclusive Locks

InnoDB storage engine supports two standard row-level locks:

  • Shared Lock (S Lock, also known as read Lock) : allows locked transactions to read data.
  • X Lock (also called write Lock) : Allows locked transactions to delete or modify data.

The compatibility of exclusive and shared locks is as follows:

X X
X Are not compatible Are not compatible
S Are not compatible Compatible with

3.2 Intentional shared lock and intentional exclusive lock

To illustrate the purpose of intent locking, let’s start with an example where transaction A uses an S lock to lock A row in A table, making it read but not write. Transaction B then attempts to acquire A write lock on the entire table. If transaction B succeeds, it should theoretically be able to modify any row in the table, which conflicts with the row lock held by transaction A. To solve this problem, the database must know that a row in the table is locked and thus block transaction B when it tries to acquire a write lock for the entire table. To know that a row in a table is locked, you can traverse each row of the table. This is possible but not very efficient, so InnoDB introduces intentional locking.

  • Intended shared Lock (IS Lock) : a shared Lock exists on a row or rows in the current table.
  • LX Lock: An exclusive Lock exists on a row or rows in the current table.

According to the rules of intentional lock, when transaction A locks A row in the table, IS locks will be added to the table at the same time. Later, when transaction B tries to acquire the X lock of the table, because the X lock IS not compatible with the IS lock, transaction B will be blocked.

X IX S IS
X Are not compatible Are not compatible Are not compatible Are not compatible
IX Are not compatible Compatible with Are not compatible Compatible with
S Are not compatible Are not compatible Compatible with Compatible with
IS Are not compatible Compatible with Compatible with Compatible with

3.3 Consistent Read

1. Consistent non-locked read

Consistent Nonlocking Read refers to that in InnoDB storage engine, if a row to be read is performing a DELETE or UPDATE operation, there is no need to wait for the row lock to be released. Instead, read the snapshot data for the row in the undo log as follows:

  • At the READ COMMITTED transaction isolation level, the latest snapshot data of the locked row is READ.
  • At the REPEATABLE READ transaction isolation level, READ data from the version in which the transaction started.

Based on multi-version concurrency control and consistent unlocked read, the lock acquisition wait can be avoided and the performance under concurrent access can be improved.

2. Consistency lock

Consistent lock read allows users to manually lock data when performing SELECT operations based on their own requirements in the following ways:

  • SELECT … FOR SHARE: lock S on read rows;
  • SELECT … FOR UPDATE: Lock X on read rows.

3.4 Lock algorithm

InnoDB storage engine supports three locking algorithms:

Record Lock: a row Lock used to Lock a single row Record. The following is an example:

Use row locks to prevent other transactions from updating or deleting rows
SELECT c1 FROM t WHERE c1 = 10 FOR UPDATE;
Copy the code

1. Gap Lock: a Gap Lock that locks a range but excludes the record itself, used primarily to solve illusory problems, as shown in the following example:

-- Gap locking prevents other transactions from inserting the value 15 into column T.c1
SELECT c1 FROM t WHERE c1 BETWEEN 10 and 20 FOR UPDATE;
Copy the code

Next-key Lock: Equivalent to a row Lock + a gap Lock that locks both the range and the record itself. Can be used to solve the “current read” problem in phantom reading.

Four, transaction

4.1 ACID definition

InnoDB storage engine fully supports ACID model:

1. Atomicity

A transaction is an indivisible minimum unit of work. All operations of a transaction are either committed successfully or rolled back on failure. There is no partial success.

2. Consistency

The database remains in a consistent state before and after the transaction, and the integrity of the database is not compromised.

3. Isolation

Multiple concurrent transactions are allowed to operate on data at the same time, but changes made by one transaction are not visible to other transactions until the final commit.

4. Durability

Once a transaction commits, its changes are stored in the database forever. It will not be lost even if there is a downtime.

4.2 Implementation of transactions

Database isolation is implemented by the locks described in the previous section, and atomicity, consistency, and persistence are implemented by undo and redo logs.

  • Undo log: The undo log segment is stored in the undo tablespace or global temporary tablespace. It records the status of data before modification. It is mainly used to roll back transactions and implement MVCC functions (such as consistent unlocked reads).
  • Redo log: Logs modified data to ensure transaction persistence.

4.3 Concurrency Problems

In a concurrent environment, data changes typically cause four kinds of problems:

1. Lost updates

The update operation of one transaction is overwritten by the update operation lock of another transaction, resulting in inconsistent data:

2. Dirty reads

Under different transactions, one transaction reads uncommitted data from another transaction:

3. Do not repeat the read

When the data is read from the same transaction, the results of the two reads of the same data are inconsistent because the data is modified by other transactions:

4. The magic to read

Between two reads of the same transaction, the data is modified by other transactions, so that the second read does not have the data in the first one, or the data in the first one does not have the data in the second one, as if the previous read was an “illusion” :

4.4 Isolation Level

InnoDB supports four isolation levels. The default isolation level is repeatable read:

  • Read Uncommitted: At this level, changes in one transaction are visible to other transactions even if they are not committed.
  • Read Committed: At this level, changes in one transaction are visible to other transactions only if they have been committed.
  • Repeatable read: Guarantees that the same data is read multiple times in the same transaction.
  • Serialization: All transactions are forced to execute serially, and since parallelism is gone, none of the concurrency problems described above will occur.

At each level, whether concurrency problems are possible is as follows:

Isolation level Dirty read Unrepeatable read Phantom read
READ UNCOMMITTED possible may may
READ COMMITTED Never to appear may may
REPEATABLE READ Can’t be Can’t be may
SERIALIZABLE Can’t be Can’t be Can’t be

In terms of the database level, any current isolation level will not happen lost update problem, under the InnoDB storage engines, for example, if you want to change to a row of data in the table, the data will add X lock on, and the corresponding table will add IX lock, any other transaction must wait for the lock to modify operation.

5. Database design paradigm

Three common patterns in database design are as follows:

First normal form: Properties are indivisible

Each column in the table is required to be an atomic item that cannot be subdivided. This is the minimum paradigm requirement and can usually be met.

Second normal form: Attributes are completely dependent on primary keys

Requires that non-primary key columns must be completely dependent on primary key columns, and that no partial dependencies exist. The following is an example:

Mechanism_id (Organization code) Employee_id (Employee ID) Ename (Name of Employee) Mname (Organization Name)
28193182 10001 heibaiying XXXX company

The above is a statistical table of the city’s active personnel, the primary key is: organization code + employee number. The employee name in the table is entirely dependent on this federated primary key, but the agency name is only dependent on the agency code, which is a partial dependency and thus violates the second normal form. The common solution is to create a dictionary table of organization and organization name.

Third normal Form: Avoiding transitive dependencies

A non-primary key column cannot depend on other non-primary key columns, and transitive dependencies occur when other non-primary key columns depend on the primary key column. The following is an example:

Employee_id (Employee ID) Ename (Name of Employee) Dept_no (Department number) Dname (Department Name)
10001 heibaiying 06 Development department

The employee name and department number depend on the primary key employee_id, but the department name depends on the department number. In this case, non-primary key columns depend on other non-primary key columns, which violates the third normal form. In this case, the common solution is to create a department table to maintain department-related information.

Antiparadigm design

As we can see from the above examples, many additional tables may be required for maintenance to fully follow the tri-paradigm design. Therefore, in daily development, the combination of other factors may not completely follow the paradigm design, and may even violate the paradigm design, which is anti-paradigm design.

The resources

  1. The InnoDB Storage Engine, Optimization and Indexes, InnoDB Locking and Transaction Model
  2. Jiang Chengyao. MySQL Technology Insider :InnoDB Storage Engine (2nd edition). MySQL Technology Insider. 2013-05
  3. Baron Schwartz/Peter Zaitsev/Vadim Tkachenko. High-performance mysql(3rd edition). Publishing House of Electronics Industry. 2013-05-01
  4. InnoDB data page parsing
  5. The data structure and algorithm principle behind MySQL index
  6. MYSQL -b +TREE index
  7. The relationship between transaction isolation level and locks in Innodb

For more articles, please visit my GitHub repository:Full-Stack-Notes !