Write before: do not recommend that kind of come up is a variety of interview list, and then recite the memory, the promotion of technology help is very small, it does not help the serious interview, something of the interviewer dug down on the meng force.

Personal advice to interview questions as a review of feynman learning method, simplified link, prepare for the interview, follow the topic to tell yourself first, see if you will be satisfied, not satisfied to continue to learn this point, so repeatedly, a good offer is not far from you, Ollie to

GitHub JavaKeeper, N line Internet development essential skills weapon spectrum, notes taken from.

MySQL architecture

MySQL is a bit different from other databases in that its architecture can be used and used well in many different scenarios. It is mainly reflected in the architecture of storage engine. Plug-in storage engine architecture separates query processing from other system tasks and data storage and extraction. In this architecture, you can select an appropriate storage engine based on service requirements.

  • Connection layer: At the top are the clients and connection services. Mainly complete some similar connection processing, authorization and authentication, and related security schemes. On this layer, the concept of thread pools is introduced to provide threads for clients that are securely accessed through authentication. SSL – based secure links can also be implemented at this layer. The server also validates that it has operational permissions for each client that is securely connected.

  • Service layer: The second service layer, which performs most of the core service functions, including query parsing, analysis, optimization, caching, and all the built-in functions. All cross-storage functions are implemented in this layer, including triggers, stored procedures, views, and so on

  • Engine layer: The third layer is the storage engine layer. The storage engine is really responsible for the storage and extraction of data in MySQL. The server communicates with the storage engine through API. Different storage engines have different functions, so you can select them according to your actual needs

  • Storage layer: The fourth layer is the data storage layer, which stores data on the file system running on the device and interacts with the storage engine

Draw the MySQL architecture diagram, this kind of abnormal question can be asked

What is the query flow of MySQL? How does an SQL statement execute in MySQL?

Client request –> Connector (verify user identity, grant permissions) –> query cache (if cache exists, return directly, If not, perform subsequent operations) –> analyzer (perform lexical analysis and syntax analysis operations on SQL) –> optimizer (mainly select the best execution scheme method for SQL optimization) –> executor (execute will first see whether the user has execution permission, –> Go to the engine layer to retrieve data returns (if query caching is enabled, query results will be cached)


What storage engines does MySQL have? What are the differences?

Storage engines

The storage engine is a component of MySQL that processes SQL operations for different table types. Different storage engines provide different storage mechanisms, indexing techniques, locking levels, and other functions. Different storage engines can be used to obtain specific functions.

You can choose which engine to use flexibly. Multiple tables in a database can use different engines to meet various performance and practical requirements. Using an appropriate storage engine will improve the performance of the entire database.

MySQL server uses a pluggable storage engine architecture that can be loaded or unloaded from a running MySQL server.

Viewing storage Engines

SHOW VARIABLES LIKE 'storage_engine' -- SHOW VARIABLES LIKE 'storage_engine' -- SHOW VARIABLES LIKE 'storage_engine' -- SHOW VARIABLES LIKE 'storage_engine' -- SHOW VARIABLES LIKE 'storage_engine' -- SHOW VARIABLES LIKE 'storage_engine' -- SHOW VARIABLES LIKE 'storage_engine' -- SHOW VARIABLES LIKE 'storage_engine' -- SHOW VARIABLES LIKE 'storage_engine' -- SHOW VARIABLES LIKE 'storage_engine' Show create table tablename -- Show table status like 'tablename' show table status from database where name="tablename"Copy the code

Setting the Storage Engine

Specify storage engine when creating table. CREATE TABLE t1 (I INT) ENGINE = INNODB; CREATE TABLE t2 (i INT) ENGINE = CSV; CREATE TABLE t3 (i INT) ENGINE = MEMORY; ALTER TABLE t ENGINE = InnoDB; SET default_storage_engine=NDBCLUSTER;Copy the code

By default, a warning is generated whenever CREATE TABLE or ALTER TABLE cannot use the default storage engine. To prevent confusing unexpected behavior when the desired engine is not available, you can enable the NO_ENGINE_SUBSTITUTION SQL mode. If the desired engine is not available, this setting will generate an error rather than a warning, and the tables will not be created or changed

Storage Engine Comparison

Common storage engines are InnoDB, MyISAM, Memory, NDB.

InnoDB is now MySQL’s default storage engine with support for transactions, row-level locking and foreign keys

File storage structure comparison

The.frm file is used to store the meta information of each data table, including the definition of the table structure, etc., which is independent of the database storage engine. That is, any storage engine must have a. FRM file named as a. FRM table name, such as user.frm.

Show variables like ‘data%’

MyISAM physical file structure is:

  • .frmFile: Metadata related to tables are stored in FRM files, including the definition of table structures
  • .MYD (MYData) file: MyISAM storage engine specific, used to store MyISAM table data
  • .MYI (MYIndex) file: MyISAM storage engine specific, used to store MyISAM table index related information

InnoDB physical file structure is:

  • .frM file: Metadata related to tables are stored in FRM files, including the definition of table structures

  • .ibd files or.ibdata files: These two types of files are used to store InnoDB data. There are two types of files to store InnoDB data, because The data storage mode of InnoDB can be configured to determine whether to use a shared tablespace to store data, or to use a private tablespace to store data.

    Ibd files. Each table has one. Ibdata file. All tables use one.

Ps: Serious company, these are professional operation and maintenance to do, data backup, restore what, let me a Javaer to do this, do not add money?

Interview answer:

  1. InnoDB supports transactions, MyISAM does not. This is one of the main reasons MySQL changed its default storage engine from MyISAM to InnoDB;
  2. InnoDB supports foreign keys, while MyISAM does not. Converting an InnoDB table with foreign keys to MYISAM will fail;
  3. InnoDB is a clustered index, MyISAM is a non-clustered index. The files in the clustered index are stored on the leaf node of the primary key index, so InnoDB must have a primary key, which is very efficient. But secondary indexes require two queries, first to the primary key and then to the data through the primary key. Therefore, the primary key should not be too large, because if the primary key is too large, the other indexes will be too large. While MyISAM is a non-clustered index, data files are separated and indexes hold Pointers to data files. Primary and secondary indexes are separate.
  4. InnoDB does not store the exact number of rows of a table, executeselect count(*) from tableIs required for full table scan. MyISAM uses a variable to store the number of rows in the entire table. When executing the above statement, you only need to read the variable, which is fast.
  5. InnoDB’s minimum lock size is row lock, MyISAM’s minimum lock size is table lock. An update statement locks the entire table, blocking all other queries and updates, and thus limiting concurrent access. This is one of the reasons MySQL changed its default storage engine from MyISAM to InnoDB;
Compare the item MyISAM InnoDB
The main foreign key Does not support support
The transaction Does not support support
Line table lock Table locks, which lock the entire table even when operating on a single record, are not suitable for high-concurrency operations A row lock is used to lock only one row without affecting other rows. It is suitable for high-concurrency operations
The cache Only indexes are cached, not real data Caching not only indexes but also real data requires a lot of memory, and memory size has a decisive impact on performance
Table space small big
concerns performance The transaction
The default installation is is

Select * from Mysql where ID = 15; select * from Mysql where ID = 15; select * from Mysql where ID = 15;

If the table type is MyISAM, then 18. Alter table MyISAM alter table MyISAM alter table MyISAM alter table MyISAM alter table MyISAM alter table MyISAM alter table MyISAM alter table MyISAM alter table MyISAM alter table MyISAM alter table MyISAM alter table MyISAM

If the table type is InnoDB, it is 15. Since InnoDB tables only record the maximum ID of the self-incrementing primary key into memory, restarting the database or performing an OPTION operation on the table will cause the maximum ID to be lost.

Which storage engine executes SELECT Count (*) faster and why?

MyISAM is faster because MyISAM maintains a counter that can be called directly.

  • In MyISAM, the total number of rows in a table is stored on disk, and when select count(*) from t, the total number of rows is returned.

  • In InnoDB storage engine, unlike MyISAM, the total number of rows is not stored on disk. When executing select Count (*) from t, the data is first read out, row by row, and the total number is returned.

In InnoDB, count(*) statements are scanned to count the total number of rows at the time of execution, so as the data becomes larger and larger, statements become more and more time-consuming. Why InnoDB engine does not store the total number of rows on disk like MyISAM engine does? This is due to the transaction nature of InnoDB, and due to multi-version concurrency control (MVCC), InnoDB tables “should return how many rows” is also uncertain.

Data types

It mainly includes the following five categories:

  • Integer types: BIT, BOOL, TINY INT, SMALL INT, MEDIUM INT, INT, and BIG INT
  • Floating point types: FLOAT, DOUBLE, DECIMAL
  • The value can be CHAR, VARCHAR, TINY TEXT, TEXT, MEDIUM TEXT, LONGTEXT, TINY BLOB, BLOB, MEDIUM BLOB, or LONG BLOB
  • Date type: Date, DateTime, TimeStamp, Time, Year
  • Other data types: BINARY, VARBINARY, ENUM, SET, Geometry, Point, MultiPoint, LineString, MultiLineString, Polygon, GeometryCollection, etc

The difference between CHAR and VARCHAR?

Char is fixed length, varchar length is variable:

Char (n) and vARCHar (n) the n in parentheses represents the number of characters, not the number of bytes. For example, char(30) can store 30 characters.

Char allocates storage space regardless of the actual data length. The latter allocates final storage space based on the data actually stored

Similarities:

  1. Char (n), varchar(n) n represents the number of characters
  2. The string is truncated when the maximum length n of a char varchar is exceeded.

Difference:

  1. Char Takes up n characters regardless of the actual number of characters stored, whereas vARCHar takes up only the byte space that the actual character should take up + 1 (actual length, 0<=length<255) or + 2 (length>255). This is because vARCHar stores data by adding a byte in addition to the string to record the length (two bytes if the column declaration length is greater than 255).
  2. The maximum storage space for a char is 255 bytes.
  3. Char truncates trailing Spaces when stored, whereas vARCHar does not.

Char is suitable for storing very short, typically fixed-length strings. For example, char is good for storing the MD5 value of a password because it is a fixed-length value. For very short columns, CHAR is also more storage efficient than VARCHAR.

What can be the string type of a column?

The value can be SET, BLOB, ENUM, CHAR, TEXT, or VARCHAR

What’s the difference between BLOB and TEXT?

A BLOB is a binary object that can hold a variable amount of data. There are four types of blobs: TINYBLOB, BLOB, MEDIUMBLO, and LONGBLOB

TEXT is a case insensitive BLOB. There are four TEXT types: TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT.

BLOB holds binary data, TEXT holds character data.


Four, index,

What do you understand about MySQL indexes?

Why B+ tree, why not binary tree?

What is the difference between a clustered index and a non-clustered index?

InnoDB engine index strategy, know?

What are the ways to create an index?

Clustered index/non-clustered index, mysql index base implementation, why not use b-tree, why not use hash, leaf nodes store data or memory address pointing to data, some points need to pay attention to when using index?

  • MYSQL officially defines an Index as: an Index is a data structure that helps MYSQL obtain data efficiently. Therefore, the essence of an Index is a data structure

  • The purpose of index is to improve the efficiency of query, can be analogous to dictionary, train station table, book catalog, etc.

  • In addition to the data itself, the database also maintains a data structure that satisfies a specific lookup algorithm. These data structures refer to (point to) the data in a way that enables the implementation of advanced lookup algorithms on top of these data structures. This data structure is called an index. An example of a possible indexing approach is shown below.

    The data table on the left has two columns of seven records, and the one on the far left is the physical address of the data record

    In order to speed up the Col2 lookup, can maintain a as shown on the right side of the binary search trees, each node contains the index key values, and a pointer to the corresponding data record physical address, so that you can use binary search within a certain complexity to obtain to the corresponding data, so as to quickly retrieve a qualified record.

  • Indexes themselves are also very large and cannot be stored in memory. They are usually stored on disk in the form of index files

  • An index is an index organized in a B+ tree (multi-way search tree, not necessarily a binary tree) without being specified. Among them clustered index, secondary index, overwrite index, composite index, prefix index, unique index is the default use of B+ tree index, collectively called index. In addition there are hash indexes and so on.

Basic syntax:

  • Create:

    • CREATE [UNIQUE] INDEX indexName ON mytable(username(length));

      For CHAR, VARCHAR, length can be smaller than the actual field length. If it is BLOB and TEXT, length must be specified.

    • ALTER table tableName ADD [UNIQUE] INDEX indexName(columnName)

  • DROP INDEX [indexName] ON mytable;

  • See: SHOW INDEX FROM table_name\G – you can format the output by adding \G.

  • Using the ALERT command

    • ALTER TABLE tbl_name ADD PRIMARY KEY (column_list):This statement adds a primary key, which means that the index value must be unique and cannot be NULL.
    • ALTER TABLE tbl_name ADD UNIQUE index_name (column_listThe value that this statement creates the index must be unique (NULL may occur multiple times in addition to NULL).
    • ALTER TABLE tbl_name ADD INDEX index_name (column_list)Add a normal index. The index value can appear more than once.
    • ALTER TABLE tbl_name ADD FULLTEXT index_name (column_list)This statement specifies the index as FULLTEXT, which is used for full-text indexing.

advantage

  • Improve data retrieval efficiency and reduce database IO costs

  • Reduce the cost of data sorting and reduce CPU consumption

disadvantage

  • An index is also a table that holds the primary key and index fields and points to records in the entity table, so it is also memory intensive
  • While indexes greatly speed up queries, they slow down the speed of updating tables, such as INSERTS, UPDATES, and DELETES. MySQL will update the index file every time it updates a column that has been added to the index

MySQL > select * from ‘MySQL’ where ‘index’ is

Data structure perspective

  • B + tree index
  • A Hash index
  • Full-text Full-text index
  • R – Tree indexes

From a physical storage perspective

  • Clustered Index

  • Non-clustered index (also called secondary index)

    Both clustered and non-clustered indexes are B+ tree structures

Logically

  • Primary key index: A primary key index is a special unique index that does not allow empty values
  • Plain index or single-column index: Each index contains a single column. A table can have multiple single-column indexes
  • Multi-column index (compound index, combined index) : A compound index is an index created on multiple fields. The index is used only when the first field is used in the query condition. Follow the leftmost prefix set when using composite indexes
  • Unique index or non-unique index
  • Spatial index: A spatial index is an index of spatial data types, which are GEOMETRY, POINT, LINESTRING, and POLYGON in MYSQL. MYSQL is extended with the SPATIAL keyword to enable the creation of SPATIAL indexes using the same syntax used to create regular index types. Columns that create a spatial index must be declared NOT NULL. A spatial index can only be created in a table whose storage engine is MYISAM

Why MySQL index B+tree, not b-tree or other tree, why not Hash index

Clustered index/non-clustered index, MySQL index base implementation, does the leaf node store the data or the memory address pointing to the data?

Does using indexed queries necessarily improve query performance? Why is that?

MySQL > select * from database;

The first thing to understand is that indexes are implemented at the storage Engine level, not the server level. Not all storage engines support all index types. Even if multiple storage engines support an index type, their implementation and behavior may differ.

B + Tree index

MyISAM and InnoDB storage engines both use B+Tree data structure. Compared with B-tree structure, all data are stored on leaf nodes, and leaf nodes are connected together by Pointers to form a data linked list to speed up the retrieval efficiency of adjacent data.

Learn the differences between B-tree and B+Tree

B-Tree

B-tree is a balanced search Tree designed for external storage devices such as disks.

When the system reads data from disks to memory, the basic unit is disk blocks. The data in the same disk block is read at a time instead of what is needed.

InnoDB storage engine has the concept of pages, the smallest unit of disk management. The default page size in the InnoDB storage engine is 16KB. You can use the innodb_page_size parameter to set the page size to 4K, 8K, or 16K. In MySQL, you can run the following command to check the page size: show variables like ‘innodb_page_size’;

The storage space of a system disk block is usually not that large, so InnoDB uses several contiguous disk blocks to achieve a page size of 16KB each time it requests disk space. InnoDB reads data from disk to disk on a page basis. If each piece of data on a page helps locate data records, this will reduce disk I/O times and improve query efficiency.

Data in the B-tree structure enables the system to efficiently locate the disk block where the data resides. To describe a B-tree, first define a record as a binary group [key, data]. Key is the key value of the record, corresponding to the primary key value in the table, and data is the data in a row except the primary key. The key values are different for different records.

An M-level B-tree has the following characteristics:

  1. Each node has a maximum of M children
  2. Each node except the root node and leaf node has at least Ceil(m/2) children.
  3. If the root node is not a leaf node, there are at least 2 children
  4. All leaf nodes are in the same layer and contain no other keyword information
  5. Each non-terminal node contains n keywords (P0,P1… Pn, k1,… Kn)
  6. The number of keywords n is ceiL (m/2)-1 <= N <= M-1
  7. Ki (I = 1,… N) indicates the keyword in ascending order
  8. Pi (I = 1,… N) is the pointer to the child root node. The keywords of all nodes of the subtree pointed to by P(i-1) are less than ki, but greater than K (i-1).

Each node in the B-tree can contain a large number of keyword information and branches according to the actual situation. The following figure shows a third-order B-tree:

Each node occupies the disk space of a disk block. A node has two keywords in ascending order and three Pointers to the child root node. The Pointers store the address of the disk block where the child node resides. The three scope fields divided into two keywords correspond to the scope fields of the data of the subtree pointed to by the three Pointers. For the root node, the keywords are 17 and 35. The data range of the subtree to which the P1 pointer points is less than 17, the data range of the subtree to which the P2 pointer points is 17-35, and the data range of the subtree to which the P3 pointer points is greater than 35.

Simulate the process of finding keyword 29:

  1. Locate disk block 1 based on the root node and read it into memory. [Disk I/O operation 1]
  2. Compare keyword 29 in the interval (17,35) to find pointer P2 to disk block 1.
  3. Locate disk block 3 according to P2 pointer and read into memory. [Disk I/O operation 2nd]
  4. Compare keyword 29 in interval (26,30) to find pointer P2 to disk block 3.
  5. Locate disk block 8 according to P2 pointer and read into memory. [Disk I/O operation 3]
  6. Find keyword 29 in the keyword list in disk block 8.

Analyzing the above procedure, we found that three disk I/O operations and three memory lookups were required. Since the keyword in memory is an ordered table structure, dichotomy lookup can be used to improve efficiency. Three disk I/O operations affect the efficiency of the entire B-tree search. Compared with AVLTree, B-tree reduces the number of nodes, so that each disk I/O data from memory plays a role, thus improving the query efficiency.

B+Tree

B+Tree is an optimization based on B-Tree to make it more suitable for external storage index structure. InnoDB storage engine uses B+Tree to achieve its index structure.

From the B-tree structure diagram in the previous section, you can see that each node contains not only the key value but also the data value. The storage space of each page is limited. If the data on each node (that is, a page) is large, the number of keys that can be stored is small. If the data on each page is large, the b-tree depth is large, which increases the disk I/O times and affects the query efficiency. In a B+Tree, all data record nodes are stored on the leaf nodes at the same layer according to the order of key values, instead of the non-leaf nodes only storing key values. This greatly increases the number of key values stored on each node and reduces the height of the B+Tree.

B+Tree is different from B-tree in the following aspects:

  1. Non-leaf nodes only store key-value information;
  2. There is a chain pointer between all leaf nodes;
  3. Data records are stored in leaf nodes

The b-tree in the previous section is optimized. Since the non-leaf nodes of B+Tree only store key value information, assuming that each disk block can store four key values and pointer information, the structure of the B+Tree is as follows:

There are usually two head Pointers on a B+Tree, one to the root node and the other to the leaf node with the smallest keyword, and there is a chain-ring structure between all the leaf nodes (that is, data nodes). Therefore, there are two kinds of lookup operations on B+Tree: a range and paging lookup on the primary key, and a random lookup starting from the root node.

There are only 22 data records in the above example, so we can not see the advantages of B+Tree. Here is a calculation:

InnoDB storage engine page size is 16KB, the typical table primary key type is INT (4 bytes) or BIGINT (8 bytes), pointer type is also generally 4 or 8 bytes, This means that a page (a node in B+Tree) stores 16KB/(8B+8B)=1K keys (K = 10^3 for convenience). This means that a depth of 3 B+Tree index can maintain 10^3 * 10^3 * 10^3 = 1 billion records.

In practice, each node may not be fully filled, so in the database, the height of B+Tree is usually 2-4 levels. MySQL’s InnoDB storage engine is designed to have the root node resident in memory, meaning that finding a row record for a particular key requires at most one to three disk I/O operations.

B + Tree

  1. Based on the above analysis, we know that the number of I/OS depends on the height of b+ h. Assuming that the data in the current data table is N and the number of data items in each disk block is M, then h=㏒(m+1)N. When the amount of data N is constant, the larger m is, the smaller H is. The size of a disk block is the size of a data page. The size of a disk block is fixed. If the space occupied by data items is smaller, the number of data items is larger and the height of the tree is lower. That’s why each data item, or index field, should be as small as possible; for example, int is 4 bytes, less than half of bigInt8 bytes. This is why b+ trees require real data to be placed in leaf nodes rather than inner nodes, where the number of data items in a disk block drops dramatically, causing the tree to grow taller. When the data item is equal to 1 it will degenerate into a linear table.
  2. When the data items of b+ tree are compound data structures, such as (name,age,sex), the search tree is built on the order of b+ numbers from left to right. For example, when retrieving data such as (Tom,20,F), the b+ tree will preferentially compare names to determine the next direction of search. If the name is the same, then age and sex are compared in turn to obtain the retrieved data. However, when data such as (20,F) without name comes, b+ tree does not know which node to search next, because name is the first comparison factor when the search tree is established, and search must be conducted according to name first to know where to search next. For example, when (three,F) data to search, b+ tree can use name to specify the search direction, but the next field age is missing, so we can only find the data whose name is equal to three, and then match the data whose gender is F, this is a very important property, that is, the left-most matching feature of the index.
MyISAM primary key index and secondary index structure

MyISAM engine index files and data files are separate. The data fields of the leaf nodes of the MyISAM engine index structure do not store the actual data records, but the addresses of the data records. Index files are separated from data files, and such indexes are called “non-clustered indexes.” The primary index of MyISAM is not much different from the secondary index except that the primary key index cannot have duplicate keys.

In MyISAM, indexes (including leaf nodes) are stored in a separate.myi file. Leaf nodes store physical address offsets of data (access by offsets is random and fast).

A primary index is a primary key index. The key value cannot be repeated. Secondary indexes are normal indexes that may have duplicate key values.

The process of finding data by index: first find the index node from the index file, get the file pointer of the data, and then locate the specific data by the file pointer in the data file. Secondary indexes are similar.

InnoDB primary key index and secondary index structure

The data field of the leaf node of the Index structure of the InnoDB engine stores the actual data records (for the primary index, this is where all data records in the table are stored; In other words, InnoDB’s data files themselves are primary key index files. Such indexes are called “clustered indexes”. Each table can have only one clustered index.

Primary key index:

As we know InnoDB index is a clustered index, its index and data are stored in the same.IDB file, so its index structure is to store index and data in the same tree node. As shown in the figure below, the lowest leaf node has three rows of data corresponding to the id, STU_ID and name data items in the table.

Innodb indexes are divided into leaf nodes and non-leaf nodes. The non-leaf nodes are stored separately in the index section like the contents of xinhua Dictionary, while the leaf nodes are arranged sequentially in the data section. Innodb data files can be shred by table (just turn on innodb_file_per_table), shred and stored in XXX.ibd, not shred by default, stored in xxx.ibData.

Secondary (non-primary key) indexes:

In this example, we create a secondary index for the name column in the secondary table. Its index structure is very different from that of the primary key index. The lowest leaf node has two rows of data, the string in the first row is the secondary index and is sorted by ASCII code.

This means that a conditional search for the name column requires two steps:

(1) Name is retrieved on the secondary index, and the corresponding primary key is obtained by reaching the leaf node;

② Use the primary key to perform the corresponding retrieval operation on the primary index

This is called a “back table query”

InnoDB index structure points to note

  1. Data files are themselves index files

  2. The table data file itself is an index structure file organized by B+Tree

  3. Clustered index middle nodes contain complete data records

  4. InnoDB tables must have primary keys, and integer increment primary keys are recommended

InnoDB storage structure as we described above, index and data are stored together, whether primary key index or secondary index, when the search is to find the index node to obtain the corresponding data, if we do not explicitly specify the index column in the design of table structure. MySQL selects columns from the table that do not duplicate data to create an index. If there is no matching column, MySQL automatically generates an implied field for the InnoDB table as the primary key. This field is 6 bytes long and of type integer.

Why is it recommended to increment the primary key by integer instead of UUID?

  • UUID is a string that consumes more storage space than an integer.

  • Searching in a B+ tree requires comparing the size with the value of the node that passes by. The comparison operation of integer data is faster than that of string.

  • The incremented integer index is stored consecutively on disk, even when reading a page of data; The UUID is randomly generated and the upper and lower rows of data are stored separately. Therefore, the query statement where ID > 5 && ID < 20 is not suitable for the query.

  • When inserting or deleting data, the integer autoincrement primary key creates a new leaf node at the end of the leaf node without damaging the structure of the left subtree. A UUID primary key can easily cause a B+ tree to be restructured in order to maintain its own characteristics and consume more time.

Why do non-primary key index leaf nodes store primary keys?

Guarantee the data consistency and save storage space, so to understand: mall system order table will store a user ID as the associated foreign key, and is not recommended storage full user information, because when we the information in the user table (real name, phone number, shipping address…) modified, do not need to maintain the order table of user data, but also saves the storage space.

A Hash index

  • The main method is to convert database field data into a fixed-length Hash value by using Hash algorithms (common Hash algorithms include direct addressing, square center, folding, divisor mod, and random number), and store the row pointer of the data in the corresponding position of the Hash table. If a Hash collision occurs (two different keywords have the same Hash value), they are stored in a linked list under the corresponding Hash key.

    Search algorithm: Performs the same Hash algorithm again for the keyword to obtain the Hash value and retrieve the data from the corresponding position of the Hash table. If a Hash collision occurs, you need to filter the value. At present, there are not many databases using Hash index, mainly Memory, etc.

    MySQL currently has a Memory engine and an NDB engine to support Hash indexing.

Full-text Full-text index

  • Full-text indexing is also a special type of index for MyISAM, mainly used for full-text indexing, which InnoDB supports from MYSQL5.6.

  • It is used to replace the less efficient LIKE fuzzy matching operation, and it can fully fuzzy match multiple fields at once through the full-text index of multi-field combination.

  • B-tree is also used to store index data, but a specific algorithm is used. The field data is divided and then indexed (generally divided every 4 bytes). The index file stores the set of index strings before segmentation, and the index information after segmentation. The node corresponding to the Btree structure stores the word information after segmentation and its position in the index string set before segmentation.

R-tree Indicates the space index

Spatial index is a special index type of MyISAM that is mainly used for geospatial data types

Mysql > select * from B+ tree;

Each node in a B+ tree stores data, whereas only leaf nodes in a B+ tree store data. Therefore, the height of a B+ tree is higher and I/OS are more frequent when the same amount of data is searched. Database indexes are stored on disk. When there is a large amount of data, the entire index cannot be loaded into memory. Instead, each disk page (corresponding node of the index tree) can be loaded one by one. The B+ tree is further optimized in MySQL: the leaf node is a bidirectional linked list, and the head node and tail node in the linked list are also pointed to circularly.

Interviewer: Why not Hash?

The underlying Hash index is a Hash table, which is a key-value data storage structure. Therefore, the storage relationship of multiple data has no sequential relationship at all. Therefore, the interval query cannot be directly queried through the index, so full table scan is required. Therefore, hash indexes are only suitable for equivalent query scenarios. B+ Tree is a multi-way balanced query Tree, so its nodes are naturally ordered (the left child node is smaller than the parent node, and the parent node is smaller than the right child node), so there is no need to perform full table scan for range query.

Hash indexes do not support the leftmost matching rule for multi-column joint indexes. If there are a large number of duplicate keys, hash indexes are inefficient because of hash collisions.

In which case you need to create an index

  1. The primary key automatically creates a unique index

  2. Fields frequently used as query criteria

  3. The foreign key relationship is used to index the fields associated with other tables in the query

  4. Single key/composite index selection problem, high concurrency tends to create composite index

  5. A sorted field in a query that is significantly faster through index access

  6. Statistics or grouping fields in the query

When not to create an index

  1. Too few table records
  2. A watch that is often added, deleted, or modified
  3. Table fields with repetitive, evenly distributed data should only be indexed for the columns that are most frequently queried and sorted (indexing doesn’t make much sense if a data class contains too many duplicates).
  4. Frequently updated fields are not suitable for index creation (add IO burden)
  5. Indexes are not created for fields that are not needed in the WHERE condition

MySQL > alter table MySQL > alter table MySQL > alter table MySQL

Covering Index (Covering Index), or Index Covering, is usually referred to as no back table operation

  • MySQL can use the index to return columns from the select list, instead of reading the data file again based on the index. In other words, the query column is overwritten by the created index.

  • An index is an efficient way to find rows, but a typical database can also use an index to find data for a column, so it doesn’t have to read the entire row. After all, index leaf nodes store the data they index, and when you can read the index to get the data you want, you don’t need to read rows. An index that contains (overwrites) data that meets the query result is called an overwritten index.

  • The judgment standard

    Using Explain, which can be determined by the output extra column. For an index overwrite query, which is displayed as using index, the MySQL query optimizer determines whether there is an index overwrite query before executing the query

MySQL > select * from ‘MySQL’

Count (*) and count(1) and count(column name)

Execution effect:

  • Count (*) includes all columns and is equivalent to the number of rows. NULL columns are not ignored when the result is counted
  • Count (1) includes all columns, with 1 representing the line of code. NULL columns are not ignored when counting results
  • Count (column name) includes only the column name. When the result is counted, the count of the column value that is null is ignored. That is, if the value of a column is null, the count is not counted.

Execution efficiency:

  • Column name primary key, count(column name) will be faster than count(1)
  • The column name is not the primary key, count(1) will be faster than count(column name)
  • If the table has multiple columns and no primary key, count(1) performs better than count(*)
  • Select count (primary key) is optimal if there is a primary key
  • Select count(*) is optimal if the table has only one field.

What is the difference between in and exists in MySQL?

  • Exists: The external exists is queried by loop. Each query will check the condition statement in exists. If the condition statement in exists can return the record line (regardless of the number of record lines, as long as it can return), the condition is true and returns the current loop to the record. Otherwise, if a condition in an EXISTS does not return a row, the current record is discarded. An EXISTS condition is like a bool condition: true if it can return a result set, false if it cannot
  • In: An IN query is equivalent to a stack of multiple OR conditions
SELECT * FROM A WHERE A.id IN (SELECT id FROM B);
SELECT * FROM A WHERE EXISTS (SELECT * from B WHERE B.id = A.id);
Copy the code

There is little difference between in and EXISTS if the two tables queried are of the same size.

If one of the two tables is smaller and the other is larger, the subtable exists and the smaller subtable in are used:

The difference between a UNION and a UNION ALL?

Both UNION and UNION ALL combine two result sets into one. The number of SQL statement fields to be joined must be the same, and the field types must be “consistent”.

  • A UNION will filter out duplicate data records after table join (inefficient), while a UNION ALL will not remove duplicate data records.

  • A UNION sorts by field order, whereas a UNION ALL simply merges the two results and returns.

SQL Execution sequence

  • handwritten

    SELECT DISTINCT <select_list>
    FROM  <left_table> <join_type>
    JOIN  <right_table> ON <join_condition>
    WHERE  <where_condition>
    GROUP BY  <group_by_list>
    HAVING <having_condition>
    ORDER BY <order_by_condition>
    LIMIT <limit_number>
    Copy the code
  • Machine readable

    FROM  <left_table>
    ON <join_condition>
    <join_type> JOIN  <right_table> 
    WHERE  <where_condition>
    GROUP BY  <group_by_list>
    HAVING <having_condition>
    SELECT
    DISTINCT <select_list>
    ORDER BY <order_by_condition>
    LIMIT <limit_number>
    Copy the code
  • conclusion

What is the difference between inner join, left join and right join in mysql?

What is inner join, outer join, cross join, Cartesian product?

The Join graph


MySQL > alter database transaction

What are the isolation levels for transactions? What is the default isolation level for MySQL?

What is magic, dirty, unrepeatable reading?

MySQL transaction features and implementation principle

Are you familiar with MVCC, the underlying principles?

MySQL transactions are mainly used to process data with large operation volume and high complexity. For example, in the personnel management system, you delete a personnel, you need to delete the personnel’s basic information, but also to delete the personnel related information, such as mailbox, articles and so on, these database operation statements constitute a transaction!

ACID – Transaction basics

A transaction is a logical processing unit consisting of a set of SQL statements with four properties, often referred to simply as the ACID property of the transaction.

  • A (Atomicity) : All operations in the whole transaction are either completed or not completed, and cannot be stopped at some intermediate stage. If a transaction fails during execution, it will be rolled back to the state before the transaction began, as if the transaction had never been executed
  • C (Consistency) Consistency: The integrity constraint of the database is not broken before and after the transaction
  • I (Isolation) Isolation: The execution of a transaction cannot be interfered by other transactions. That is, the operations and data used within a transaction are isolated from other concurrent transactions. Concurrent transactions cannot interfere with each other
  • D (Durability) : Changes made to the database by the transaction persist in the database and do not get rolled back after the transaction completes

Problems with concurrent transaction processing

  • Lost Update: When transaction A and transaction B select the same row and then Update the row based on the value originally selected, the Lost Update problem occurs because both transactions are unaware of each other’s existence
  • Dirty Reads: When transaction A Reads the data updated by transaction B and then transaction B rolls back the data, A Reads Dirty data
  • Non-repeatable Reads: When transaction A Reads the same data for many times, transaction B updates and commits the data during the process of reading the same data for many times, resulting in inconsistent results when transaction A Reads the same data for many times.
  • Phantom Reads: Phantom Reads are similar to unrepeatable Reads. It occurs when one transaction, A, reads A few rows of data, and then another concurrent transaction, B, inserts some data. In subsequent queries, transaction A will find more records that did not originally exist, as if an illusion occurred, so it is called phantom read.

The difference between phantom and unrepeatable reads:

  • The point of non-repeatable reads is to modify: in the same transaction, the data read for the first time is different from the data read for the second time under the same conditions. (Because other transactions committed changes in the middle)
  • The key point of magic reading is to add or delete: in the same transaction, under the same conditions, the first and second read records are not the same. (Because other transactions committed insert/delete in the middle)

Solutions to the problems associated with concurrent transactions:

  • “Update lost” is usually something that should be avoided entirely. However, preventing update loss cannot be solved by the database transaction controller alone, but requires the application to add necessary locks to the data to be updated. Therefore, preventing update loss should be the responsibility of the application.

  • “Dirty reads”, “unrepeatable reads”, and “phantom reads” are actually database read consistency problems, which must be solved by the database to provide certain transaction isolation mechanism:

    • One is locking: data is locked before it is read, preventing other transactions from modifying the data.
    • The other is the MultiVersion Concurrency Control (MVCC or MCC), also known as a multi-version database: Create a Snapshot of data at a point in time without locking and use this Snapshot to provide consistent reads at a certain level (statement level or transaction level). From the user’s point of view, it seems that the database can provide multiple versions of the same data.

Transaction isolation level

There are four isolation levels for database transactions, from lowest to highest

  • Read-uncommitted: The lowest isolation level that allows UNCOMMITTED data changes to be READ, potentially resulting in dirty, illusory, or unrepeatable reads.
  • Read-committed: Allows concurrent transactions to be READ, preventing dirty reads, but magic or unrepeatable reads can still occur.
  • REPEATABLE-READ: Multiple reads of the same field are consistent, unless the data is modified by the transaction itself. This can prevent dirty reads and unrepeatable reads, but phantom reads are still possible.
  • SERIALIZABLE: Highest isolation level, fully subject to ACID isolation level. All transactions are executed one by one so that interference between transactions is completely impossible. That is, this level prevents dirty reads, unrepeatable reads, and phantom reads.

To view the transaction isolation level of the current database:

show variables like 'tx_isolation'
Copy the code

The following illustrates the relationship between dirty reads, unrepeatable reads, phantom reads and transaction isolation levels in the concurrent operation of a transaction.

The more stringent the transaction isolation of the database, the less concurrency side effects, but the higher the cost, because transaction isolation essentially serializes transactions to a certain extent, which is obviously contradictory to “concurrency.” At the same time, different applications have different requirements on read consistency and transaction isolation. For example, many applications are not sensitive to “unrepeatable reads” and “phantom reads” and may be more concerned with the ability of concurrent data access.

Read uncommitted

Read uncommitted means that one transaction can read data from another uncommitted transaction.

Example: The boss wants to pay programmers, and the programmer’s salary is 36,000 yuan/month. However, the boss accidentally pressed the wrong number when the salary was paid, and pressed 39,000 / month. The money had been hit to the programmer’s account, but the transaction had not been submitted. At this moment, the programmer went to check his salary this month, and found that he had increased his salary by 3,000 yuan more than usual. However, the boss noticed the error in time and immediately rolled back the transaction that had almost been committed and changed the number to 36,000.

Analysis: The actual programmer’s salary is still 36k this month, but the programmer sees 39k. He sees the data before the boss commits the transaction. This is dirty reading.

So how do you solve dirty reading? The Read committed! Read submissions can solve dirty read problems.

Read committed

Read commit, as the name implies, is when a transaction waits for another transaction to commit before it can read data.

Example: programmer took the credit card to enjoy life (the card is only 36,000 of course), when he paid the bill (programmer transaction open), the charging system detected that his card has 36,000, at this time!! The programmer’s wife had to transfer all the money out for household use and submit it. When the charging system was preparing to deduct money, it tested the amount of the card again and found that there was no money left (of course, the amount of the second test should wait for the wife to transfer the amount of the transaction submitted). Programmers will be very depressed, clearly the card is rich…

If a transaction updates the data, the read transaction can read the data only after the UPDATE transaction is committed. In this case, however, two identical queries within the scope of a transaction return different data, which is called a non-repeatable read.

So how do you solve the possible unrepeatable read problem? Repeatable read!

Repeatable read

Repeat reads, in which no modification operations are allowed once the data is read (the transaction is started). MySQL default transaction isolation level

Example: Programmer takes credit card to enjoy life (the credit card is only 36k, of course), when he pays the bill (the transaction is open, no UPDATE operation is allowed for other transactions), the charging system detects 36K in his credit card beforehand. At this point his wife can’t transfer the amount. The next payment system can deduct money.

Analysis: Repeat read can solve the problem of non-repeat read. At this point, it should be understood that non-repeatable reads correspond to modifications, or UPDATE operations. But there may also be phantom problems. Because phantom problems correspond to INSERT operations, not UPDATE operations.

When does phantom reading occur?

Example: one day, a programmer went to consume, and spent 2,000 yuan. Then his wife checked his consumption record today (the FTS was scanned, and his wife’s business was opened), and found that it did spend 2,000 yuan. At this time, the programmer spent 10,000 yuan to buy a computer, that is, added a new INSERT consumption record and submitted it. When his wife printed the programmer’s consumption record list (his wife’s affairs were submitted), he found that he spent 12,000 yuan, which seemed to be an illusion. This is unreal reading.

So how do you solve the illusory problem? The Serializable!

The Serializable serialization

Serializable is the highest transaction isolation level, where transactions are serialized and sequentially executed to avoid dirty reads, unrepeatable reads, and phantom reads. In short, Serializable locks every row of data that is read, which can cause a lot of timeouts and lock contention. This transaction isolation level is inefficient and costly to database performance, and is generally not used.

To compare

Transaction isolation level Read Data Consistency Dirty read Unrepeatable read Phantom read
Read uncommitted (read-uncommitted) At the lowest level, only physically corrupt data is guaranteed to be read is is is
Read committed (read-committed) statement-level no is is
Repeatable read The transaction level no no is
Serializable The highest level, transaction level no no no

It should be noted that the transaction isolation level and the concurrency of data access are antithetical, and the higher the transaction isolation level, the worse the concurrency. There is no one-size-fits-all rule for determining the appropriate transaction isolation level for your application.

The default isolation level supported by MySQL InnoDB storage engine is REPEATABLE-READ. We can do this by SELECT @@tx_isolation; MySQL 8.0 SELECT @@transaction_isolation;

Here are some things to note: Unlike the SQL standard, InnoDB storage engine uses next-key Lock algorithm in REPEATABLE READ ** transaction isolation level, thus avoiding phantom reading, which is different from other database systems (such as SQL Server). The default isolation level of InnoDB storage engine is REPEATABLE READ, which can fully guarantee transaction isolation requirements, i.e. meet SQL standard **SERIALIZABLE ** isolation level, and retain good concurrency performance.

Most database systems have read-committed isolation: because the lower the isolation level, the fewer locks are COMMITTED, but remember that InnoDB storage engine uses **REPEATABLE READ ** by default without any performance penalty.

MVCC multi-version concurrency control

Most of MySQL’s transactional storage engine implementations are not simple row-level locking. To improve concurrency, multiple versions of concurrency control (MVCC) have been implemented, including Oracle and PostgreSQL. Just the implementation mechanism is different.

MVCC can be considered a variant of row-level locking, but it avoids locking in many cases and is therefore less expensive. Implementation mechanisms vary, but most implement non-blocking reads, and writes lock only the necessary rows.

MVCC is implemented by saving a snapshot of the data at a point in time. This means that no matter how long it takes to execute, everything will see the same data.

The typical MVCC implementation is divided into optimistic concurrency control and pessimistic concurrency control. Below is a simplified version of InnoDB’s behavior to illustrate how MVCC works.

InnoDB’s MVCC is implemented by storing two hidden columns at the end of each row. These two columns, one holds the creation time of the row and the other holds the expiration time (deletion time) of the row. Of course, the stored time is not the real time, but the system version number. The system version number is automatically incremented each time a new transaction is started. The system version number at the start of the transaction is used as the version number of the transaction and is compared with the version number of each row of records queried.

REPEATABLE READ How MVCC works at isolation level:

  • SELECT

    InnoDB checks each row based on two criteria:

    • InnoDB only looks for rows whose version predates the current version of the transaction. This ensures that the rows read by the transaction either existed before the transaction started or were inserted or modified by the transaction itself

    • The delete version number of the row is either undefined or greater than the current transaction version number, which ensures that the row read by the transaction is not deleted before the transaction begins

    Only those that meet the above two conditions will be queried

  • INSERT: InnoDB stores the current system version number as the row version number for each newly inserted row

  • DELETE: InnoDB saves the current system version number for each deleted row as a row deletion identifier

  • UPDATE: InnoDB saves the current system version number as the row version number for an inserted record, and saves the current system version number to the original row as an delete mark

Save these two additional system versions so that most operations are unlocked. Data manipulation is simple, performance is good, and it is also guaranteed that only rows that meet the requirements will be read. The downside is that each row requires additional storage, more row checking, and some additional maintenance.

MVCC only works at COMMITTED READ and REPEATABLE READ isolation levels.

The transaction log

InnoDB uses logging to reduce the overhead of committing transactions. Because transactions are already recorded in the log, there is no need to flush dirty blocks from the buffer pool to disk with each transaction commit.

Transaction modified data and indexes are often mapped to random locations in the table space, so flushing those changes to disk requires a lot of random IO.

InnoDB assumes that using regular disks, random IO is much more expensive than sequential IO because an IO request takes time to move the head to the correct position and then wait for the desired part to be read from the disk before going to the starting position.

InnoDB uses logs to turn random IO into sequential IO. Once the logs are safely written to disk, transactions are persisted and InnoDB can replay the logs and restore committed transactions even if there is a power outage.

InnoDB uses a background thread to intelligently refresh these changes to data files. This thread can batch combine writes to make data write more sequentially for efficiency.

Transaction logging can help improve transaction efficiency:

  • With transaction logging, the storage engine only needs to modify the in-memory copy of a table’s data and record the modification to a transaction log that persists on disk, rather than persisting the modified data itself to disk each time.
  • Transaction logging is appending, so the operation of logging is sequential I/O in a small area of the disk, unlike random I/O, which requires moving the head in multiple places on the disk, so transaction logging is relatively faster.
  • After the transaction log is persisted, the modified data in memory can be slowly flushed back to disk in the background.
  • If the data changes are recorded in the transaction log and persisted, but the data itself is not written back to disk, the system crashes and the storage engine automatically recovers the modified data upon restart.

Currently, this is what most storage engines implement, often referred to as write-ahead Logging, where data changes require two disk writes.

Transaction implementation

The transaction implementation is a database-based storage engine. Different storage engines support transactions differently. The storage engines that support transactions in MySQL are InnoDB and NDB.

The implementation of a transaction is how ACID properties are implemented.

Isolation of transactions is achieved through locks, while atomicity, consistency, and persistence of transactions are achieved through transaction logging.

The more in-depth the better about how transactions are implemented through logging.

Transaction logs include redo log redo log and rollback log undo

  • Redo logs implement persistence and atomicity

    In innoDB’s storage engine, transaction logging is implemented through redo logs and innoDB’s storage engine’s Log Buffer. When a transaction is started, the operations in the transaction are first written to the log buffer of the storage engine. Before the transaction is committed, the cached logs need to be flushed to disk for persistence, which is referred to as “write-ahead Logging” by DBAs. After the transaction commits, the data files mapped in the Buffer Pool are flushed to disk slowly. If the database crashes or is down, the database can be restored to its original state according to the redo log when the system restarts. An unfinished transaction can be committed or rolled back, depending on the recovery strategy.

    A contiguous storage space is allocated for redo logs at startup. Redo logs are sequentially appended to improve performance. All transactions share redo log storage, and their redo logs are recorded alternately in the order in which they are executed.

  • Undo log implements consistency

    Undo log is used to roll back transactions. In addition to redo logs, a certain amount of undo logs are logged during transaction execution. The undo log records the status of data before each operation. If a transaction needs to be rolled back during execution, the undo log can be used to roll back data. The rollback of a single transaction only rolls back the operations of the current transaction and does not affect the operations of other transactions.

    Undo records incomplete transactions that have been partially completed and written to disk. By default, rollback logs are recorded in a shared or exclusive table space.

Both types of logging can be considered as a recovery operation, redo_log is a page operation that restores committed transaction changes, and undo_log is a rollback of row records to a specific version. The contents of redo_log and undo_log are different. Redo_log is a physical log that records physical page changes, while undo_log is a logical log that records each row.

Do you know how many types of logs MySQL has?

  • Error log: Records error information, warning information or correct information.

  • Query logging: Records information about all requests to the database, whether or not they are executed correctly.

  • Slow query log: Set a threshold. All SQL statements whose running time exceeds this threshold are recorded in the slow query log file.

  • Binary log: Records all the changes made to the database.

  • Trunk logs: Trunk logs are also binary logs used to recover the slave library

  • Transaction logs: redo log redo log and rollback log undo

2PC, 3PC, 2PC, 3PC

MySQL support for distributed transactions

There are many ways to implement distributed transactions, either using InnoDB’s native transaction support, or using message queues to achieve the ultimate consistency of distributed transactions. Here we’ll focus on InnoDB’s support for distributed transactions.

MySQL has supported XA distributed transactions since 5.0.3 InnoDB Storage Engine. A distributed transaction involves multiple actions that are themselves transactional. All actions must complete successfully together or be rolled back together.

In MySQL, using distributed transactions involves one or more resource managers and a transaction manager.

Figure 1 shows MySQL’s distributed transaction model. The model is divided into three parts: application program (AP), resource manager (RM) and transaction manager (TM) :

  • Application: defines transaction boundaries and specifies which transactions need to be done.
  • Resource manager: provides a way to access transactions, usually a database is a resource manager;
  • Transaction manager: Coordinates transactions that participate in a global transaction.

Distributed transactions take the form of two-phase commit:

  • Phase 1 all transaction nodes start preparing, telling the transaction manager ready.
  • The second phase transaction manager tells each node whether to commit or rollback. If one node fails, the global rollback is required to ensure atomicity of transactions.

MySQL lock mechanism

Optimistic and pessimistic locks for databases?

What types of locks are available in MySQL?

MySQL InnoDB engine row lock how to implement?

MySQL does not provide a mechanism for deadlock resolution. MySQL does not provide a mechanism for deadlock resolution. MySQL does not provide a mechanism for deadlock resolution

A lock is a mechanism by which a computer coordinates concurrent access to a resource by multiple processes or threads.

In a database, in addition to the contention for traditional computing resources (such as CPU, RAM, I/O, etc.), data is also a resource shared by many users. In simple terms, database locking mechanism is a rule designed by the database to ensure the consistency of data and make all kinds of shared resources in order to be accessed concurrently.

For example, when we go to Taobao to buy a product, there is only one item in stock. If there is another person to buy it at this time, how to solve the problem of whether you buy it or another person buys it? Here we definitely need to use items, so we take the quantity of the item from the inventory table, insert the order, insert the payment table information after payment, and then update the quantity of the item. In this process, using locks can protect limited resources and resolve the conflict between isolation and concurrency.

The classification of the lock

By type of operation on data:

  • Read lock (shared lock) : Multiple read operations can be performed simultaneously for the same data without affecting each other

  • Write lock (exclusive lock) : It blocks other write locks and read locks until the current write operation is complete

From the granularity classification of data operations:

As much as possible in order to improve the concurrency of database, data range as small as possible for each of the lock, in theory, a time to lock the current scheme of data operation will get maximum concurrency, but management lock is very expensive (involves capturing, inspection, release the lock, etc), so the database system need to be in high concurrency response and performance of the system balance, This gives rise to the concept of Lock granularity.

  • Table lock: low overhead, fast lock; No deadlocks occur; Large locking granularity, the probability of lock conflict is the highest, and the concurrency is the lowest (MyISAM and MEMORY storage engines use table-level locking).

  • Row-level lock: expensive, slow lock; Deadlocks occur; The minimum locking granularity, the lowest probability of lock conflicts, and the highest concurrency (InnoDB storage engine supports row-level locking and table-level locking, but row-level locking is used by default).

  • Page lock: the overhead and lock time are between table lock and row lock. Deadlocks occur; The locking granularity is between table locks and row locks, and the concurrency is average.

Application: From the lock point of view, table-level lock is more suitable for the query based, only a small amount of data update according to the index conditions of the application, such as Web applications; Row-level locking is more suitable for applications with a large number of concurrent updates of a small amount of different data based on index conditions and concurrent queries, such as some online transaction processing (OLTP) systems.

Row locks Table locks Page locks
MyISAM Square root
BDB Square root Square root
InnoDB Square root Square root
Memory Square root

MyISAM table locks

MyISAM table locks have two modes:

  • Table Read Lock: does not block other users’ Read requests to the same Table, but blocks all write requests to the same Table.
  • Table Write Lock: blocks other users’ read and Write operations on the same Table.

The MyISAM table is serial between read and write operations and between write operations. When a thread acquires a write lock on a table, only the thread holding the lock can update the table. Read and write operations on other threads wait until the lock is released.

By default, write locks have a higher priority than read locks: when a lock is released, the lock is given priority to the lock acquisition requests waiting in the write lock queue, and then to the lock acquisition requests waiting in the read lock queue.

InnoDB row locks

InnoDB implements two types of row locking:

  • Shared lock (S) : An exclusive lock that allows one transaction to read a row, preventing other transactions from acquiring the same data set.
  • Exclusive lock (X) : Allows transactions that acquire exclusive locks to update data, preventing other transactions from acquiring shared read locks and exclusive write locks of the same data set.

InnoDB also has two types of Intention Locks for internal use, both of which are table Locks:

  • Intended shared lock (IS) : a transaction that intends to assign a shared lock to a row must acquire an IS lock on that table before assigning a shared lock to a row.
  • Intentional exclusive lock (IX) : a transaction that intends to lock a row exclusively must acquire an IX lock on the table before it can lock a row exclusively.

Index failure causes row locks to become table locks. For example, a vchar query does not write single quotes.

Locking mechanism

Optimistic locking and pessimistic locking are two concurrency control ideas that can be used to solve the lost update problem

Optimistic locking will “optimistically” assume that there is no concurrent update conflict, data access and processing will not be locked, only in the update of data according to the version number or timestamp to determine whether there is a conflict, if there is, the transaction will be processed, if there is no transaction. It is implemented with the Version recording mechanism, which is the most common implementation of optimistic locking

Pessimistic locking “pessimistically” assumes a high probability of concurrent update conflicts, applies exclusive locks before data is accessed and processed, locks data during the entire data processing process, and releases the locks only after a transaction is committed or rolled back. In addition to the optimistic lock corresponding, pessimistic lock is achieved by the database itself, to use, we directly call the relevant statements of the database can be.

Lock mode (InnoDB has three row lock algorithms)

  • Record Locks: Locks on a single row Record. Locks the index entry to lock the eligible rows. Other transactions cannot modify or delete locked entries;

    SELECT * FROM table WHERE id = 1 FOR UPDATE;
    Copy the code

    It places a record lock on the row where id=1 to prevent other transactions from inserting, updating, or deleting the row

    UPDATE a row with a primary key index or a unique index; UPDATE a row with a primary key index or unique index; UPDATE a row with a primary key index or unique index

    UPDATE SET age = 50 WHERE id = 1;Copy the code
  • Gap Locks: When we retrieve data using a range condition rather than an equality condition and request shared or exclusive Locks, InnoDB Locks index entries for existing data records that meet the condition. Records whose key values are in the condition range but do not exist are called gaps.

    InnoDB also locks this “gap”. This locking mechanism is called gap locking.

    To lock “gaps” between index entries, the range of records (before the first record or after the last record), excluding the index entry itself. Other transactions cannot insert data within the lock, thus preventing other transactions from adding phantom rows.

    Gap locking is based on non-unique indexes and locks index records within a range. Gap Locking is based on the next-key Locking algorithm that will be mentioned below. Please bear in mind that gap Locking is used to lock an interval, not just every data in the interval.

    SELECT * FROM table WHERE id BETWEN 1 AND 10 FOR UPDATE;
    Copy the code

    That is, all rows within (1, 10) will be locked, and all rows with ids 2, 3, 4, 5, 6, 7, 8, and 9 will be blocked, but rows 1 and 10 will not be locked.

    The purpose of the GAP lock is to prevent the illusion of two current reads of the same transaction

  • Next-key Locks: A combination of record and gap Locks that lock both index records and index ranges. The main purpose of a keylock is also to avoid Phantom Read. If the isolation level of the transaction is downgraded to RC, the temporary key lock will also fail.

    Next-key can be understood as a special kind of gap lock, or as a special algorithm. The illusion problem can be solved by temporary lock. A key lock exists on a non-unique index column of each data row. When a transaction holds a key lock on that data row, it will lock the data in a range of open and closed data. It is important to note that InnoDB row-level locking is index-based. Temporary locks are only associated with non-unique index columns. There are no temporary locks on unique index columns (including primary key columns).

    For the query of rows, this method is used, the main purpose is to solve the illusion problem.

What does select for UPDATE mean? Does it lock tables or rows or something else

For UPDATE works only with InnoDB and must be in a transaction block (BEGIN/COMMIT) to take effect. When a transaction is performed, MySQL adds an exclusive lock to each row of the query result set using the “for update” statement. Updating or deleting the record will be blocked by other threads. Exclusive locks include row locks and table locks.

InnoDB’s row-locking implementation means that InnoDB uses row-locking only when data is retrieved by index criteria. Otherwise, InnoDB uses table locking! Let’s say we have a form called Products with two columns id and name, and id is the primary key.

  • Specify the primary key explicitly, and have the pen data, row lock
SELECT * FROM products WHERE id='3' FOR UPDATE;
SELECT * FROM products WHERE id='3' and type=1 FOR UPDATE;
Copy the code
  • Specify primary key explicitly, if check no data, no lock
SELECT * FROM products WHERE id='-1' FOR UPDATE;
Copy the code
  • No primary key, table lock
SELECT * FROM products WHERE name='Mouse' FOR UPDATE;
Copy the code
  • Primary key not clear, table lock
SELECT * FROM products WHERE id<>'3' FOR UPDATE;
Copy the code
  • Primary key not clear, table lock
SELECT * FROM products WHERE id LIKE '3' FOR UPDATE;
Copy the code

Note 1: FOR UPDATE only applies to InnoDB and must be in BEGIN/COMMIT to take effect. Note 2: To test the lock condition, you can use MySQL Command Mode to open two Windows.

How do you solve the deadlock problem in MySQL?

A deadlock

Deadlock generation:

  • A deadlock is a vicious cycle in which two or more transactions occupy the same resource and request to lock the resource occupied by the other
  • Deadlocks can occur when a transaction tries to lock resources in a different order. Deadlocks can also occur when multiple transactions simultaneously lock the same resource
  • The behavior and order of locks is dependent on the storage engine. By executing statements in the same order, some storage engines generate deadlocks and some do not — deadlocks have two causes: true data conflicts; Storage engine implementation.

Deadlock detection: various deadlock detection and deadlock timeout mechanisms are implemented in database systems. InnoDB storage engine can detect deadlocked loop dependencies and immediately return an error.

Deadlock recovery: After a deadlock occurs, a deadlock can only be broken by partially or completely rolling back one of the transactions. InnoDB currently handles deadlocks by rolling back the transactions that hold the least row-level exclusive locks. So transactional applications must be designed with deadlocks in mind, and in most cases simply need to re-execute transactions rolled back by deadlocks.

Deadlock detection of external locks: InnoDB can automatically detect the occurrence of a deadlock, and cause one transaction to release the lock and fall back, and another transaction to acquire the lock and continue to complete the transaction. InnoDB does not automatically detect deadlocks when an external lock is involved, or when a table lock is involved. Innodb_lock_wait_timeout is used to solve this problem

Deadlocks affect performance: Deadlocks affect performance rather than causing serious errors, because InnoDB automatically detects deadlock conditions and rolls back one of the affected transactions. On high-concurrency systems, deadlock detection can cause slow down when many threads are waiting for the same lock. Sometimes when a deadlock occurs, it may be more effective to disable deadlock detection (using the Innodb_deadlock_detect configuration option), in which case you can rely on the Innodb_LOCK_WAIT_TIMEOUT setting for transaction rollback.

MyISAM avoids deadlocks:

  • In the case of automatic locking, MyISAM always obtains all the locks required by the SQL statement at once, so MyISAM tables do not deadlock.

InnoDB avoids deadlocks:

  • To avoid deadlocks when performing multiple concurrent writes on a single InnoDB table, it can be used at the start of a transaction by using each ancestor (row) that is expected to be modifiedSELECT ... FOR UPDATEStatement to obtain the necessary locks, even if the line change statement is executed later.
  • In a transaction, if you want to update records, you should directly apply for a lock of sufficient level, that is, exclusive lock, rather than apply for a shared lock first and then apply for an exclusive lock during the update. At this time, when users apply for an exclusive lock again, other transactions may have already obtained the shared lock of the same record, resulting in lock conflict or even deadlock
  • If a transaction needs to modify or lock more than one table, lock statements should be used in the same order in each transaction. In an application, if different programs concurrently access multiple tables, try to agree to access the tables in the same order, which can greatly reduce the chance of deadlock
  • throughSELECT ... LOCK IN SHARE MODEAfter obtaining a read lock for a row, a deadlock is likely if the current transaction needs to update the record.
  • Change the transaction isolation level

If a deadlock occurs, use show engine Innodb status; Command to determine the cause of the last deadlock. The return results include details about deadlock-related transactions, such as the SQL statement that caused the deadlock, the locks that the transaction has acquired, what locks it is waiting for, and the transactions that were rolled back. Based on this, we can analyze the cause of deadlock and improve measures.


MySQL > update MySQL

How do you optimize SQL in your daily work?

What are the general steps of SQL optimization, how to look at the execution plan (EXPLAIN), how to understand the meaning of each field?

How to write SQL to effectively use composite indexes?

How do you optimize a SQL that takes too long to execute?

What is the leftmost prefix principle? What is the leftmost matching principle?

Factors affecting mysql performance

  • Impact of business requirements on MySQL

  • Storage location Impact on MySQL

    • Data that does not fit into MySQL
      • Binary multimedia data
      • Stream queue data
      • Very large text data
    • Data that needs to be cached
      • System configuration and rule data
      • Basic information data of active users
      • Active user customization information data
      • Quasi-real-time statistical data
      • Other frequently accessed data that changes less
  • Impact of Schema design on system performance

    • Minimize requests for database access
    • Minimize query requests for useless data
  • The impact of the hardware environment on system performance

Performance analysis

MySQL Query Optimizer

  1. MySQL has the optimizer module specially responsible for the optimization of SELECT statements. Its main function is to provide the optimal execution plan for the Query requested by the client by calculating and analyzing the statistics collected in the system.

  2. When the client requests a Query from MySQL, the command parser module classifies the Query as SELECT and sends it to the MySQL Query Optimize R, the MySQL Query Optimizer first optimizes the entire Query. Dispose of some constant expression budgets and convert them directly to constant values. And simplify and transform the Query conditions in Query, such as removing some useless or obvious conditions, structure adjustment, etc. The Hint information in the Query is then parsed (if any) to see if displaying the Hint information fully determines the execution plan for that Query. If there is no Hint or Hint information is not sufficient to fully determine the execution plan, the statistics of the object involved are read, the corresponding computational analysis is written according to Query, and the final execution plan is obtained.

MySQL common Bottlenecks

  • CPU: CPU saturation occurs when data is loaded into memory or read from disk

  • IO: Disk I/O bottlenecks occur when far more data is loaded than there is memory

  • Server hardware performance bottlenecks: TOP, free, iostat, and vmstat to view the system performance status

Performance degradation SQL slow execution time long waiting time Cause Analysis

  • Query statements are poorly written
  • Index failure (single value, compound)
  • Too many joins for associated queries (design flaw or forced requirement)
  • Server tuning and parameter Settings (buffering, thread count, etc.)

MySQL common performance analysis methods

In the optimization of MySQL, it is usually necessary to analyze the database. Common analysis methods include slow query log, EXPLAIN analysis query, profiling analysis and show command to query system status and system variables. Only by locating the bottleneck of analysis performance, can the performance of the database system be better optimized.

Performance Bottleneck Location

You can use the show command to check MySQL status and variables to find bottlenecks in the system:

Mysql> show variables (Mysql> show variables like 'XXX') Mysql> show variables (Mysql> show variables like 'XXX') Mysql> show variables (Mysql> show variables like 'XXX') Mysql> show processList Mysql> show processList Mysql> show processList Shell> mysqladmin variables -u username -p password -- Displays system variables Shell> mysqladmin extended-status -u username -p password -- Displays the status informationCopy the code
Explain(implementation plan)

What it is: Use the Explain keyword to simulate the optimizer’s execution of SQL queries to see how MySQL processes your SQL statements. Analyze performance bottlenecks in your query or table structure

Can do:

  • The read order of the table
  • Operation type of data read operation
  • Which indexes are available
  • Which indexes are actually used
  • References between tables
  • How many rows per table are queried by the optimizer

How to play:

  • Explain + SQL statements
  • Information contained in the execution plan (and partitions if there are partitioned tables)

Description of each field

  • Id (Serial number of the SELECT query, containing a set of numbers indicating the order in which the SELECT clause or operation table is executed in the query)

    • The ids are the same and the execution sequence is from top to bottom
    • The ids are different. For sub-queries, the ID sequence increases. A larger ID has a higher priority and is executed earlier
    • If the IDS are the same, the ids with a large number are executed first. If the ids with the same number are executed from the top down
  • Select_type (query type, used to distinguish common query, federated query, subquery and other complex query)

    • SIMPLE: A SIMPLE select query that does not contain subqueries or unions
    • PRIMARY: If the query contains any complex subparts, the outermost query is marked as PRIMARY
    • SUBQUERY: Contains subqueries in a SELECT or WHERE list
    • DERIVED: Subqueries contained in the FROM list are marked as DERIVED, and MySQL executes these subqueries recursively, putting the results in temporary tables
    • UNION: If the second select appears after UNION, it is marked as UNION, and if UNION is included in the subquery of the FROM clause, the outer select is marked as DERIVED
    • UNION RESULT: Select the RESULT from the UNION table
  • Table (shows which table this row is about)

  • Type (shows what type the query uses, System > const > eq_ref > ref > fulltext > ref_or_NULL > index_merge > unique_subquery > index_subquery > Range > index > ALL)

    • System: the table has only one row (equal to the system table). This is a special case of const type
    • Const is used to compare the primary key or unique index. Since only one row of data is matched, mysql can quickly convert the query to a constant, such as by placing the primary key in a WHERE list
    • Eq_ref: Unique index scan. For each index key, only one record in the table matches it. This is common in primary key or unique index scans
    • Ref: a non-unique index scan whose range matches all rows of a single value. It is essentially an index access that returns all rows that match a single value. However, it may also find multiple rows that match the criteria, as it should be a mixture of lookup and scan
    • Range: Retrieves only rows in a given range, using an index to select rows. The key column shows which index is being used, typically between, <, >, in, etc., in your WHERE statement. This type of range index scan is better than a full table scan because it only needs to start at one point in the index and end at another, rather than scanning all the indexes
    • Index: Full index Scan. The index type is different from ALL. Only the index tree is traversed. It is usually faster than ALL because index files are usually smaller than data files. (all and index are both read from the full table, but index is read from the index, and all is read from the hard disk)
    • ALL: Full Table Scan: scans the entire Table to find matching rows

    Tip: In general, make sure the query is at least range and preferably ref

  • Possible_keys (Shows possible indexes in the table, one or more. Possible_keys will be listed if there are indexes in the table, but not necessarily used by the query)

  • key

    • The actual used index, if NULL, is not used

    • If an overwrite index is used in a query, the index overlaps with the select field of the query and appears only in the key list

  • key_len

    • Represents the number of bytes used in the index, which is used to calculate the length of the index used in the query. With no loss of accuracy, the shorter the length, the better
    • The value key_len displays is the maximum possible length of the index field, not the actual length, that is, key_len is calculated from the table definition, not retrieved from the table
  • Ref (which column of the index is used, if possible, is a constant. Which columns or constants are used to find values on index columns)

  • Rows (approximate estimate of the number of rows to read to find the desired record based on table statistics and index selection)

  • Extra (contains important additional information that is not suitable for display in other columns)

    1. Using filesort: indicates that mysql uses an external index to sort data, instead of reading data in the order of the indexes in the table. A sort operation in mysql that cannot be done with an index is called “file sort”. This is common in order by and group by statements

    2. Using temporary: A temporary table is used to hold intermediate results. Mysql uses temporary tables when sorting query results. Common in sort order by and group by queries.

    3. Using INDEX: indicates that an overwrite index is used in the select operation to avoid accessing rows of the table. If using WHERE is also present, the index is used to perform key lookup. Otherwise the index is used to read data rather than perform lookup operations

    4. Using WHERE: Where filtering is used

    5. Using join buffer: The join buffer is used

    6. Impossible WHERE: The value of the where clause is always false and cannot be used to obtain any primitives

    7. Select Tables Optimized Away: Without the Group by clause, optimize the COUNT(*) operation based on the index or for MyISAM storage engine without the need to wait until the execution stage. The optimization is completed during the query execution plan generation stage

    8. Distinct: Optimizes the DISTINCT operation to stop searching for the same value after finding the first matched ancestor

case:

  1. Line 1 (execution order 4) : Select * from select_type; select * from select_type; select * from select_type; select * from select_type; Select with ID 3. 【 】 the select d1. Name…

  2. Line 2 (execution order 2) : id 3 is part of the third select in the entire query. Derived because the query is contained in from. 【 SELECT id,name from t1 where other_column=”】

  3. Select subquery with select_type as subquery, the second select in the whole query. 【select id from t3】

  4. Select name,id from t2 as select_type; select * from t2 as select_type; select * from t2 as select_type;

  5. Row 5 (execution order 5) : represents the stage of reading rows from the union’s temporary table. The <union1,4> of the table column indicates the union operation with the results of the first and fourth select. [Two result union operations]

Slow Query logs

The slow query log of MySQL is used to record statements whose response time exceeds the threshold. In particular, SQL statements whose response time exceeds the value of long_query_time are recorded in the slow query log.

  • long_query_timeThe default value is 10, which means that the statement runs for more than 10 seconds
  • By default, slow log query is disabled for the MySQL database. You need to manually enable the slow log query function

Viewing the Enabled Status

SHOW VARIABLES LIKE '%slow_query_log%'
Copy the code

Enable slow log query

  • Temporary configuration:
mysql> set global slow_query_log='ON';
mysql> set global slow_query_log_file='/var/lib/mysql/hostname-slow.log';
mysql> set global long_query_time=2;
Copy the code

You can also set the file location. By default, the system gives a default file host_name-slow.log

Using the set operation to enable slow log query takes effect only for the current database. If MySQL is restarted, it will be invalid.

  • Permanent configuration

    Modify the my.cnf or my.ini configuration file by adding two configuration parameters under the line [mysqld]

[mysqld]
slow_query_log = ON
slow_query_log_file = /var/lib/mysql/hostname-slow.log
long_query_time = 3
Copy the code

Note: The log-slow-queries parameter indicates the location where slow query logs are stored. Generally, this directory must have the write permission of the MySQL running account. Generally, this directory is set as the directory where MySQL data is stored. Long_query_time =2 indicates that the query takes more than two seconds to record; Add log-queries-not-using-indexes to my. CNF or my.ini to record the queries that do not use indexes.

You can verify this with Select sleep(4).

In a production environment, if you manually analyze the log, search and analyze SQL, it is still difficult, so MySQL provides the log analysis tool mysqlDumpslow.

Use mysqlDumpslow –help to view operation help information

  • Get the 10 SQL that return the most recordsets

    mysqldumpslow -s r -t 10 /var/lib/mysql/hostname-slow.log

  • Get the 10 most frequently accessed SQL

    mysqldumpslow -s c -t 10 /var/lib/mysql/hostname-slow.log

  • Get the top 10 queries in chronological order that contain the left join

    mysqldumpslow -s t -t 10 -g "left join" /var/lib/mysql/hostname-slow.log

  • It can also be used with pipes

    mysqldumpslow -s r -t 10 /var/lib/mysql/hostname-slow.log | more

You can also use pt-query-digest to analyze RDS MySQL slow query logs

Show Profile Analysis query

Slow log query enables you to know which SQL statements are inefficiently executed. Explain enables you to know the execution status and index usage of SQL statements. You can also view the execution status with the Show Profile command.

  • Show Profile is a resource that MySQL provides to analyze the resource consumption of statement execution in the current session. Can be used for SQL tuning measurement

  • By default, the parameter is turned off and the results of the last 15 runs are saved

  • Analysis steps

    1. Check whether the current mysql version supports it

      mysql>Show variables like 'profiling'; -- Disable the function by default. Enable the function before using itCopy the code
    2. Enable the function. It is disabled by default and must be enabled before use

      mysql>set profiling=1;  
      Copy the code
    3. Run the SQL

    4. View the results

      Copy the code

    mysql> show profiles; +———-+————+———————————+ | Query_ID | Duration | Query | + — — — — — — — — — – + — — — — — — — — — — — – + — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — + | 1 | | 0.00385450 show the variables like “profiling” | | | 2 0.00170050 | show the variables like “profiling” | | 3 | | 0.00038025 select * from t_base_user | +———-+————+———————————+

    Copy the code
    1. Diagnostics SQL, show profile CPU,block IO for Query ID

    2. Daily development needs attention to conclusions

      • Converting HEAP to MyISAM query results are too large, there is not enough memory to move to disk.

      • Create TMP table creates a temporary table

      • Copying to TMP table on disk

      • locked

    
    
    
    Copy the code

In what cases do you not use indexes in a query?

Performance optimization

The index optimization

  1. Full value matches my favorite
  2. Optimal left prefix rule, such as creating a joint index (a,b,c), then in fact we can use the index (a), (a,b), (a,b, C)
  3. Not doing anything on the index column (calculation, function, (automatic or manual) type conversion) will cause the index to fail and move to a full table scan
  4. The storage engine cannot use the column to the right of the range condition in the index
  5. Minimize select by using overridden indexes (queries that access only the index (the index column is the same as the query column)
  6. Is null,is not NULL also cannot use the index
  7. Like “XXXX %” can use indexes, like “% XXXX %” can’t (like “% XXX %”). Like begins with a wildcard (‘% ABC… Index failure becomes a full table scan operation,
  8. The index of a string without single quotation marks is invalid
  9. Use or sparingly, as it will cause index failure when used for joins
  10. <, <=, =, >, >=, BETWEEN, IN available to index, <>, not IN,! = does not work, which results in full table scan

General Recommendation

  • For single-key indexes, try to select indexes with better filtering for the current Query

  • When selecting a composite index, the filtering field in the current Query is higher in the index field order, the better.

  • When selecting a composite index, try to select an index that contains as many fields as possible from the WHERE clause in the current query

  • Select the appropriate index by analyzing statistics and adjusting the way query is written whenever possible

  • Hint enforces indexes sparingly

Query optimization

Forever subscripts drive large tables (small cubes drive large cubes)

Slect * from A where id in (select ID from B) 'equivalent to # equivalent to select id from B select * from A where a.id =B.idCopy the code

In is used better than exists when the data set of table B must be smaller than that of table A

Select * from A where b.id = a.id '; select * from A where b.id = a.id 'Copy the code

Exists is better than in when the data set of table A is smaller than that of table B

Note: The ID field of table A and table B should be indexed.

Order by keyword optimization

  • Order by clause, try to use Index rather than FileSort

  • MySQL supports two kinds of sorting methods, FileSort and Index. Index is efficient, which means that MySQL scans the Index itself to complete sorting. FileSort is inefficient.

  • ORDER BY (Index); The ORDER BY statement uses the left-most column of the index. The ORDER BY clause uses the left-most column of the index

  • Sort the index column as much as possible, following the best prefix for the index

  • If not, filesort has two algorithms, mysql will need to enable double sort and single sort

    • Double sort: MySQL 4.1 used double sort, which literally means scanning the disk twice to get the data
    • Single-way sort: Reads all columns required for a query from disk, sorts them by order column in buffer, and then scans the sorted list for output, which is more efficient than double-way sort
  • Optimization strategy

    • Increases the sort_buffer_size parameter setting
    • Increases the max_lencth_for_sort_data parameter setting

GROUP BY keyword optimization

  • Group by is essentially sorted and then grouped, following the best left prefix for the index
  • Increases when the index column cannot be usedmax_length_for_sort_dataParameter Settings, increasessort_buffer_sizeParameter Setting
  • Where having is higher than where having can be qualified

Data type optimization

MySQL supports a wide variety of data types, and choosing the right data type is critical to achieving high performance. No matter what type of data you store, a few simple rules can help you make a better choice.

  • Smaller is usually better: In general, you should try to use the smallest data type that can store data correctly.

    Simple is good: Simple data types generally require fewer CPU cycles. For example, integers are less expensive to operate on than characters because character sets and collation rules (collation rules) make character comparisons more complicated than integers.

  • Try to avoid NULL: It is usually best to specify a NOT NULL column


Nine, partition, table, library

MySQL partition

Normally the tables we create correspond to a set of storage files, one with MyISAM storage engine. MYI and.myd files, a.ibd and.frm (table structure) file with Innodb storage engine.

When the amount of data is large (usually more than 10 million records), the performance of MySQL begins to degrade. At this time, we need to spread the data into multiple storage files to ensure the efficiency of the single file

Can do

  • Logical data segmentation
  • Improve the speed of single write and read applications
  • Improves the speed of partition range read queries
  • Split data can have multiple different physical file paths
  • Save historical data efficiently

How to play

First check whether the current database supports partitioning

  • MySQL5.6 and previous versions:

    SHOW VARIABLES LIKE '%partition%';
    Copy the code
  • MySQL5.6:

    show plugins;
    Copy the code

Partition type and operation

  • RANGE partition: Allocates multiple rows to a partition based on column values belonging to a given contiguous interval. Mysql will place the data in different table files according to the specified split strategy. It’s like, on paper, it’s broken into little pieces. However, the external feeling to customers is still a table, transparent.

    According to the range, it means that each library has a continuous section of data, which is generally based on time range, such as transaction table, sales table, etc. Data can be stored according to year. It can cause hot issues, with a lot of traffic on the latest data.

    Range, and the good thing is that it’s really easy to expand.

  • LIST partitioning: Similar to partitioning by RANGE, each partition must be clearly defined. The main difference is that each partition in the LIST partition is defined and selected based on the fact that the values of a column are subordinate to a value in a LIST of values, whereas the RANGE partition is subordinate to a set of consecutive interval values.

  • HASH partition: A partition selected based on the return value of a user-defined expression computed using the column values of the rows to be inserted into the table. This function can contain any valid expression in MySQL that produces a non-negative integer value.

    Hash distribution, which has the benefit of evenly distributing the amount of data and request pressure per library; The downside is that capacity expansion is cumbersome and involves a process of data migration, where the previous data needs to be re-hash and reassigned to different libraries or tables

  • KEY partitioning: Similar to HASH partitioning, except that KEY partitioning supports only one or more columns and the MySQL server provides its own HASH function. One or more columns must contain integer values.

Partition tables look cool, but why do most of the Internet still prefer to scale horizontally with their own partition tables?

  • Partition table, partition key design is not flexible, if you do not go to the partition key, it is easy to appear full table lock
  • Once the amount of concurrent data comes up, it is a disaster if you implement association in partitioned tables
  • Their own database and table, their own control of business scenarios and access mode, controllable. Partition table, r & D write a SQL, are not sure how to play mysql, not too controllable

With the development of services, more and more complex services, more and more modules are applied, the total amount of data is large, and high concurrent read and write operations exceed the processing capacity of a single database server.

This is where data sharding comes in. Data sharding refers to dividing data from a single database into multiple databases or tables according to a dimension. The effective method of data sharding is to divide database and table into relational database.

Different from the partition, the partition is generally placed in the single machine, with more time range partition, convenient archiving. However, the partition of the database and table requires code implementation, partition is mysql internal implementation. Subtables and partitions do not conflict and can be used together.

Talk about the design of sub-database and sub-table

MySQL table

There are two ways to split table, one is vertical split, the other is horizontal split.

  • The vertical resolution

    Vertical table, usually by the frequency of use of business functions, the main, hot fields together as the main table. Then gather the infrequently used ones according to their business attributes and split them into different secondary tables. The relationship between primary and secondary tables is usually one-to-one.

  • Horizontal split (data sharding)

    The capacity of a single table does not exceed 500 W. Otherwise, you are advised to split the table horizontally. A table is copied into different tables with the same table structure, and data is divided according to certain rules and stored in these tables to ensure that the capacity of a single table is not too large and improve performance. Of course these tables with the same structure can be placed in one or more databases.

    Several methods of horizontal segmentation:

    • Using an MD5 hash, you do this by MD5 encrypting the UID, then taking the first few bits (in our case, the first two bits), and then you can hash different UIds into different user tables (user_xx).
    • You can also put different tables according to the time, for example: article_201601, article_201602.
    • According to the heat split, the high click rate of the entry generated their own table, the low heat of the entry are placed in a large table, until the low heat of the entry reached a certain number of posts, and then the low heat of the form separated into a table.
    • According to the value of the ID, add the corresponding table, the first table USER_0000, the second one million user data in the second table user_0001, as the number of users, directly add the user table.

MySQL depots

Why separate repositories?

In the database cluster environment, there are many slaves, which basically meet the reading operation. However, write operations or large data and frequent write operations have a great impact on master performance. In this case, a single library cannot solve the problem of large-scale concurrent write, so separate libraries are considered.

What is a branch library?

There are too many tables in one database, which leads to massive data and system performance degradation. Therefore, the tables originally stored in one database are split and stored in multiple libraries. Usually, the tables are divided according to functional modules and relationship degree and deployed to different libraries.

Advantages:

  • Reduce the impact of locks on queries when incremental data is written

  • As the number of single tables decreases, common query operations reduce the number of records that need to be scanned, reducing the number of rows required for a single query, reducing disk I/OS, and shortening the latency

However, it cannot solve the problem of too much data in a single table

The difficult problem after sub – library sub – table

Distributed transaction issues, data integrity and consistency issues.

Data operation dimension problem: different comparative analysis angles of different dimensions of users, transactions and orders, user query dimension and product data analysis dimension. For the problem of cross-library joint query, it may need to query the count, order BY, group BY and aggregate function of cross-nodes twice, and it may need to merge the results on the application side after obtaining the results on each node. Additional data management burden, such as the navigation of accessing the data table, additional data operation pressure, such as: Need to execute in multiple nodes, and then combine the calculation program code to improve the difficulty of development, there is no good framework to solve, more rely on business to see how to divide, how to combine, is a difficult problem.

If you’re a serious company, you won’t let Javaer do it, but be aware of that

Master/slave replication

The basic principle of replication

  • The slave reads the binlog from the master for data synchronization

  • Three steps

    1. The master logs the changes to the binary log. These logging processes are called binary log events;
    2. Salve copies the master binary log events to its relay log.
    3. Slave reworks events in relay logs and applies changes to its own database. MySQL replication is asynchronous and serialized.

Basic principles of replication

  • Each slave has only one master
  • Each Salve can have only one unique server ID
  • Each master can have multiple salves

The biggest problem with replication

  • Time delay

11. Other issues

Name three paradigms

  • First Normal Form (1NF) : Fields in database tables are single-attribute and non-divisible. This single attribute is made up of basic types, including integer, real, character, logical, date, and so on.
  • Second normal Form (2NF) : There is no partial function dependence of non-key fields on any candidate key field in a database table (partial function dependence refers to the presence of some field in the combined key determining the non-key field), that is, all non-key fields are completely dependent on any set of candidate keys.
  • Third normal Form (3NF) : On the basis of second normal form, the data table conforms to third normal form if there is no transfer function dependence of non-key fields on any candidate key field. The so-called transfer function dependence refers to the dependence of C transfer function on A if there is A “A → B → C” decision relation. Therefore, database tables that meet the third normal form should not have the following dependencies: key field → non-key field X → non-key field Y

How do I delete data at the million level or above

About index: Because index needs extra maintenance cost, index file is a separate file, so when we add, modify and delete data, there will be extra operations on index file, these operations need to consume extra IO, which will reduce the efficiency of add/change/delete. So, when we delete millions of database data, check the MySQL official manual to see that the speed of deleting data is proportional to the number of indexes created.

  1. So when we want to delete millions of data, we can delete the index first.
  2. Then delete the useless data (this process takes less than two minutes)
  3. Re-create the index after the deletion (when there is less data) is also very fast, in about 10 minutes.
  4. With the previous direct delete is definitely much faster, not to mention in case of delete interruption, all delete will be rolled back. That’s even worse.

Reference and thanks:

zhuanlan.zhihu.com/p/29150809

Juejin. Cn/post / 684490…

Blog.csdn.net/yin76783337…