paradigm

A paradigm is a set of principles that a table is designed to follow, reflecting the relationships between attributes in the table

(1) First paradigm

Emphasize that each column of attributes in the database cannot be further split, for example, if one column is a phone, it can be split between a work phone and a life phone, which is not in the first paradigm

(2) Based on the first normal form, the second normal form emphasizes that some attributes in the table cannot be partially dependent on unique codes;

The unique code does not refer to the increment ID of, but to the fact that each row of records is uniquely identified by several attributes, which together uniquely identify a record

For example, a table has student numbers, names, subject grades

Only code is student number + course, only certain is result

The name here is partially dependent on the student number + subject, because the same name corresponds to the same name for different subjects.

So names don’t fit the second normal form here, and to do so, you have to split it into two tables

(3) Third normal form

The third normal form is based on the second normal form, further eliminating the dependence of some attributes on unique code passing

It shows that unique code A uniquely identifies all other columns B,C, and D

But B can determine C and D

Therefore, dependency transfer occurs, which should be split into tables A, B and B,C,D

Index and lock reference links

Index and lock

B tree

(1) B tree has multiple layers, each layer has multiple nodes, each layer contains multiple elements, and each element contains key and value. Different keys in each node are in order, and the key of each node is also in order, increasing from left to right, and value corresponds to specific data

(2) Elements in all layer nodes of the B tree will not be repeated (the key of the element may be repeated, but the value will not be repeated)

(3) Each node of the B tree corresponds to a page in the disk, so each node is limited in size

(4) The order of B tree is determined by the maximum value of child nodes among nodes. For example, a node in a B tree has five child nodes, which is the largest number in the tree, so a B tree has five order, and the order is very important. Each node has a maximum of several child nodes depending on the size of each element. If the element is small, the current node can contain more elements, and each element represents one child node. So we can have more degrees in the B tree

(5) The root node has only 2 child nodes, and the number of child nodes of other nodes n meets the order / 2 <= n <= between the order.

(6) The number of elements that each node can contain K satisfies (order / 2-1) <= K <= (Order -1)

(7) B tree is a self-balanced multi-fork search tree. When adding or deleting elements, if the number of elements of a node does not meet the requirements, the new elements will split the node (the middle node is upgraded to the upper layer, recursively), and the deletion of elements will make the nodes merge

The core is to set limits on the number of nodes and node elements to ensure performance

B + tree

On the basis of the B tree

(1). Except for the last layer, each node in the upper layer stores only key values instead of value values, so that each node (a disk block) can store more elements, that is, for the same data, the order of B+ tree is higher than that of B tree. With more nodes in the upper layer, the B+ tree becomes shorter and finds shorter paths.

(2) The elements of the node in the last layer include all the elements. Different nodes in the last layer are connected together and can be accessed sequentially

(3) B+ trees and everything else is just like B trees

The hop table is a B+ tree in memory form. The hop table is used by Redis, and the data is in memory. Therefore, the hop table is designed according to the idea of B+ tree

How do B trees and B+ trees get from one node to the other

(1) Record the root node to the memory according to the disk address to undergo an I/O

(2) Obtain the disk address of the next node based on the root node and perform I/O

(3) Load the content into memory according to the disk address, find the target element, and return.

So when you use an index, for a B tree you get a few IO’s for a few nodes, and for a B+ tree, the IO’s are always the height of the tree.

Is the part of the mysql B tree or B+ tree that stores data stored on disk or in the data itself?

It has to do with data engines

In the InnoDB

The index tree corresponding to the primary key ID is called the clustered index. All cotyledon nodes store the data itself, while other index trees except the primary key ID store the value of the primary key ID, and then query the specific value through the clustered index.

That is, when the disk of the last cotyledons in the cluster index reads memory, the value of each element is the data itself

The value of each element is the primary key ID when the disk in the last cotyledon of a non-clustered index reads into memory

The MyISAM

Without clustering indexes, when the cotyledon values are read from disk to memory, the value of each element is still a disk address, which needs to go through another IO

Mysql index

The characteristics and differences of different indexes

(1) B+ tree index

InnoDB uses this index as the base index, usually referring to B+ tree index rather than hash index.

If each column is indexed, then each column corresponds to a B+ tree, and the values of each column and row correspond to the key values of the elements of each node in the B+ tree, depending on the case

Because of its orderliness, the index can be used for lookup, sorting, and grouping

Applies to the full key value, the key value range, and the key prefix search, where the key prefix search only applies to the leftmost prefix search. Indexes cannot be used if they are not looked up in the order of the index columns.

(2) Hash index

Used for precise lookup, not range lookup

Cannot be used for sorting or grouping

Generally used to complement B+ tree indexes, such as a value that is always accurately searched, a new hash index is added

(3) Full-text index

MyISAM and InnoDB storage engines are supported after MySQL 5.6.4

Is an inverted index used to find keywords

Search criteria use MATCH AGAINST instead of the normal WHERE

InnoDB primary key index and cluster index

A primary key index may not exist, but a cluster index must exist. Because InnoDB data is stored in a clustered index, if there is no clustered index, then nothing can be found.

Clustering index details

Clustered index and non-clustered index

What is a full table scan?

Mysql > select * from a table where rows are stored in a file. Mysql > select * from a table where rows are stored in a file. Mysql > select * from a table where rows are stored in a file.

Actually it’s not, for example InnoDb, it puts all the data together into a clustered index, which is called an index, but it’s actually a table of contents, which is equivalent to a full table scan, which is traversing all the cotyledons of that table of contents. In MyIsam, cotyledon nodes store the addresses of data, so when scanning the full table, we need to read the data in the corresponding addresses.

Index usage issues

Mysql has two types of single-column and multi-column indexes

Single index

Index cannot be used:

    1. Select * from XXX where id + 1 = 2;
    1. Select * from XXX where sum(id) = 1;
    1. Select * from XXX where id = “1”; Id is defined as int in mysql, and the string 1 passed as the argument hides the conversion function

Indexes are not used in any of the above three cases

Single-column index, when the retrieval condition is (A, B, C), it will first determine which result is the least according to A, B, and C respectively (on the premise that there are indexes on A, B, and C). Assuming that the table meets the conditions of 10W A, 12W B, and 5W C, then 5W results will be obtained according to C. Then the whole table traverses the 5W data, and the result is obtained

The column index more

A multi-column index is better than a single-column index because a multi-column index is used by each column index. If each column has an index, only the most selective column index is used when multiple parameters are used.

Multi-column index in B+ tree. When the retrieval condition is (A, B, C), the key value of non-cotyledons in the upper layer will only store the value of INDEX A, and the last leaf node will have multiple nodes with the same key. Then the value value is the value of index B and C stored by column, and the same key. Different values, sorted in order of value.

Benefits of multi-column indexes

How do I avoid back table queries and what is index overwrite?

(1) Avoid back table

If the index tree already contains all the fields of our query, we can just look up the index tree and get the result without going back to the table.

(2) Index coverage

Index override has two meanings:

  • (1) If the query is not sorted, the required fields are set up in the composite index to avoid back to the table.
  • (2) If it is a sort query, add the sort field to the index to ensure that the nodes of the index tree contain the sort field.

If max_LENGTH_FOR_sorT_data is shorter than max_LENGTH_FOR_sort_data, higher query performance will be achieved.

The size of the sort field also affects sort performance

(3) The multi-column index is not sequential search but binary search

Does InnoDB look up sequentially?

After a multi-column index finds a cotyledon node based on the first column, a binary search is performed for each element in that page because the elements are ordered.

Multiple column index order selection problem

A multi-column index follows the left-most prefix. A multi-column index retrieves the first index and then the second index in the result, so the order of the index selection should be on the left.

Mysql now has an optimizer. If you do not use the left-most prefix, the optimizer will automatically optimize to match the left-most prefix.

How many values should be queried for the IN field query?

MySQL has a configuration parameter eq_range_index_dive_limit, which is used to query values (e.g. In query), if the number of equivalent conditions is less than the configuration parameter, the query cost is analyzed by scanning index tree. If the value is greater than or equal to the configuration parameter, the query cost is analyzed by index statistics.

If the value of IN query field ID IN the SQL is smaller than eq_RANGE_INDEx_DIve_limit, the query cost is analyzed by index tree scanning. If the value is greater than or equal to eq_RANGE_INDEx_DIve_limit, the query cost is analyzed by index statistics.

  • Index tree scan analysis: The cost results obtained are accurate, which means that MySQL can choose the correct execution plan to ensure the performance of the statement query
  • Index statistical analysis: query cost analysis is fast and cheap. However, its disadvantages are also obvious, because there is no need to scan the index tree, through the rough statistics of index use, the query cost, MySQL may choose the wrong execution plan, so that the PERFORMANCE of SQL query decreases.

Therefore, the value of the IN query field should not exceed the parameter eq_range_INDEx_dive_limit, so that MySQL can correctly select the execution plan and ensure the performance of SQL queries.

The default value of the eq_range_INDEx_dive_limit parameter was updated to 200 in version 5.7.

Disable index

Mysql creates indexes for primary keys by default. Indexes can greatly improve query speed and performance. However, mysql also takes time to build an index, so creating an index when inserting data slows down the speed of inserts. To solve this problem, we can turn the index off and then on again during mass inserts of database data.

(1) Disable indexes

Alter table table_name DISABLE KEYS;

Alter table table_name ENABLE KEYS;

(2) force the specified index to be used or not to be used in a SQL statement.

Mysql > select * from user where id = ‘PRI’;

Such as:

select * from table force index(PRI) limit 2; Select * from table force index(ziduan1_index) limit 2; select * from table force index(ziduan1_index) limit 2; (forcibly use index" ziduan1_index") select * from table force index(PRI,ziduan1_index) limit 2; (Forced use of indexes "PRI and ziduan1_index")Copy the code

Mysql > disallow index PRI (primary key PRI)

Such as:

select * from table ignore index(PRI) limit 2; Select * from table ignore index(ziduan1_index) LIMIT 2; Select * from table ignore index(PRI,ziduan1_index) limit 2; (Disallow index" PRI,ziduan1_index")Copy the code

Disable uniqueness checking

When inserting data, MySQL performs a uniqueness check (unique key index) on the inserted record. This check also slows down the insertion speed.

Disable unique checks: SET UNIQUE_CHECKS = 0;

SET UNIQUE_CHECKS = 1;

InnoDB row level lock thinking

  • Disadvantages of row locking: high overhead; Lock is slow; Deadlocks can occur
  • Advantages of row lock: the granularity of lock is small and the probability of lock conflict is low. Strong ability to handle concurrency

Row-level locks

In InnoDB, row-level locks are implemented by locking index entries. InnoDB uses row-level locks only when index conditions are used to retrieve data. Otherwise, InnoDB uses table locks. Which index tree (primary key index, or normal index) is locked depends on which index is hit by the SQL statement.

If a SQL statement uses a primary key index, mysql will lock the primary key index. If a statement operates on a non-primary key index, mysql will first lock the non-primary key index and then lock the primary key index.

In practice, pay special attention to InnoDB’s row locking feature, otherwise it can lead to a large number of lock conflicts, which can affect concurrency performance. Here are some practical examples to illustrate.

  • (1) InnoDB does use table locks instead of row locks when queries are not indexed.
  • (2) MySQL row lock is a lock on the index, not on the record, so although different rows are accessed, if the same key is used, there will be lock conflicts.
  • (3) When a table has multiple indexes, different transactions can use different indexes to lock different rows. In addition, InnoDB uses row locks to lock data whether using primary key indexes, unique indexes or normal indexes.
  • (4) even in conditions were used in the field, but whether to use the index to retrieve the data by MySQL by judging the cost of different execution plans to determine, if the MySQL think a full table scan is more efficient, such as for some small table, it will not use the index, in this case the InnoDB lock table, rather than row locks. Therefore, when analyzing lock conflicts, don’t forget to check the EXECUTION plan of your SQL to see if indexes are actually being used.

Three row-level locks

  • Record Lock: A Lock on a single Record
  • Gap Lock: A Gap Lock that locks a range but not the record book
  • Next-key Lock: Gap Lock+Record Lock locks a range, and locks the Record itself

For example, an index has the values 10,11,13,20. InnoDB can use Record Lock on 10, 11,13,20 as needed, and Gap Lock on (-∞,10), (10,11), (11,13), (13,20), (20, +∞). Next-key Locking is similar to the combination of the above two types of locks. It can be locked in (-∞,10], (10,11], (11,13], (13,20], (20, +∞). It can be seen that it not only locks a range, but also locks the record itself. Some rules of InnoDB storage engine lock algorithm are shown below, and the corresponding experimental cases and detailed explanations will be given in the following sections.

InnoDB locks all records in a table without indexing criteria. Therefore, for performance purposes, all fields in a conditional query in a WHERE statement should be indexed.

InnoDB uses indexes to lock rows, not records. Therefore, when two different records of an operation have the same index, the wait also occurs because the row lock is locked.

Due to InnoDB’s indexing mechanism, database operations use the primary key index, InnoDB locks the primary key index; When InnoDB uses a non-primary key index, InnoDB locks the non-primary key index first and then the primary key index.

When the index of the query is a unique index (no two rows have exactly the same Key value), InnoDB storage engine degrades next-key Lock to Record Lock, which locks only the index itself, not the range.

InnoDB has special treatment for secondary indexes. It not only locks the range of secondary indexes, but also applies a Gap LOCK to the value of the next key.

InnoDB uses next-key Lock mechanism to avoid Phantom Problem.

Which indexes do row locks apply to?

InnoDB storage engine row locking is implemented by locking indexes, not records. This is the key to understanding many database locking problems.

Due to InnoDB’s special indexing mechanism, InnoDB locks primary key indexes when database operations use primary key indexes. When InnoDB uses a non-primary key index, InnoDB locks the non-primary key index first and then the primary key index.

As shown in the figure below, field A is the primary key and field B is an attribute. When InnoDB locks non-primary key index B, it also locks its corresponding primary key index, so if InnoDB locks non-primary key index B with value 2 and 3, then InnoDB also locks its corresponding primary key index A with value 6 and 5.

Deadlock situations can occur in this case, as shown in the following figure:

Why InnoDb can implement row locking?

InnoDB because of the existence of the clustering index, all the query finally certainly is a clustering index (both hit the index, and a full table scan), the last is to get the data through clustering index, thus row-level locks will eventually add on clustering index, thus ensure the, there will be no row-level locks and a full table scan conflict problems.

The deadlock problem

Deadlocks occur when multiple transactions simultaneously hold and request locks on the same resource resulting in circular dependencies. So transaction A locks row X, and then needs to lock row Y, and then transaction B locks row Y, and needs to lock row x. Thus, A and B transactions are waiting for each other to lock, causing A deadlock.

In InnoDb, row-level locking is more prone to deadlocks.

The show engine Innodb status command displays deadlock logs

How does Mysql handle deadlocks

Mysql itself can solve deadlock problems.

MySQL has two ways to handle deadlocks:

  • Wait until timeout (innodb_lock_WAIT_timeout =50s).
  • Initiate deadlock detection that actively rolls back one transaction to allow other transactions to continue (innodb_deadlock_detect=on).

Deadlock detection is generally used to handle deadlocks for performance reasons.

Deadlock detection

The principle of deadlock detection is to construct a directed graph with transaction as vertex and lock as edge, and judge whether there is a ring in the graph.

The rollback

When a deadlock is detected, the transaction rollback with the least number of rows inserted or deleted is selected, based on the TRx_weight field in the Information_schema.innoDB_TRX table.

How do I avoid deadlocks

Deadlock solution (1) Add index

Refine the index granularity to prevent row locks from locking too much data at once

(2) The sequence of acquiring locks between different transactions should be as consistent as possible

For example, if transaction A wants to lock A and B, then transaction B also locks A and B (do not lock B and then lock A), then B will fail to lock A.

(3) If multiple resources need to be locked, lock them at one time. Either they will be locked successfully or they will fail directly (enter waiting).

For example, lock multiple ids at once.

Select * from xxx where id in (xx,xx,xx) for update
Copy the code

Explain the usage of the

The EXPLAIN statement returns the execution plan for MYSLQ. With the information it returns, we can see how the MYSQL optimizer executes SQL statements. Explain only returns the execution plan, not the actual execution.

MYSQL 5.6.3 previously could only EXPLAIN SELECT; EXPLAIN SELECT,UPDATE,DELETE from MYSQL5.6.3

Explain the key field type returned

Type The possible return value indicates the link type of the index

Eq_ref > ref > range > index > ALL The good or bad here is actually the number of lookups using the index tree, and the fewer the lookups, the better the performance.

System: The table has only one row (equal to the system table). This is a special case of the const join type.

Const: For unique indexes, data can be found after a search of the index tree, for primary key indexes, for single table queries. For example, select * from XXX where id = 1;

Eq_ref: also applies to a unique index, but refers to the multi-table query, that is, in the multi-table query, the primary key index is used. Select t1, t2 from table1 t1, table2 T2 where t1.id = t2.id;

Select * from XXX where name = “zhangsan”; select * from XXX where name = “zhangsan”;

Range: finding all data in the index tree is equivalent to multiple const or ref. Select * from XXX where id in (1, 2, 3);

Index: if there is no where condition, only the index column is queried, and only the entire index tree is scanned. For example, select name from XXX; Where name is the normal index column,

All: scans all tables without indexes. For example, select * from XXXX. No rows with indexed columns are hit here

Explain other important fields

Row: indicates the number of affected rows

Possible_keys: Possible index

Key: indicates the used index. The unused index is null

Mysql data engine differences

InnoDB: transaction support, row locking support, four isolation levels, clustering index support

MyISAM: simple design, does not support transactions, does not support row locking, low requirements can be used