What is the index of MySQL? How do you optimize?

Index is similar to the bibliographic index of university library, which can improve the efficiency of data retrieval and reduce the IO cost of database. MySQL performance starts to decline around 3 million records, although official documentation says 500~800w records, so indexing is necessary for large data volumes. MySQL provides Explain, which displays detailed information about SQL execution and allows for index optimization.

1. Causes of slow SQL execution:

1. Hardware problems. For example, the network speed is slow, the memory is insufficient, the I/O throughput is small, and the disk space is full.

2. There is no index or the index is invalid. In Internet companies, dbAs lock tables in the middle of the night and re-index them because when you delete data, the tree structure of the index is incomplete. So Internet companies do fake deletions of their data. One is to do data analysis, and two is not to break the index.)

3. Too much data (separate databases and tables)

4. Server tuning and parameter setting (adjust my.cnf)

Second, when analyzing the reasons, we must find a breakthrough point:

1. Enable slow query logs and set thresholds (for example, SQL queries that exceed 3 seconds are slow). After running in the production environment for one day, check which SQL queries are slow.

2.Explain and slow SQL analysis. Examples include poorly written SQL statements, missing or invalid indexes, too many associated queries (sometimes due to design flaws or unnecessary requirements), and so on.

3.Show profiles are execution details that go a step further than Explain, showing what was done for each SQL execution and how many seconds it took.

4. Ask the DBA or O&M to tune the parameters of the MySQL server.

What is an index?

An Index is a data structure that helps MySQL obtain data efficiently. We can simply understand it as: a kind of data structure that can be quickly found and sorted. Mysql index has two main structures: B+Tree index and Hash index. Index (B+Tree) index (B+Tree) The index is shown below:

The outermost light-blue disk block 1 contains data 17 and 35 (dark blue) and Pointers P1, P2, and P3 (yellow). P1 indicates disk blocks smaller than 17, P2 indicates disk blocks between 17 and 35, and P3 indicates disk blocks larger than 35. The real data resides in the cotyledons, the bottom layer 3, 5, 9, 10, 13…… Non-leaf nodes do not store real data, but only the data items that guide the search direction, such as 17 and 35.

Search process: for example, to search for 28 data items, first load disk block 1 into memory, an I/O occurs, and use binary search to determine the P2 pointer. Then it is found that 28 is between 26 and 30, and disk block 3 is loaded into memory through the address of P2 pointer, and a second I/O occurs. Find disk block 8 in the same way, and a third I/O occurs.

The reality is that the B+Tree at the top 3 levels can represent millions of data, millions of data with only three I/ OS instead of millions of I/ OS, and the time improvement is huge.

4. Explain analysis

After the foreshadowing is completed, enter the practical operation part. First, insert the data required by the test:

CREATE TABLE `user_info` (
  `id`   BIGINT(20)  NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(50) NOT NULL DEFAULT '',
  `age`  INT(11)              DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `name_index` (`name`)
)ENGINE = InnoDB DEFAULT CHARSET = utf8;

INSERT INTO user_info (name, age) VALUES ('xys', 20);
INSERT INTO user_info (name, age) VALUES ('a', 21);
INSERT INTO user_info (name, age) VALUES ('b', 23);
INSERT INTO user_info (name, age) VALUES ('c', 50);
INSERT INTO user_info (name, age) VALUES ('d', 15);
INSERT INTO user_info (name, age) VALUES ('e', 20);
INSERT INTO user_info (name, age) VALUES ('f', 21);
INSERT INTO user_info (name, age) VALUES ('g', 23);
INSERT INTO user_info (name, age) VALUES ('h', 50);
INSERT INTO user_info (name, age) VALUES ('i', 15);

CREATE TABLE `order_info` (
  `id`           BIGINT(20)  NOT NULL AUTO_INCREMENT,
  `user_id`      BIGINT(20)           DEFAULT NULL,
  `product_name` VARCHAR(50) NOT NULL DEFAULT '',
  `productor`    VARCHAR(30)          DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `user_product_detail_index` (`user_id`, `product_name`, `productor`)
)ENGINE = InnoDB DEFAULT CHARSET = utf8;

INSERT INTO order_info (user_id, product_name, productor) VALUES (1, 'p1', 'WHH');
INSERT INTO order_info (user_id, product_name, productor) VALUES (1, 'p2', 'WL');
INSERT INTO order_info (user_id, product_name, productor) VALUES (1, 'p1', 'DX');
INSERT INTO order_info (user_id, product_name, productor) VALUES (2, 'p1', 'WHH');
INSERT INTO order_info (user_id, product_name, productor) VALUES (2, 'p5', 'WL');
INSERT INTO order_info (user_id, product_name, productor) VALUES (3, 'p3', 'MA');
INSERT INTO order_info (user_id, product_name, productor) VALUES (4, 'p1', 'WHH');
INSERT INTO order_info (user_id, product_name, productor) VALUES (6, 'p1', 'WHH');
INSERT INTO order_info (user_id, product_name, productor) VALUES (9, 'p8', 'TE');Copy the code

Initial experience of the effect of implementing Explain:

There are three columns possible_keys, key, and KEY_len for index usage. Next, we’ll explain them from left to right.

1.id

Explain select u.*, O.* from user_info u,order_info o where u.id= o.usser_id;Copy the code

Select * from user_info where id=(select user_id from order_info where product_name ='p8'); select * from user_info where id=(select user_id from order_info where product_name ='p8')Copy the code

2.select_type

You can see the following types of execution instances of id:

SIMPLE: indicates that the query does not contain UNION query or subquery
PRIMARY: indicates that this query is the outermost query
SUBQUERY: The first SELECT in a SUBQUERY
UNION: Indicates that this query is the second or subsequent query to the UNION
DEPENDENT UNION: The second or subsequent query statement in the UNION, depending on the external query
The UNION RESULT
DEPENDENT SUBQUERY: The first SELECT in a SUBQUERY, DEPENDENT on the external query. That is, subqueries depend on the results of outer queries.
DERIVED: DERIVED, representing the SELECT (subquery FROM the clause) FROM the DERIVED table

3.table

Table represents the tables involved in the query or derived tables:

explain select tt.* from (select u.* from user_info u,order_info o where u.id=o.user_id and u.id=1) ttCopy the code

The <derived2> with id 1 is derived from the u and O tables with ID 2.

4.type

The Type field is important because it provides an important basis for determining whether a query is efficient. Through the type field, we can determine whether the query is a full table scan or an index scan, etc.

The common values of type are:

System: There is only one data in the table. This type is a special const type.
Const: An equivalent query scan against a primary key or unique index that returns at most one row. Const is very fast because it reads only once. For example, the following query uses the primary key index, so type is const: explain select * from user_info where id = 2;
Eq_ref: This type is typically found in multi-table JOIN queries and means that for each result of the previous table, only one row of the subsequent table can be matched. In addition, the comparison operation of query is usually =, which indicates high query efficiency. For example, explain select * from user_info, order_info where user_info.id = order_info.user_id;
Ref: This type typically occurs in join queries with multiple tables, for non-unique or non-primary key indexes, or for queries that use the left-most prefix rule index. For example, in this example, a ref type query is used :explain select * from user_info, order_info where user_info.id = order_info.user_id AND order_info.user_id = 5
Range: indicates that the index range query is used to obtain some data records in the table by the index field range. This type usually appears IN =, < >, >, > =, <, < =, IS NULL, the < = >, BETWEEN, IN () operation. Explain select * from user_info where id between 2 and 8;
Index: full index scan, similar to ALL except that ALL scans ALL indexes, while index scans only ALL indexes. The data to be queried can be retrieved directly from the index tree without scanning the data. When this is the case, the Extra field shows Using Index.
ALL: indicates a full table scan. This type of query is one of the least high-performance queries. In general, our queries should not have all-type queries, because such queries can be a disaster for database performance in the case of large data volumes. If a query is an All-type query, it can generally be avoided by adding indexes to the corresponding fields.

In general, the performance relationship between different types of type is as follows: ALL < index < range ~ index_merge < ref < eq_ref < const < System ALL Type Because it is a full table scan, it is the slowest in the same query conditions. An index query is not a full table scan, but it scans ALL indexes and is therefore slightly faster than an ALL query. The latter types all use indexes to query data, so they can filter some or most of the data, so the query efficiency is relatively high.

5.possible_keys

It represents the index that mysql may use when querying. Note that possible_keys, even though some indexes appear in possible_keys, doesn’t mean they can actually be used by mysql. It depends on the key field.

6.key

This field is the actual index used by mysql in the current query. For example,possible_keys means how many people could be there while key means how many people could actually be there. When we do not build indexes:

explain select o.* from order_info o where  o.product_name= 'p1' and  o.productor='whh';
create index idx_name_productor on order_info(productor);
drop index idx_name_productor on order_info;Copy the code

Create compound index before query:

7.key_len

Represents the number of bytes of the index used by the query optimizer. This field evaluates whether the composite index is fully used.

8.ref

This shows which column of the index is used and, if possible, is a constant. The type attribute also contains ref. Note the difference

9.rows

Rows is also an important field. The mysql query optimizer estimates the number of rows of data that the SQL needs to scan and read to find the result set based on the statistics. This value is a very intuitive indication of how efficient SQL is, and in principle the fewer rows the better. Compare this to the example in key, where rows is 9 without indexing and 4 after indexing.

10.extra

A lot of additional information in Explain will be displayed in the Extra field. Common ones are the following:

Using filesort: indicates that mysql requires additional sorting operations. Using filesort is recommended to remove this query because it consumes large CPU resources.
Using index: overwrites the index scan, indicating that the query can find the required data in the index tree without scanning the table data files, usually indicating good performance
Using temporary: A query uses temporary tables, which are used for sorting, grouping, or join of multiple tables. Optimization is recommended because the query efficiency is not high.
Using WHERE: Table names are filtered using where

5. Optimization cases

explain select u.*,o.* from user_info u LEFT JOIN  order_info o on u.id=o.user_id;Copy the code

Execute result, type has ALL, and no index:

Select * from ‘ALL’, ‘ref’ and ‘rows’ from’ 9 ‘to’ 1 ‘;

There is a general rule is: the left link index is added to the right table above, right link index is added to the left table above.

Do you need to create an index?

Indexes can speed up queries very efficiently, but slow down table updates. The index is actually a table that holds primary keys and index fields and points to records in the entity table, so index columns also take up space.

I am an ordinary programmer, the level is limited, the article inevitably has mistakes, welcome to sacrifice their precious time readers, on the content of this article, my purpose is just hope to help readers.