concept

Official definition: An index is a data structure that helps mysql efficiently retrieve data. So the essence of an index is a data structure, which can be understood as a sort of fast lookup data structure. In addition to the data itself, the database also maintains a data structure that satisfies a specific lookup algorithm. These data structures point to the data in a certain way, so that advanced lookup algorithms can be implemented based on these data structures. This data structure is called an index.

advantage

  • Similar to library bibliographic index, improve data retrieval efficiency and reduce IO cost of database
  • Sorting data through indexed columns reduces the cost of sorting data and CPU consumption

disadvantage

While indexes greatly speed up queries, they slow down the speed of updating tables, such as INSERTS, UPDATES, and DELETES. When updating a table, MySQL not only updates the data, but also updates the index file. Indexes also take up some disk space.

classification

  • Single-value index An index contains only a single column, and a table can have multiple single-column indexes
  • Unique index The value of an index column must be unique, but null values are allowed
  • A composite index is an index that contains multiple columns

grammar

  • create
    1. CREATE [UNIQUE] INDEX indexName ON tableName ;
    2. ALTER TABLE tableName ADD [UNIQUE] INDEX indexName ;
  • DROP INDEX indexName ON tableName;
  • Check SHOW INDEX FROM tableName

Best practices for indexing

In which case an index is created

  1. The primary key automatically creates a unique index
  2. Columns that are frequently used as query criteria should be indexed (following the WHERE statement or columns that appear in the ORDER BY statement)
  3. The foreign key relationship is used to index the fields associated with other tables in the query
  4. Single key, composite index selection problem (recommended to create composite index under high concurrency)
  5. A sorted field in a query that can be accessed through an index to speed sorting
  6. Statistics or group fields in the query

Which cases do not create indexes

  1. Too few table records
  2. A watch that is often added, deleted, or modified
  3. Table fields with repetitive and evenly distributed data, so that indexes are created only for the most frequently queried and ordered data columns. Note that if a data column contains a lot of duplicate content, there is not much practical value in indexing it. (For example, 1.4 billion Chinese are of Chinese nationality, so this type of field can not be indexed, or gender)
  4. Indexes are not created for fields that are not needed in the Where condition

Performance analysis

Use the EXPLAIN keyword to simulate the optimizer’s execution of SQL queries to see how mysql handles your SQL statements. Analyze performance bottlenecks in your query or table structure.

grammar

The explain SQL statements

Related field

Each column is defined as follows

  • id: SELECT the identifier of the query. Each SELECT is automatically assigned a unique identifier
    1. The ids are the same and the execution sequence is from top to bottom
    2. The ID is different. For subqueries, the ID sequence increases. A larger ID has a higher priority and is executed earlier
  • select_type: SELECT Query type. It is mainly used to distinguish ordinary query, joint query, sub-query and so on complex query
    1. SIMPLE: A SIMPLE select query that does not contain subqueries or unions
    2. PRIMARY: The outermost query is marked as PRIMARY if it contains any complex subparts
    3. DERIVED: Subqueries contained in the FROM list are labeled as DERIVED, and MySQL performs these subqueries recursively, putting the results in temporary tables.
    4. SUBQUERY: Contains subqueries in a SELECT or WHERE list
    5. UNION: If the second SELECT appears after the UNION, it is marked as UNION; If UNION is included in the subquery of the FROM clause, the outer SELECT will be marked: DERIVED
    6. UNCACHEABLE SUBQUREY: a subquery that cannot be cached
    7. DEPENDENT SUBQUERY: Contains subqueries in a SELECT or WHERE list based on the outer layer
    8. UNION RESULT: SELECT the RESULT from the UNION table
  • Table: Which table is queried and which table the data in this row is about
  • typeThe: JOIN type (which determines performance) shows the type used in the query, from best to worst: system>const>eq_ref>ref>range>index>ALL
    1. System: The table has only one row (equal to the system table). This is a const column that does not normally occur and can be ignored
    2. Const: indicates that the index was found once, and const is used to compare primary key or unique indexes. Because only one row of data is matched, MySQL can quickly convert the query to a constant if the primary key is placed in the WHERE list
    3. Eq_ref: Unique index scan, for each index key, only one record in the table matches it. Common for primary key or unique index scans
    4. Ref: a non-unique index scan that returns all rows matching a single value. It is essentially an index access that returns all rows that match a single value. However, it may find multiple rows that match the criteria, so it should be a mixture of lookup and scanning (e.g., people are queried by name; names are repeatable, not unique fields).
    5. Range: Retrieves only rows in a given range, using an index to select rows. A range scan is better than a full table scan because it only needs to start at one point in the index and end at another point without scanning all the indexes.
    6. Index: Full IndexScan. The Index type differs from ALL. Only the Index tree is traversed. This is usually faster than ALL because index files are usually smaller than data files. (all and Index are both read from the full table, but Index is read from the Index, and all is read from the hard disk)
    7. Note: In general, ensure that the query reaches at least the range level and preferably the ref level.
  • Possible_keys: Possible indexes in this query can reveal one or more possible indexes that could be applied to the table. If an index exists on a field involved in the query, the index is listed but not necessarily used by the query
  • Key: indicates the exact index used in the query
  • Key_len: Indicates the number of bytes used in the index. This column is used to calculate the length of the index used in the query. The key_len field helps you check to see if you are making full use of the index
  • Ref: Which field or constant is used with the key. Shows which column of the index is used, if possible, as a constant. Which columns or constants are used to find values on index columns
  • Rows: Shows how many rows were scanned for this query. This is an estimate.
  • Filtered: indicates the percentage of data filtered by this query condition
  • extra: contains additional information that is important but not suitable for display in other columns
    1. Using filesort: indicates that mysql uses an external index sort for data, instead of reading data in the order of the indexes in the table. A sort operation in MySQL that cannot be done using an index is called “file sort”.
    2. Using WHERE: Indicates that where filtering is used
    3. Using Index: Covering Index (Covering Index scan), not back to the table query (” Covering Index scan “, query in the Index tree can find the required data, do not have to scan table data files, often shows good performance)
    4. Using temporary: Enables temporary tables to hold intermediate results. MySQL uses temporary tables when sorting query results. Common in sort order by and group by queries.
    5. Covering Index: a column is covered by an Index

Main fields: Id/type/key/rows/Extra

Query optimization

Index invalidation and index optimization

Test the SQL

CREATE TABLE staffs (
 id INT PRIMARY KEY AUTO_INCREMENT,
 NAME VARCHAR (24) NULL DEFAULT ' ' COMMENT 'name',
 age INT NOT NULL DEFAULT 0 COMMENT 'age',
 pos VARCHAR (20) NOT NULL DEFAULT ' ' COMMENT 'job',
 add_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Entry Time'
) CHARSET utf8 COMMENT 'Staff Record' ;

INSERT INTO staffs(NAME,age,pos,add_time) VALUES('z3'.22.'manager'.NOW());
INSERT INTO staffs(NAME,age,pos,add_time) VALUES('July'.23.'dev'.NOW());
INSERT INTO staffs(NAME,age,pos,add_time) VALUES('2000'.23.'dev'.NOW());
INSERT INTO staffs(NAME,age,pos,add_time) VALUES(null.23.'dev'.NOW());
SELECT * FROM staffs;

ALTER TABLE staffs ADD INDEX idx_staffs_nameAgePos(name, age, pos);
Copy the code
  1. Full value matches my favorite index idx_staffs_nameAgePos in the order of name, age, pos. EXPLAIN SELECT * FROM STAFFS WHERE NAME = ‘July’;















EXPLAIN SELECT * FROM staffs WHERE age = 25 AND pos = ‘dev’;
































General Recommendation

  • For single-key indexes, try to select indexes with better filtering for the current Query
  • When selecting a composite index, the filtering field in the current Query is higher in the index field order, the better. (Avoid index failure with good filters)
  • When selecting a composite index, try to select an index that contains as many fields as possible from the WHERE clause in the current query
  • Select the appropriate index by analyzing statistics and adjusting the way query is written whenever possible

Single table query optimization

sql

CREATE TABLE IF NOT EXISTS `article` (
`id` INT(10) UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
`author_id` INT(10) UNSIGNED NOT NULL.`category_id` INT(10) UNSIGNED NOT NULL.`views` INT(10) UNSIGNED NOT NULL.`comments` INT(10) UNSIGNED NOT NULL.`title` VARBINARY(255) NOT NULL.`content` TEXT NOT NULL
);

INSERT INTO `article`(`author_id`.`category_id`.`views`.`comments`.`title`.`content`) VALUES
(1.1.1.1.'1'.'1'),
(2.2.2.2.'2'.'2'),
(1.1.3.3.'3'.'3');

SELECT * FROM article;
Copy the code
  • Query category_id (category_id = 1 and Comments > 1) and category_id (category_id = 1 and category_id = 1) EXPLAIN SELECT id,author_id FROM article WHERE category_id = 1 AND comments > 1 ORDER BY views DESC LIMIT 1;









article
category_id
comments
views





1.2 第2 行 EXPLAIN SELECT ID, CATEGORY_ID FROM article WHERE category_id = 1 AND comments >1 ORDER BY views DESC LIMIT 1 1;


DROP INDEX idx_article_ccv ON article;

ALTER TABLE article ADD INDEX IDx_article_CV (category_id, category_id, category_id); create index idx_article_cv on article(category_id,views);

1 第3 行 EXPLAIN SELECT ID, CATEGORY_ID FROM article WHERE category_id = 1 AND comments > 1 ORDER BY views DESC LIMIT 1 1;

Associated query optimization

sql

CREATE TABLE IF NOT EXISTS `class` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`card` INT(10) UNSIGNED NOT NULL,
PRIMARY KEY (`id`));CREATE TABLE IF NOT EXISTS `book` (
`bookid` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`card` INT(10) UNSIGNED NOT NULL,
PRIMARY KEY (`bookid`));INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND(*)20)));


INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND(*)20)));

Copy the code
  • Explain SELECT * FROM class LEFT JOIN book ON class.card = book.card;






book
card






Explain DROP INDEX Y ON book; ALTER TABLE class ADD INDEX X (card); EXPLAIN SELECT * FROM class LEFT JOIN book ON class.card = book.card;

advice

  • Ensure that the join field of the driven table is indexed
  • When left JOIN, select the small table as the driver table and the large table as the driven table.
  • When an inner join is performed, mysql itself selects the tables of small result sets as driver tables for you.
  • Try not to put subqueries in the driven table, it may not use the index.

Subquery optimization

Select sql_no_cache sum(sal) from emp where deptno in (select deptno from dept); select sql_no_cache sum(sal) from emp where exists (select 1 from dept where emp.deptno=dept.deptno); Exists returns a record. Exists is used as a query, so it does not matter what is returned after a select. select sql_no_cache sum(sal) from emp inner join dept on emp.deptno=dept.deptno;

Select SQL_no_cache sum(e.sal) from (select * from EMp where ID <10000) e where exists (select 1 from EMp where e.deptno=emp.deptno); select sql_no_cache sum(e.sal) from (select * from emp where id<10000) e inner join (select distinct deptno from emp) m on m.deptno=e.deptno; select sql_no_cache sum(sal) from emp where deptno in (select deptno from dept);

Select SQL_no_cache sum(e.sum) from (select * from emp where ID <10000) e where exists (select 1 from emp where e.deptno=emp.deptno); select sql_no_cache sum(e.sal) from (select * from emp where id<10000) e inner join (select distinct deptno from emp) m on m.deptno=e.deptno; select sql_no_cache sum(sal) from emp where deptno in (select deptno from dept);

Select sql_no_cache sum(sal) from emp where deptno in (select deptno from dept); select sql_no_cache sum(sal) from emp where exists (select 1 from dept where emp.deptno=dept.deptno); select sql_no_cache sum(sal) from emp inner join dept on emp.deptno=dept.deptno; Inner join is the best way to drive a large table from a small table without an index. In exists No index distinct consumes high performance. Therefore, exists has the best performance. In has the second worst performance. In and EXISTS performance is slightly better than 5% but inner Join is better than using join buffer so it is much faster if left JOIN is the slowest

Order by keyword optimization

MySQL supports two kinds of sorting methods, FileSort and Index, which are efficient. It means that MySQL scans the index itself for sorting. The FileSort method is inefficient. ORDER BY (Index)

  • The ORDER BY statement uses the left-most front row of the index
  • Use the combination of the Where clause and the Order BY clause condition columns to satisfy the left-most front of the index
  • The order by index is invalidated if a range query is present in the WHERE clause.

If not on the index column, filesort has two algorithms: double sort and single sort

  • MySQL 4.1 previously used double-sort, which literally means scanning the disk twice to get the data, reading the row pointer and Order Derby columns, sorting them, then scanning the sorted list and re-reading the corresponding data output from the list according to the values in the list. Multipath sorting requires disk sorting. So get the data. I’ve arranged to get the data. Two I/O operations were performed. A slow single-path sort saves the sorted data in memory, eliminating an I/O operation. Therefore, it is fast, but requires sufficient memory space.
  • Fetch the sort fields from disk, sort them in buffer, and fetch the other fields from disk.

Single way sorting

  • Fetching a batch of data requires two scans of the disk. I\O is notoriously time-consuming, so after mysql4.1, a second improved algorithm was introduced, namely single-way sort. It is faster to read all the columns needed for the query from disk, sort them in buffer by order Derby columns, and then scan the sorted list for output, avoiding a second read of the data. And it turns random IO into sequential IO, but it uses more space because it saves each line in memory.

In sort_buffer, method B takes up A lot more space than method A. Because method B takes out all fields, the total size of data taken out may exceed the capacity of sort_buffer, so only sort_buffer size can be taken each time. Sort_buffer = sort_buffer; sort_buffer = sort_buffer; So multiple I/ OS. An attempt to save one I/O operation resulted in a large number of I/O operations, which was more than worth the loss.

Optimization strategy

  • Increases the size of memory used for single-path sorting by setting sort_buffer_SIZE
  • Increases the set single sort field size of the max_LENGTH_FOR_sorT_DATA parameter. (Single sort request)
  • Select * from (select * from); select * from (select * from)

GROUP BY keyword optimization

  • Group by is essentially sorted and then grouped, following the best left prefix for the index
  • When index columns are unavailable, increase the max_LENGTH_FOR_sort_data parameter setting + increase the sort_BUFFer_SIZE parameter setting
  • Where having is higher than where having can be qualified.

Heavy to optimize

Try not to use distinct keywords for duplicates such as: T_mall_sku table id shp_id KCDZ — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — 3 1 1 4 of the changping district of Beijing changping district of Beijing tianjin 5 5 of Beijing changping district 6 3 chongqing 8 8 examples: Select KCDZ form t_mall_sku where id in(3,4,5,6,8) Select distinct KCDZ form t_mall_sku where id in(3,4,5,6,8) Select KCDZ form t_mall_sku where id in(3,4,5,6,8) group by KCDZ

case

.