Mysql high-performance index

When the amount of DB reaches a certain order of magnitude, the efficiency of each full table scan is very low, so a common solution is to create some necessary indexes as an optimization method, then the problem is:

  • So what is an index?
  • How is the index implemented?
  • What is the difference between a clustered index and a non-clustered index?
  • How do you create and use indexes?

I. Index introduction

MySQL defines an index as: An index is a data structure that helps MySQL obtain data efficiently. In short, indexes are data structures

1. The structure of some trees

A. B + tree

In a word, it is a balanced binary tree designed for disk or other storage devices. In B+tree, all records are stored in leaves according to the size of the key, and the leaves are directly connected with Pointers

B. binary tree

The rule of a binary tree is that the parent is larger than the left child and smaller than the right child

C. Balanced binary tree

The first is a binary tree, but the height difference between the left and right children of any node is not greater than 1

D. B tree

The first is a balanced binary tree, but requires that every leaf node is the same distance from the root node

So what’s the difference between a B tree and a B plus tree?

  • A leaf node of a B+ tree can contain a pointer to another leaf node
  • There are non-leaf nodes in the copy of B+ tree key value; The key value + record is stored in the leaf node

2. InnoDB engine B+ tree

Mysql InnnoDB engine uses B+ tree, only leaf nodes store the corresponding data column, has the following benefits

  • Leaf nodes usually contain more records and have higher fan-out (which can be interpreted as the number of lower nodes corresponding to each node), so the tree height is low (3~4). The tree height also determines the number of disk I/O, which affects the performance of the database. In general, the number of I/OS is the same as the height of the tree
  • For combined indexes, B+tree indexes are sorted by index column names (from left to right). Therefore, random I/O can be converted into sequential I/O to improve I/O efficiency. And can support ordering requirements such as order by \group; Suitable range query

3. A hash index

Compared with B tree, hash index does not need to traverse from the root node to the leaf node, and can locate the location at one time. The query efficiency is higher, but its disadvantages are also obvious

  • Only “=”,”IN” and “<=>” queries can be satisfied. Scope queries cannot be used
    • Because the calculation is based on the hash value, the query can only be accurate. The hash value is not regular, and the order can not be guaranteed to be the same as the original. Therefore, the range query cannot be used
  • I can’t sort
    • The reason as above
  • Partial indexes are not supported
    • The hash value is calculated based on the complete index columns. If one or more of them are missing, the hash value cannot be calculated
  • Hash collision

4. Clustered and non-clustered indexes

A. Clustered indexes

InnoDB’s data file itself is the index file. The data on the leaf of the B+Tree is the data itself, and the key is the primary key. Non-leaf nodes are stored in <key,address>, and address is the address of the next layer

The structure of the cluster index:

B. Non-clustered indexes

For a non-clustered index, the data on the leaf node is the primary key (that is, the primary key of the clustered index). Why store the primary key instead of the address of the record? The reason is quite simple, because the address of the record is not guaranteed, but the primary key is

Non-clustered index structure diagram:

From the structure of non-clustered indexes, we can see the positioning process in this scenario:

  • First, locate the corresponding leaf node through the non-clustered index and find the corresponding primary key
  • Based on the primary key found above, locate the corresponding leaf node in the clustered index.

5. Advantages of indexing

  • Avoid full table scans (when indexes cannot be accessed, they must be matched one by one; If you go by index, you can locate according to B tree)
  • Using indexes helps servers avoid sorting or temporary tables (Pointers on leaf nodes, which effectively support range queries; In addition, the leaves themselves are sorted by key.)
  • Indexes change random IO to sequential IO

6. Application scope

Indexes are not for every situation. Applicable to medium and large tables. For small tables full table scan is more efficient. For very large tables, consider “partitioning” techniques

II. Principles for the use of indexes

When creating a table, you need to specify a primary key so that you can determine the clustered index.

1. Several syntax for indexing

Create indexes

Create index
create index `idx_img` on newuser(`img`);

- to see
show create table newuser\G;
Copy the code

The output

show create table newuser\G
*************************** 1. row ***************************
       Table: newuser
Create Table: CREATE TABLE `newuser` (
  `userId` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'user id',
  `username` varchar(30) DEFAULT ' ' COMMENT 'User login name',
  `nickname` varchar(30) NOT NULL DEFAULT ' ' COMMENT 'User nickname',
  `password` varchar(50) DEFAULT ' ' COMMENT 'User login password & ciphertext root',
  `address` text COMMENT 'User address',
  `email` varchar(50) NOT NULL DEFAULT ' ' COMMENT 'User Mailbox',
  `phone` bigint(20) NOT NULL DEFAULT '0' COMMENT 'User phone number',
  `img` varchar(100) DEFAULT ' ' COMMENT 'User Profile picture',
  `extra` text,
  `isDeleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `created` int(11) NOT NULL,
  `updated` int(11) NOT NULL,
  PRIMARY KEY (`userId`),
  KEY `idx_username` (`username`),
  KEY `idx_nickname` (`nickname`),
  KEY `idx_email` (`email`),
  KEY `idx_phone` (`phone`),
  KEY `idx_img` (`img`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8
Copy the code

Another common way to add indexes

alter table newuser add index `idx_extra_img`(`isDeleted`.`img`);

-- View index
show index from newuser;
Copy the code

The output

+---------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+----- -+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +---------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+----- -+------------+---------+---------------+ | newuser | 0 | PRIMARY | 1 | userId | A | 3 | NULL | NULL | | BTREE | | | | newuser | 1 | idx_username | 1 | username | A | 3 | NULL | NULL | YES | BTREE | | | | newuser | 1 | idx_nickname | 1 | nickname | A | 3 | NULL | NULL | | BTREE | | | | newuser | 1 | idx_email | 1 | email | A | 3 | NULL | NULL | | BTREE | |  | | newuser | 1 | idx_phone | 1 | phone | A | 3 | NULL | NULL | | BTREE | | | | newuser | 1 | idx_img | 1 | img | A | 3  | NULL | NULL | YES | BTREE | | | | newuser | 1 | idx_extra_img | 1 | isDeleted | A | 3 | NULL | NULL | | BTREE | | | |  newuser | 1 | idx_extra_img | 2 | img | A | 3 | NULL | NULL | YES | BTREE | | | +---------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+----- -+------------+---------+---------------+Copy the code

Remove the index

drop index `idx_extra_img` on newuser;
drop index `idx_img` on newuser;

-- View index
show index from newuser;
Copy the code

The output

show index from newuser; +---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------ +------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------ +------------+---------+---------------+ | newuser | 0 | PRIMARY | 1 | userId | A | 3 | NULL | NULL | | BTREE | | | | newuser | 1 | idx_username | 1 | username | A | 3 | NULL | NULL | YES | BTREE | | | | newuser | 1 | idx_nickname | 1 | nickname | A | 3 | NULL | NULL | | BTREE | | | | newuser | 1 | idx_email | 1 | email | A | 3 | NULL | NULL | | BTREE | |  | | newuser | 1 | idx_phone | 1 | phone | A | 3 | NULL | NULL | | BTREE | | | +---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------ +------------+---------+---------------+Copy the code

A way to force an index to go

Select * from table force index where XXX

explain select * from newuser force index(PRIMARY) where userId not in (3.2.5);
-- +----+-------------+---------+-------+---------------+---------+---------+------+------+-------------+
-- | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
-- +----+-------------+---------+-------+---------------+---------+---------+------+------+-------------+
-- | 1 | SIMPLE | newuser | range | PRIMARY | PRIMARY | 8 | NULL | 4 | Using where |
-- +----+-------------+---------+-------+---------------+---------+---------+------+------+-------------+


explain select * from newuser where userId not in (3.2.5);
-- +----+-------------+---------+------+---------------+------+---------+------+------+-------------+
-- | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
-- +----+-------------+---------+------+---------------+------+---------+------+------+-------------+
-- | 1 | SIMPLE | newuser | ALL | PRIMARY | NULL | NULL | NULL | 3 | Using where |
-- +----+-------------+---------+------+---------------+------+---------+------+------+-------------+
Copy the code

2. Index usage rules

When there are multiple indexes in a table, how do you determine whether your SQL has reached the index and which index?

You can use the explain keyword to help you judge. Of course, when writing SQL, it is necessary to understand the rules of index matching, to avoid setting some redundant indexes, or write some SQL that does not reach the index

The table structure for the test is as follows

*************************** 1. row ***************************
       Table: newuser
Create Table: CREATE TABLE `newuser` (
  `userId` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'user id',
  `username` varchar(30) DEFAULT ' ' COMMENT 'User login name',
  `nickname` varchar(30) NOT NULL DEFAULT ' ' COMMENT 'User nickname',
  `password` varchar(50) DEFAULT ' ' COMMENT 'User login password & ciphertext root',
  `address` text COMMENT 'User address',
  `email` varchar(50) NOT NULL DEFAULT ' ' COMMENT 'User Mailbox',
  `phone` bigint(20) NOT NULL DEFAULT '0' COMMENT 'User phone number',
  `img` varchar(100) DEFAULT ' ' COMMENT 'User Profile picture',
  `extra` text,
  `isDeleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `created` int(11) NOT NULL,
  `updated` int(11) NOT NULL,
  PRIMARY KEY (`userId`),
  KEY `idx_username` (`username`),
  KEY `idx_nickname_email_phone` (`nickname`,`email`,`phone`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8
Copy the code

A. Left-most prefix matching rule

Nickname (nickname, email, phone) = nickname (nickname, email, phone) = nickname (nickname, email, phone) = nickname (nickname, email, phone) = nickname (nickname, email, phone

- walk index
explain select * from newuser where nickname='Little grey' and email='[email protected]';

-- 1. Match nickname, and you can go through the index
explain select * from newuser where nickname='Little grey';

- output:
-- +----+-------------+---------+------+--------------------+--------------------+---------+-------+------+---------------- -------+
-- | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
-- +----+-------------+---------+------+--------------------+--------------------+---------+-------+------+---------------- -------+
-- | 1 | SIMPLE | newuser | ref | idx_nickname_email | idx_nickname_email | 92 | const | 1 | Using index condition |
-- +----+-------------+---------+------+--------------------+--------------------+---------+-------+------+---------------- -------+


-- 2. Although the email is matched, it does not meet the left-most match, so the index is not used
explain select * from newuser where email='[email protected]';

- output
-- +----+-------------+---------+------+---------------+------+---------+------+------+-------------+
-- | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
-- +----+-------------+---------+------+---------------+------+---------+------+------+-------------+
-- | 1 | SIMPLE | newuser | ALL | NULL | NULL | NULL | NULL | 3 | Using where |
-- +----+-------------+---------+------+---------------+------+---------+------+------+-------------+
Copy the code

B. A column cannot be skipped to use a subsequent index column

(nickname, email, phone) (idx_nickname_email_phone) (nickname, email, phone) (idx_nickname_email_phone

C. The indexes of the range queried columns cannot be used

For example, “>”, “<“, “between”, and “like” are range queries. “>”, “<“, “between”, and “like” are range queries

select * from newuser where nickname like 'gray %' and email='[email protected]' and phone=15971112301 limit 10;
Copy the code

D. Column as a function argument or part of an expression

-- can't get to index
explain select * from newuser where userId+1=2 limit 1;


- output
-- +----+-------------+---------+------+---------------+------+---------+------+------+-------------+
-- | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
-- +----+-------------+---------+------+---------------+------+---------+------+------+-------------+
-- | 1 | SIMPLE | newuser | ALL | NULL | NULL | NULL | NULL | 3 | Using where |
-- +----+-------------+---------+------+---------------+------+---------+------+------+-------------+
Copy the code

3. Index shortcomings

  • While indexes greatly speed up queries, they also slow down table updates such as INSERT, UPDATE, and DELETE. When you update the table, MySQL not only saves the data, but also saves the index file.
  • Index files that occupy disk space. Normally this is not a serious problem, but if you create multiple composite indexes on a large table, the index file size will swell very quickly.

4. Precautions

  • Indexes do not contain columns with NULL values
  • Using short indexes
  • Index column sorting
    • MySQL queries use only one index, so columns in order BY do not use indexes if they are already used in the WHERE clause. Therefore, do not use the sort operation if the default database sort can meet the requirements; Try not to include more than one column sort, if necessary, it is best to create composite indexes for these columns
  • Like statement operation
    • The use of the like operation is generally discouraged, and if it has to be used, how it is used is a problem. Like “%aaa%” does not use an index whereas like “aaa%” can use an index
  • Do not operate on columns
    • select * from users where YEAR(adddate)<2007;
  • Try NOT to use NOT IN and <> operations

5. SQL usage policy

A. Use one SQL instead of multiple SQL

It is often recommended to use a single SQL instead of multiple SQL queries

Of course, if the execution efficiency of SQL is low or a table lock operation such as delete occurs, you can use multiple SQL statements to avoid blocking other SQL statements

B. Decompress associated query

Make the associated join as small and simple as possible in the application

  • The decomposed SQL is simple and easy to use with mysql cache
  • Execute the decomposed SQL to reduce lock contention
  • Better scalability and maintainability (SIMPLE SQL)
  • Associated SQL uses nestLoop, a nested loop algorithm, while applications can use structures such as HashMap to process data more efficiently

c. count

  • Count (*) counts the number of rows
  • Count (column name) counts the number of columns that are not null

d. limit

  • limit offset, size; In paging query, offset + size entries are displayed and the final size entries are obtained

Such as limit 1000, 20 will query 1020 pieces of data that meet the condition, and then return the last 20, so try to avoid paging queries

e. union

You need to put the WHERE, ORDER BY, and LIMIT restrictions into each subquery to improve efficiency. In addition, if not necessary, try to use Union ALL, because Union will add DISTINCT to the temporary tables of each subquery and perform uniqueness check for each temporary table, which is inefficient.

6. Mysql use query

A. View the index

-- The unit is GB
SELECT CONCAT(ROUND(SUM(index_length)/(1024*1024*1024), 6), ' GB') AS 'Total Index Size'
FROM information_schema.TABLES WHERE table_schema LIKE 'databaseName';
Copy the code

B. View the tablespace

SELECT CONCAT(ROUND(SUM(data_length)/(1024*1024*1024), 6), ' GB') AS 'Total Data Size'   
FROM information_schema.TABLES WHERE table_schema LIKE 'databaseName'; 
Copy the code

C. View the information about all tables in the database

SELECT CONCAT(table_schema,'. ',table_name) AS 'Table Name',   
    table_rows AS 'Number of Rows'.CONCAT(ROUND(data_length/(1024*1024*1024),6),' G') AS 'Data Size'.CONCAT(ROUND(index_length/(1024*1024*1024),6),' G') AS 'Index Size' ,   
    CONCAT(ROUND((data_length+index_length)/(1024*1024*1024),6),' G') AS'Total'  
FROM information_schema.TABLES   
WHERE table_schema LIKE 'databaseName';
Copy the code

IV. The other

reference

  • In-depth understanding of Mysql – High performance indexes and high performance SQL

Personal blog:A gray Blog

Hexo + Github Pages is a personal blog built by hexo + Github Pages. You are welcome to visit it

The statement

Do not as good as the letter, has been on the content, purely a statement, because I am general ability, limited insight, such as found bugs or have better suggestions, at any time welcome criticism and correction

  • Micro-blog address: small gray Blog
  • QQ: A gray /3302797840

Scanning attention