preface

A friend went to ali for an interview, he said the interviewer gave several query SQL, asked: how many tree search operations need to be performed? My friend was a little ignorant at that time, and then calm thinking, only to find that is to test the index of a few basic knowledge points ~~ this article we are divided into nine index knowledge points, together to discuss it. If there are incorrect words, welcome to point out ha, learn together ~

  • Public number: a boy picking up snails
  • Github address, thanks to each star

Github.com/whx123/Java…

  • What is the index of the interviewer’s test sites?
  • The index type of the interviewer’s questions
  • Why did the interviewer choose B+ tree as the index structure
  • The interviewer points to an index search process
  • Index of coverage of interviewers’ test sites
  • Index failure scenario of interviewer’s test site
  • The left-most prefix of the interviewer’s question
  • The index of the interviewer’s test points is pushed down
  • Add index to large table of interviewer test site

What is the index of the interviewer’s test site?

  • Index is a kind of data structure which can improve the efficiency of database query. It can be compared to the contents of a dictionary, which can help you find the corresponding records quickly.
  • Indexes are usually stored in files on disk and take up physical space.
  • Just as water can carry a boat, it can also overturn it. Appropriate indexes can improve query efficiency, but too many indexes will affect the insert and update function of database tables.

What are the types of indexes

Data structure dimensions

  • B+ tree index: All data is stored on leaf nodes and the complexity is O(Logn). It is suitable for range query.
  • Hash index: suitable for equivalent query, high retrieval efficiency, one in place.
  • Full text indexes: Both MyISAM and InnoDB support the use of full text indexes, typically created on the text types char,text, vARCHar.
  • R-tree index: used to create SPATIAL indexes for GIS data types

Physical storage dimensions

  • Clustered index: A clustered index is an index that is created with the primary key and stores the data in the table in the leaf node.
  • Non-clustered index: A non-clustered index is an index created with a non-primary key. The primary key and index columns are stored on the leaf node.

Logic dimension

  • Primary key index: A special unique index that does not allow null values.
  • Plain index: The basic index type in MySQL that allows null and duplicate values.
  • Joint index: An index created for multiple fields that follows the left-most prefix rule.
  • Unique index: Values in index columns must be unique, but null values are allowed.
  • Spatial indexing: Spatial indexing is supported after MySQL5.7 and follows the rules of the OpenGIS Geometric data model for spatial indexing.

Why did the interviewer choose B+ tree as the index structure

The problem can be looked at in several dimensions, whether the query is fast, whether it is stable, how much data is stored, how many times the disk is searched, and so on. Why not hash structure? Why not a binary tree, why not a balanced binary tree, why not a B tree, but a B+ tree?

When we write business SQL queries, most of the time, we do range queries, such as SQL

select * from employee where age between 18 and 28;
Copy the code

Why not use a hash structure?

We know that a hash structure, like a K-V structure, is a one-to-one relationship between key and value. It works fine for equivalence queries, but it doesn’t work for range queries.

Why not use a binary tree?

First recall the binary tree related knowledge ~ the so-called binary tree, characteristics are as follows:

  • Each node has at most two subtrees, called left and right subtrees respectively.
  • The value of the left child node is less than the value of the current node, and the current node is less than the value of the right child node
  • The top node is called the heel node, and the node with no children is called the leaf node.

It’s easy to imagine a binary tree structure like this:

But there are special binary trees that might look something like this:

If the binary tree is specialized as a linked list, it is equivalent to a full table scan. So why index? Therefore, binary trees in general are not suitable as index structures.

Why not use a balanced binary tree?

Balanced binary tree features: it is also a binary search tree, the maximum difference in height between the two subtrees of any node is 1. So you don’t have to specialize a linked list.

But:

  • Balanced binary tree insertion or update requires left-right rotation to maintain balance, which is costly to maintain
  • If there are a lot of them, the trees will be very tall. Because the data is stored on disk, using it as an index structure, each time a node is read from disk, the IO operation is many times.

Why not use B trees?

If you have a large amount of data, the height of the balanced binary tree will be very high, which will increase the IO. So why not choose a shorter B tree with the same amount of data?

B trees can store more data and have a lower height than balanced binary trees. But why choose B+ trees in the end? Because the B+ tree is an upgraded version of the B tree:

  • Non-leaf nodes of a B+ tree do not store data, only store key values, while B tree nodes store not only key values, but also data. Innodb’s default page size is 16KB. If innoDB does not store data, it stores more key values, and the corresponding tree order (tree of nodes’ children) is larger, the tree is shorter and fatter, which again reduces the number of I/OS needed to find data on disk and makes data queries more efficient.
  • All data of B+ tree index is stored in leaf node, and the data is arranged in order, linked list. So B+ trees make range lookup, sort lookup, group lookup, and de-lookup incredibly easy.

A B+ tree index search process

Interviewer: Suppose you have the following table structure and these items of data

CREATE TABLE `employee` ( `id` int(11) NOT NULL, `name` varchar(255) DEFAULT NULL, `age` int(11) DEFAULT NULL, `date` datetime DEFAULT NULL, `sex` int(1) DEFAULT NULL, PRIMARY KEY (`id`), KEY `idx_age` (`age`) USING BTREE ) ENGINE=InnoDB DEFAULT CHARSET=utf8; Insert into EMPLOYEE values(100,' 21-01-20', 80,' 21-01-20','0'); Insert into EMPLOYEE values(200,' 21-01-21', 48,' 21-01-21','0'); Insert into VALUES (300,' 2020-01-21','1'); Insert into EMPLOYEE values(400,' 2020-01-21', 32,'2020-01-21','0'); Insert into VALUES (500,' 2020-01-21','1'); Insert into EMPLOYEE values(600,' 21-01-21', 49,'2021-01-21','0'); Insert into EMPLOYEE values(700,' xiaoyan ',28,'2021-01-21','1');Copy the code

Interviewer: How many tree searches are required if the following query SQL is executed? You can draw the corresponding index structure diagram ~

select * from Temployee where age=32;
Copy the code

The interviewer is testing the candidate’s familiarity with B+ tree index structure. You can answer it like that

  • First to drawidx_ageThe index structure diagram is as follows:

  • Then draw the primary key index of id. Let’s draw the cluster index structure as follows:

Therefore, the execution of this SQL query looks like this:

    1. searchidx_ageIndex tree, load disk block 1 into memory, since 32<37, search the left branch to disk address disk block 2.
    1. Load disk block 2 into memory, continue traversing in memory, find the record age=32, obtain id = 400.
    1. If id=400, go back to the primary key index tree.
    1. searchId primary keyIndex tree, load disk block 1 into memory, traverse in memory, find 400, but B+ tree index non-leaf nodes do not store data. The index continues to search the right branch of 400 to disk address disk block 3.
    1. Load disk block 3 into memory, walk through memory, find id=400, get row R4, ok, done.

Therefore, this SQL query performs several tree searches, is not a step. In particular, after the primary key ID is found in the idx_age secondary index tree, the search process of the primary key index is called back to the table.

What is a return table? The process of retrieving a primary key and then going back to the primary key index is called back to the table

The coverage index of the interviewer’s test site

Interviewer: How many tree searches are performed using select ID,age instead of select *?

Analysis: this question, mainly investigate the candidate’s coverage index knowledge points. Back in the idx_AGE index tree, you can see that the query options ID and age are on the leaf node. Therefore, the query results can be provided directly, and there is no need to return to the table

Overwrite index: in the data column of the query, you do not need to go back to the table to look up, directly from the index column can fetch the desired result. In other words, the index column data used in your SQL overwrites the column of the query result.

Therefore, compared with the previous problem, the tree search operation back to the table is omitted.

The index of the interviewer’s test site is invalid

Interviewer: If I now add a normal index to the name field and then use a like fuzzy search, how many queries will I run? SQL is as follows:

Select * from employee where name like '% jay %';Copy the code

SQL > select * from ‘explain’; SQL > select * from ‘explain’; In fact, like fuzzy search, will not go to the index, as follows:

SQL > select * from table_name where table_name = ‘index’ and table_name = ‘index’;

  • If the query condition contains OR, the index may become invalid
  • If the field type is a string, where must be quoted, otherwise the index will be invalid
  • The like wildcard may invalidate the index.
  • Union index, the query condition column is not the first column in the union index, index failure.
  • Index invalid when using mysql’s built-in functions on index columns.
  • Index column operations (e.g., +, -, *, /) invalidate the index.
  • Index fields using (! = or < >, not in) may cause index invalidation.
  • If is null is not null is used on an index field, the index may become invalid.
  • Left-link query or right-link query The encoding format of the field associated with the query is different, which may cause index failure.
  • Mysql estimates that a full table scan is faster than an index, so no index is used.

The left-most prefix principle of the interviewer’s joint index

Interviewer: If I now add a federated index to the name and age fields, how many tree searches will the following SQL perform? Let’s draw the index tree, okay?

Select * from employee where name like 'small %' order by age desc;Copy the code

The left – most prefix of a joint index is like. The schematic diagram of the combined index tree is as follows:

The joint index entries are sorted by first name and then by age if the names are the same. Idx_name_age = idx_name_age = idx_name_age = idx_name_age = idx_name_age

Select * from idx_name_age; select * from idx_name_age; select * from idx_name_age; select * from idx_name_age; The left – most prefix is the left – most M characters of the string index. In fact,

  • The leftmost prefix can be the leftmost N field of a federated index. For example, the combined index (A, B, C) can be equivalent to building three indexes (a),(a,b),(a, B, C), which greatly improves the index reuse ability.
  • The leftmost prefix can also be the leftmost M characters of the string index.

8. Push down the index of the interviewer’s test points

Interviewer: We are still living in the composite index idx_name_age. How many tree searches does the following SQL perform?

Select * from employee where name like '%' and age=28 and sex='0';Copy the code

Select * from idx_name_age; select * from idx_name_age; select * from idx_name_age; select * from idx_name_age; As shown in figure:

(name,age) synth index Why not look at the age and return to the table after selecting the word “small”? Isn’t it more efficient? For this reason, MySQL 5.6 introduced the index pushdown optimization, which can be used to determine the index fields in the index traversal process, and directly filter out the records that do not meet the criteria to reduce the number of times back to the table.

Mysql > select * from MySQL5.6; select * from MySQL5.6; select * from MySQL5.6;

Nine, the interviewer test site of the large table add index

Interviewer: If a table has tens of millions of data levels, how do you add indexes to the table?

We need to know that when adding an index to a table, the table will be locked. If not careful operation, production accidents may occur. You can refer to the following methods:

  • 1. Create table B with the same data structure as table A.
  • 2. Add new indexes to new table B.
  • 3. Add table A to table B
  • 4. Rename table B to the name of the original table A and rename the original table A to another table.

Summary and Practice

This article mainly explains the index of 9 key knowledge points, hope to help you. Next, give you a, about my recent business development encountered add index SQL, look at everyone is how to answer, interested can contact me ha ~ topic as follows:


select * from A where type ='1' and status ='s' order by create_time desc;
Copy the code

Given that there are nine types of type and status is fairly discriminative (there are three types), how do you index it?

  • Type is indexed single
  • Or (type, status, create_time) combined index
  • Or (type, create_time) joint index?

See and thank you

  • What index types does MySQL have?
  • Large table plus index scheme
  • MySQL > MySQL