Avoid using select* to find fields. Instead, write specific fields after select.

The reason for this is obvious: to reduce the amount of data transferred.

But I want to talk about is another reason: the use of select *, is virtually impossible to use to cover index (behind what is cover index, speak), it is will a should have can be used to cover the index of the query into the query cannot use cover index, leads to a random I/O or back to the table query (back to the table query in article introduces clustering index of time speak).

Type of index

1. The b-tree index

Most MySQL engines support this type of index, which uses a b-tree data structure to store data (InnoDB uses B+TREE). B+TREE is a variant of B-tree. The difference is that B+TREE adds a chain pointer to the next leaf node for all leaf nodes, and all keywords appear in the leaf node.

A B-tree usually means that all values stored in it are in order, and queries are faster because they are based on index structures rather than full table scans.

Applicable enquiries:

Full value matching: matches all columns in the index.

Match left-most prefix: this is the left-most prefix rule for multi-column indexes. For example, if A multi-column index is (A, B, C), you can use the index if the query contains either A or A, B or A, B, or C. If only B exists, you cannot use the index.

Match column prefixes: For example, like ‘ABC %’ can use indexes, but like ‘% ABC %’ cannot.

Matching range value: actually range query. Note, however, that when one column of a multi-column index is used for a range query, the indexes following that column are not used. Select * from (A, B, C) where A=1, B>1, C=1; select * from (A, B, C) where A=1, B>1, C=1;

Index-only queries: overwrite index queries.

Of course, if the query meets the above criteria, then these columns can also be used to sort.

2, hash index

Hash indexes are implemented based on hash tables and only support exact index queries. In MySQL, currently only the Memory engine supports hash indexes, but we can customize hash indexes. The idea is as follows:

Create a column in the table to store the hash value, and then use the B-tree index to look it up again.

Here is an example:

For example, a large number of urls need to be stored in a table. If a B-tree is used to store urls, the contents stored in the table will be large, resulting in a large index. If we add a column (url_hash) to store the URL hash, and then create a B-tree index on that column, performance will be much higher. When the amount of data is very large, there will be hash conflicts in the hash, so the URL and URL_hash columns are used for filtering in the query. Such as:

select id from url where url=’www.baidu.com’ and url_hash=’123213512987′;

As for the hash algorithm, consider implementing a simple 64-bit hash function yourself. Note that this custom function must return an integer, not a string.

3. Full-text index

This is a special type of index. We will discuss full-text indexes separately later, but we won’t go into details here.

Second, the advantages of indexes

Indexes greatly reduce the amount of data that the server needs to scan;

Indexes help the server avoid sorting and temporary tables;

Indexes can turn random I/ OS into sequential I/ OS.

Is an index necessarily the best solution? We all know that maintaining an index requires some extra work, so simply put, an index works when the benefits outweigh the risks.

High performance index strategy

1. Independent columns

It is common to see improper queries that prevent MySQL from using existing indexes. Such as:

select id from users where id+1=5;Copy the code

This type of query does not use indexes and can be written as:

select id from users where id=4;Copy the code

So we should get in the habit of simplifying WHERE conditions by always placing the index column on the side of the comparison symbol.

Prefix index and index selectivity

Sometimes we need to index long string columns, so we need to use prefix indexes. In MySQL, we must use prefix indexes for TEXT, BLOB, and long character columns, because MySQL does not allow indexes of all the length of these columns. Accordingly, prefix indexes must reduce index selectivity. Index selectivity refers to the ratio of non-duplicate index columns to the total number of records in the data table.

So how to find a balance between prefix indexes and index selectivity? To paraphrase an example from High Performance MySQL:

A field in a table stores the names of various cities. First, we found a list of the most common cities:



Then try starting with 3 prefixes:



You can see that this is a pretty big gap. After trying, we found that a prefix index length of 7 is appropriate:



There’s another algorithm that we can use to do this: calculate selectivity.



This is the selectivity of the full column. Then let’s see what the selectivity is when the prefix index is 3,4,5,6,7:



There may be a misconception here that an index prefix length of 4 or 5 is sufficient. So let’s use the previous method to verify:



You can see that the number of prefixes that are most common is much larger than the number of cities that are most common. Even though they’re less selective. After finding the prefix index length, we can create the prefix index:

mysql> ALTER TABLE city ADD KEY (city(7));Copy the code

Prefix indexes are an effective way to make indexes smaller and faster. MySQL cannot use prefix indexes to do ORDER BY and GROUP BY, nor can it use prefix indexes to do override scans.

3, multiple column index

Here you can open up your own database table and see if the structure is a separate index for each column, which is what our company does now.



This is a common mistake. Prior to MySQL5.0, the following queries would not use the index and would require a full table scan:

select id from `shops_orders` where user_id=11 and shop_id=1;Copy the code

After MySQL5.0, the concept of “index merging” was introduced. This algorithm includes: union of OR conditions (UNION), intersection of AND conditions (INTERSECT), union AND intersection of the first two cases of combination.

Let’s first look at the union of OR conditions:



You’ll see that in the EXTRA column, there is a Using Union (). AND the (intersect) condition will have a Using intersect().

This index merge strategy is a result of optimization, but it also indirectly indicates that the index on your table is poorly built:

When a server intersects multiple indexes (usually multiple AND conditions), it usually means that a multi-column index with related columns is required, rather than multiple independent individual indexes.

When a server joins multiple indexes (usually multiple OR conditions), it usually consumes a large amount of CPU and memory resources for algorithm caching, sorting, and merging operations. In particular, some of these indexes are not highly selective and need to merge large amounts of data returned from the scan.

More importantly, this index merging strategy is not calculated into the “query cost” (cost) by the optimizer, which only cares about reading random pages.

4, select the appropriate index column order

Since you are building multi-column indexes, it is important to choose the right order. A good rule of thumb for selecting the right index order is to place the most selective column at the top of the index.

Initially, you can use this rule to create multiple indexes, because it will filter out the least amount of data while using the first index column. As you gain experience, you will have your own experience sorting index columns.

5. Cluster index

Clustered index is not a separate index type, but a way of data storage. Clustered indexes always store rows in leaf pages, so there can only be one clustered index in a table.

Clustered indexes are not supported by all storage engines and we will focus on InnoDB here. In InnoDB, a clustered index is actually a primary key index. If no primary key is defined in the table, InnoDB selects a unique non-empty index as the primary key. If there is no such index, InnoDB implicitly defines a primary key as the cluster index. The advantages of clustered indexes are as follows:

You can keep related data together;

Faster data access;

Queries that use an overridden index scan can directly use the primary key value in the page node.

If tables are designed and queried to take advantage of these features, performance will be greatly improved. Of course, clustered indexes also have their drawbacks:

Clustered indexes maximize performance for I/ O-intensive applications, but they have no advantage if all data is stored in memory.

The insertion speed depends heavily on the insertion sequence. This is why InnoDB typically sets an incremented INT column as the primary key.

Updating the clustered index is expensive because InnoDB is forced to move every updated row to a new location.

If new data is inserted out of order, “page splitting” may occur.

Secondary indexes may be larger than expected. Because the leaf nodes of the secondary index contain primary key columns that reference rows.

Secondary index access may require table-back queries.

One might wonder: what is a table-back query? Why does a secondary index need to be queried?

The answer is the essence of the “row pointer” stored in the secondary index. The secondary index does not hold a pointer to the physical location of the row in the leaf node, but rather the primary key value of the row.

If the query is not an override query, the primary key value of the row stored in the secondary index leaf node is used to perform a secondary query in the table. That’s when we get the data we really want, which results in using two B-tree queries instead of one. This is another reason to avoid using SELECT *, as mentioned at the beginning of this article.

Take a look at the data distribution of the clustered index:



6. Overwrite indexes

This gives you an idea of what an overwrite index is.

If an index contains or overwrites all the field values to be queried, it is called a “overwriting index.” So an index may be an override index for some queries and not for others. It’s actually a secondary index, but it satisfies a certain condition.

Overwriting indexes is a very useful tool that can greatly improve performance. Imagine the benefits of a query that only needed to scan the index without having to return the table query twice:

Index rows are usually much smaller than data rows, so if you only need indexes, MySQL will greatly reduce the number of data visits.

Because indexes are stored sequentially, there is less I/O for an I/ O-intensive range query than for a random read of each row from disk.

Overwrite indexes are especially useful for InnoDB because of its clustered indexes.

When issuing an override query, you can see the “Using Index” message in the Extra column in Explain. Let’s go back to the select issue: No index can cover all columns, so select can result in queries that would otherwise use an override index being unavailable.

7. Use index scans to sort

The limit of ORDER BY is the same as that of lookup queries: the left-most prefix of indexes must be met; otherwise, MySQL cannot use index sort. But there is a special case where the leading column is constant.

For example, if there is an index (A, B, C), then the SQL statement will be sorted by index as follows:

select id from table where A=2 order by B,C;Copy the code

This one doesn’t:

select id from table where A>2 order by B,C;Copy the code

Index case study

The best way to understand an index is with an example, so I’ll use an order table from our company. The current index is a separate index on each required column.

The company_ID column is certainly not very selective in the first place, but it will be used in almost every query. The Status and Type columns are also generally less selective, but are also used in many queries. The company_id,status, and type are used as prefixes to create indexes of different combinations.

At this point, many of you may be wondering: We mentioned earlier that as a rule of thumb, the most selective column should be placed in front of a multi-column index, so why create the index on the least selective column and use it as the index prefix column?

There are two reasons for this:

These three columns are used in almost every query, and you can even design it so that orders must be queried by state.

More importantly, even if we don’t use these three columns, we can “trick” around them.

The “trick” is that if we have a query that looks for orders IN all states, we can add AND status IN(1,2,3,4) to the WHERE condition to make MySQL select the index. To do so, one thing is that IN the IN () condition, the combinations that the optimizer needs to make increase exponentially.

For the three columns we mentioned earlier, if company_id has 5 values, status has 4 values, and type has 3 values, the optimizer translates to 5x4x3=60 combinations. 60 combinations is not a big deal for MySQL, but you need to be careful when the number of combinations reaches thousands.

In our side, the most commonly used is to query orders according to the store, customer phone number, customer name and time period. There are dozens of shops with the least value inside. So it obviously doesn’t fit the “trick” mentioned above.

So we can create several indexes:

(company_id,status,type,shop_id,click_at)

(company_id,status,type,custom_name,click_at)

(company_id,status,type,custom_mobile,click_at)

When creating multi-column indexes, it is important to avoid multiple range conditions, such as the above three indexes. If you add any columns after click_AT, the indexes will not be hit. Unless there is a special reason, such as you need to add a field to make a query override, it is best not to waste space by adding other columns after a range column.

Iv. ICP Deduction under Index Conditions

As an aside, MySQL introduced an optimization called Index Condition Pushdown in version 5.6. MySQL official website manual is described as follows:

The goal of ICP is to reduce the number of full-record reads and thereby reduce IO operations. For InnoDB clustered indexes, the complete record is already read into the InnoDB buffer. Using ICP in this case does not reduce IO.

The purpose of ICP is to reduce IO operations by reducing the number of full record reads. For InnoDB clustered indexes, the full record is already read into the InnoDB buffer, in which case ICP does not reduce IO.

Let’s turn off ICP:

set optimizer_switch='index_merge_intersection=off';Copy the code

In the absence of ICP, look at the following SQL:

explain select * from `shops_orders` where `code` like '2018010419%';Copy the code

The result is:



Then we open ICP and run the SQL above:



We can see that the following SQL uses ICP. So what is ICP?

In the absence of ICP, the filtering of columns that are not used by indexes in the WHERE condition is in the MySQL service layer; With ICP, this filtering is done directly in the storage engine layer before the secondary index returns to the table, which avoids a lot of data transfer and thus reduces IO.






DBAplus community