High-performance index optimization Strategy (1) : Isolate data columns and prefix indexes

Proper index creation and use is critical to query performance. There are many ways to select and use indexes effectively because of the variety of optimizations and behaviors for specific scenarios. Therefore, deciding how to use indexes is a skill that takes experience and time to develop. Here are some ways to use indexes effectively.

Quarantined data column

In general, we find that query statements prevent MySQL from using indexes. MySQL does not use indexes for columns unless they are independent in a query statement. “Isolated” means that the index column should not be part of an expression or in the body of a query function. For example, the following example will not match the index actor_id.

SELECT `actor_id` FROM `actor` WHERE `actor_id` + 1 = 2;
Copy the code

For humans, it’s easy to know that the query condition is actually actor_id = 4, but MySQL doesn’t do that, so get in the habit of simplifying the WHERE decision condition, which means that the index column is on one side of the comparison operator independently. Here’s another example of a common mistake:

SELECT.WHERE TO_DAYS(CURRENT_DATE) - TO_DAYS(date_col) < = 10;
Copy the code

Prefix index and index selectivity

Sometimes you need to build indexes on columns with very long characters, but this can lead to large index space and slow queries. One strategy is to use hash index emulation, but sometimes this may not be good enough.

It is usually possible to replace full field indexes with partial characters before index columns to improve performance and save space. But that makes the selectivity worse. Index selectivity refers to the proportion of data selected by individual index values in the whole data set. Highly selective indexes allow MySQL to filter out more irrelevant data. For example, the selectivity of a unique index is 1. Column prefixes usually provide good enough performance in terms of selectivity. If you use BLOB or TEXT or very long VARCHAR columns, you must define prefixed indexes because MySQL does not allow full-length indexes.

You need to strike a balance between using longer prefixes for better selectivity and ones that are short enough to save storage. To determine an appropriate prefix length, find the most frequent value and compare it with the most frequent prefix. For example, in a city table, we can use the following statement:

SELECT COUNT(*) as cnt, `name` FROM `common_city` GROUP BY `name` ORDER BY cnt DESC LIMIT 10
Copy the code

You can see that these city names appear more frequently. We can now find the most frequent city name prefixes using 1 word prefixes.

SELECT COUNT(*) as cnt, LEFT(`name`, 1) as pref FROM `common_city` GROUP BY pref ORDER BY cnt DESC LIMIT 10
Copy the code

As you can see, there are more data sets for 1 word, which leads to fewer independent selections, so you need to adjust the prefix length. For example, to three words.

SELECT COUNT(*) as cnt, LEFT(`name`, 3) as pref FROM `common_city` GROUP BY pref ORDER BY cnt DESC LIMIT 10
Copy the code

As you can see, this is similar to the full length, so the actual three-word prefix is enough. Another approach is to use the ratio of the number of prefixes of different lengths to the number of full fields to evaluate how appropriate it is. Such as:

SELECT 
  COUNT(DISTINCT LEFT(`name`, 1)) / COUNT(`name`) as pref1, 
  COUNT(DISTINCT LEFT(`name`, 2)) / COUNT(`name`) as pref2, 
  COUNT(DISTINCT LEFT(`name`, 3)) / COUNT(`name`) as pref3, 
  COUNT(DISTINCT LEFT(`name`, 4)) / COUNT(`name`) as pref4 
FROM `common_city`
Copy the code

The closer the value is to 1, the better the effect is. However, it can also be seen that the improvement space is smaller with the increase of prefix length. It’s not a good idea to just look at the average and check the worst case. 3-4 words might be enough, but if the data is distributed unevenly, there can be pitfalls. Therefore, we also need to check whether there is a case in which the number of prefixes corresponding to one prefix is extremely large compared to other prefixes. Finally, the specified column can be prefixed with an index.

ALTER TABLE `common_city` ADD KEY (name(3));
Copy the code

Prefix indexes are good for saving space and improving efficiency, but they also have a drawback, that is, they cannot be used on ORDER BY and GROUP BY (actual validation is also useful in MySQL 5.7 and above). Another common scenario is in long hexadecimal strings, such as stored session ids, where indexing with the first eight bits prefixes filters out a lot of irrelevant data and works well.

High-performance index optimization Strategy (1) : Isolate data columns and prefix indexes

Quarantined data column

Prefix index and index selectivity

Related Posts

Talk about ten common software architecture patterns

A bug that has been lurking for five years

After reading this paragraph, I dare not say I am proficient in C++ any more..