[TOC] [index] [index]

1. What is a prefix index

A prefix index is simply an index of the first few characters of the text. The index is smaller, so the query is faster. This is similar to how Oracle uses the Left function to create a functional index for a field, except that MySQL does not need to use the Left function because the prefix index is automatically matched internally.

So why not index the entire field? Generally use the prefix index, may be because the entire field data volume is too big, there is no need for the entire field index, prefix index is just choosing a field of some characters as the index, index on the one hand can save a space so, on the other hand, can improve the efficiency of the index, of course it is clear that this way will reduce the selectivity of index.

Again, what is index selectivity?

2. What is index selectivity

Index Selectivity refers to the ratio of non-repeating Index values (also known as cardinality) to the total number of records ina table, ranging from 0 to 1. The more selective an index is, the more efficient the query is, because a more selective index allows MySQL to filter out more rows in a lookup.

The more selective the index, the better? Of course not! Index selectivity is 1, if the index selectivity is 1, it is the only index, the search can be directly through the search criteria to locate a specific row of records! This time, although the performance is the best, but also the most space, which is not in line with our original intention to create a prefix index.

We started to create the prefix index is not the only index, index is hoping to find a balance between performance and space, we hope to be able to choose a long enough prefix to ensure high selectivity (so there is no need to scan a lot in the process of query line), but also want to index don’t occupy too much storage space.

So how do we choose an appropriate index selectivity? The index prefix should be long enough so that the selectivity of the prefix index is close to the entire column of the index, that is, the cardinality of the prefix should be close to the cardinality of the entire column.

First we can get full column selectivity with the following SQL:

SELECT COUNT(DISTINCT column_name) / COUNT(*) FROM table_name;
Copy the code

Select a length prefix by using the following SQL:

SELECT COUNT(DISTINCT LEFT(column_name, prefix_length)) / COUNT(*) FROM table_name;
Copy the code

When executing the SQL above, we need to select the appropriate prefix_length until the result is approximately equal to the full column selection.

3. Create a prefix index

3.1 A small case

For example, let’s create a prefix index.

The data sample songo used here is a test script he found online, with 300W+ pieces of data, which is enough for SQL test optimization. The friends replied to mysql-data-samples on the background of the official account to obtain the download link of the script.

Let’s look at the table structure roughly:

This table has a user_UUID field, which we will work on.

Git is a tool you can use. Different from Svn, the version number on Git is not a number but a Hash string. However, when you use Git, for example, you need to roll back the version, you do not need to enter the complete version number. You only need to enter the first few characters of the version number, because the publication number can be determined according to the previous part.

If we want to index user_uuid, we don’t need to index the entire string, we only need to index part of the string.

SQL > select user_uuid from user_uuid; select user_uuid from user_uuid;

You can see user_uuid and I only need to give a part to lock a unique record.

Of course, the above SQL is tested by Songo, given the ‘39352F %’ condition can not be shorter, any shorter will find two or more records.

As you can see from the above example, if you index the user_uuid field, you may not need to index the entire string, but only a portion of the prefix string.

What about indexing the first few strings? This is not a head shot. It’s a scientific calculation. Let’s move on.

3.2 Prefix Index

SQL > select user_uUID;

SELECT COUNT(DISTINCT user_uuid) / COUNT(*) FROM system_user;
Copy the code

As you can see, the result is 1. A selectivity of 1 for all columns means that the values in this column are unique and not duplicated.

Let’s first try a few different prefix_lengths to see how selective they are.

There are 5 different prefix_lengths tested here.

8 and 9 are equally selective, because in the UUID string, the ninth string is -, and all UUID ninth strings are the same, so 8 characters and 9 strings are equally distinct.

When prefix_length is 10, the selectivity is already 1, meaning that only the first 10 characters are needed to uniquely locate a specific record in the 300 rows using user_uuid.

Create a prefix index:

alter table system_user add index user_uuid_index(user_uuid(10));
Copy the code

View the prefix index you just created:

show index from system_user;
Copy the code

As you can see, the second line is the prefix index we just created.

Next we analyze whether the index is used in the query:

select * from system_user where user_uuid='39352f81-165e-4405-9715-75fcdf7f7068';
Copy the code

As you can see, the prefix index is already used.

The specific search process is as follows:

  1. fromuser_uuid_indexThe first value found in the index is39352f81-1Record (first ten characters of user_uUID).
  2. Because user_uuid is a secondary index, the leaf node holds the primary key value, so it gets the primary key id of 1.
  3. Go back to the table with the primary key ID, find the full record of the row with ID 1 on the primary key index, and return it to the Server layer.
  4. The server layer checks whether its user_UUID is39352f81-165e-4405-9715-75fcdf7f7068The Extra of the execution plan is Using where.
    1. If not, the row is discarded.
    2. If so, add the record to the result set.
  5. Index leaf nodes are linked by one-way lists, so proceed from the first search to the next record, then repeat steps 2, 3, 4 until the value of user_uuid_index is not39352f81-1, the loop ends.

If we have a prefix index and the prefix index selectivity is 1, then step 5 is not needed. If the prefix index selectivity is less than 1, step 5 is required.

From the above case, you can see that we both save space and improve search efficiency.

3.3 A Question

SQL > select * from prefix index;

select user_uuid from system_user where user_uuid='39352f81-165e-4405-9715-75fcdf7f7068';
Copy the code

This time, instead of select *, select user_uUID, as in Songo’s previous article (it’s time to check if the gesture with the index is correct!). As you know, we should use the overwrite index here, let’s look at the execution plan:

Hey, what about index coverage? (Notice how Extra is Using where).

In a prefix index, the value of the user_uUID column is not stored in the B+Tree, but must be returned to the table. So, if you use a prefix index, you don’t use an overwrite index.

4. Summary

Ok, this is the prefix index, please use it according to the actual needs of your project. Today’s talk so much, the rest of us later to talk about it ~

References:

  1. Blog.csdn.net/dhrome/arti…