“This is the 28th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”

Author: Tangyuan

Personal blog: Javalover.cc

preface

There are several ways to index a string field. The simplest way to index a string field is to index the entire field.

Of course, there are other ways to add an index, such as adding an index to the first half of the string, adding an index to the string after the reverse processing, and adding an index after the ha sh processing.

Here are a few ways to index a string field.

directory

  1. Add an index to the entire field
  2. Add an index to the front part of the field
  3. Add an index to a field in reverse order
  4. Add indexes for the hash field

The body of the

The following SQL statement is used as an example: ID_card is the id number of a user

select * from t_user where id_card = 142733xxxxx
Copy the code

1. Add indexes to the entire field

If you simply index the entire field id_card, the query will be simple;

A general query procedure is as follows:

  1. First go to the ID_card index tree to locate the specific index value142733xxxxxx;
  2. Then query all data back to the table according to the primary key value in the index, and add to the result set;
  3. Continue to query the next record in the index tree. If yes, repeat Step 2. Not satisfied with the end;

The biggest advantage of this approach is that when the table is queried, the query is all the rows that meet the condition;

That is, all queries are precise and non-fuzzy;

2. Add indexes to the preceding fields

Add index index_card(ID_card (6))

Here we specify 6 to index the first six characters;

The steps for such a query are as follows:

  1. Locate the prefix in the id_card index tree142733Index value of (there may be more than one, the first one will be taken here);
  2. It then returns to the table based on the primary key value in the indexCheck whether the id_card value is consistent with the ID_card value in the WHERE condition.
    1. If consistent, all data recorded by the row is queried and added to the result set;
    2. If not, repeat Step 1 and continue according to the prefix142733Location index;
  3. Continue to query the next record in the index. If yes, repeat Step 2. Not satisfied with the end;

Select id_card, id_card, id_card, id_card, id_card, id_card, id_card, id_card, id_card;

That is, they are fuzzy queries, not precise queries;

The advantage here is to save space. For example, the id number here has 18 digits by default, but only needs 6 digits after using the prefix index.

But the disadvantages are also obvious:

1. Scan multiple rows of data and determine which data is valid only after the operation is performed on the table; 1. Overwrite index will be affected: because every time the prefix index, we need to go back to the table to check whether the value of the query is equal (because the prefix index only contains the first few characters); So if the query statement is used to override the index, the override index is invalidatedCopy the code

Overwrite index: if the data queried is only the index value and primary key value, there is no need to query the records back to the table

Q: How do you optimize that?

A: Select an appropriate prefix index length and ensure that the index value of the prefix is sufficient to distinguish strings. That is, the higher the index distinction is, the better the query performance of the prefix index is.

For example, in the example of the id number above, the first six digits are differentiated according to provinces and counties (details are shown in the figure below), that is to say: if it is a county or urban area, their first six digits are the same, which will greatly increase the workload of the query;

In this case, we can add a few bits later to include the date of birth, so that the index distinction will greatly increase, the workload of the query will be reduced a lot;

3. Add indexes in reverse order

Is to store the string in the field in reverse order, and then add index to the previous part of the field; That is, first reverse order, and then prefix index;

This may seem like a bit of a detour, but it works for the id number example above;

Reverse (); reverse();

Because the distinction of the last few digits of the ID number is much higher than that of the first few digits, we only need to use reverse function for reverse query when querying, as shown below:

select * from t_user where id_card = reverse(142733xxxxx)
Copy the code

4. Add indexes to the hash field

Hash field values before adding indexes;

This is similar to adding an index to reverse order storage, which is to process the existing field value before storing it;

However, there is a possibility of conflicting results after the hash field is processed, so the hash result needs to be stored in an additional field.

The hash function in mysql is crc32();

In this case, you need to perform crc32() processing on the field value before inserting data.

Select * from crc32(); select * from crc32();

select * from t_user where id_card_crc=crc32('142733xxxx') and id_card='142733xxxx'
Copy the code

conclusion

The four storage methods are described above. The first two are relatively simple: whole field index and prefix index.

The latter two are a bit more complicated, so here’s a summary of the differences:

Stored in reverse chronological order Hash stored
The storage space normal One more field to store the hash value
CPU consumption Call the reverse() function Calling crc32() is a bit more expensive than reverse()
The query efficiency Multiple lines may be scanned As long as the hash values do not conflict, only one row is scanned
Find the range Does not support Does not support