In addition to the b-tree index, MySQL also provides the following indexes:

  • A Hash index

Only Memory engine support, simple scenario

  • R – Tree indexes

A special index type of MyISAM, used primarily for geospatial data types

  • Full-text

A special index of MyISAM, which is mainly used for full-text indexing. InnoDB supports full-text indexing since MySQL 5.6

Index/storage engine MyISAM InnoDB Memory
B-tree indexes support support support
A HASH index Does not support Does not support support
R – Tree indexes support support Does not support
Full – text index support support Does not support

The most commonly used indexes are B-tree indexes and Hash indexes, and only Memory and NDB engines support Hash indexes. Hash index is suitable for key-value query, which is faster than b-tree index query. However, Hash indexes do not support range lookup, such as <><==,>==, etc. Memory uses hash indexes only if “=”

MySQL does not support functional indexing until 8.0. Before this, it was possible to index the first part of a column, such as the title field, by only the first 10 characters of the title field. This feature greatly reduces the size of index files. Invalid for order BY and Group by operations.

create index idx_title on film(title(10));
Copy the code

1 the characteristics of

The value is stored in an array, and a hash function converts the key to a certain memory location, and then places the value in that location in the array. Hash conflicts occur naturally when using hash, and MySQL uses the zipper method to resolve them.

Hash indexes are implemented based on Hash tables. A Hash index can be used only when the query criteria exactly match the columns in the Hash index. For all columns in the Hash index, the storage engine computes a Hashcode for each row, and that’s what the Hash index stores.

  • For example, if a table maintains id numbers and names, search for corresponding names based on id numbers, and its hash index is as follows:

Username = ID_card_n4;

  1. willID_card_n4I’m going to use the hash function to figure out A
  2. Walk through in order to find User4

The four ID_card_n values are not necessarily incremented, so even if new users are added, it is fast to append them later. Of course, the disadvantages are obvious, not ordered, so hash index interval query is slow. For example, if you want to find all users whose ID numbers are in the [ID_card_X, ID_card_Y] interval, you need to scan the entire table.

2 Hash index defects

  • You have to look twice
  • Partial index lookup and range lookup are not supported
  • Hash code may have hash conflicts. If the hash algorithm is not well designed, too many collisions will deteriorate performance
  • The index stores hash values, so only < = > and IN are supported
  • The index cannot be sorted because the hash value is not necessarily the same as the hash value when the index is stored
  • Full table scans cannot be avoided because memory tables support hash indexes with non-unique values, that is, different index keys may have the same hash value
  • Since a hash table is a data structure that accesses memory locations based on keywords, a hash index that uses this principle requires all data files to be added to memory, which is very memory intensive
  • If all queries are equivalent queries, then hash is faster, but in reality the range lookup data is more
  • Only full-value matching queries for key values can be processed
  • The Hash function determines the size of the index key

To enable InnoDB or MyISAM to support hash indexes, it is possible to implement pseudo-hash indexes, called adaptive hash indexes.

You can add a field to store the hash value, index the hash value, and create a trigger to automatically add the calculated hash value to the table when inserting or updating.

The hash table structure is suitable for scenarios where there are only equivalent queries, such as Memcached.

3 Case Application

If you have a very large table, for example, the user login needs to retrieve the user by email, if you directly create an index in the email column, in addition to index interval matching, but also the string matching, short email is ok, if long, the query cost is relatively high. If the email hash index is set to int, the performance is much faster than string alignment.

The Hash algorithm

To build a hash index, first choose the hash algorithm, “high Performance MySQL” said the CRC32 algorithm.

INSERT UPDATE SELECT operation

Add hash field to table:

ALTER TABLE `User` ADD COLUMN email_hash int unsigned NOT NULL DEFAULT 0;
Copy the code

The next step is to UPDATE the email_hash field automatically on UPDATE and INSERT via triggers:

DELIMITER |
CREATE TRIGGER user_hash_insert BEFORE INSERT ON `User` FOR EACH ROW BEGIN
SET NEW.email_hash=crc32(NEW.email);
END;
|
CREATE TRIGGER user_hash_update BEFORE UPDATE ON `User` FOR EACH ROW BEGIN
SET NEW.email_hash=crc32(NEW.email);
END;
|
DELIMITER ;
Copy the code

The SELECT request would then become:

SELECT 'email', 'email_hash' FROM 'User' WHERE email_hash = CRC32(' [email protected] ') AND 'email' = '[email protected]';Copy the code
+----------------------------+------------+
| email                      | email_hash |
+----------------------------+------------+
| [email protected]             | 2765311122 |
+----------------------------+------------+

Copy the code

AND email = “[email protected]” to prevent hash collision data inaccuracies.