MySQL index underlying data structure

The data structures used by Mysql indexes include BTree index and Hash index. In the case of a Hash index, the underlying data structure is a Hash table, so the best performance is to use a Hash index in most cases where a single record query is required. The BTree index is recommended for most other scenarios.

Why do indexes speed up queries

The basic storage structure of MySQL is pages. Pages are linked by two-way linked lists, and records within pages are linked by one-way linked lists. When querying data, the bidirectional linked list is first used to query the page where the data resides. When querying within the page, if the where clause is a primary key, the query is carried out according to binary search; if the where clause is not a primary key, the query is carried out by traversing the linked list. The total time complexity is order n.

After using the index, you can use the BTree data structure to find the data in order of log n time.

Disadvantages of using indexes

  • Index creation and index maintenance take time, which increases with the amount of data.
  • Indexes need to occupy physical space. The larger the amount of data, the larger the space.
  • During addition, deletion, and modification, indexes need to be maintained, which reduces efficiency. Therefore, it is not appropriate to create too many indexes for a table with a large number of addition, deletion, and modification operations.

MySQL Index type

  1. Primary key Index: a special index that uniquely identifies a record and cannot be empty.
    ALTER TABLE `table_name` ADD PRIMARY KEY ( `column` ) 
    Copy the code
  2. Unique index: The value of the index column must be unique and cannot be empty. For a composite index, the combination of column values must be unique.
    ALTER TABLE `table_name` ADD UNIQUE ( `column` ) 
    Copy the code
  3. Ordinary index: the most basic index, which has no restrictions.
    ALTER TABLE `table_name` ADD INDEX index_name ( `column` )
    Copy the code
  4. FULLTEXT index: FULLTEXT index can only be used for MyISAM engine tables; Applies to columns of type CHAR, VARCHAR, or TEXT.
    ALTER TABLE `table_name` ADD FULLTEXT ( `column`) 
    Copy the code
  5. Joint index: Several columns are retrieved as a single index, using the leftmost matching principle.
    ALTER TABLE `table_name` ADD INDEX index_name ( `column1`, `column2`, `column3` )
    Copy the code

Associative indexedLeftmost matching principle

In simple terms, when a federated index is created, after the left-most index or indexes are identified, the following indexes are ordered. (as shown in the figure below, after the joint index of a and b is established, the value of a is [1,1,2,2,3,3], and the value of b is [1,2,1,4,1,2].) when the value of a is not determined, the value of b is unordered in structure. At this time, if only b is used for search in the where subquery, the full table will be searched. So when we use where a = 1 and b = 2 to query, we use the upper index to query b because b is ordered after a is determined. Select * from a where a > 1 and b = 2; select * from a where a > 1 and b = 2;

  • conclusion
  • Left-most prefix matching rule. MySQL will continue to match to the right until it reaches a range query (>,<,BETWEEN,LIKE) and stops matching. For example: A = 1 AND c > b = 2 AND 3 AND d = 4, if establish (a, b, c, d) order index, d is to use less than index, if establish (a, b, d, c) of the index, it can be used, a, b, d sequence can be arbitrary adjustment.
  • Theta is equal to theta and in can be out of order. For example, if a = 1 AND b = 2 AND c = 3 create (a,b,c) indexes in any order, MySQL’s query optimizer will help you optimize the patterns that the indexes can recognize

Clustered indexes and non-clustered indexes

Simple generalization

  • A clustered index is an index created with a primary key
  • A non-clustered index is an index that is not created with a primary key

The difference between

  • In a clustered index, leaf nodes store data from the table
  • In a non-clustered index, leaf nodes store primary keys and index columns
  • When using the non-clustered index to query the data, get the primary key on the leaf and then look up the data you want to find. (The process of getting the primary key and looking it up is called a callback.)
  • Create multiple non-clustered indexes. Each clustered index generates an index tree, so creating multiple indexes takes up more disk space.

Index and query optimization considerations

  • Leading fuzzy queries cannot use indexes, such asWhere like '% index 'Unable to use index
  • Union, IN, or can match the index. In is recommended
  • Negative condition queries cannot use indexes and can be optimized as in queries where negative condition is! =, <>, not in, not exists, not like, etc
  • When creating a federated query, the most distinguished field is on the far left
  • Place computations at the business layer rather than the database layer because computations on fields do not match indexes
  • The cast will scan the full table, and if the phone field is of type VArcher, the following SQL will not match the index.Select * fromuser where phone=13800001234
  • If a column is not null, use a not NULL constraint and the default value is not null. If a column is not null, then the index is invalid.
  • Limit 1 can be more efficient if you know that only one query will result, such as when verifying logins
  • Use proper paging to increase efficiency.select id,name from product limit 866613, 20When using the above SQL statement for paging, you may find that as the amount of table data increases, the direct use of limit paging queries will become slower and slower. The optimization method is as follows: you can take the id of the maximum number of rows on the previous page, and then limit the starting point of the next page based on this maximum ID. For example, in this column, the maximum id on the previous page was 866612. SQL can be written as follows:Select ID,name from product WHERE ID > 866612 LIMIT 20.

After the interview

Q: Please talk about the considerations of indexing, or talk about the advantages and disadvantages of indexing.

References:

  • Two database artifacts [index and lock]
  • MySQL learning – index (average index, the only index, full-text index, index matching principle and index hit, etc.)
  • MySQL Index (pros and cons, when to create indexes/not to create indexes, index and SQL statement optimization)