This is the seventh day of my participation in the August More text Challenge. For details, see:August is more challenging

An overview of the

Indexes were created to improve the efficiency of querying data, just like the contents of a book. If you want to find a point in a 500-page book quickly, without a table of contents, I suspect you’ll have to look for a while. Similarly, for a database table, an index is essentially its “catalog.”

Indexes also have a number of negative effects: creating and maintaining indexes takes time, which increases with the amount of data; Indexes need to occupy physical space. Not only tables need to occupy data space, but also each index needs to occupy physical space. Indexes must be maintained dynamically when adding, deleting, changing, or changing tables, which slows data maintenance.

Principles of indexing:

  1. Create indexes on the most frequently used fields to narrow the query;
  2. Create indexes on frequently used fields that need to be sorted.

Where indexing is not appropriate:

  1. Do not create indexes for columns rarely involved in the query or columns with many duplicate values.
  2. It is not appropriate to create indexes for some special data types, such as text fields.

The underlying data structure of the index

The data structure of the index is related to the implementation of the specific storage engine. The most commonly used indexes in MySQL are Hash index, B+ tree index, etc. The default index implementation of the InnoDB storage engine that we often use is B+ tree index.

B + number

  1. B+ tree is implemented based on B tree and sequential access pointer of leaf node. It has the balance of B tree and improves the performance of interval query by sequential access pointer.
  2. In a B+ tree, the keys of a node are non-decreasing from left to right. If the adjacent keys of a pointer are key I and key I +1 respectively and are not null, all keys of the pointer to a node are greater than or equal to key I and less than or equal to key I +1.
  3. In the search operation, the binary search is first performed on the root node to find a pointer to the key, and then the search is recursively performed on the node pointed to by the pointer. Until the leaf node is found, and then binary search is performed on the leaf node to find the data corresponding to the key.
  4. Insert and delete operations will destroy the balance of the balanced tree, so after the insert and delete operations, the tree needs to be split, merged, rotated and other operations to maintain the balance.

B+ tree and B tree

If we use B+ tree instead of B tree, we consider the impact of IO on performance. Each node of B tree stores data, while only the leaf nodes of B+ tree store data. Therefore, if we look for the same amount of data, the height of B tree is higher and I/O is more frequent. Database indexes are stored on disk. When the amount of data is large, the entire index cannot be loaded into memory. Only each disk page (the node corresponding to the index tree) can be loaded one by one.