The function of index is to improve the efficiency of query. There are many ways to realize index, such as hash table, ordered list, search tree and so on.

Hash table

A structure in which data is stored as key-value pairs, and the corresponding value can be found by the specified key. Hash puts the value in an array, converts the key to a certain location using a hash function, and then puts the value in that location in the array. However, after conversion of multiple key values through Hash function, the same value may appear, that is, Hash conflict. The common solution is linked address method, which puts all keys with the same Hash value into a linked list. In this way, no matter how many conflicts, only add nodes to the single linked list at the current location. This method is applicable to the scenario where only equivalent query is performed, and interval query is slow.

An ordered list

Equivalent query and range query are supported, but the cost of updating data is relatively high. This is useful for statically storing indexes, such as unmodified data such as the population of a city in 2017.

The tree

1. Binary tree:

The left son of each node is smaller than the parent, and the parent is smaller than the right son. The time complexity of searching and updating a node is O(log(N)), and the search efficiency is the highest.

2.B tree (multi-fork tree) :

The root node has at least two children, each increasing in size from left to right.

3. B + tree:

The leaf node of the B+ tree saves all the key values and the data corresponding to the key values of the parent node. The key values of each leaf node are linked from small to large, but the non-leaf node does not save the data corresponding to the key values, which greatly increases the key values saved by each node of the B+ tree. Since the non-leaf node of B+ tree only carries out data index and does not store the data corresponding to the actual key value, all data can only be obtained from the leaf node, so the number of data query is the same every time. Although binary trees are efficient, most databases do not choose binary trees because indexes are not only in memory, but also written to disk to minimize disk reads and IO times.

Imagine a balanced binary tree of 1 million nodes, 20 tall. A query may require access to 20 data blocks. In the days of mechanical hard disks, reading a block of data randomly from a disk took about 10 ms of addressing time. That is, for a 1 million row table, it may take 20 10 ms times to access a row individually if binary trees are used for storage. InnoDB uses a B+ tree index model, where all data is stored in a B+ tree, with each index corresponding to a B+ tree.

Take an integer field index of InnoDB for example, the N is about 1200. When this tree is 4, we can store 1,200 to the third, which is 1.7 billion. Considering that the root of the data block is always in memory, an index of an integer field on a billion row table requires only three disk visits at most to find a value. In fact, the second level of the tree also has a high probability of being in memory, so the average number of disk accesses is even smaller. Conclusion: InnoDB uses B+ tree structure, because B+ tree can well cooperate with the read and write characteristics of disk, reduce the number of disk access in a single query, reduce I/O, and improve performance.

The above content hopes to help you, more free PHP factory PDF, PHP advanced architecture video materials, PHP wonderful good article can be wechat search concerns: PHP open source community

2021 Jinsanyin four big factory interview real questions collection, must see!

Four years of PHP technical articles collation collection – PHP framework

A collection of four years’ worth of PHP technical articles – Microservices Architecture

Distributed Architecture is a four-year collection of PHP technical articles

Four years of PHP technical essays – High Concurrency scenarios

Four years of elite PHP technical article collation collection – database