This article is based on the author Takashi Shenling

Mysql engine has InnoDB and MyIsAM. This section focuses on InnoDB

InnoDB base: B+tree

MyIsAM underlying: Hash

Index: an ordered data structure that helps MySQL efficiently retrieve data

3, create index (describe table level effect) reason

Control search times and improve search efficiency

4. Why InnoDB chooses B+tree

Select * from B+Tree; select * from B+Tree; select * from B+Tree;

Common data structures include binary Tree, red-black Tree, B-tree, and B+Tree

In the final analysis, B+tree is selected because the height of the tree is small, which can improve the query efficiency. Therefore, we should choose a data structure with tens of millions of levels of data but a relatively small tree height. Height 3-4, data can be stored tens of millions, the answer is ready — B+Tree, the following is the verification process.

Visual data structure

1. Binary tree

Concept features check baidu encyclopedia…

Features: Right more than left

Why not: If a binary tree has only a right or left subtree, it is no more efficient than a linked list query

2. Red-black tree – binary balanced tree

The height of the tree could be very, very high, although it’s self-balanced but the height of the tree is not controllable, so it’s not ideal

3, b-tree – multi-path search Tree

The leaf nodes have the same depth and the pointer to the leaf node is null

All index elements are not duplicated

The data index in the node is sorted incrementally from left to right

— Expanded nodes horizontally and reduced height

A more optimal solution – B + tree

4, B+Tree (b-tree variant)

Has the advantages of B-tree and:

Non-leaf nodes do not store data, but only indexes (redundancy). You can put more indexes on the same node to reduce the height of the tree

The leaf node contains all the index fields

Leaf nodes use pointer connections (b-tree does not use Pointers) to improve interval access performance

5, InnoDB B+Tree store data amount and Tree height calculation

B+tree a node == a page ==16kb

Bigint is 8byte in mysql, 6byte in MYSQL (C).

So the amount of data on an index page ==16KB/(8+6)B ==1170 == Number of nodes

Most databases are overloaded with 1kb of data in a row, so the number of nodes on which data is stored ==16kb/1kb==16

If the depth is three, then 1170*1170*16==21902400, can store more than 20 million data

If the depth is 3, it means that if you go to the B+Tree index, three disk I/OS can be found. And walk the full table query: need millions of levels of query…

In some versions of mysql, the efficiency is further improved by putting all the indexed nodes into memory, so the ten-thousandth lookup only takes about as long as it takes to fetch the data from the node.

The height of a Tree is determined by the number of indexes stored in non-leaf nodes