Art is long, life is long

Introduction to the

An M order B+ tree has the following characteristics:

  1. The middle node of a k subtree contains K elements (k-1 elements in a B tree). Each element does not hold data, but is used only for indexing. All data is stored in the leaf node.
  2. All leaf nodes contain information of all elements and Pointers to records containing these elements, and the leaf nodes themselves are linked in large order according to the size of the key word.
  3. All intermediate node elements coexist in child nodes, which are the largest (or smallest) element in the child node element.

Let’s first look at the structure of the B+ tree:

In the tree above, the conclusion is:

  • The root node element 8 is the largest element of the child nodes 2,5,8 and the largest element of the leaf node 6,8.
  • Element 15 of root node is the largest element of child node 11,15, and leaf node 13,15.
  • The largest element of the root node is the largest element of the entire B+ tree. No matter how many elements are inserted or deleted, the largest element should always be kept in the root node.
  • Since all the elements of the parent node appear in the child node, all leaf nodes contain all element information, and each leaf node has a pointer to the next node, forming an ordered linked list.

Satellite data:

Refers to the data record that the index element points to, such as a row in a database. In a B-tree, both the middle node and the leaf node have satellite data.

Satellite Information in B-tree:

Satellite Information in B+ tree:

Only the leaf nodes have satellite data, while the other intermediate nodes are only indexes without any data association.

It should be added that in the Clustered Index of the database, the leaf nodes directly contain satellite data. In the NonClustered Index, leaf nodes have Pointers to satellite data.

B+ tree query

The following are analyzed by single row query and range query respectively:

Single line query:

For example, the element we are looking for is 3:

The B+ tree looks for nodes layer by layer from top to bottom, eventually finding matching leaf nodes.

First disk IO:

Second disk IO:

Third disk I/O:

B+ tree query and B- data comparison:

  • The middle nodes of the B+ tree have no satellite data, so the same size disk can hold more node elements;
  • In the case of the same amount of data, the STRUCTURE of B+ tree is shorter and fatter than that of B- tree, so the I/O times of query are fewer.
  • A B+ tree’s query must eventually find a leaf node, whereas a B- tree simply finds a matching element, whether the matched element is in an intermediate node or a leaf node.
  • The lookup performance of b-trees is inconsistent (search only root nodes at best, leaf nodes at worst). And every lookup of a B+ tree is stable.

Range query:

For example, we want to query data from 3 to 11:

B-tree range query process:

From the top down, find the lower limit of the range (3) :

Middle order traversal to element 6:

Middle order traversal to element 8:

Middle order traversal to element 9:

Middle order traversal to element 11, traversal ends:

B+ tree range search process:

From the top down, find the lower limit of the range (3) :

Iterate through the list pointer to elements 6, 8:

Conclusion:

B+ trees have three advantages over B- trees:

  1. A single node stores more elements, resulting in fewer I/OS for queries.
  2. All queries need to find leaf nodes, and the query performance is stable.
  3. All leaf nodes form an ordered linked list for easy range query.