B-Tree

B-tree is also called B-tree, which differs from balanced binary Tree in that B-tree is multi-fork Tree (balanced multi-path search Tree). The index technology of Oracle and MongoDB is based on b-tree data structure, and B-tree can also be regarded as an extension of 2-3 search Tree.

An m-order B-tree has the following properties

  1. Each node has at most M child nodes.
  2. Each non-leaf node (except root node) contains at least M /2 child nodes.
  3. If the root node is not a leaf node, then the root node has at least two children.
  4. On each node, all the keywords are in order, from left to right, in descending order;
  5. The mean value of the left subtree of each keyword is smaller than the current keyword, and the mean value of the right subtree is larger than the current keyword.
  6. Each node has an index and data;
  7. For a non-leaf node, it can store at most M-1 keywords.
  8. All leaf nodes are in the same layer.

B tree search

Let’s take finding 13 as an example:

Disk I/O for the first time: If the value is smaller than 17, select the leftmost subtree.

Second disk I/O: If the value is greater than 12, select the rightmost subtree.

Third disk I/O: Locate the disk to 13 and check the corresponding data. The search is complete.

B tree insert

For an m-order B-tree, the new nodes are generally inserted in the leaf layer, but it is necessary to consider whether fission is needed according to the actual situation.

1. If the number of key codes in the object is less than M-1, insert the key codes directly.

2. If the node number is equal to the key words in the m – 1, will be divided, with a keyword for boundary points among nodes are divided into 2, create a new node, and the key words in the middle of the insert to the parent node, continue to judge whether the number of keywords of the parent node is equal to the m – 1, in turn, determine whether or not split, the worst situations have been divided to the root node, Add a layer to the whole tree.

B tree delete

B tree deletion is also very complicated

If the original key tree of the node where the keyword resides is greater than or equal to (m/2), the deletion still satisfies the B-tree structure, and you can delete the keyword directly.

If the deletion no longer meets the b-tree structure, certain adjustment process is required:

If there are redundant keywords in the left and right sibling nodes, that is, the number of keywords in the nodes adjacent to this node is greater than (m/2) -1, the largest (left sibling) or smallest (right sibling) of the sibling nodes will be moved to the husband node. Then, the keyword that is smaller than (right sibling up) or larger than (left node up) in the parent node is moved down to the node with the deleted keyword.

If the left and right siblings have no extra keywords, the situation becomes very, very complicated; Need to put the delete key bytes and the left (or right) of keyword merged into brother nodes (the parent node points to the delete key bytes point node pointer to the left (right) of the brothers) is pointing to brother nodes, if therefore the key words of the parent node number is less than the specified value, then you need to do the same with the parent node, the worst case will make the whole tree to reduce a layer.

conclusion

The difference between B tree and balanced binary tree is that each node contains more keywords, especially when THE B tree is applied to the database, which makes full use of the principle of disk blocks (Disk data storage is stored in the form of blocks, each block size is 4K, each IO data read, The data of the same disk block can be read at one time) limit the node size and make full use of the disk block size range; After increasing the number of node keywords in the tree, the tree level is much less than the original binary tree, which greatly reduces the number of data search and comparison and improves the efficiency.

B + tree

A B+Tree with N keywords has N branches, while a B Tree with N keywords has N +1 branches.

In B+Tree, the number of keywords in each non-root node is >= (m /2) and <= M, whereas in B Tree, the number of keywords is >= (m /2) -1 and <=(m-1).

In B+Tree, the number of keywords in the root node is >=1 and <= M, whereas in B Tree, the number of keywords is >=1 and <=(m-1).

B+Tree is an upgraded version of B Tree, because the non-leaf node of B+Tree does not store the pointer of the keyword record, so compared with B Tree, B+Tree makes more full use of the node space, making the search speed more stable, and its speed is completely close to binary search.

1. The non-leaf nodes of B+ tree do not save the pointer to the keyword record, but only index the data, so that the non-leaf nodes of B+ tree can greatly improve the ability to save the keyword, and the tree level will be less;

2. The leaf node of B+ tree stores the pointer of the keyword record of its parent node, so the relevant data can be really obtained only when the leaf node is queried each time. Moreover, the characteristic of the flat fork tree is that the hierarchy difference of all child nodes is no more than one, so the query speed is relatively stable.

3. The keywords of the leaf nodes of B+Tree are arranged in order from small to large, and the data ending on the left saves the pointer to the data starting from the node on the right;

4. Tree of child nodes of non-leaf nodes = key words.

B + tree search

The search specification of a B+ tree is the same as that of a B tree, which is carried down layer by layer, but the difference is that it must be carried out to the leaf node, because only the leaf node stores the pointer to the data.

B + tree insert

All insertion operations take place in the leaf node

1. If it is an empty tree, create a leaf node and insert the record. The leaf node is also the root node.

2. If the node where the keyword to be inserted has less than M keywords, insert the keyword directly.

3. If the keyword number of the node where the keyword is inserted is equal to M, split the node into two nodes, move the keyword m/2 up to the parent node, and check whether the keyword number of the parent node is greater than M. If the keyword number of the parent node is greater than M, split the node according to the preceding procedure.

B + tree deletion

1. If the number of keywords on the node where the keyword resides is greater than M /2, delete the keyword directly.

2. If the number of keywords on the node where the keyword is to be deleted is equal to M /2, if the sibling node contains redundant keywords, you can borrow the keywords from the sibling node to delete the keyword.

3. If the sibling node has no redundant keywords, it needs to be merged with other sibling nodes.

4. If the parent node does not conform to the B+ tree structure, adjust the parent node structure according to the preceding rules.

5. Note the structure of the B+ tree (non-leaf nodes store index information, and only leaf nodes store data Pointers). After the modification, you need to modify the index value of its parent node.

conclusion

1. B+ tree has fewer levels;

2. B+ tree query speed is more stable;

3.B+ tree naturally has the sorting function. Since all the leaf node data of B+ tree constitute an ordered linked list, it is more convenient to query the data within the range and the data is very close;

4. Full-node traversal of B+ tree is faster: a B+ tree traverses the whole tree only requires traversal of all leaf nodes, while a B tree requires traversal of each layer, so B+ tree is more conducive to full-table scanning;

B * tree

The B* tree is a variation of the B+ tree, with the following differences

1. Limit the number of keywords: m/2 for B+ tree initialization, 2m/3 for B* tree initialization, so there are fewer levels of B* tree;

2. When the B+ tree node is full, it will split, while when the B* tree is full, it will check whether the sibling node is full. If the sibling node is not full, the key will be transferred to the sibling node. This feature makes it split less than B+ trees.

3. Non-leaf nodes in the B* tree besides the root node also store Pointers to sibling nodes;

conclusion

A B* Tree has a larger initial capacity, stores more keywords, has fewer levels, and has fewer fission times than a B+Tree.