B tree

Before introducing B+ tree, we will briefly introduce B tree. There are similarities and differences between the two data structures. Finally, we will compare the differences between the two data structures.

Now is the golden 9 silver 10 recruitment season, are you ready? I have collected the interview questions of some major manufacturers and the latest information of this year (2020). The following are some screenshots of the information (all the information has been integrated into a document, compressed PDF package processing).

If there is a need for friends can scan the following TWO-DIMENSIONAL code



B tree concept

B tree is also called B- tree, it is a multi – path balanced search tree. Binomial trees I think everyone is familiar with, in fact, B trees and B+ trees are also from the simplest binary tree transformation, there is no mysterious place, let’s look at the definition of B trees.

  • Each node has a maximum of M-1 keywords (key-value pairs that can be stored).
  • The root node can have at least one keyword.
  • The non-root node has at least m/2 keywords.
  • The keywords in each node are arranged in descending order, with all the keywords in the left subtree of each keyword being smaller than it and all the keywords in the right subtree being larger than it.
  • All leaf nodes are at the same level, or the length from the root node to each leaf node is the same.
  • Each node has an index and data, i.e. a corresponding key and value.

Therefore, the keyword quantity range of the root node is 1 <= K <= m-1, and the keyword quantity range of the non-root node is m/2 <= k <= m-1. In addition, we need to pay attention to the concept that when describing a B-tree, we need to specify its order, which indicates the maximum number of child nodes in a node. Generally, the letter M is used to represent the order. Let’s take another example to illustrate the above concept, for example, here is a b-tree of order 5, the range of root nodes: 1 <= K <= 4, the range of non-root nodes: 2 <= k <= 4. Below, we through an example of insert, explain the process of insert B tree, and then explain the process of delete keywords.

B tree insert

When inserting, we need to remember a rule: determine whether the number of keys in the current node is less than or equal to m-1. If so, insert the key directly; if not, divide the node into left and right parts by dividing the middle node into the parent node.

Example: In a 5-order B-tree, a node has at most 4 keys and at least 2 keys (note: the following nodes use the same node for key and value).

Insert 18,70,50,40



Insert the 22



When 22 is inserted, it is found that the keyword of this node is already greater than 4, so it needs to be split. The rules of splitting have been described above. After splitting, it is as follows.



Then insert 23,25,39



I split it and I get the bottom one.

I’m not going to talk much about the insert process, but I’m sure you already know how to insert with this example.

Deleting a B tree Deleting a B tree is a bit more complicated than inserting a B tree, but, you know, it’s pretty easy to remember a few things.

Now I have an initial state of B tree like this, and I’m going to delete it.



Delete 15. In this case, the element of the leaf node is deleted. If the number of nodes is still greater than M /2 after deletion, it can be deleted directly.

Next, we delete 22. The rule in this case is that 22 is a non-leaf node. For a non-leaf node to be deleted, we need to overwrite the deleted key with a successor key, and then delete the successor key in the child of the successor key. To delete 22, you need to move the successor element 24 to the node where 22 was deleted.



At this time, it is found that the node where 26 is located has only one element, less than 2 (m/2), and this node does not meet the requirements. At this time, the rule (borrow elements from brother nodes) is as follows: If the leaf node is deleted, if the number of elements is less than (m/2) and its sibling node has more elements than (m/2), that is to say, the sibling node has more elements than the minimum value m/2, the elements of the parent node will be moved to this node first, and then the elements of the sibling node will be moved to the parent node. So that’s good enough.

It becomes clearer when we look at the operation.



Then delete 28, delete the leaf node, delete the requirements, so we need to consider borrowing elements from the sibling node, but the sibling node does not have more nodes (2), can not borrow, what to do? In this case, the elements of the parent node are first moved to the node, and then the keys of the current node and its siblings are merged to form a new node.



After the move, merge with the sibling node.



Delete only the above several cases, according to different circumstances can be deleted.

The above introduction, I believe that we have a certain understanding of B trees, the next part, we continue to explain B+ trees, I believe that with the comparison of B+ trees, it will be more clear.

B + tree

Summary of B + tree

B+ trees are actually very similar to B trees. Let’s first look at the similarities:

  • The root node has at least one element

– Range of non-root node elements: m/2 <= k <= m-1

Difference:

There are two types of nodes in a -B+ tree: internal nodes (also known as index nodes) and leaf nodes. Internal nodes are non-leaf nodes. Internal nodes do not store data, but only indexes. Data is stored in leaf nodes. – For a key in an internal node, all keys in the left tree are smaller than it, and all keys in the right subtree are greater than or equal to it. Records in leaf nodes are also arranged by key size. – Each leaf node has a pointer to the neighboring leaf nodes, and the leaf nodes themselves are linked in a descending order according to the size of the keyword. The parent node holds the index of the first element of the right child.

Let’s look at an example of a B+ tree and feel it.



2.2 Insert Operations

For insertion operations, it is easy to remember just one trick: when the number of node elements is greater than m-1, the middle element is split into the left and right parts, and the middle element is split into the parent node for index storage, but the middle element itself is split into the right part.

The following takes the insertion process of a 5-order B+ tree as an example. The node of the 5-order B+ tree has at least 2 elements and at most 4 elements.

Insert 5,10,15,20



I insert 25, now I have more than 4 elements, I split



Then we insert 26,30, and we continue to split



With these examples, I’m confident that the insert operation is ok, so let’s move on to the delete operation.

Delete operation

The deletion operation is simpler than that of the B-tree, because the leaf node has Pointers. When borrowing elements from the sibling node, it does not need to go through the parent node, but can move directly through the sibling node (provided that the element of the sibling node is greater than M /2), and then update the index of the parent node. If the sibling node has no more than m/2 elements (and no extra elements), merge the current node with the sibling node and remove the key from the parent node. Let’s look at an example.

The initial state



Delete 10, after deleting, the requirement is not satisfied, find the left sibling node has redundant elements, so borrow elements, finally, modify the parent node index



Delete element 5, it is found that the requirements are not met, and it is found that the left and right sibling nodes have no redundant elements, so you can choose to merge with the sibling node, and finally modify the index of the parent node



It is found that the parent index does not meet the criteria, so you need to do the same as in the previous step



So, the deletion operation of B+ tree is also completed, is not after reading, feel very simple!

03 Summary of B tree and B+ tree

B+ trees have some advantages over B trees, which boil down to the following points.

A single node stores more elements, resulting in fewer IO times for queries, which makes it more suitable as the underlying data structure of MySQL database. All queries need to find leaf nodes, and the query performance is stable, while b-tree, each node can find data, so it is unstable. All the leaf nodes form an ordered linked list, which is easier to find.

At the end of the article

This article about B tree and B+ tree to share here, in addition, I collate and collect more than 20 years of company interview knowledge collation, as well as a variety of Java core knowledge free to share with you, if you want information, please scan the TWO-DIMENSIONAL code