This article originated from personal public account: TechFlow, original is not easy, for attention


This post was originally scheduled to be released last Friday, but it was delayed to this Friday because it was too critical. I’m sure you can get a sense of this hardcore from the title.

Friday’s topic is Big Data and distribution, and my original intention was to share with you the application of LSM trees in distributed storage engines. But to really understand the essence of LSM and its clever ideas, you need to know something about traditional database B-trees and B-+ trees. Hence today’s article.

Although I have written a complete B tree myself, I do not recommend this for beginners, as it is not helpful to force through difficult data structures except to discourage them. So, despite the title, I’m not going to post too much code in the article. And even if it is an interview FLAG, as long as you do not have to write your resume as ACMer, you will not be asked to write a variety of tree interview questions. Therefore, it is more important to understand the principles than to implement them ourselves.

Even if you are a beginner and don’t know much about data structures, please don’t quit. I am confident that you can understand how B-trees work.


Binary and binary search trees


Before going into the details of how this works, let’s review some simple concepts.

Let’s start with the concept of a binary tree. The concept of a binary tree itself is simple, with a maximum of two children per node except for the root node.

For example, a tree is a binary tree:

Binary tree itself does not have much use, just a tree data structure, until a great god thought of a trick. If I stipulate that the elements of a binary tree have order, with all the elements smaller than it in the left subtree and the elements larger than it in the right subtree, wouldn’t we be able to find something very quickly?

It was a genius idea, and the binary search tree was born.

The figure above is a classic binary search tree. For example, if we want to find element 4, we first compare it to the root node 8. Obviously 4 is less than 8, so we iterate to the left subtree of 8 3, 4 is greater than 3, so we go to the right subtree of 3, we go to 6. 4 is less than 6, so we go to the left subtree of 6, which is 4, and we find the element, and we’re done.

Ideally, with each additional level of the binary tree, the number of elements that can be stored doubles. In other words, the binary search tree of n elements, the corresponding tree height is. So the time we take to find the element and insert the element is also zero.

But that’s the ideal situation, and obviously in practice it’s probably not so ideal. For a simple example, if the inserted data appears in ascending or descending order, all the elements will be arranged linearly and the tree structure will degenerate into a linked list. Obviously, in this case, the efficiency of lookup and insertion degrades. In the algorithmic world it is acceptable that a data structure is imperfect or reliable most of the time, but not acceptable that the reliability is unknown.

In order to solve the problem of unbalanced binary search tree, a variety of balanced trees are proposed. For example, AVL in the data structure textbook is a classic balance tree. The idea of balance tree is simple, that is, to judge when inserting elements. If the insertion or deletion of the current element will affect the balance of the tree, the rotation operation will be carried out to maintain the balance of the tree.

It is because the rotation mechanism is introduced that the performance of binary search tree is guaranteed, thus greatly increasing the difficulty of this originally very simple data structure.

The figure above is a classic rotation operation. In the operation of add, delete, change and check, rotation operation is often used to balance the whole tree, so it requires high logic and spatial thinking of programmers. It is easy to dissuade beginners. We’re not going to go into the details of rotation, we don’t use rotation in B trees, we just need to know that it’s used to adjust the structure of the tree to rebalance it.


Binary and multi-fork


Binary search tree understanding, multi-fork search tree is logical, the basic principle is the same, the only difference is that multi-fork search tree allows the number of elements in the node to exceed 1. That is, a node can store multiple elements and have multiple subtree branches.

A very clever design in B trees is that the number of children per node is the number of elements +1. And like binary search tree, there is correlation of size order.

As shown in the figure, the root node has two elements, 3 and 9, and three child nodes, which exactly correspond to three intervals. It’s less than 3, it’s between 3 and 9, and it’s greater than 9, so depending on the size of the element we’re looking for, it’s pretty easy to decide which branch to pick. And the elements in the node are ordered, we can use binary search for efficient search.

Let’s think about a question. Since binary search trees can also efficiently add, delete, change and search nodes, why do we need to create a multi-fork search tree? Compared with binary search tree, what unique advantages does it have?

We can’t see it just by looking at it. We need to know the actual application scenario of B tree. B tree is mainly used in major storage file systems and database systems. The amount of data in these scenarios is so large that it is impossible to store it all in memory. So to solve this problem, we will store the child’s disk address in the tree node. The information of the child node is read into memory by disk loading when access is needed. This means that in the database we are traversing the tree along with disk reads.

As we discussed earlier in MapReduce, random reads and writes to disks are very time consuming. Obviously, the deeper the tree is, the more reads and writes are made to the disk and the more I/O overhead is incurred. So in order to optimize this problem, B trees were designed. Since B trees store more than 2 data and children per node, their tree depth is significantly smaller than binary search trees. Therefore, there are fewer reads and writes to the disk, resulting in lower IO overhead. This is why it is suitable for file engines as well as database engines.

Let’s take a look at a picture to get a feel for it:

As we can see from the above example, the same elements, obviously B tree has a smaller tree depth.


Definition of B tree


Although a B-tree is a multi-fork search tree, it doesn’t mean that a multi-fork search tree is a B-tree. B-tree also has some restrictions on nodes. The number of elements and children that can be stored in each node is not arbitrary.

Let’s look at the specific definition:

  1. The number of elements and subtrees of each node in a B-tree is limited. Except for the root node, all nodes have at most M-1 elements, and all non-leaf non-root nodes have at most M subtrees (called m-order B-trees).
  2. The root node has at least two subtrees, and the non-leaf nodes other than the root node have K subtrees and K-1 elements ((M+1/2) < K < M) in ascending or descending order
  3. All leaf nodes belong to the same layer

That is to say, B-tree is through a unique mechanism to achieve the increase, deletion, change and check, so that the results after the increase, deletion, change and check still meet the above three points. Although it is said to increase, delete, change and check, but change and check the logic is actually the same. So there are really only three core approaches.

Before introducing the mechanism, we emphasize again that b-trees are not complex in principle and do not involve tree rotation operations or complex transformations.


B tree search


We’ll start with the simplest, as usual. In fact, the easiest part of any search tree is almost always lookup. Because lookup does not change the tree structure, we just need to understand the logic of lookup.

The search for B trees is essentially the same as the search for binary trees, and binary trees can be regarded as a special case of B trees.

The essence of search operation is to narrow the search scope by judging the current node element. As we’ve already shown, the K subtree of a node in a B tree corresponds to its K minus 1 elements. We only need to judge the size relationship between the key and all elements of the current node to determine the scope of the data.

To simplify the description, let’s define the nodes in the B-tree:

class Node:
    def __init__(self):
      self.keys = []
      self.childs = []
Copy the code

A node of a B tree has K-1 elements and K subtrees, which we store with keys and childs. And we know that the elements in keys are ordered. The subtrees in childs correspond to the interval delimited by the elements in keys.

Let’s assume that the element we’re looking for is key and the current node is Node.

First we look for the position in node.keys that is greater than or equal to the key. We’ll call it pos. If pos = len(node.keys) or node.keys[pos]! = key, which means that the current node is not the one we are looking for, and we need to continue searching the subtree.

What is this subtree? Childs [pos] = node. Keys [pos] = node. Keys [pos] = node. Keys [pos] > key is larger than all elements in Node. keys, so pos would be len(Node. keys) and node.childs[-1]. We access Node.childs [pos] all correctly. So we call recursively, otherwise node is the target, and we just return.

Here’s an example:

For example, if we want to search for 7, we first find the first location in the root node that is greater than or equal to 7. The element in this location is 9, which is not equal to 7. This means that there is no 7 in the current node. Since the subtree corresponds to the interval divided by the elements, we can confirm that if 7 exists in the subtree, it will only appear in the subtree before 9. Therefore, we recurse to the subtree of 9, that is, node. Childs [1].

Let’s look at the code:

 def _query(self, node, key):
        """ :param node: a node in the B tree :param key: the object to be found :return: True/False node: a node in the B tree with key subscript """
        If the node is a leaf node, then it has no childs
        if node.is_leaf:
            Check if elements exist
            if node.length == 0:
                return False, node, 0
            Select * from key
            pos = bisect_left(node.keys, key)
            If not in node, return False
            if pos == node.length ornode.keys[pos].key ! = key:return False, node, 0
            else:
                return True, node, pos
        else:
            # if not a leaf node
            pos = bisect_left(node.keys, key)
            # If not, continue recursing
            ifpos ! = node.lengthand node.keys[pos].key == key:
                return True, node, pos
            return self._query(node.childs[pos], key)
Copy the code

We use a function called bisect_left, derived from Python’s bisect binary lookup library, to implement binary lookup instead of us, returning the first position greater than or equal to the key, and the length of the array if both are smaller.


B tree insertion


B-tree insertion is a little more complicated than lookup.

A rule of thumb for b-tree inserts is that all inserts occur only on leaf nodes. This is actually pretty easy to figure out, because if the key we’re inserting is in a non-leaf node, then it becomes a modification operation, and we just change the value of the original key. If the inserted key is not a non-leaf node, then obviously we can look all the way to the leaf node.

In this figure, for example, the root node from 3 to 9 divides the whole interval into three intervals less than 3, 3 to 9 and greater than 9. Whatever key we’re inserting, it’s either going to be at the boundary of the interval, or it’s going to be in some interval. If the key value appears in the keys of the non-leaf node, we can directly modify its corresponding value. If it appears in the interval, then obviously we should add a value to the leaf node.

However, there is a slight problem with adding data to leaf nodes. The b-tree has a limit on the number of elements in all nodes. So we need to deal with the fact that there are more elements in the node than there are.

The solution of B-tree to this problem is very clever, its measure is very simple, in a word, by splitting nodes to reduce the number of elements in the node.

One problem is that when we split a node, the number of elements in the parent node should also increase, because the number of children in the parent node corresponds to the number of elements in the node. If a child node splits and branches increase, it is obvious that the parent node should also add an element.

Indeed, for this reason, after the leaf node splits, an element needs to be uploaded to the parent node to ensure that the number of elements in the parent node and the number of subtrees remain legal. Let’s look at this example:

This is a b-tree of order 4, so we insert 5 first, and then the number of elements in the middle leaf reaches 3, and we haven’t violated the property yet. We insert 6 again:

In this case, the number of leaf nodes is greater than or equal to M after two elements are inserted consecutively, so we need to split it in two and upload the middle node to the parent node. After this adjustment, an element is added to the parent node and a branch is added to ensure that the B-tree remains legal.

The final result is as follows:

But, you might ask, what about uploading an element to the parent node that might cause the parent node to have more elements than it already has? It’s really simple, as you would expect, but let’s keep splitting the nodes.

Consider the following example:

If we insert 13 into it, the number of the last leaf node exceeds the limit, so we split the node and upload the middle element 11 to the parent node:

However, since uploading the parent node may also cause the excessive number of elements, we need to recursively determine whether the operation of splitting the node is needed. At this point, the number of elements in the parent node is greater than or equal to M, so we need to continue splitting:

The newly generated node is higher than the original root node and becomes the new root node. In addition, the subtree of the original root node is split and divided into two subtrees of the new root node. In addition to keys, we need to split childs and reassign the parent pointer.


Delete the B tree


This brings us to the most difficult part of the data structure, which is not so much the algorithm itself, but the fact that at first glance the situation looks a lot more complicated. But in this case, all changes are traceable, we just need to grasp the root cause and purpose of the change, these seemingly complicated changes are no longer a problem.


All deleted are leaf nodes

First, let’s get the first point straight. Whatever element we are currently removing will eventually be implemented on the leaf node, which means that all cases can be translated into the problem of removing the leaf node.

I know this sounds ridiculous, because it’s possible that the element we’re deleting is not in the leaf node, but the root node of one of the subtrees in the middle, how can they all be leaf node deletions?

It’s not that hard to do, it just takes a very simple transformation.

For example, in the figure below, if we want to delete element 11, and 11 is on the root node, obviously the location we want to delete is not on the leaf node.

But to avoid deleting elements that are not leaf nodes, we can first find the descendants of 11. The successor node here refers to the smallest node in the tree that is larger than the current element. In this figure, the successor node of 11 is 12. We assign 12 to 11 and recursively downgrade it to delete 12, as shown in Figure 2:

Of course, the successor nodes we select may still not be leaf nodes, and that doesn’t matter, we just need to repeat. Because we can guarantee that subsequent nodes will appear deeper in the tree than the current element, not less. The tree depth is finite, which means that after a limited number of transformations, we can shift the deletion operation onto the leaf node.

And that’s the premise of all the subsequent derivations, and we have to figure that out.


The method of finding the successor node

Finding the successor node is a very common practice in binary search trees. The idea itself is very simple, which is to find the smallest node larger than a certain key value. This is essentially upper_bound for binary searches.

But doing this on a tree node is a little bit more complicated and not as intuitive as the dichotomy. For a node, if the current node has no element greater than key, then there is only one possibility: the successor node is in the right-most subtree. Because the right-most subtree is larger than all the elements in the node, it is possible to have a value larger than key.

Here’s an example:

For example, if we want a successor of 11, and no element is greater than 11 for the root node, then if the successor exists, it must be in the right-most subtree. If we’re asking for a descendant of 15, then obviously even if we search the right-most subtree, we can’t find a legitimate descendant.

If there is an element greater than key in the current node, there are two possibilities: one possibility is that the successor is in the subtree, and the other possibility is that the successor is in the node.

Again, if we’re looking for a successor to 10, we find 11 greater than 10 in the root node, but we haven’t searched the subtree yet, so there’s no way to tell whether the final answer is 11 or if there’s a better solution in the subtree.

So to solve this problem, we need to pass 11 as a spare to the subtree. If the subtree finds a better solution, then return the optimal solution, otherwise the spare is the optimal solution, then return the spare.

We write pseudocode:

def successor(node, key, backup):
    Find the index of the first element greater than key
    pos = binary_search(node.keys, key)
    # If it is a leaf node, then there is no need to continue recursion
    if node.is_leaf:
        If you find it, return it because all elements in the subtree are smaller than backup
        if pos < len(node.keys):
            return node.keys[pos]
        else:
            return backup
    # If not found, recurse
    if pos == len(node.keys):
        return successor(node.childs[- 1], key, backup)
    # Found, update the spare tire
    backup = bt_node.keys[pos]
    # It might be better to continue recursing and pass in a new spare
    return successor(node.childs[pos], key, backup)
Copy the code


Two cases after deletion

Now that we understand the problem, we can happily talk about elements on nodes.

As we said before, a legitimate B tree would have K minus 1 elements on its leaf nodes, where M//2 <= K <= M.

Since there is a limit on the minimum number of nodes in a B-tree, it is possible to break this limit by deleting elements. In view of this situation, we discuss separately.

If a leaf node has a large number of elements, we can delete one and still make it legal. This case is very simple, there is nothing to say, just delete.

Please refer to the following figure for details:

In the figure above, M=4, which means that we allow a node to have at most four branches, and a node has at least 4/2-1, i.e., one element.

If we want to delete element 19, because there are so many elements in node 3, even if we delete one element, it still meets the requirements of the node, then we do nothing. But if we delete 10, since node 10 has only one element, if we delete it, then we break the minimum number of elements for the node.

In this case, there is only one way to delete and then borrow from other nodes.


Borrow from brother nodes

Let’s first consider borrowing an element from a sibling of a node, which is a very simple idea. Because sibling nodes have the same parent as the current node, you can easily transfer nodes and preserve the nature of the tree.

Obviously, for a leaf node, there are only two sibling nodes at most, that is, the sibling on the left and the sibling on the right. Since the elements in the B tree are increasing from left to right, we think that the right is the elder node and the left is the younger node. In fact, the order of borrowing affects the shape of the tree, but does not affect the legitimacy of the tree. So we borrow from our brother and then we borrow from our brother, or vice versa. I personally set the order of elder brother before younger brother.

Using the previous example:

Let’s say we want to delete node 10, which has only one element. At this time, we first look at the elder brother’s situation, there are three elements in the elder brother node, even if borrowing one element can still meet the requirements, then we borrow from the elder brother.

The method of borrowing is simple, since all the elements in the older node are larger than the current node, so obviously to keep the order of the elements, we will borrow the first element, which is 13. But 13 is larger than 12 in the parent node, so we can’t just put 13 in the original 10, but we need to take the 12 from the parent node, put it in the 10, and then put 13 in the 12. If you’re familiar with balance trees, you’ll see that this is actually a left-handed operation, but if you’re not, don’t worry.

The final state of equilibrium looks like this:

Similarly, if we want to delete 23, since it has no older siblings, it only has younger siblings, and the younger brother satisfies this condition. So we borrow an element from the younger node, and the logic is the same. We borrow an element from the parent node and then borrow an element from the younger node and put it back into the parent node. The results are as follows:


Sibling nodes are also poor

Since there are sibling nodes that can be borrowed, there are obviously sibling nodes that can’t be borrowed, and if the sibling node itself is barely qualified, it can’t be borrowed. This time can’t, can only or father node to. If the parent node is a little richer and still satisfies the condition after one element is given, then the parent node is removed. However, it should be noted that if the parent node gives out an element, its number of subtrees should also be reduced, otherwise it will not meet the characteristics of B-tree. This can be done by merging two subtrees.

Let’s look at an example to make it more intuitive. For example, in the picture below, we continue to delete 10:

After deletion, you get:

You’ll notice that this graph is strange, and besides being ugly, the biggest problem is that it has two elements at the root, but it has four subtrees, which obviously violates the properties of B trees.

The reason for the violation is that the root element 9 is borrowed from the quilt tree, so to solve this problem, we need to merge the two neighboring subtrees. In other words, subtrees [6, 7] and [9] are combined to obtain:

If you think about it a little bit, this is actually the inverse of splitting when you add a node, so essentially splitting when you add an element and sending one element into the parent and borrowing one from the parent when you delete and merging two subtrees is the inverse of each other, essentially the same thing.

Now, we have one last problem. When we borrow from the parent node to merge the subtrees, the number of elements in the parent node also decreases. What if the parent node does not satisfy the condition even after reducing one element? You might say, do I borrow from the child node? In fact, if you think about it, it doesn’t work. The reason is simple, because even if we can find rich children, we can’t increase the number of children by 1.

The solution to this problem is not difficult, we just need to recursively borrow the node operation, let the parent node borrow elements from its siblings and the parent node. In extreme cases, this can lead to changes in tree height, such as:

The figure above is a B-tree of order 5. If we delete 7, according to the previous convention, we will borrow element 6 from the parent node and merge it with the subtree [1, 3] to get:

But this borrowing causes the parent node to break the minimum element requirement of the B-tree, so we need to recursively maintain the parent node. In other words, the parent node is asked to repeat the steps of borrowing elements. We can find that for the node [10], it has no rich sibling node and can only continue to borrow from the parent node, which will lead to the merge again. Finally, we get the following results:

If we observe the changes in the tree structure above, we can find that as long as we borrow elements from the parent node, we are bound to merge with the sibling node. Merging with a sibling requires maintaining the elements of both nodes as well as the subtrees of each. Especially if we keep track of the parent node and the location within the parent node in each node, this all needs to be maintained. This is why b-trees are more difficult to implement than to understand.

In addition, deletion and addition of elements can cause changes to the root node, so we need to be careful to update the tree structure at the same time as the root node.

Here, the article is finally over, this article is more than 10,000 words, involving so many knowledge points, can insist on seeing you here, must be very good. My biggest feeling after writing this article is that BEFORE LEARNING B-tree, I also feel it is unattainable and very difficult. But as I crumpled up each of those dots and figured it out, and wrote about it, that’s what happened.

Learning will always experience a fog, gradually to the stage of the autumn will appear, as long as we can bear the patience, figure size, these problems will be no problem.

Finally, if you are interested in the complete code, you can reply “B tree source code” on the official account, and I will share my code with you.

That’s all for today’s article. If you feel you have gained something, please scan the code and pay attention to it. Your contribution is very important to me.