First, the interview

Interviewer: Do you know what data structures are used to store file indexes or database indexes?

Xiaoqiu: Yeah, it’s usually stored in a tree structure.

Interviewer: Can you explain why the tree structure is used for storage?

Xiaoqiu: Tree structure, such as B tree, B+ tree and binary search tree are all ordered, so the query efficiency is very high. You can find the target data in O(logn) time complexity.

Interviewer: Can you ask what kind of tree structure is used for file indexes, such as database indexes?

Pelle: Mostly B+ trees, a few B trees. (B and B+ trees are too complicated.)

Interviewer: Why use B tree instead of binary search tree? Or why not use a hash table? Hash table lookup is also very fast.

Pelle: Although the hash table can find the target data again O(1), but if we want to carry out fuzzy search, we can only traverse all the data, and if there is an extreme case, too many elements in the hash table conflict, it will also lead to linear time search efficiency.

Interviewer: Why not use a binary search tree?

Pelle: Well… . Actually I also don’t know, at that time was another interview question to see the answer, hee hee.

Interviewer: Then you can go home and wait for the announcement… .

Why not use binary search tree?

Pelle is a little frustrated after the hatred, ran over to ask shuai about some knowledge of the B tree and the mind of the question.

Pelle: Today I went to an interview. The interview asked me why the file index should use a B-tree instead of a binary search tree. Then I thought about it.

Handsome to: indeed, if the search efficiency (that is, the more times) is, in fact the binary tree can be said to be the fastest, but we are stored in a disk file index, so we should not only consider the search efficiency, consider the addressing of disk loading times oh, and that is why we want to use B tree.

Pelle: Does the binary search tree result in more disk loads? Can you tell me more about it?

Handsome ground: can ah, however understood, feel I speak of good, you want to remember to give me many points praise, forward oh.

Pelle: No problem.

What is a B tree?

Shuadi: To understand this problem, let’s first understand what is B tree, in fact, B tree and binary search tree, are trees, B tree is equivalent to a multi-fork search tree, for a m order B tree has the following characteristics:

1. The root node has at least two children.

2. Each intermediate node contains K-1 elements and K children, where m/2 <= k <= m.

3. Each leaf node contains K-1 elements, where M /2 <= k <= M.

4. All leaf nodes are on the same side.

5. The elements in each node are arranged from small to large, and k-1 elements in the node are exactly the range division of the elements contained by K children.

Xiao Qiu: Oh, my god, it’s so complicated, I can’t remember. I still choose not to learn, woo.

Shuadi: don’t worry, I can’t remember these rules, just let you know some of these rules, under normal circumstances, we don’t need to recite its rules completely. To understand this, let me give you an example of a B tree. Such as:

In the figure, there is a third-order B tree with m = 3. It can be seen that some nodes in the tree have multiple elements, and the values of all elements of the left node are smaller than those of the parent element, just like the binary search tree. For example, for the node (3, 7). The two elements divide this node into three ranges, that is, three children. 2 is the left child node of 3, while (4,6) is the right child of 3, which is also the left child of 7, and 9 is the right child of 7.

It’s very similar to a binary search tree, in that it’s ordered, and the left child is small, and the right child is large, except that a node of a B tree can have multiple elements and multiple branches. These branches and the number of elements can be found in the five rules above. To be honest, I’m too lazy to remember those rules, only know the general and B tree application is ok.

The article comes from wechat public: “helpless and painful code farmers”, more articles can be searched for attention

Four, Pelle’s doubts

Xiao Qiu: I see, but can such a multi-fork tree really be more efficient than a binary search tree? Why don’t I think so?

Handsome ground: that you can say oh.

For example, the above B tree has 11 elements. According to these 11 elements, we build a binary search tree as follows, as shown in the picture.

Let’s compare query efficiency

1. Number of lookups in B tree. Now suppose we want to query for element 9, for the B tree we need to make four comparisons, for example: the first comparison: 10 compare, smaller than 10, so find the child to the left of 10.

Second and third comparison: compare with 3, bigger than 3, at this time we also have to compare with 7.

Fourth comparison: Compare with 9, equal, find the target tree, return.

So the final result requires four comparisons.

2. Comparison results in binary trees

In order to save space, I’m not going to compare each of them, but as you can see at a glance, it takes four comparisons. As shown in figure

Pelle: The same is four comparisons, and each node of the B tree, if there are more elements, then the comparison of the B tree will be more. Why is the efficiency of B faster than binary search tree?

5. Solve puzzles

Shuty: Yes, binary lookup trees are as good as B-trees in terms of number of comparisons alone, but you’re missing one very important point, which is the number of times the disk is addressed.

We know that when loading data from disk into memory, it is loaded on a page basis, and we also know that data from node to node is discontinuous, so different nodes are likely to be distributed in different disk pages. So for the binary search tree above, we might need to do 4 addressing loads, as shown below:

For a B tree, since each node of the B tree can store multiple elements, the number of disk addressing loads is relatively small. For example, in the example above, the number of disk addressing loads is only 3, as shown in the figure:

In memory, as we all know, the speed is very fast, at least than addressing disk loading speed, fast hundreds, but when we compare the numerical, was conducted in memory, though the comparison of B tree number may be more than binary search trees, but fewer disk operation, so in general, or B tree much faster, That’s why we use B trees for storage.

Pelle: I see. I didn’t know that before.

The article comes from wechat public: “helpless and painful code farmers”, more articles can be searched for attention

Shuadi: I don’t know if you noticed, but actually the load times of the disk are basically related to the height of the tree, the higher the height, the more load times, the shorter, the less load times. So for this kind of file index storage, we usually choose the chunky tree structure. For example, with 1000 elements, the height of a binary search tree may be as high as 10 levels, whereas with a 10-order B tree, it only takes three or four levels.

But I still have a question. Most file indexes or database indexes use B+ trees, and only a small part of them use B trees. Can I ask why we use B+ trees instead of B trees? Also, are there any other applications for B trees?

Smartly: The b-tree is used in a small number of file indexes (database indexes), the most used is the file system. Why we use B trees instead of B+ trees, and why most database indexes use B+ trees instead of B trees, will be discussed in the next section.

Pelle: I’m looking forward to it.

Shuadi: If you feel there is a harvest, you can help to forward, like, share oh, this is also my motivation to write articles.

conclusion

About B tree and B+ tree, in the interview process, or ask a lot of drops, especially when asked about the database, the basic will ask index, and then asked about B+ tree, which will also pull the B tree. Therefore, it is very important to master the application and principle of the two kinds of trees. Although their rules are very complicated, if you are dealing with interviews, you do not need to memorize those rules, but only need to know about and know their principle.

Have a harvest? Hope the old iron people come to a triple whammy, give more people to see this article

1, like, can let more people see this article

2, pay attention to the original wechat public number “Shuaidi play programming”, in order to consolidate the basic knowledge of computer (computer network + operating system + database +Linux) and algorithm, recently opened a wechat public number “Shuaidi play programming”, interested in can pay attention to, focus on the algorithm related articles, hee hee.