Tree structure series (3) : B tree, B+ tree

The article was first published on “Chen Shuyi” public account and personal blog shuyi.tech

The search efficiency of balanced binary tree is very high and can be improved by reducing the tree depth. However, when the amount of data is very large, the number of elements stored in the tree is limited. As a result, the binary search tree structure is too deep, resulting in frequent disk I/O reads and writes, which leads to low query efficiency.

B trees are designed to solve this problem by reading a lot of data at once. Instead of just storing a number, a node stores a shard of data. In this way, frequent disk data reads can avoid frequent I/O access, resulting in search speed bottlenecks.

B tree

B-Tree B-Tree B-Tree B-Tree

B trees are not to be confused with binary trees, which are not binary trees but self-balancing tree data structures. It maintains ordered data and allows search, sequential access, insertion, and deletion in logarithmic time. A b-tree is a generalization of a binary search tree because a node of a B-tree can have more than two children.

Unlike other self-balanced binary search trees, B-trees are well suited to storage systems that read and write relatively large blocks of data, such as discs. It is commonly used in databases and file systems. For example, mysql’s InnoDB engine uses a data structure that is a variant of the B tree B+ tree.

A B-tree is a balanced multitree, usually called a b-tree of order M, which must satisfy the following conditions:

Each node has at most m child nodes.
Each non-leaf node (except the root) has at least ⌈m/2⌉ child nodes.
If the root is not a leaf node, the root has at least two child nodes.
A non-leaf node with K child nodes contains K-1 keys.
All leaves appear at the same level without any information (highly consistent).

The order of a B-tree refers to the maximum number of children of a node in a b-tree. For example, in the book above, “13,16,19” has the largest number of child nodes, with a total of four child nodes (gray nodes). So this b-tree is of order 4, and this tree is called a b-tree of order 4. In practice, B-tree is applied to MongoDb index.

B + tree

A B+ tree is a variant of a B tree that is required by the file system. B+ tree features:

The middle node of an M subtree contains m elements (k-1 in a B tree). Each element does not hold data and is used only for indexing.
All leaf nodes contain the information of all keywords and Pointers to records containing these keywords, and the leaf nodes themselves are linked in the order of small and large according to the size of keywords. The leaf nodes of the B tree do not contain all the information to look for.
All non-terminal nodes can be regarded as index parts, which only contain the maximum (or minimum) keyword of the sub-root node. The non-final nodes of the B tree also contain valid information to look up. For example, the root node 8 in the figure below is the largest element in the left subtree and 15 is the largest element in the right subtree.

Compared with B tree, B+ tree has the following advantages:

B+ trees have lower disk read and write costs

The internal node of B+ tree does not have a pointer to the specific information of the keyword, so its internal node is smaller than that of B tree. If all the keywords of the same internal node are stored in the same disk block, the number of keywords that can be contained in the disk block is larger. Therefore, more keywords to be searched are read into the memory at a time. Relatively speaking, the number of IO reads and writes is reduced, and the search speed is faster.

B+ tree query efficiency is more stable

Because non-endpoints are not the nodes that ultimately point to the contents of the file, they are only the indexes of the keywords in the leaf nodes. So any keyword lookup in a B+ tree must take a path from root to leaf. The length of all keyword query paths is the same, resulting in the same query efficiency of each data. For b-trees, because each node stores specific data, the query speed may be faster, but it is not stable.

B+ trees facilitate range queries (most importantly, range lookups are the norm in databases)

While B tree improves IO performance, it does not solve the problem of low efficiency of element traversal. To solve this problem, B+ tree applications were created. B+ trees can traverse the entire tree simply by traversing the leaves. Range-based queries are so frequent in databases that MySQL’s Innodb engine uses B+ trees as its index data structure.

conclusion

B tree is to solve the large amount of data search problem and was born, in fact, the generalization of binary search tree. By storing more data on each node, the B-tree is flatter than the binary search tree, which reduces THE I/O read frequency and improves the search speed.

The biggest differences between A B+ tree and a B tree are that the non-leaf nodes no longer store specific data, and that the leaf nodes are linked list structures. Non-leaf nodes no longer store specific data, which makes the B+ tree flatter and lookup more efficient. Leaf nodes are linked list structures, which makes B+ trees more suitable for range lookup scenarios.

Learn here, our tree structure avenue basically finished school, to the overall review.

The resources

B tree _ Baidu Encyclopedia
B + tree _ Baidu Encyclopedia
B/B+ tree comparison.
B tree, B + tree details – Assassin MASON – Expo garden
Comics B tree B – tree _sinat_36118365 blog – CSDN blog
Comics: What is a B + tree? – zhihu
MySQL (Innodb) index principle – Lonely smoke – Blog garden

Tree structure series (3) : B tree, B+ tree

B tree

B + tree

conclusion

The resources

Related Posts

Nodejs preliminary

Dear lu! Only Log4j is known to print logs?

Python crawler combat, data capture and analysis of bra sales record data, found surprising secrets