A few days ago a friend invited me to answer a question: why does MongoDB (index) use B- trees while Mysql uses B+ trees? I think that’s a very good question, and there’s no better way to learn about data structures from a practical application point of view. Because large, time-tested software like Mysql and MongoDB are highly designed, why do they choose these data structures? 🙂

This paper introduces and analyzes B- tree and B+ tree from the perspective of practical application.


B – tree

Definition: B-tree is a class of trees, including B-tree, B+ tree, B* tree, etc., is a self-balanced search tree, it is similar to the ordinary balanced binary tree, the difference is that B-tree allows each node to have more child nodes. B-trees are designed specifically for external storage, such as disks, and are generally used in file systems and databases because they are good at reading and writing large chunks of data.

The definition just needs to know that b-trees allow more children per node. The number of child nodes is typically in the thousands, depending on the nature of the external memory.

Let’s take a look at why data structures like B-trees exist.

There are many traditional balanced binary trees used for search, such as AVL tree, red black tree and so on. These trees are great for query performance in general, but they are useless when the data is very large. Cause When a large amount of data is generated, the memory is insufficient. Most of the data can only be stored on disks. Only the required data can be loaded into the memory. Typically, memory access takes about 50 ns, while disk access takes about 10 ms. The speed difference was nearly five orders of magnitude, and the disk read time far exceeded the time it took to compare the data in memory. This means that the program blocks disk IO most of the time. So how do we improve program performance? Reduce disk I/O times, balanced binary trees such as AVL trees, red and black trees are not designed to “cater” to disk. About the disk can refer to the storage model in the computer (four) disk

The figure above shows a simple balanced binary tree. The balanced binary tree is balanced by rotation, and rotation is an operation on the whole tree, which cannot be completed if part of the tree is loaded into memory. Secondly, the height of the balanced binary tree is relatively large log n (base 2), so that logically close nodes may actually be too far away to make good use of disk prefetch (locality principle), so the selection of such balanced binary tree on the database and file system is passed.

The principle of spatial locality: if a location in a memory is accessed, then locations near it are also accessed.

Let’s look at b-tree design from the point of view of “catering” to disks.

The efficiency of an index depends on the disk I/O count. A quick index can effectively reduce the disk I/O count. The principle of indexing is to constantly narrow the search scope, just as we usually use a dictionary to look up words, first find the first letter narrow, then the second letter and so on. A balanced binary tree splits the range into two intervals at a time. To be faster, the B-tree splits the range into multiple intervals at a time. The more intervals, the faster and more accurate the positioning data. So if the nodes are interval ranges, each node is larger. Therefore, when creating a node, apply for a page size space (disks are divided into blocks, usually 512 bytes). Disk IO reads several blocks at a time, called a page, depending on the operating system, typically 4 K, 8 K, or 16 K). Computer memory allocation is page aligned, so that a node only needs one IO.

The figure above shows a simplified B-tree. The benefits of multi-fork are very obvious, and the height of b-tree is effectively reduced by a large log N base, which is related to the number of child nodes of a node. Generally, the height of a B-tree is about 3 layers. As the number of layers is low, the scope of each node area is more accurate and the scope is reduced faster. It says that a node needs to do one IO, so the total number of IO is reduced to log n. Each node of a B-tree is n ordered sequences (A1, A2, A3… An), and the sub-nodes of this node are divided into n+1 intervals for indexing (X1< A1, A2 < X2 < A3,… , an+1 < Xn < anXn+1 > an).


b-tree

The figure above is a B-tree. Each node of the B-tree has D ~ 2D keys. The factor 2 indicates the rules of splitting and merging of the tree, which maintain the balance of the B-tree.

Insertion and deletion of b-trees are not covered in detail, but there are many sources describing this process. In a normal balanced binary tree, rotation is performed if the conditions are not met after insertion and deletion, while in a B-tree, splitting and merging are performed if the conditions are not met after insertion and deletion.

A brief description of split and merge operations.

Split: If a node has 2D keys and a 2d+1 key is added, each node in the B-tree does not comply with the preceding rules. Each node has D to 2D keys. If the number of keys is larger than 2D, the node is split into two NODES with D keys and the median key is returned to the parent node. Merge: If a node has D keys and one key is deleted, the number of keys becomes D-1. If each node in the B-tree does not comply with the preceding rules and has D to 2D keys less than D, the node is merged. If the merging conditions are met, the merging is complete.

B- tree lookup

Let’s look at the b-tree search. Suppose each node has n keys, divided into n+1 intervals. Note that each key follows the data field, indicating that the KEY and data of the B-tree are aggregated together. In general, the root nodes are in memory. B-tree takes each node as a disk IO. For example, in the figure above, if the data with key 25 is searched, binary search is performed on the root node first (because keys are ordered, binary search is the fastest), and key 25 is judged to be less than key 50. So locate the leftmost node, perform disk I/O, read the node from disk into memory, and continue the process until the key is found.

Find pseudocode

Data* BTreeSearch(Root *node, Key key) { Data* data; if(root == NULL) return NULL; data = BinarySearch(node); if(data->key == key) { return data; }else{ node = ReadDisk(data->next); BTreeSearch(node, key); }}Copy the code

B + tree

A B+ tree is a variant of the B- tree, which differs from the B- tree in that:

  • In a B+ tree, copies of keys are stored on internal nodes, and real keys and data are stored on leaf nodes.
  • The node pointer field with n keys is n instead of n+1.

A B+ tree is shown below:

Since the inner node does not store data, the size of leaf nodes and inner nodes in B+ trees are generally different, while the size of each node in B- trees is generally the same, which is one page.

To increase interval access, B+ trees are generally optimized. The B+ tree with sequential access is shown below.


B- tree and B+ tree

1. Nodes in the B+ tree do not store data, and all data is stored on leaf nodes. As a result, the query time complexity is fixed at log N. On the other hand, the b-tree query time complexity is not fixed and depends on the position of the key in the tree. The best value is O(1).

Query data whose key is 50 in the B-tree or B+ tree, as shown below.

b-tree

As can be seen from the figure above, the node with key 50 is in the first layer, and the B-tree only needs one disk IO to complete the search. So the best time complexity for a b-tree query is O(1).


B + tree

Since all data fields of the B+ tree are at the root node, the node whose query key is 50 must be indexed from the root node to the leaf node, and the time complexity is fixed as O(log n).


2. When the B+ leaf nodes are connected in pairs, the interval access can be greatly increased and the range query can be used, etc., while the key and data of each node in the B-tree cannot be searched in the interval.

According to the principle of spatial locality: if a location of a memory is accessed, then the location near it will also be accessed.

B+ tree can make good use of locality principle. If we access node key 50, nodes with key 55, 60, and 62 May also be accessed in the future. We can use disk prefetch principle to read these data into memory in advance, reducing the number of disk I/O. Of course, B+ trees are also good for range queries. For example, query nodes whose key values are between 50 and 70.


3.B+ trees are better for external storage. Since the internal nodes have no data field, each node can index a larger and more accurate range

This makes sense because each key inside a B-tree node has a data field, whereas a B+ tree node stores only a copy of the key. Real keys and data fields are stored in leaf nodes. Disk I/O reads several blocks at a time, depending on the operating system. Therefore, the size of disk I/O data is fixed. In an I/O, the smaller the size of a single element, the larger the number of blocks. This means that the B+ tree has more disk I/O information than the B- tree. In this sense, the B+ tree has fewer DISK I/O times than the B- tree.

As can be seen from the figure above, the B-tree has only 2 keys in the same area, while the B+ tree has 3 keys.


Why MongoDB index select B- tree, Mysql index select B+ tree

With that in mind, let’s take a look at why MongoDB indexes are b-trees and Mysql (InooDB engine) indexes are B-+ trees.

Mysql everyone should be more familiar with the traditional relational database, the next introduction to MongoDB.

Take a look at the definition of MongoDB on Wikipedia:

MongoDB (from humongous) is a cross-platform document-oriented database. Classified as a NoSQL database, MongoDB eschews the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON)

MongoDB is a document-type database, noSQL, that stores data in jSON-like formats.

Document database and our common relational database is different, generally use XML or Json format to store data, belonging to the aggregation database.

The key value database also belongs to the aggregate database, the students who are familiar with Redis should have a good understanding.

Here’s an example:

Join us to establish an e-commerce website, similar to Taobao to sell goods to users, then must store user information, catalog, order, delivery address, billing address, payment method, etc.

Take a look at how traditional relational databases are stored:

Aggregated database storage model:

It is expressed in jSON-like format as follows:

//Customer
{
        "id":1,
        "name":Tom,
        "billingAddress":[{"city":"China"}]
}

//Orders
{
        "id":99,
        "orderItem":[
                "productId"27,
                "price":100,
                "productName":book
         ],
         "shippingAddress":[{"city":"china"}],
         "orderPayment":[
            ...
        ]   
}Copy the code

Compared with Mysql relational databases, noSQL such as MongoDB is suitable for situations with simple data model and high performance requirements


Why does MongoDB use B-trees

MongoDB is a type of NOSQL that is also stored on disk and is designed for applications with simple data models and high performance requirements. High performance requirements, look at the difference between B/B+ trees first point:

Nodes in the B+ tree do not store data, and all data is stored on leaf nodes. As a result, the query time complexity is fixed at log N. However, the time complexity of b-tree query is not fixed and depends on the position of key in the tree. It is better to set it to O(1).

As we said, having as little disk IO as possible is an effective way to improve performance. MongoDB is an aggregated database, and the B-tree happens to aggregate the key and data fields.


Why does Mysql use B+ trees

Mysql is a relational database, interval access is a common situation, and B-tree does not support interval access (see the figure above), and B+ tree because all data is stored in leaf nodes and connected by Pointers, so it is easy to perform interval traversal or even full traversal. See the difference between B/B+ trees and the second point:

When the B+ leaf nodes are connected in pairs, the interval access can be greatly increased and the range query can be used, etc., while the key and data of each node in the B-tree cannot be searched in the interval.

Secondly, the query efficiency of B+ tree is more stable. All data are stored in leaf nodes, and the query time complexity is fixed at O(log N).

And finally, the third point:

B+ trees are better suited for external storage. Since the internal nodes have no data field, each node can index a larger and more accurate range


At the end of this article, if there are mistakes, but also hope to correct 🙂

— From XiyouLinux Group By WWH