B tree and B plus tree


B tree, B+ tree detail


A B+ tree is a storage structure that is often indexed in a database.

Prepare knowledge

A tree of order m: the maximum number of tree bifurcations is m, that is, the maximum number of child nodes is m;

Root node: A node that has no parent;

A node that has no children;

Internal node: A node that is not a root or leaf node;

Binary search tree: the value in the left subtree is smaller than the value of the root node, and the value in the right subtree is larger than the value of the root node;

Balanced binary tree: a special case of binary search tree, left and right subtrees are the same height;

A and B tree

A number B is a balanced multitree, a B-tree of order M, with the following properties:

  1. The internal node must have at least Ceil (m/2) child nodes;
  2. If the root node is not a leaf node, there must be at least two child nodes, that is, order 2;
  3. A node of order m contains m-1 data;
  4. All leaf nodes are highly consistent;

Second, the B + tree

B+ tree is a variant of B tree, with some changes to its rules. The changes are as follows:

  1. A leaf node consists of an ordered array and a pointer to a leaf node to its right;
  2. Non-leaf nodes consist of an ordered array, but array elements consist of an index value and a pointer;

    1. Pointer: Pointers to a leaf node;
    2. Index value: The smallest index value in the leaf node that points to;
  3. Non-leaf nodes, which are tool nodes, are used to quickly find the specified leaf nodes, and only the leaf nodes store the actual data (a row of data);
  4. The leaves are like an ordered linked list;
  5. A node of order m contains m data;

2.1 Why are B+ trees suitable for databases?

  1. B+ trees facilitate range queries, which is the most important thing.

You just look for the leftmost range, and when you get there, you iterate over the leaf to the right until you hit the right range, and then you screen out all the ranges.

B tree range lookup uses middle order traversal, while B+ tree uses linked list traversal;

  1. B+ trees cost less to read and write from disk.

The internal nodes of the B+ tree do not have Pointers to keyword specific information. So its internal nodes are smaller than the B tree. If all the keys of the same internal node are stored in the same disk block, the number of keywords that the disk block can contain is also greater. The more keywords that need to be looked up are read into memory at once. Relatively, the number of IO reads and writes is reduced;

  1. B+ tree query efficiency is more stable

Because non-endpoints are not the nodes that ultimately point to the contents of the file, they are just the indexes of the keywords in the leaf nodes. So any keyword search must go from the root to the leaf. The path length of all keyword queries is the same, resulting in the same query efficiency for each data.