Why do database index data structures use B+ trees instead of XXX?

December 23, 2023

by Kirk Martin

No Comments

The article directories

A, why database index can not use binary search tree?
Why are red-black trees not suitable for database indexing?
Why not use Hash data structure as index data structure?
Why not use b-tree
Why can use B+ tree

This question is actually very interesting, I wrote in the last article: 1, why database index can not use binary sort tree; 2, why database index can not use red black tree;

This article added: 1, why can’t use hash table; 2. Why not use b-tree? 3, why can use B+ tree.

A, why database index can not use binary search tree?

According to the above demonstration, it is also possible to look at binary search tree, it is also quite fast. But why isn’t it appropriate to use it at the bottom of a database? This is a common interview question.

We can demonstrate: www.cs.usfca.edu/~galles/vis…

Let’s suppose we index Col1 and then insert the binary search tree in order: 1, 2, 3, 4, 5, 6, 7;

And you can see it degenerates into a linked list.

When we query 7, the time complexity becomes a singly linked list.

From big to small:

The summary is as follows:

If the bottom of the database uses binary search tree, it will degenerate into a single linked list when the data is extreme, so it is not appropriate;

Imagine if we were to use the index data structure of a binary search tree for the increment column. That’s the extreme. It’s all on one side.

Why are red-black trees not suitable for database indexing?

A red-black tree is also called a binary balanced tree

Red-black trees should be familiar to Java developers as the underlying data structure in HashMap in JDK8 uses red-black trees.

Red-black trees are used in the JDK, so why not index data structures in databases?

So again, let’s say we add Col1 to the index of the red-black tree.

The process is dynamically demonstrated as follows:

If we execute:

select * from table1 where Col1 = 7;
Copy the code

Dynamic demonstration is as follows:

As you can see, it took us four queries to find it. There is still a larger effect than before adding this index, at least not all scans.

Conclusion:

As you can see, almost every insertion adjusts the binary tree, keeping the height balanced. If the data volume is very large, it is also very time consuming, so red black trees are not suitable.

Why not use Hash data structure as index data structure?

By the time you click on this article, you’ll be familiar with Hash tables.

Hash tables have the following characteristics:

1. The position of data insertion is determined by the hash value, and the order is out of order;
2. Insert quickly;
3, the search is also fast;

Let’s insert a set of data into a hash table:

100,13,101,14,103,109
Copy the code

We use www.cs.usfca.edu/~galles/vis… To dynamically simulate Hash tables;

To show why the Hash table doesn’t work with the database, we insert the prepared data sequentially:

Dynamic demonstration is as follows:

The results are as follows:

1.

We often use SQL in databases to query a range of data for example:

select * from t where id < 15;
Copy the code

We know that hash tables are unordered, so just by virtue of that, it’s hard.

I should know, hash table is definitely not ok.

2,

As you can see from the dynamic demonstration of inserting data, the hashes of 100 and 13 are both 13.

Then it moves backwards (which is also a way for hash tables to resolve conflicts).

For example, we insert 100 first, then 13;

If we wanted to find 13, it would be slower.

Two numbers might not work. 10,000? 100000? How about a million numbers? As you can imagine, this is equivalent to a full table scan.

So, hash tables, in general, don’t work.

Why not use b-tree

B-Tree B-Tree B-Tree

Let’s continue the simulation: www.cs.usfca.edu/~galles/vis…

Insert 1-10 after 10 numbers:

B trees do solve the problem of the scope of the hash table we talked about above.

We execute the following SQL:

select * from t where id > 5;
Copy the code

(1) First find 5 search path: 4 – >6 – >5;

(2) then go back one level and find 6 (3) then find 6 (4)…

You can see that there is a loop, and as the amount of data grows, there is more loop, and more time is wasted.

Why can use B+ tree

We use this to simulate: www.cs.usfca.edu/~galles/vis…

Construct a B+ tree of numbers 1-10;

Let’s start with this tree:

It is divided into two parts, leaf node and non-leaf node.

Leaf nodes are implemented by linked lists, where all inserted data is linked together.

Non-leaf nodes only store keys;

Leaf nodes store both key and value.

Key: 0-10.

Value: indicates the address of the number from 0 to 10.

The problem of roundabout search in B tree is solved. Search efficiency is also improved overall.

Such as:

select * from t where id > 5;
Copy the code

See below:

As you can see, non-leaf node 5 is searched, then 7, then 6, and finally leaf node 5.

Once you find it, you can take it out in sequence, so you don’t have to go back to the next level.