First released on wechat public account: [Code Nong in Singapore], welcome to follow

background

How many rows can be stored in a B+ tree? This is an interesting interview question. Maybe you would guess 10 million, 20 million, or billions of numbers?

When you finish reading this article, you will have a good idea. Most importantly, this article will give you a deeper understanding of all aspects of InnoDB’s B+ tree index.

After reading this article, you can answer the following interview questions about InnoDB B+ trees simultaneously:

  • How much data can MySQL InnoDB store in a B+ tree?
  • How many branches can a MySQL InnoDB B+ tree have per non-leaf node?
  • Why MySQL InnoDB uses B+ tree instead of B tree /Hash?
  • MySQL InnoDB is recommended to use auto-increment ID as primary key.

InnoDB

The index type

As we all know, MySQL has many storage engines, such as InnoDB, MyISAM, Memory, Merge, Archive, CSV, Blackhole, etc. One of the most commonly used storage engine is InnoDB, so learning InnoDB well is also the core of understanding MySQL.

InnoDB supports multiple indexes:

  • B+ tree index – Traditionally, a B+ tree index cannot find specific row data by key value. A B+ tree index can only find the page where the row data is locked, and then read the page into memory to find the row data in memory. The B+ tree index is also the most common and frequently used index.
  • Full-text index – full-text index is a special index, generally based on inverted index to achieve, can be solvedlike %xxx%Statement, index will be invalidated.
  • Hash index – created by the database itself according to your usage, there is no human intervention, so called adaptive hash index, using hash table data structure. So it’s very fast for dictionary type queries, but not for range queries.

The key to understanding InnoDB’s performance is its B+ tree index.

File structure

First we need to understand some basic concepts.

InnoDB storage engine minimum storage unit Page (Page), the default size of a Page is 16K, we can customize the size by parameter (modify source code recompile).

In InnoDB, a node in the B+ tree is one page, i.e. 16K. One page is set because for most businesses, one page is sufficient. However, if a row in a table is longer than 16K, then a row overflow occurs, and the overflow is stored in a separate page called an uncompresse blob page.

If the database is only stored this way, finding the data becomes a problem because we don’t know which page the data is on and we can’t walk through all the pages. That would be too slow. So they came up with a way to organize the data as a B+ tree.

B + tree

We first sort the data records by primary key and store them in separate pages. In addition to data pages that hold data, there are pages that hold key + pointer. The non-leaf node in the figure stores keys and Pointers to the data page. Such a page is composed of multiple keys and Pointers. And of course it’s also sorted. This form of data organization is called an index organization table. Now, if you want to find a piece of data, how do you do that?

Select * from TAB where id = 5;

So here ID is the primary key, and we look it up through this B+ tree, so first we find the root page, how do we know where the root page of a table is? In fact, the root page position of each table is fixed in the tablespace file, that is, the page with page number=3 (after finding the root page, use binary search method to locate the data page with ID =5, also use binary search method to find the record with ID =5.

All the data are in the leaf node, and each leaf node has a pointer to the next node, forming an ordered linked list. Why is it ordered? It’s actually for range queries.

Select * from TAB where id > 1 and id < 100;

When 1 is found, all data nodes can be accessed at one time by traversing the sequence of nodes and Pointers, which greatly improves the query efficiency of the range. Hash indexes are powerless against such range queries.

To sum up:

  1. The smallest storage unit of the InnoDB storage engine is pages, which are stored in leaf nodes in a B+ treedata, stored in non-leaf nodesKey + pointer.
  2. The index organization table determines which page the data is in through the binary search method of non-leaf nodes and Pointers, and then finds the required data in the data page.

Here are the advantages of B+ trees:

  • A single node stores more elements and reduces IO
  • All queries must find the leaf node, the query is stable
  • All leaf nodes form an ordered linked list for convenient range query

In general, the height of the DATABASE’S B+ tree is usually between 2 and 4 layers, which means that it takes 1 to 3 logical IO at most to find a row record of a certain key value, which is very fast. (The root node is resident in memory, so a three-level tree query requires only two disk I/OS.) Of course, as you can see from the figure above, the number of I/OS for range lookups depends on the number of range lookups.

Storage calculation

The size of the primary key ID and the size of the row are assumed: we assume that the primary key ID is bigint, 8 bytes. The size of a row is about 1K. So we can store 16 rows of this data on a page. Data page 16K is a data page that contains file header/page header/page tail structures. So these are just estimates.

What about non-leaf nodes? In fact, this is also easy to calculate, we assume that the primary key ID is bigint and the length is 8 bytes, and the pointer size is set to 6 bytes in the InnoDB source code, making a total of 14 bytes. How many such units we can store in a page is actually the number of Pointers, i.e. 16384/14=1170.

So we have at most 1170 children per non-leaf node.

Then a B+ tree of 2 height can hold 1170*16=18720 such data records.

By the same principle we can calculate that a B+ tree of height 3 can hold: 1170*1170*16= 2,190,2400 (21 million) such records.

How about four layers: 1170*1170*1170*16= 25.6 billion. Most InnoDB B+ trees have 3 or 4 levels, and 3 levels provide better performance.

If the primary key is 4 bytes, or if there is less data in a row, then the same number of rows can be stored more.

Now we know that the maximum amount of data you can store is not fixed. Generally speaking, in order to maintain three layers of B+ number of layers, it is about ten million levels of data.

The purpose of the calculation here is to let you have a deeper understanding of InnoDB B+ tree knowledge, do their own know. The point of this interview question is not to get a specific number, but to analyze the question itself. That way you know how big your table is and then you can start dividing tables.

Related interview questions

Now that we know the nature of InnoDB B+ trees, we know the answers to the following common interview questions.

Number of branches per non-leaf node

From the above we actually get the answer:

  • If our primary key is of type Bigint (8 bytes),16384 / (8 + 6) = 1170.
  • If our primary key is of type int (4 bytes), it is16348 / (4 + 6) = 1634.

If you use strings that take up more space, such as UUID, the number will be even smaller.

Why use B+ trees instead of B trees or hashes

B tree One of the most important differences between A B tree and a B+ tree is that a B+ tree has only leaf nodes for storing data, while the rest of the nodes are used for indexing, whereas a B tree has each index node for storing data. Storing data means less space for indexing, fewer children per node, more layers and more disk I/OS to store the same amount of data. And range look-ups are not as convenient as B+ numbers being connected directly through a linked list.

Hash The Hash function is efficient. However, it can only be used for “=” and “IN” queries, not for range queries. And Hash cannot use indexed data to sort.

Why is it recommended to use auto-increment primary key

This is also a common interview question.

It is recommended to use auto-increment ID as the primary key. As we know from above, InnoDB’s minimum storage unit is pages. When a data page is full, MySQL will request a new data page to store data.

  • If the primary key is an auto-increment ID, insert the next page only when the page is full.
  • If the primary key is not an auto-increment ID, in order to maintain the order of the B+ tree, frequent page splitting and page rotation will result in slow insertion speed.

So the primary key of a clustered index should be a continuously growing value, not a random value (no random string or UUID).

For the primary key of InnoDB, use integer as much as possible, and increase integer. This is very efficient in storage/query.

Use smaller primary keys To meet service requirements, use smaller primary keys.

  • The larger the space occupied by primary keys, the smaller the number of primary keys stored on each page, and the longer the depth of the B+ tree, resulting in more I/O times.
  • The leaf node of the normal index stores the value of the primary key ID. If the primary key ID occupies a large space, it will multiply the MySQL space.

< Full text >

How many rows can a B+ tree hold? (WX Official Account)

Welcome to follow my wechat official account: [Miannong in Singapore], have more good technology to share.