B+ tree and why use B+ tree as mysql index

Mysql index uses B+ tree as the data structure, so you need to understand the principle and use of B+ tree

Balanced binary trees

The difference between a balanced binary tree and a normal binary tree is that the height difference between the left and right subtrees of any node in a binary tree cannot be greater than 1

Binary tree (tree structure query fast)-> balanced binary tree (solve the problem of binary tree extreme cases)->B tree (multi-node, the height is much smaller than the red black tree, etc.)->B+ tree (external linked list, increase range search function. Non-leaf nodes only store indexes but not data to save space.

Why we chose B+ trees

  1. Similar to the “directory” function, the tree only stores indexes, and the linked list below stores data
  2. Add bidirectional linked list, accord with the function of range search
  3. InorDB uses clustered indexes for primary keys so that physical space is also stored contiguous, making it easier to find pages

How are pages and mysql indexes implemented

The operating system obtains data from the disk by page, that is, one page at a time =4Kb. Mysql obtains data from the disk by page, but the default page is 16kb

Page composition:

  • User data area: Stored sequentially, forming a long linked list (long linked list queries can be slow, so page directories are needed)

  • Page table of contents: Simplifies query operations, where the index information is stored in the page (analogous to the upper three layers in the B+ tree above)

  • Header: contains front and back Pointers to memory addresses of other pages, using which all pages form a long list of bidirectional Pointers

To quickly locate the page of the data being queried, the long list of bidirectional Pointers uses a B+ tree to form an index

This is equivalent to a primary key index (clustered index)

Conclusion:

A non-leaf node of a B+ tree is a tree structure of all page data

The leaf nodes of THE B+ tree store the data information of each page, which are connected with each other using a bidirectional linked list

So the query operation here becomes

  1. First, go to the page directory and use binary search to find the page to be retrieved
  2. Retrieves the page from disk
  3. Find the group information for the data area assignment in the page directory
  4. Locate the corresponding data in the data area

Mysql index and index failure

The purpose of an index in storage is to serve as a table of contents composed of pages. Improve the speed of page retrieval (because pages exist on disk, if the full table scan one by one is convenient, it will require many IO operations. Hitting the page with the index only requires one IO operation.)

Primary key index:

  1. A tree structure maintained using B+ trees where leaf nodes are corresponding pages. A non-leaf node is a query directory of pages
  2. A page contains a page directory and a data area. The data in a page directory is divided into different areas for easy searching

Left-most prefix rule

If there is no left field, then there is no way to judge the direction in the B+ tree

That is, when you create a federated index, you actually create the index from left to back

For example,index(a,b,c) is used to create indexes such as index(a),index(a,b), and index(a,b,c). A query against only B and C will not hit the index

Clustered index and non-clustered index

The purpose of using non-clustered indexes:

  1. Reduce space usage, as backing up data for all pages is space-consuming if you create an index
  2. The update operation requires that all index data be modified

Index of the failure

Rules:

  1. The principle of the best left prefix for joint indexing
  2. Select * from t where left(name,4)=’July’ select * from t where left(name,4)=’July’
  3. Multiple returns to the table invalidate the index (overwrite index: the query field is the index result information, without the need to retrieve the table)
  4. Use on index! =,<>,is null,is not nullWill the failure
  5. Starting with a wildcard character like will not workSelect * from t where name like '% fish 'It will fail, butSelect * from t where name like '%'It doesn’t have to fail
  6. The index field is a string, but the string is invalidated without a ‘ ‘
  7. Index field usageorIt will fail. Recommended useunionModify the

String index

The characters are sorted according to the character rules of the string, which may be different for different countries

The problem

Interview questions

A brief introduction to the understanding of mysql indexes

Mysql index includes primary key index, unique index, normal index, union index and MyISAM engine’s own full text index, etc

InnoDB engine default index implementation data structure is B+ tree, in order to faster match the query data stored in the disk page, thus reducing the query time and disk IO times

To create an index, use create [index attribute] index name on table name

Index is a kind of system optimization with space for time

By default, a primary key index is generated after the table is created. If the primary key index is not specified, the system automatically obtains a unique index in the table and generates a primary key index. If the roWIDs are not specified, the system automatically virtualizes a column of ROwiDs as primary key indexes

Why innoDB uses B+ tree as index data structure

B+ tree data structure has the following characteristics to facilitate mysql search

  • Each node can have multiple values (joint index)
  • Leaf nodes contain all index data (result query)
  • Leaf nodes are pointed using bidirectional Pointers (for range lookup)

Index evolution is also based on requirements. For details, see the index evolution process

Let’s talk about the difference between clustered and non-clustered indexes

A cluster index is also called a primary key index, and its leaf nodes are the data of the entire table

The value of the leaf node of the non-clustered index is the key information of the clustered index. After using a non-clustered query, you need to perform a “back table” operation

What is the difference between a primary key index, a normal index, and a federated index

The answer can be distinguished by referring to the union index structure described above and the differences between primary and normal indexes described below

Why are primary keys recommended to use autoincrement instead of UUID

  • When inserting data, the primary key maintains a clustered index using a B+ tree implementation, which is much more efficient at adding data from the middle than from the end
  • Data stored in a page, if inserted from the middle of the data, resulting in subsequent data needs to “page feed”

Why does query efficiency plummet when a table is considered to have 25 million rows

  • According to the calculation of each row of data, a row of data is 1KB, plus the pointer directory, etc., basically we can find that the structure of B+ tree at the second layer is about 25 million rows of data
  • If there are more than 25 million rows, the index of the B+ tree needs to be increased to three levels

What role does an index play in a lookup?

  • Locating the page data that needs to be retrieved from disk requires a full table scan to retrieve data from disk multiple times
  • Fast location similar to binary search

The interview asks you when a federated index hits an index

  • Left-most prefix rule

  • Mysql also has its own validation optimizer, which means it will not index data if it determines that the cost of a walk index is greater than the cost of a full table scan

    A full table scan requires only one query, but if a non-clustered index requires many back-table operations, it will not be indexed

If no primary key index is created when the table is created

  1. Mysql checks for unique indexes and defines primary key indexes by default
  2. If there is no unique index, a hidden primary key index (rowId) is created

Common interview index hit questions

Table (a, b, c, d, e), the index (b, c, d), primary (a)

The query Whether to use indexes why
select * from t where b>1 no Because you need to perform multiple table operations
select a,b,c,d from t where b>1 is The query criteria can be obtained on non-clustered indexes without the need to return to the table
select b from t is Use an index, but an index scan.

Because indexes store small amounts of data, fewer pages are allocated,

Fewer I/O operations are performed on the database
select * from t order by b,c,d Can be No index (small amount of data): All tables are scanned and need to be sorted

When the amount of data is small, it is faster to sort directly in memory. If the amount of data is too high, the memory is insufficient

To constitute a sort operation requires some regular operations (such as comparing parts at a time).

Using indexes: Using index sorts without sorting requires a lot of back-table operations

What is an overwrite index

Note that this is not an index type, but an index operation

That is, the query result information can be directly obtained at the end of the index tree, and there is no need to go back to the table to obtain data. This query operation is called an overwrite index

Say index push down

Mysql5.6 added query optimization

In normal index query, if there are other query fields, the system uses the index field to match some data, and then uses the other fields for comparison. After the operation back to the table to obtain the entire table data return

For example,

Select * from t1 where age=10 select * from t1 where age=10 select * from t1 where age=10 select * from t1 where age=10 select * from t1 where age=10 select * from t1 where age=10 select * from t1 where age=10 Mysql > select * from mysql where age =10; mysql > select * from mysql where age =10; mysql > select * from mysql where age =10Copy the code

In other words,mysql has added the step of checking whether another parameter meets the condition after the value of the non-clustered index is obtained

A brief introduction to the use of Explain

This is a little bit big, I’ll add later, but you can get to know the author and Jack Sler goes into a lot of detail

Normal and unique indexes

Difference: The fields specified by a unique index are guaranteed to be unique

Performance analysis:

The query efficiency

After the index matches the first data, the common index will continue to search for the power saving data until it is not the corresponding parameter

Unique index, returns after the first hit, does not continue to query the data

However, innoDB queries by page data, so the cost of querying adjacent data is very small (not excluding the case that two adjacent data happens to be on different pages). So the difference is minimal

Insert and update efficiency

Unique index characteristics: Index fields are not allowed to duplicate

When data is in a memory page:

  • Plain index: Finds the location and inserts the data
  • Unique index: find the location, determine whether it is duplicate, insert data

When the data to be inserted is not in the memory page:

  • Normal index: Update records to change buffer(no IO operation designed)
  • Unique index: get data page from disk, determine whether it is duplicate, insert this value into disk (involving two IO operations)

Change buffer cannot be used for unique indexes. (Since it is necessary to check whether duplicate data is generated, the data on disk needs to be queried into the cache. This operation has already violated the original purpose of change buffer(more consumption), so change will not be used Buffer, but query out directly update)

String field creates index

Alter table User add index idx_index1(tel) alter table User add index idx_index1

This is often the case in business

Find a phone number to the first three characters is to move them roughly “134135136137, 138139147150151, 152157158159178182183184187188198170171, 165 “select * from User where tel like ‘134%’;

Select * from User where tel =’12345678912′;

Alter table User add index IDx_INDEx1 (tel(3)) alter table User add index IDx_index1 (tel(3)

However, if prefix indexes are used, you need to perform one more step than normal indexes in the query. After obtaining the index value, you need to check whether it is the whole result

Advantage of using a prefix index: you can reduce the memory footprint of the index. Instead of indexing the entire TEL, you can only use the first three characters of the index. Disadvantages are: when some query operations, more query operation behavior, increase the consumption of query

change buffer

When mysql updates data, if the updated data happens to be in memory, it will be updated directly. If the updated data is not in memory, it will be updated directly. The updated operations are first cached in the Change buffer.

The change buffer actually updates data in the following situations

  • The next time you access the data page, it will alsochange bufferIs executed to update the data being looked for
  • Automatically updates data to disk data pages periodically:mergeoperation

The function of the change buffer is to reduce disk I/O operations

Innodb_change_buffer_max_size Innodb_change_buffer_MAX_size specifies the percentage of buffer pool space that change buffer can use

Usage scenario: In a service scenario where many operations are written but few operations are read, disable the change Buffer if all update operations are followed by queries

So in a production environment,mysql has a memory hit ratio (the higher the hit ratio, the more efficient the operation used)

learning

Zhuge teacher said very dry (dry goods full): Zhuge teacher Bilibili

Index invalidation and optimization: zhang Baa

Give it a thumbs up if you think it’s good. Give it a thumbs up if you don’t