preface

Recently, I have been working on Mysql. My understanding of Mysql is limited to transaction and isolation levels. I don’t know more about it, so I am learning and making up for it. The materials I learned were mainly “Ribs to understand Mysql” this book and Dinqi teacher “Mysql45 talk”, from “ribs to understand Mysql” this book to see InnoDB data page structure felt difficult, so later mainly read “Mysql45 talk”, read with gusto. So here is a summary of my own.

wedge

Teacher Dinky’s first sentence is: learning technology should not be directly into the details, should first have a bird’s eye view of the whole picture, which can help to understand the problem from a higher dimension. I agree with that. Therefore, I think learning Mysql should also have a general framework, know which part belongs to which part, and have the knowledge vein in my mind. I think there are four parts of Mysql: underlying storage, deployment architecture, index type and data backup principle. Interview questions can be asked from these four parts. This article is also about index types.

What are the common index structures

An index is a data structure designed to improve the efficiency of data queries

The hash

A hash table is a structure that stores data in key-value format

The subscripts for where each value is stored are randomly calculated using a hash, and one of the problems you can expect with a hash is hash collisions. Because there could be multiple different values hashed to the same location. One way to solve hash conflict is the chained address method. When multiple key values are hashed to the same position through the conversion of hash function, a linked list is maintained in this position, and the data hashed out in this position are put in this linked list. It’s important to note that the elements in the list are not ordered, and because they’re not ordered, it’s fast to insert elements, but it’s slow to find them, to scan them all. So: Hash tables are only good for equivalence queries, but not for range queries

Orderly array

Ordered arrays perform well in both equivalent query and range query scenarios, because they are ordered, dichotomy can be used naturally, and the dichotomy time complexity is O(log(N)). While the query is efficient, updating the data is cumbersome, as is the case with ordered arrays, where to insert an array in the middle you have to move all the subsequent arrays. So ordered array indexes are only good for static storage engines, which store historical data that doesn’t change.

Search tree

We all know that the sequence traversal time in binary search tree is an ordered array, so the time complexity of query can reach O(log(N)). To maintain order log(N) query complexity, you need to keep this tree a balanced binary tree. To do that, the update time is order log(N), so it’s a balanced binary search tree. But this is only in memory, for falling disk persistent data, reading a block of data is very slow. In order for a query to read as little disk as possible, the query procedure must access as few blocks of data as possible, that is, the tree stores as much data as possible, so use an N-tree,

Index used by InnoDB

In the InnoDB:

(1) The leaf node of the primary key index stores the whole row of data. In InnoDB, primary key indexes are also called clustered indexes. (2) The contents of leaf nodes that are not primary key indexes are primary key values. In InnoDB, non-primary key indexes are also called secondary indexes and are also non-clustered indexes.

InnoDB creates a Rowid primary key by default. The key of a tree node is the primary key of a row, and the value is the other data of that row. Adding an index is equivalent to adding a B+ tree, so it means that there is at least one B+ tree in a table, and that tree is the primary B+ tree. The clustered index uses the primary key as the key, the leaf node stores the row record, the non-clustered index uses the defined index field as the key, and the leaf node stores the primary key value

So what’s the difference between a primary key index and a non-primary key index? As we know, the primary key index that B + tree is stored in a whole line of record, if we want to be part of one row, and that by clustering index of B + tree suddenly found, because the row is stored in the main B + tree, and by the clustering index of B + tree is need to find the corresponding primary key value, then return to the main B + tree to find the records, This process is called table-back. That is, queries based on non-primary key indexes need to scan one more index tree, which gives us an idea for SQL optimization: primary key queries should be used as much as possible.

Should I use the business logic field primary key?

As we know, the leaf node of the secondary index will store the value of the primary key. If a business logic field is very long and is used as the primary key, the leaf node of the secondary index will occupy many bytes and occupy large space. Therefore, it is not suitable to use the business logic field as the primary key. Therefore, the smaller the primary key length, the smaller the leaf node of the common index, and the smaller the space occupied by the common index

Some business scene demand is for only one index, and the index must be a unique index, aiming at this situation, the list will be only a B + tree, there is no other B + tree generated by the secondary indexes, also won’t have to consider other indexes of the size of the leaf nodes, so the business logic can be set as the primary key field, meet the use principle of primary key of the query.

SQL optimization means

Knowing the indexes, it’s natural to think about how to optimize SQL. In fact, in the end, SQL optimization is to try to avoid back to the table, reduce the number of back to the table. One way is to overwrite the index, the other is the left-most prefix rule.

Cover index

It is important to understand that an overwrite index is not the name of any index, but a principle to be followed by SQL optimization, a means of SQL performance optimization.

I’m going to try to explain what an overwrite index is as colloquially as I can understand itOverwrite index is to try to make SQL in the query, in the secondary index of the B+ tree can find the value you want, and do not have to go to the primary B+ tree table query.Guided by this principle, we can create a federated index to cover the fields we want to query.Many syndication indexes are created to support overwriting indexes, which can greatly improve efficiency for specific businesses. Do not use select * when writing SQL, because it is sure to return the table, not do overwrite index

Left-most prefix rule

When creating a federated index, follow the left-most prefix rule. When you create a combined index (a, B,c), the index entries are sorted in the order in which they appear in the index definition, so when you write SQL, you should also write in the order in which the fields are defined, in order to satisfy the left-most prefix and use the index to speed up the retrieval. For example, where a=1 and b=1 and c=1, not where a=1 and C =1 and b=1. Another note: the leftmost prefix can be the leftmost N field of a joint index, or the leftmost M characters of a string index. How do you arrange the fields in order to create a federated index? Index reuse first, then other factors such as space footprint. If your index order is sufficient and you do not need to maintain another index, that is good. If this factor does not affect the maintenance of another index in addition to the union index, consider choosing the index with a small footprint.

An index pushdown

Index push-down is an optimization introduced after Mysql5.6 to reduce the number of times a table is called back. In the index traversal process, you can judge the fields contained in the index first and filter out the records that do not meet the conditions to reduce the number of entries back to the table. How do you understand that? There is this statement:

select * from table where name like Chen '%' and age = 24 and height =170
Copy the code

Mysql > select * from table where id = ID1; mysql > select * from table where id = ID1; mysql > select * from table where id = ID1; According to ID2 to the primary key index tree back table query ID2 corresponding to the row records, ID3 is the same, a table back to the table three times. Age,(ID1,age=31),(ID2,age=21),(ID3,age=24); select * from MySQL5.6 where (ID1,age=31); Age =31 and age=21 are not satisfied, so I’m not going to go back to the table.

The resources

  • MySQL45 speak
  • Provide a good article: damn, why does the leftmost prefix principle fail?