In this article, we will talk about the basic structure of InnoDB in mysql from the bottom, from row to page, and then to index. After reading this article, we will have a certain understanding of index, which is very helpful for us to write SQL and optimize slow SQL.

A line,

MySQL has a variety of row storage formats. After 5.7, the default is Dynamic, which is basically the same as the default compact row format. Let’s first look at the structure of the compact row formatThe actual data recorded is the table data we usually see through the SELECT statement, but each row of data in the table has additional storage information at the bottom, such as

Variable-length field list: The actual number of bytes used by variable-length fields is in reverse order. Because some columns may be of type VARCHAR, the computer can use the list of records to know how many bytes are needed for a given column. NULL value list: If a column is not NULL, the column is not NULL. If a column is NULL, the column is NULL. If a column is not NULL, the column is NULL. Instead of “WHERE column = NULL”, use “WHERE column is NULL” or “WHERE column is not NULL” record header: The record header used to describe the record is a fixed number of 5 bytes

The focus here is on the record header, as shown in the figureDelete_mask is used to record whether the row is deleted or not. When deleting data, mysql does not reclaim the storage space occupied by the deleted data and the index bit. Instead, leave it empty, set delete_mask to 1 to delete, and wait for new data to fill the gap. The downside of this is that if you don’t have data to fill the gap, it’s a waste of resources. If you know about tables that write frequently, you can optimize the memory space on a regular basis. If you know about computers and operating systems, you will know that large amounts of memory fragmentation can affect performance. mysql> optimize table ad_visit_history;

On page 2,

Simple to understand line, after the next page we knew each other from shallow to deep, if give you 100 article sorted data allows you to find out the id of 12 a data, so if we use an array of storage, we usually find out is to iterate through group, or by using the binary search method, so if we can through the data structure to improve the query speed. 1. Suppose there are 100 pieces of data, each ten pieces of data are divided into a group2. Each group is configured with a slot, and each slot records the address of the largest record in each groupAt this point, we can quickly find the record through the dichotomy:

Let’s say I have 10 sets of data, and I want to find the one whose ID is equal to 12, so I will: first calculate the middle slot (1+10)/2=5, and find the fifth group by the address. At this time, the fifth group ID is 50, 12 is less than 50, so continue to double. (1+5)/2=3; (1+5)/2=3; (1+5)/2=3; (1+5)/2=3; (1+3)/2=2; (1+3)/2=2; (1+2)/2=1; (1+2)/2=1; (1+2)/2=1; (1+2)/2=1;

Summary: 1. First, find the slot where the data is located through dichotomy. 2, and then through the one-way list traversal to get the data.

It looks like the first group and the second group are separate, right? In fact, the addresses are connected, so by the last piece of data in the first set, you can go through the list in a one-way way, so there’s an address in the id 10 header that points to id 11. The above is just for you to follow the convenient understanding, in fact, the real storage structure is as follows:3. One thousand data is a boundary to divide data. We call the structure composed of one thousand data pages, and each page is connected with a two-way linked list4, pages, we also need to make a special page, page inside the storage is good, this special page is the directory, called it directory pageFor example, I’m looking for the piece of data with ID 500.We iterate through the directory page data to know that the data is on page 1, and then we start to find the data slot in the page by splitting the page, and then iterate through the group that the slot points to5. When the data volume continues to increase, the catalog page can be added as well as the data pageIf there are too many directory pages, it will be slow to find the directory page.The above graph is a tree structure, and I’m sure you can guess that this is a B+ tree structure, which in mysql is an index tree structure. Assume that each data page stores 100 user records, and each index page stores 1000 records (primary key ID and page number). Layer 1: Stores 100 user records. Layer 2:100 * 1000 = 10W layer 3:10010004 layers: 10010001000 * 1000 = 100 billion

Of course, we still need to understand the structure of the page: When we insert data, we usually set the primary key of the table to increment. Have you ever thought about why?

We can know that according to the index tree in front of the figure, the data line is carried out in accordance with the id index sorting, if we id not since, have a kind of extreme, when we insert a data, after the index data the insert page number is determined, and found that the page is full, when the page is divided the largest data to the next page, insert the data again, However, if every page after the next page is full, the data inserted will initiate a cascade reaction, resulting in the need for data operations on each page, resulting in extremely poor database write performance.

Three, index,

If we want to see how efficiently an SQL query is executed, we can add explain before the SQL, for example: Select * from table_1 select * from table_1 select * from table_1 select * from table_1

  • Primary key or unique index column const
  • Ordinary secondary index is equivalent to constant ref
  • Index column range query range
  • Scan index for all index pages

const

It means constant level, the cost is negligibleIf a primary key or a unique secondary index is composed of multiple columns, each column in the index must be an equivalent comparison to be const. This is because a unique record can be located only when all columns in the index are compared. Create an index with a PRIMARY KEY or a UNIQUE KEY.

ref

An access method that searches for equivalence comparisons between a secondary index column and a constant is called refAs shown in the figure, for a common secondary index, it is possible to match multiple consecutive records by equivalent comparison of the index column, rather than the primary key or unique secondary index, which can only match one record at most. However, when the number of matched records is small in the secondary index equivalent comparison, the efficiency is still high. You can create an index NORMAL KEY

range

If the secondary index + back table method is used, the range matching method using indexes is called: range. Note: Indexes used for range matching can be clustered or secondary indexes.

In > < >= <= between and range access

Note: The index can be used when the number of queried data entries accounts for less than one-fifth of the total number of entries, but when the number exceeds one-fifth, the full table scan is used.

index

Find the desired index column directly in the secondary index B + tree, Index_part1_part2_part3 (key_part1,key_part2,key_part3) USING BTREE SELECT key_part1,key_part2 key_part3 FROM single_table WHERE key_part2 = ‘abc’;

Key_part1, KEY_part2, key_part3 are fields in the joint index index_part1_part2_part3. Although key_part2 is not the leftmost column, it is in the secondary index B + tree. And the three columns displayed in select are located in the secondary index B+ tree, do not have to query back to the table, the speed is very fast.

Next, let’s focus on joint indexes KEY Joint index (The name.birthday.results) USING BTREE Next, let’s look at a set of SQLThe second SQL is not indexed. As we can see from the index tree, if we only look at the whole row of birthdays, the birthdays are not ordered. Only birthdays with the same name are ordered. But the third SQL can be queried using indexes. Why? Here we have a term ICP

4. Index optimization

1, the ICP

Icp is one of the bottom optimization methods of mysql, here is a pear, such as an SQL query in the figure below. If we do not use the joint index in the actual production, there will be the optimization of scheme one, but mysql will make full use of the advantages of the joint index to optimize the query efficiency of SQL, that is, scheme two.

2, the MRR

If we need to query a set of ID data, we usually use the query condition id in() in SQL. However, if our ids are too many and unordered, then mysql will repeatedly scan the query in the tree structure without optimization, which may be less efficient than full table scan. So mysql will have MRR optimization scheme.