Reading Instructions:

Every technology or feature is created to solve a problem. Knowing why is more important than knowing how. So I always start with a question, and I encourage you to think about whether you know the answers to these questions, and then compare the answers I’ve given, and if you don’t agree, think about who has the wrong answer, and prove it. This ability to improve myself by solving one problem after another is the value THAT I hope to bring to you from one article to another.

I will focus on one or two points per article and try not to spread out too much. Of course, technical articles cannot avoid a knowledge point implicated in another knowledge point. When you don’t know something, I suggest you know that there is such a thing and concentrate on the current problem rather than spreading it too far. Of course, I will fill the pit according to the feedback.

The problem

  1. What is Index nested-loop Join?
  2. What is Block nested-loop Join?
  3. What does join buffer do?
  4. Should join be used at work?


The preparatory work

CREATE TABLE `school` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `student` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `school_id` int(11) DEFAULT NULL,
  `name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_school` (`school_id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

delimiter;;
CREATE PROCEDURE idata ( ) BEGIN
    DECLARE
        i INT;
    
    SET i = 1;
    WHILE
            ( i <= 1000 ) DO
            INSERT INTO school
        VALUES
            ( i, 'school' );
        SET i = i + 1;
        
    END WHILE;
    
END;;

delimiter;
CALL idata ( );


drop procedure idata;
delimiter;;
CREATE PROCEDURE idata ( ) BEGIN
    DECLARE
        i INT;
    
    SET i = 1;
    WHILE
            ( i <= 1000 ) DO
            INSERT INTO student
        VALUES
            ( i, i%2+1, 'stu');
        SET i = i + 1;
        
    END WHILE;
    
END;;

delimiter;
CALL idata ( );Copy the code

There are now two tables, a Shool table and a student table. Insert 1000 pieces of data each

Index Nexted-Loop Join

EXPLAIN select * from student STRAIGHT_JOIN school on school.id = student.school_idCopy the code

STRAIGHT_JOIN was used to prevent the Mysql optimizer from automatically optimizing. The optimization logic is not explained here, but will be discussed later.

Execution Result:

Student = driven; school = driven;

Execution process:

  1. First, I did a full table scan on student, where I scanned 1000 rows.
  2. Select ‘school’ from ‘school’ where ‘school_ID’ = ‘school_id’;Find matching data, into the result set. Because the school ID is the primary key, using the primary key index can quickly locate that row.
  3. At the end of the loop, output the result set to the client.

What is the difference between Left join and Right join? The difference is that in the second step above, the inner join needs to find the matching data to be placed in the result set, while the outer join needs to add the row that drives the table to the result set regardless of whether the data is found.

Index nested-loop Join is a method that loops through the driven table and uses indexes in the query process, regardless of whether it is an inner Join or an outer Join. The nested-loop is a nested-loop.



Block Nested-Loop Join

What if the search field on the driven table has no index?

EXPLAIN select * from student STRAIGHT_JOIN  school on school.name = student.nameCopy the code

(Although the actual scenario does not use the school name field to associate the name of a student, this is only the case if the school name field without index is used as the association condition)



You can see that the school cannot use the index, so it must perform a full table scan.

So we guess the execution process is:

  1. Scan the student table and fetch 1000 rows.
  2. Cycle through the 1000 rows of data taken out, and search each row in the school table. Since there is no index in the name field, it is necessary to scan the whole table of school to find the corresponding data. Find the data and put it in the result set.
  3. Output the result set.

During this process we can see that the number of rows to scan is 1000 * 1000. If the number of schools or students is larger, 10,000 schools, millions of students, then the efficiency of this query can be expected to be very low.

So Mysql optimizes this type of join where the driven table cannot use the index, not the process we guessed above.


join buffer

First, mysql introduced the concept of join buffer. The size of the join buffer can be configured using the startup parameter or the system variable join_BUFFer_SIZE. The default size is 256K and the minimum size can be set to 128 bytes.

So the real flow is:

  1. scanningThe driver tableStudent, insert 1000 rows into the join buffer.
  2. Scan the driven table school, extract one row and compare it with the data in JOIN_buffer. If join conditions are met, add it to the result set. After comparing a row, remove the next row until the full table scan is complete.

Therefore, the above situation is that the data in school is extracted line by line and compared with the data in join_buffer to find the result that meets the conditions. So the number of rows to scan is 1000 of school, and each row is compared with 1000 of join_buffer.

The number of rows that need to be scanned in this case is 1000 rows of student, which is used to put into join_buffer, and 1000 rows of school. That’s 2000 rows scanned. The number of judgments in a single memory is 1000 * 1000;

It can be seen that by introducing join buffer, disk scan becomes memory judgment, thus improving efficiency


What if the driver table is too large to fit into the join buffer? For example, if student has 100 million rows, the join buffer can only hold 1000 rows at a time.

The process then becomes:

1. Scan student until the join buffer is full.

2. Scan the school and take out a row and compare it with the data in JOIN_buffer. If the join conditions are met, add it to the result set.

  1. After school scanning is complete, clear the Join buffer.
  2. Continue scanning the rest of student, repeating step 2,3. Until all students have been scanned.

Block nested-loop join is the way to read Block by Block into the join buffer and compare it.

Optimize the way

It can be seen that if Block nested-loop join is used during join, its execution efficiency is related to the size of join buffer. The larger the join buffer is, the smaller the number of blocks is, and the higher the efficiency is.

Of course, making full use of the join Buffer space is another way. The data put into the join buffer space are the query and search fields that drive the table. Therefore, avoiding the use of * and clearly listing the fields that need to be queried also helps to utilize the join buffer space.

Of course, the optimal optimization method is to add an Index to change Block nested-loop Join to Index nested-loop Join.

Indexes are not added randomly. If the field is rarely searched in a business scenario, adding an index for a particular join is not worth the loss, and it needs to be considered for the specific business.


conclusion

The query modes can be divided into Index nested-loop join and Block nested-loop join according to whether the driven table can use indexes during the join process

  • Index Nested-Loop Join High efficiency, can be used in business.
  • Block Nested-Loop JoinAlthough mysql introduces join buffer for optimization, the efficiency is still low and should be avoided as far as possible.

ps: In my opinion, even though Index nested-loop join is used in the join process of multiple tables in actual business scenarios, each join is a process of Cartesian product. If the data amount of one or more tables increases with the development of business, Will lead to the whole SQL query efficiency is not very ideal. Therefore, avoid multi-table JOIN scenarios as much as possible. A maximum of 2 to 3 tables can be joined.


Please pay attention to my personal public number