This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

We often use joins to join multiple tables when querying multiple tables. In fact, the efficiency ratio of join is not good, so we should avoid using join as much as possible. The essence of join is that each table matches in a Loop. In fact, it is to improve the efficiency of join execution.

1. Simple nested-loop Join

Simple nested-loop Join (NLJ) algorithm reads one row at a time from the first table in the Loop, passing each row to a Nested Loop that matches data consistently. Select * from User u left join User_info info on U.id = info.user_id from UserInfo The pseudo-code logic should be

for(User u:Users){
    for(UserInfo info:UserInfos){
        if(u.id == info.userId){
            // Get the matching data}}}Copy the code

A simple and crude algorithm that retrieves one data item at a time from the User table, scans all records in User_info for matches, and returns the combined data.

If the driver table User has 10 entries, and the driven table UserInfo also has 10 entries, then the driver table User will be scanned 10 times, and the driven table will be scanned 10*10=100 times. Especially the driven table. Each scan actually reads data from the hard disk and loads it into the memory, namely, one IO. Currently, IO is the biggest bottleneck

2. Index nested-loop Join

Index nesting loops use indexes to reduce the number of scans to improve efficiency, so they require indexes on non-driven tables.

In the query, the driver table (User) will query according to the index of the associated field. When the corresponding value is found in the index, the query will be back to the table. If the associated field of User_info (user_id) is the primary key, the query is very efficient (the leaf node of the primary key index structure contains the full row data (InnoDB)). Each time an index is matched, a table-back query (based on the primary key ID of the secondary index (non-primary key index)) is performed. The performance of the query is definitely lower than that of the primary key query.

Mysql > select * from B+ tree; mysql > select * from B+ tree; mysql > select * from B+ tree

Nested-loop Join(caches)

If the join column does not have an index, the driven table will be scanned for too many times. Each time the drive table is accessed, the records in the table will be loaded into memory, and then one entry from the drive table will be matched with it. After the matching, the memory will be cleared. Then load a record from the driver table and then load the records of the driven table into memory matching, so that the cycle, greatly increasing the number of IO. In order to reduce the I/O times of the driven table, Block nested-loop Join is used.

Instead of obtaining the data of the driver table one by one, the join buffer buffer is introduced to cache some data columns related to the driver table join (size is the limit of the join buffer) into the Join buffer, and then scan the driven table in the whole table. Each record of the driven table is matched (in-memory operation) with all the driven table records in the Join buffer at once, combining multiple comparisons in a simple nested loop to reduce the frequency of access to the non-driven table.

By default, join_BUFFer_size = 256K. When querying, the join buffer will cache all columns, not just join columns. In an SQL with N join associations, n-1 join buffers are allocated. Therefore, when querying, try to reduce unnecessary fields, so that the join buffer can store more columns.

Show variables like ‘%join_buffer%’ This value can be changed as required.

If Block nested_join algorithm is used, the block_nested_loop of optimizer management configuration of optimizer_switch is on, which is enabled by default. To check the block_nested_loop status, run show variables like ‘%optimizer_switch%’.

The above three algorithms can be understood. In fact, in practical work, as long as we can use indexes well, we should pay attention to whether the associated fields are indexed, or we should be good at using indexes to provide query efficiency.

If you have any questions, please communicate and learn from each other.