“This is the 21st day of my participation in the Gwen Challenge in November. See details: The Last Gwen Challenge in 2021”

preface

In the last article, I shared the initial optimization of JOIN and the two corresponding algorithms Index nested-loop JOIN (NLJ) and Block nested-loop JOIN (BNL). NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ: NLJ

This article will explain the following points:

  1. What is a return table
  2. What is the MRR algorithm
  3. What is the BKA algorithm
  4. How is BNL optimized
  5. NLJ algorithm optimization

What is a return table

Here we need to understand a knowledge point, what is the back table? Here is a general explanation of how MySQL queries data against secondary indexes:

Select a from t and select a from t

select * from t where a = 50

  1. Primary key ID=500; secondary key ID=500; secondary key ID=500;
  2. Since the result we need in our SQL also contains field B, we need to continue with step 3;
  3. Traverse the B+ tree of the cluster index on the right to get the final data.

Select id,a,b from t where a > 10 and a< 100; select id,a,b from t where a > 10 and a< 100 There will be a lot of random access in the process of returning to the table, which will greatly affect performance.

So how do you solve this situation? Well, here’s MRR algorithm.

MRR algorithm recognition

MRR algorithm, which is called multi-range-read, is a new feature introduced in MySQL 5.6 to reduce random disk access and use sequential disk reads as much as possible.

In order to ensure that the disk is read in sequence, performance can be improved. In order to ensure that the disk is read in sequence, performance can be improved.

For example, the range query a at range (10,100) in the above example can be broken down into the following steps:

  1. Query all ids of the interval (10,100) according to secondary index A and put them into memory read_rnd_buffer;
  2. In memory, incrementing the id sort;
  3. Query the data in the primary key index and return the result set, in order.

Read_rnd_buffer: Specifies the memory size defined by the read_rnd_buffer_size variable. The default value is 256K

The above is the execution process of MRR. The advantage of MRR is that it can sort a large number of primary key ids for range lookup statements to ensure that primary key indexes are read and written in sequence, thus improving performance.

How do I enable the MRR

MRR can be turned on by setting parameters

set optimizer_switch="mrr_cost_based=off"
Copy the code

By executing the plan, you can see in Extra that we have turned on MRR.

NLJ algorithm optimization

MySQL 5.6 introduces Batched Key Access (BKA), which is an optimization of the NLJ algorithm. The NLJ algorithm is already known in the previous chapter, so you can refer to the previous article for details.

  • NLJ algorithm

NLJ algorithm is efficient, but it obtains results through single value matching, so can we pass multiple values to t2 table at the same time to query, now we have learned the ABOVE MRR, do you have an idea to optimize NLJ through MRR?

In fact, BKA is indeed based on MRR algorithm. Observe the following figure. When querying, the data extracted from the driver table is put into join_buffer.

How do I enable the BKA

BAK can be turned on by setting parameters. The first two parameters are used to set MRR, because BAK depends on MRR

set optimizer_switch='mrr=on,mrr_cost_based=off,batched_key_access=on';
Copy the code

BNL algorithm optimization

We learned about the BNL algorithm in the previous chapter, and we also know the disadvantages of the algorithm, which can be summarized as follows:

  1. The join process requires M*N comparison times (M and N are the number of rows of two tables), which is quite terrible for large tables.
  2. If the join statement is used to scan a cold table for several times and the execution time of this statement exceeds 1 second, the data page of the cold table will be moved to the head of the LRU linked list when the cold table is scanned again, thus causing the hot data of the Buffer Pool to be eliminated and affecting the memory hit ratio. This content will be explained in detail in the following chapters. Here is a brief understanding.

For the above problem, the simplest method is to create a new index on the driven table, but this method is not suitable for all cases, for example, in our example, the driven table has tens of thousands of data, and the SQL query is low frequency SQL, it is very wasteful to directly add the index.

Another way we can add a temporary table is as follows:

  1. Create temporary table temp;
  2. Insert the data that meets the criteria into the new table;
  3. Alter table temp add index temp;
  4. Use the driver table and the temporary table temp for join operations

Overall, the goal is to be able to use indexes to trigger the BAK algorithm to improve performance.

conclusion

Through this article, the following points can be summarized:

  1. Use BKA algorithm as much as possible;
  2. BNL algorithm is the least efficient and can be converted to BKA algorithm by adding driven table indexes.
  3. Based on actual requirements, temporary table schemes can be considered.