In practice, most people will use joins for table joins, but some find that joins seem to be inefficient and take different execution times depending on the table driver. So how exactly does a join work?

I have two tables here, T1, T2

explain select * from t1 left join t2 on t1.a=t2.a;
Copy the code

T1 is the driver table, t2 is the driven table, and t1 is the driver table

Select * from left join where index A is used. Select * from left join where index A is used. Select * from left join where index A is used.

  1. Read a row from table T1
  2. From the data row, select field A into table T2
  3. Select the rows in table T2 that meet the conditions and form a row with the data in table T1 as part of the result set
  4. Repeat the above steps until t1 runs out of data

This process is to first traverse table T1, and then query the records that meet the conditions in table T2 according to the value of A in each row of data retrieved from table T1. Here we call “Index nested-loop Join”, or NLJ for short.

As mentioned above, if we choose the driver table, we should choose the small table to do the driver table. Otherwise, the large table does the driver table is to query all, the efficiency will be much lower. Of course, if the index of the driven table can be used

Here we change the SQL statement:

explain select * from t1 left join t2 on t1.a=t2.b;
Copy the code

The B field of the T2 table is unindexed

The result is that both tables are scanned Using where; Using join buffer (Block Nested Loop)

MySQL MySQL has implemented an optimization for full table scan without join.

BNL process:

  1. Read table T1 into join_buffer. Here we are putting table T1 into join_buffer.
  2. Scan table T2, extract each row from table T2, compare it with join_buffer, and return the rows that meet join conditions as part of the result set.

Here, both tables are doing a full table scan, so the execution cost is the same regardless of which table is doing the drive.

If the data in a table is too large to hold all the data, it is segmented. You can also change join_buffer_SIZE.

Batched Key Access (BKA) has been added to MySQL5.6 for indexed driven tables

For a multi-table JOIN statement, when MySQL accesses the first join table using an index, it uses JOIN_buffer to collect the values of the associated columns generated by the first operation object. After the KEY is constructed by the BKA, it is submitted to the engine for query through the MRR interface.

– step:

  1. Put the columns associated with the driver table into JOIN_buffer.
  2. Batch send keys (index Key values) to the MRR interface.
  3. The MRR sorts the received key according to its ROWID and then reads the data.

BKA is the same as BNL, but the BKA is used to sort the table by the index field when the table is driven by an index. The BKA is used to sort the table by the index field when the index is not a primary key index.

MRR

MySQL5.6 began supporting multi-range Read (MRR) optimizations. MRR is designed to reduce random disk access and convert random access to sequential data access. MRR is applicable to range, REF, and EQ_REF queries

MRR optimization has the following benefits:

  1. MRR makes data access more sequential. When the secondary index is queried, it is sorted by primary key and bookmarks are searched in the order of primary key sorting
  2. Reduce the number of times pages are replaced in the buffer pool
  3. Batch processing of key query operations

MRR is designed because most data is inserted in the ascending order of the primary key. Therefore, it can be considered that if the data is queried in the ascending order of the primary key, the disk reads are closer to the sequential read, which improves the read performance.

The key to MRR’s performance improvement is that the query does a range query (that is, a multi-value query) on index A and can get enough primary key ids. In this way, after sorting, then go to the primary key index to look up the data, in order to reflect the “sequential” advantage.