What is Sort Merge Join

Before we start reading the source code, let’s take a look at what Sort Merge Join (SMJ) is, defined on Wikipedia. Simply put, the two Join tables are sorted according to Join attributes and then scanned and merged to obtain the final result. The biggest consumption of this algorithm lies in sorting internal and external data. When joining index columns, we can use the orderality of indexes to avoid the consumption caused by sorting. Therefore, SMJ can be considered in the case of joining index columns in the query optimizer.

TiDB Sort Merge Join implementation

Implementation process

The implementation code for TiDB is in TiDB /executor/merge_join.go. SELECT * FROM A JOIN B ON A.a = B.a; SELECT * FROM A JOIN B ON A.a = B.a; SELECT * FROM A JOIN B ON A.a = B.a;

  1. Insert rows of the same keys into array A1, insert rows of the same keys into array A2, insert rows of the same keys into array A2. Exit if external data or inner table data is finished reading.

  2. Read the current first row of data from A1, set to v1. Reads the current first row of data from a2, set to v2.

  3. According to the comparison of V1 and v2 by join-keys, the results can be divided into several situations:

    • CmpResult > 0 indicates that v1 is greater than v2. The data of current A2 is discarded and the next batch of data is read from the inner table. The reading method is the same as 1. Repeat 2.
    • CmpResult < 0 indicates that v1 is less than v2, indicating that the outer v1 has no inner table with the same value. The outer data is output to the resultGenerator (Different connection types have different output results, for example, the outer connection will output the mismatched outer data).
    • CmpResult is equal to 0, which means v1 is equal to v2. So the data in A1 is traversed, and the data in A2 is output to the resultGenerator for a connection.
  4. Go back to Step 1.

The following figure shows the SMJ process:

Read inner table/outer data

FetchNextInnerRows or fetchNextOuterRows read the inner table and the outer table, respectively. The two functions implement similar functions, and I will only detail the implementation of the function fetchNextInnerRows here.

The MergeSortExec operator reads data through the readerIterator, which reads data sequentially. The MergeSortExec operator maintains two readeriterators: outerIter and innerIter, which are constructed in the buildMergeJoin function.

Real data read operation is in readerIterator nextSelectedRow, here through ri. Reader. NextChunk read a Chunk of data at a time, about the relevant contents of the Chunk, Check out our previous TiDB source for an introduction to Chunk and the implementation framework in our series (10).

. Here it is important to note that our via expression VectorizedFilter on appearance data filtering, return a curSelected Boolean array, each row of data is used for appearance meet the filter filter conditions. Select * from t1 left outer join T2 on t1.a=100; For example, filter is t1.a=100. For rows that do not pass this filter condition, We through ri. JoinResultGenerator. Send to resultGenerator emitToChunk function, the resultGenerator is an interface, specifically the line data, It depends on the type of join. For example, outer join is printed and inner join is ignored. For details about resultGenerator, refer to the previous article: TiDB source Code Read series (9) Hash Join

RowsWithSameKey continuously reads the next row of data through nextSelectedRow, and determines whether the join keys of each row of data belong to the same join keys. If so, Rows of the same join-keys are placed into the innerChunkRows and outerIter4Row arrays respectively. We create iterators innerIter4Row and outerIter4Row, respectively. During the execution of SMJ, these two iterators will be used to obtain data for real comparison to obtain join result.

Merge-Join

For the Merge – Join logic code in the function MergeJoinExec. JoinToChunk, internal appearance of the iterator current data according to their respective Join – keys compared, have the following results:

  • CmpResult > 0 indicates that the current data in the outer table is greater than the data in the inner table, so fetchNextInnerRows directly reads the next inner table data, and then re-compares the data.

  • CmpResult < 0, which means that the current data in the outer table is less than the current data in the inner table. In this case, there are several cases. If it is an outer join, then the outer data + NULL is output; if it is an inner join, then the outer data is ignored. Unified by the “e.r esultGenerator to control, we only need to put the appearance data through” e.r esultGenerator. EmitToChunk calls it. FetchNextOuterRows is then used to read the next external data and recompare.

  • CmpResult = = 0, represents the appearance of the current data is equal to the table in the current data, this looks up data to make a table in the current data connection, through the “e.r esultGenerator. EmitToChunk generate results. Then the outer and inner tables get the next data and start the comparison again.

Repeat the above process until the outer or inner table data is traversed, and exit the merge-JOIN process.

More and more

Our analysis code above is based on the source-code branch. You may have noticed some problems, such as reading the inner Join group (the same key) at once. If there are many same keys, there is a risk of OOM memory. To address this problem, we did a few things in the latest Master branch to optimize:

  1. It is not necessary to read the same keys at once. It is only necessary to iterate over the keys and join them one by one with the inner table. This can at least reduce the appearance of OOM problems, can greatly reduce the probability of OOM.

  2. Tracker is used to record the memory size of the intermediate results used by the inner table. If it exceeds the threshold set by us, we will output logs or terminate the SQL to continue running to avoid OOM. Memory. Tracker is not covered here, but keep an eye out for a future source code analysis article.

In the future, we will make some optimization in merge-join, for example, we can do multi-way merging, save intermediate results and save external ones, etc. Please stay tuned.

Author: Yao Wei