Brief introduction: In this article, we focus on the latest version of MySQL 8.0.25 about the SQL basic elements table, column, function, aggregation, grouping, sorting and other elements of the analysis, setting and conversion process. In this article, we continue the complex transformation process for more complex subqueries, partitioned tables, and joins.

The authors | | passenger source ali technology to the public

Background and architecture

In this article, we focus on the latest version of MySQL 8.0.25 about the SQL basic elements table, column, function, aggregation, grouping, sorting and other elements of the analysis, setting and conversion process. In this article, we will continue the complex transformation process for more complex subqueries, partitioned tables, and joins, as follows:

Transformation

  • remove_redundant_subquery_clause :

Permanently remove redundant parts from the query if 1) This is a subquery 2) Not normalizing a view. Removal should take place when a query involving a view is optimized, not when the view is created.

  • remove_base_options:

Remove SELECT_DISTINCT options from a query block if can skip distinct

  • resolve_subquery :Resolve predicate involving subquery, perform early unconditional subquery transformationsConvert subquery predicate into semi-join, orMark the subquery for execution using materialization, orPerform IN->EXISTS transformation, orPerform more/less ALL/ANY -> MIN/MAX rewriteSubstitute trivial scalar-context subquery with its value
  • transform_scalar_subqueries_to_join_with_derived:

Transform eligible scalar subqueries to derived tables.

  • flatten_subqueries :

Convert semi-join subquery predicates into semi-join join nests. Convert candidate subquery predicates into semi-join join nests. This transformation is performed once in query lifetime and is irreversible.

  • apply_local_transforms :delete_unused_merged_columns : If query block contains one or more merged derived tables/views, walk through lists of columns in select lists and remove unused columns.simplify_joins : Convert all outer joins to inner joins if possible. Prune_partitions: Perform partition pruning for a given table and condition.
  • push_conditions_to_derived_tables :

Pushing conditions down to derived tables must be done after validity checks of grouped queries done by apply_local_transforms();

  • Window::eliminate_unused_objects:

Eliminate unused window definitions, redundant sorts etc.

Detailed conversion process

1 Resolving subquery (resolve_subquery)

Parse statements with subqueries in conditions and do some early unrestricted subquery conversions, including:

  • OPTIMIZER_SWITCH_SEMIJOIN and HINT do not limit subqueries to IN/=ANY and EXIST Subquery predicate subqueries are simple query blocks rather than UNION subqueries without implicit and explicit groups The BY subquery has no HAVING, and the WINDOW function Resolve stages are Query_block::RESOLVE_CONDITION and Query_block::RESOLVE_JOIN_NEST and does not use the latest Hyper Optimizer. External query blocks can support semijoins with at least one table, Subquery_strategy::UNSPECIFIED Subquery_strategy::UNSPECIFIED Parent query also has at least one table. Neither parent query nor child query can have straight Join parent query block. RAND determines whether the result of a subquery needs to be converted to true or false and NULL, and whether antiJoin or semijoinAntijoin is supported, or whether semiJoinOffset and limit are valid for semJoin. Offset starts from the first line and limit is not 0. Set Subquery_strategy::CANDIDATE_FOR_SEMIJOIN and add sj_candidates

  • Mark whether the subQuery is executed using the materialization scheme. If it does not conform to the transformation semijoin, try materialization. Optimzier switch Subquery_to_derived =on subquery is IN/=ANY or EXISTS Predicate subquery is simple query block NOT UNION There must be no aggregate Subquery predicate in the WHERE clause (currently not implemented in the ON clause), The Subquery_strategy::UNSPECIFIED parent query has at least one table, and the ANDs or ORs expression tree parent query supports the Semiquery_strategy ::UNSPECIFIED subquery. RAND determines whether the result of the subquery needs to be converted to true or false, and whether the result is NULL. Judgment can do antijoin or semijoin does not support the parameter is not multi – column sub-query (WHERE (outer_subq) = ROW (derived. Col1, derived. Col2)) is converted to a derived the subquery does not support Table (m_subquerY_to_deriveD_is_impossible) Set Subquery_strategy::CANDIDATE_FOR_DERIVED_TABLE and add the sj_candidates

  • If the above two policies are not available, select transformerItem_singlerow_subselect based on type :: select_Transformer for simple standard quantum queries, directly replace the query with the execution result

    select * from t1 where a = (select 1); => select * from t1 where a = 1;

  • Item_in_subselect/Item_allany_subselect::select_transformer->select_in_like_transformer

  • Select_in_like_transformer function to handle IN/ALL/ANY/SOME subquery transformation

  • SELECT 1 from Item_in_optimizer

  • If there is currently no way to execute a subquery, that is, a subquery that cannot be executed using semiJoin/AntiJoin, an IN->EXISTS conversion is performed, essentially choosing between materialized execution and iterative cyclic execution. The IN syntax means that an unrelated subquery is executed only once, materializing the query result into a temporary table, and then dematerializing the table when the result is needed. EXISTS indicates that the subquery is executed once for each external record, in an iterative loop. The subquery policy is set to Subquery_strategy::CANDIDATE_FOR_IN2EXISTS_OR_MAT

  • Rewrite single-column IN/ALL/ANY subquery (single_value_transformer)

    oe
    c m p cmp
    (SELECT ie FROM … WHERE subq_where … HAVING subq_having) =>

    • oe
      c m p cmp
      (SELECT MAX(…) ) // handled by Item_singlerow_subselect
    • oe
      c m p cmp
      <max>(SELECT …) // handled by Item_maxmin_subselect

    fails=>Item_in_optimizer

    • For the already Materialized scheme, no transformation is performed
    • Convert IN to EXISTS via equ-join
  • If it is an ALL/ANY single-valued subquery predicate, try the MIN/MAX subquery conversion

    SELECT * FROM t1 WHERE a < ANY (SELECT a FROM t1); => SELECT * FROM t1 WHERE a < (SELECT MAX(a) FROM t1)

  • If not, call single_value_IN_TO_EXISTS_transformer to convert the IN to EXISTS transformation to set the subquery to a related subquery, UNCACHEABLE_DEPENDENT If the subquery contains aggregate functions, window functions, GROUP syntax, and HAVING syntax, add the criteria to the HAVING clause, and use ref_or_NULl_helper to distinguish NULL from False results. NULL IN (SELECT…) You also need to wrap it in an Item_func_trig_cond trigger.

    SELECT … FROM t1 WHERE t1.b IN (SELECT <expr of SUM(t1.a)> FROM t2) => SELECT … FROM t1 WHERE t1.b IN (SELECT <expr of SUM(t1.a)> FROM t2 [trigcond] HAVING t1.b=ref-to-<expr of SUM(t1.a)>)

  • If the subquery does not contain aggregate functions, window functions, or GROUP syntax, it will be placed in the WHERE query condition, and of course the HAVING clause (Item_func_trig_cond+Item_is_not_null_test) if NULL is required.

    Subqueries that do not need to distinguish between NULL and FALSE:

    SELECT 1 FROM … WHERE (oe
    c m p cmp
    ie) AND subq_where

    SELECT 1 FROM… WHERE subq_where AND trigcond((oe cmpcmpcmp ie) OR (ie IS NULL)) HAVING trigcond(@

    (ie))
    @>

  • JOIN::optimize() calculates the cost of the materialization and EXISTS conversions to select m_SUBquerY_to_deriveD_is_impossible = true

  • ROW value conversion, via Item_in_optimizer, does not support ALL/ANY/SOME (ROW_value_TRANSFORMER) Item_in_subselect:: ROW_value_IN_TO_EXISTS_transformer

    for (each left operand) create the equi-join condition if (is_having_used || ! abort_on_null) create the “is null” and is_not_null_test items if (is_having_used) add the equi-join and the null tests to HAVING else add the equi-join and the “is null” to WHERE add the is_not_null_test to HAVING

  • No HAVING expression

    (l1, l2, l3) IN (SELECT v1, v2, v3 … WHERE where) => EXISTS (SELECT … WHERE where and (l1 = v1 or is null v1) and (l2 = v2 or is null v2) and (l3 = v3 or is null v3) [ HAVING Is_not_null_test (v1) and is_not_NULl_test (v2) and is_not_NULl_test (v3))]

  • HAVING expressions

    (l1, l2, l3) IN (SELECT v1, v2, v3 … HAVING having) => EXISTS (SELECT … HAVING having and (l1 = v1 or is null v1) and (l2 = v2 or is null v2) and (l3 = v3 or is null v3) and is_not_null_test(v1) and is_not_null_test(v2) and is_not_null_test(v3))

Transform_scalar_subqueries_to_join_with_derived Table transform_SCALar_subqueries_TO_JOIN_WITH_Derived Table

This feature was introduced in 8.0.16 to support analysis push-down by the Secondary Engine(Heapwave). The difference between conversion and non-conversion execution plans can be seen intuitively:

root:test> set optimizer_switch = 'subquery_to_derived=off'; Query OK, 0 rows affected (0.00 SEC) root:test> EXPLAIN SELECT B, MAX(a) AS ma FROM t4 GROUP BY b HAVING ma < (SELECT MAX(t2.a) FROM t2 WHERE t2.b=t4.b); +----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+------- ----------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+------- -- -- -- -- -- -- -- -- -- -- + | 1 | PRIMARY | t4 | NULL | | NULL ALL | NULL | NULL | NULL 10 | | | 100.00 Using temporary | | | 2 DEPENDENT SUBQUERY | t2 | NULL | | NULL ALL | NULL | NULL | NULL | 3 | | 33.33 Using the where | +----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+------- ----------+ 2 rows in set, 3 warnings (0.00 SEC) root:test> set optimizer_switch = 'subquery_to_derived=on'; Query OK, 0 rows affected (0.00 SEC) root:test> EXPLAIN SELECT B, MAX(a) AS ma FROM t4 GROUP BY b HAVING ma < (SELECT MAX(t2.a) FROM t2 WHERE t2.b=t4.b); +----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------- -----------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 1 | PRIMARY | t4 | NULL | | NULL ALL | NULL | NULL | NULL 100.00 | | | 10 Using Temporary | | 1 | PRIMARY | < derived2 > | NULL | | NULL ALL | NULL | NULL | NULL | 3 | | 100.00 Using the where; Using the join buffer (hash join) | | 2 | DERIVED | t2 | NULL | | NULL ALL | NULL | NULL | NULL | 3 | | 100.00 Using temporary | +----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 3 rows in the set, three warnings (0.01 SEC)Copy the code
  • Transform_SCALar_subqueries_TO_JOIN_WITH_DERIVED First collect standard queries (Item:: Collect_SCALar_subqueries) that can be converted from JOIN conditions, WHERE conditions, HAVING conditions, and SELECT list. Iterate over these subqueries to see if you can add an additional transformation (transform_grouped_to_DERIVED) : turn the implicit GROUP BY standard quantum query into a Derived Table.

    SELECT SUM(c1), (SELECT SUM(c1) FROM t3) scalar FROM t1; Convert to => SELECT derived0.summ, derived1.scalar FROM (SELECT SUM(a) AS summ FROM t1) AS derived0 LEFT JOIN (SELECT SUM(b) AS scalar FROM t3) AS derived1 Explain SELECT SUM(a), (SELECT SUM(C1) FROM T3) scalar FROM T1; +—-+————-+————+————+——+—————+——+———+——+——+———-+——— ———————————–+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +—-+————-+————+————+——+—————+——+———+——+——+———-+——— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — + | 1 | PRIMARY | | NULL | | NULL ALL | NULL | NULL | NULL | 1 | | NULL | 100.00 | 1 | PRIMARY | | NULL | | NULL ALL | NULL | NULL | NULL | 1 | | 100.00 Using the where; Using the join buffer (hash join) | | | 4 DERIVED | t3 | NULL | | NULL ALL | NULL | NULL | NULL | 1 | | NULL | 100.00 | 3 | DERIVED | t1 | NULL | | NULL ALL | NULL | NULL | NULL | 2 | | NULL | 100.00 +—-+————-+————+————+——+—————+——+———+——+——+———-+——— ———————————–+

  • Collects a list of unique aggregate functions, collect_aggregates, which are replaced by the columns of the new Derived Table. You also need to add all the fields that refer to those items, including those directly in the SELECT list, Window function parameters, ORDER by, Partition by, and the ORDER by column in the query block, because they are all drawn into the Derived Table. Query_expression/Query_block (create_query_expr_and_block) needed to create Derived Table. Add the Derived Table to the query block and top_JOIN_list. Keep the old subquery unit block, if it contains Derived units that can be converted, in the Query_block below the Derived Table, if it does not, in the original subquery block. Insert the previous list of aggregate functions Item into the query block of the Derived Table. Collect all but the columns in the GROUP AGG expression, and since the fields have been moved into the Derived Table, remove unreasonable fields references. After collecting all the unique columns and View references, add them to the new Derived Table list. A flatten_subqueries/setup_tables redo the flatten_subqueries/setup_tables on the new Derived Table, disregarding the converted subqueries. Process the newly added aggregate function Item in the Derived Table HAVING condition and refer to new_Derived ->base_ref_items via Item_aggregate_refs instead of the previous parent query block base_ref_items. Permanently replace the list of aggregate functions in the parent query block, become columns in the Derived Table, and delete them. The only columns and View references that were previously saved and added to the Derived Table are also replaced with new fields.

  • This subquery is not currently supported in HAVING expressions and can be converted.

    SELECT SUM(a), (SELECT SUM(b) FROM t3) scalar FROM t1 HAVING SUM(a) > scalar; Convert to => SELECT derived0.summ, derived1.scalar FROM (SELECT SUM(a) AS summ FROM t1) AS derived0 LEFT JOIN (SELECT SUM(b) AS scalar FROM t3) AS derived1 ON TRUE WHERE derived0.sum > derived1.scalar;

  • Next, we iterate over all the subqueries that can be converted, converting them to Derived tables, and replacing the corresponding expressions into columns (transform_subquery_to_DERIVED). Derived table TABLE_LIST (synthesize_derived). Set where_cond, which you can move to the Derived Table, to join_cond. Add derived Table to the set of tables in the query block. Decorrelate_derived_scalar_subquery_pre Adds non-related reference columns (NCF) to the SELECT list that are referenced by the JOIN condition, and another field that contains columns associated with the external query, We call this’ lifted_WHERE ‘and add COUNT(*) to the SELECT list so that the converted query block can be checked in cardinality. For example, there are no aggregate functions in the subquery. If you do include an aggregate function, the return line must be NCF and also in the GROUP BY list. Add NCF to the GROUP list of the subquery, if it is already there, to the end of the list. Add Item_func_any_value (non-aggregated column) to the SELECT list if the column with GROUP BY failed due to dependency checking. For NCF, derived. Field and Derived. Count (field) are created. Set some preparation for materialization (setup_materialized_derived). Decorrelate_derived_scalar_subquery_post: Create the corresponding ‘lifted_fields’. Update the reference to the related column in the JOIN condition, replacing the query outside the reference with the column related to the Derived Table.

  • Expressions that replace subqueries in WHERE, JOIN, HAVING conditions, and SELECT List become columns in the corresponding Derived Table.

The following diagram shows the conversion process and results of this function:

3 Flattening sub-queries (flatten_subqueries)

This function converts semi-join subqueries to nested joins only once and is not reversible.

  • Create SEMI JOIN (IT1… ItN), and added to the execution plan of the outer query block. Add the child query’s WHERE condition and JOIN condition to the parent query’s WHERE condition. Removes the subquery predicate from the parent query’s decision predicate.

  • Because MySQL has a finite number of tables that can join a query block (MAX_TABLES), not all sj_candidates can do a math that rolls _subqueries. This nesting needs to be prioritised first. The priority rule is as follows: Subqueries with more related subqueries taking precedence over non-related inner tables are greater than those before positions with fewer inner tables are greater than those after positions

    subq_item->sj_convert_priority = (((dependent * MAX_TABLES_FOR_SIZE) + // dependent subqueries first child_query_block->leaf_table_count) * 65536) + // then with many tables (65536 – subq_no); // then based on position

  • In addition, because a recursive call to the mathematical _subqueries is bottom-up, the flattener rolls the underlying subqueries into the outer query block.

    for SELECT#1 WHERE X IN (SELECT #2 WHERE Y IN (SELECT#3)) :
    
    Query_block::prepare() (select#1)
       -> fix_fields() on IN condition
           -> Query_block::prepare() on subquery (select#2)
               -> fix_fields() on IN condition
                    -> Query_block::prepare() on subquery (select#3)
                    <- Query_block::prepare()
               <- fix_fields()
               -> flatten_subqueries: merge #3 in #2
               <- flatten_subqueries
           <- Query_block::prepare()
       <- fix_fields()
       -> flatten_subqueries: merge #2 in #1
    Copy the code
  • Iterate through the list of subqueries to delete subqueries marked as Subquery_strategy::DELETED with Item::clean_up_after_removal and set sj_convert_PRIORITY according to the priority rule. Sort by priority.

  • Iterate through the sorted subquery list. For the subquery of Subquery_strategy::CANDIDATE_FOR_DERIVED_TABLE policy, Convert subquery ([NOT] {IN, EXISTS}) to JOIN’s Derived table (transform_table_subquery_to_join_WITH_Derived)

    FROM [tables] WHERE … AND/OR oe IN (SELECT ie FROM it) … => FROM (tables) LEFT JOIN (SELECT DISTINCT ie FROM it) AS derived ON oe = derived.ie WHERE … AND/OR derived.ie IS NOT NULL …

  • Set the policy to Subquery_strategy::DERIVED_TABLEsemijoin subquery and AntiJoin subquery cannot be nested with each other, or the number of external query tables exceeds MAX_TABLE. Otherwise, mark it as Subquery_strategy::SEMIJOIN policy. Determines whether the WHERE condition of the subquery is constant. If the judgment condition is always FALSE, the subquery result is always empty. In this case, the subquery is DELETED by calling Item::clean_up_after_removal marked Subquery_strategy::DELETED. If Subquery_strategy::DELETED/ Set the re-marking of the Subquery_strategy::UNSPECIFIED policy to Subquery_strategy::UNSPECIFIED. Replace the WHERE condition (replace_subcondition) of the outer query with the condition (replace_subcondition). If the condition in the subquery is not always FALSE, or if the condition is always FALSE, it needs to be changed to antiJoin (In the antiJoin case, the result of the subquery is always empty, The outer query condition always passes. Change the condition to always True. Subqueries are always FALSE and are not AntiJoin. Change the condition in the outer query to always False.

  • Item_subselect::EXISTS_SUBS does not support SQLIN/=ANY predicates that hold the aggregation operation convert_subquery_to_semijoin

  • If the condition is disassociated, disassociate the decorrelate_condition

  • Add the disassociated inner table expression to the SELECT List

  • Collect externally relevant Derived table or join conditions in the FROM clause

  • Delete the association identifier UNCACHEABLE_DEPENDENT and update the Used table

  • Derived Table subquery adds SELECT_DISTINCT identity

  • Convert the subquery to a Derived table and insert it into the query block FROM which it belongs (transform_subquery_to_DERIVED)

  • Create the Derived Table and its join conditions

  • Iterate over the WHERE of the parent query block and replace the subquery Item with the derived table (replace_subcondition)

  • Iterate through the sorted list of subqueries for the Subquery_strategy::CANDIDATE_FOR_SEMIJOIN policy.

  • Determine whether semiJoin can be converted

  • Iterate through the sorted list of subqueries. For Subquery_strategy::SEMIJOIN, Start converting to semijoin/ antiJoin (convert_subquery_to_semiJOIN) The convert_subquery_to_semiJOIN function resolves the SQLIN/=ANY predicate for the following schema

    SELECT … FROM ot1 … otN WHERE (oe1, … oeM) IN (SELECT ie1, … , ieM FROM it1 … itK [WHERE inner-cond]) [AND outer-cond] [GROUP BY …] [HAVING …] [ORDER BY …] => SELECT … FROM (ot1 … otN) SJ (it1 … itK) ON (oe1, … oeM) = (ie1, … , ieM) [AND inner-cond] [WHERE outer-cond] [GROUP BY …] [HAVING …] [ORDER BY …]

  • EXISTS predicates

    SELECT … FROM ot1 … otN WHERE EXISTS (SELECT expressions FROM it1 … itK [WHERE inner-cond]) [AND outer-cond] [GROUP BY …] [HAVING …] [ORDER BY …] => SELECT … FROM (ot1 … otN) SJ (it1 … itK) [ON inner-cond] [WHERE outer-cond] [GROUP BY …] [HAVING …] [ORDER BY …]

  • The NOT EXISTS predicates

    SELECT … FROM ot1 … otN WHERE NOT EXISTS (SELECT expressions FROM it1 … itK [WHERE inner-cond]) [AND outer-cond] [GROUP BY …] [HAVING …] [ORDER BY …] => SELECT … FROM (ot1 … otN) AJ (it1 … itK) [ON inner-cond] [WHERE outer-cond AND is-null-cond(it1)] [GROUP BY …] [HAVING …] [ORDER BY …]

  • NOT IN predicate

    SELECT … FROM ot1 … otN WHERE (oe1, … oeM) NOT IN (SELECT ie1, … , ieM FROM it1 … itK [WHERE inner-cond]) [AND outer-cond] [GROUP BY …] [HAVING …] [ORDER BY …] => SELECT … FROM (ot1 … otN) AJ (it1 … itK) ON (oe1, … oeM) = (ie1, … , ieM) [AND inner-cond] [WHERE outer-cond] [GROUP BY …] [HAVING …] [ORDER BY …]

  • Find where semi-join nesting and its generated conditions can be inserted, such as for T1 LEFT join T2, embedding_join_nest is T2, and T2 can also be nested, such as T1 LEFT join (T2 join t3).

  • Generate a new semiJoin nested TABLE_LIST table

  • Processing Antijoin

  • Merge potential tables from subqueries into the above join table (TABLE_LIST::merge_underlying_tables)

  • Insert the leaf table of the subquery after the leaf table of the current query block, reset the leaf table of the subquery sequence number and dependent appearance. Resets the leaf table of the subquery.

  • Outer JOIN, pass nullability in the join list

  • De-correlating the association conditions in the inner sub-query, which are added to the semiJoin list. These conditions must be determinate AND support only simple judgment conditions or an AND condition consisting of simple judgment conditions (Query_block::decorrelate_condition) to determine whether the left AND right conditions depend only on the inner AND outer layer tables, Add their expressions separately to the list of expressions inside semiJoin (decorrelate_equality)

  • Decorrelate_condition = Query_block::decorrelate_condition

  • Removes the subquery expression from the parent query AST (Query_express::exclude_level)

  • Update table bitmap based on WHERE/ join conditions generated by semi-join nested (Query_block::fix_tables_after_pullout)

  • Update the use table information by pulling up the WHERE condition of the subquery (Item_cond_and::fix_after_pullout())

  • Create AND condition according to semiJoin’s condition list. If any condition is constant True, remove the condition. If constant is False, the whole condition is removed (Query_block::build_sj_cond)

  • Add the created semiJoin condition to the WHERE condition of the outer query

  • As with the Subquery_strategy::UNSPECIFIED policy, the IN->EXISTS override (select_Transformer) is performed for the untransformed subquery list. If the original subquery already has an alternative Item, Call replace_subcondition parsing and add them to the appropriate WHERE or ON clause.

  • Clears all sj_candidates lists

  • Semijoin can be executed in five ways. This article does not introduce the Optimizer and Execution process. For details, please refer to the introduction of semiJoin in the article.

    SELECT @@optimizer_switch\G *************************** 1. row *************************** @@optimizer_switch: …… materialization=on,semijoin=on,loosescan=on, firstmatch=on, subquery_materialization_cost_based=on, ……

  • The following figure illustrates the conversion process:

    SELECT * FROM t1 WHERE t1.a in (SELECT t2.c1 FROM t2 where t2.c1 > 0); => /* SELECT t1 #1 */ SELECT t1.a AS a FROM T1 SEMI JOIN (t2) WHERE (t1.a = t2.c1) and (t2.c1 > 0)) explain SELECT * FROM t1 WHERE t1.a in (SELECT t2.c1 FROM t2 where t2.c1 > 0); +—-+————-+——-+————+——+—————+——+———+——+——+———-+————– ———————————————+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +—-+————-+——-+————+——+—————+——+———+——+——+———-+————– — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — + | | 1 SIMPLE | t2 | NULL | | NULL ALL | NULL | NULL | NULL | 1 | | 100.00 Using where; Start temporary | | | 1 SIMPLE | t1 | NULL | | NULL ALL | NULL | NULL | NULL | 2 | | 50.00 Using the where; End temporary; Using join buffer (hash join) | +—-+————-+——-+————+——+—————+——+———+——+——+———-+————– ———————————————+

4 Applying current query block transforms (apply_local_transforms)

This function is called bottom-up after flattern subqueries:

Delete unnecessary columns (delete_unused_merged_columns)

If the query block has removed some Derived tables/views, iterate over the columns of the SELECT list, removing unnecessary columns

Simplified Joins (simplify_joins)

This function reduces the nested join of top_join_list in Query_block to a flattened join list. Nested join table1 join table2 and table1 (table2, table3). If the simplification process shown:

Static prune_partitions for partitioned tables

Pruning depends on HASH/RANGE/LIST and secondary partitions, so prune_partitions are called in the prepare and optimize phases, and some constant quantum queries are evaluated and executed.

struct TABLE { ...... partition_info *part_info{nullptr}; /* Partition related information */ /* If true, all partitions have been pruned away */ bool all_partitions_pruned_away{false}; . } SQL tranformation phase SELECT_LEX::apply_local_transforms --> prune_partitions for example, select * from employee where company_id = 1000 ; SQL optimizer phase JOIN::prune_table_partitions --> prune_partitions ------> based on tbl->join_cond_optim() or JOIN::where_cond for example, explain select * from employee where company_id = (select c1 from t1);Copy the code
  • Examples of the following RANGE pruning process:

    root:ref> CREATE TABLE R2 ( -> a INT, -> d INT -> ) PARTITION BY RANGE(a) ( -> PARTITION p20 VALUES LESS THAN (20), -> PARTITION p40 VALUES LESS THAN (40), -> PARTITION p60 VALUES LESS THAN (60), -> PARTITION p80 VALUES LESS THAN (80), -> PARTITION p100 VALUES LESS THAN MAXVALUE -> ); Query OK, 0 rows affected (0.09 sec)

    root:ref> Select * From R2 where a > 40 and a < 80;

  • The detailed pruning process is as follows: Because pruning requires intersection of pruning results generated under different conditions, bitmap such as read_partitions should be used to store whether the corresponding partitions are used during pruning. In addition, the pruning process is similar to iterative judgment, so the part_iterator is introduced to store the start, end, and present, as well as the corresponding endpoint function that needs to get the range and the iterator function that gets the next value, next. Pointers are cleverly used to accommodate different partition types Hash/Range/List, as shown below:

  • Get SEL_TREE red black tree (get_mm_tree) in join_cond or m_where_cond

  • Call find_used_partitions to retrieve the required partitions, for each SEL_TREE interval: 1. 2. Continue to get the next partition from the left until the end of the right endpoint. After each call, the itersection that needs to be bitmapped is called.

  • Set read_PARTITIONS and lock_PARTITIONS

  • Find_used_partitions recurse according to the structure of SEL_TREE, traversing next_KEY_part (and condition) from left to right, and then traversing the left and right (i.e., up and down, or condition) deep recursion of SEL_TREE.

    (start) | $ | Partitioning keyparts $ subpartitioning keyparts | $ | ... . $ | | | $ | +---------+ +---------+ $ +-----------+ +-----------+ \-| par1=c1 |--| par2=c2 |-----| subpar1=c3|--| subpar2=c5| +---------+ +---------+ $ +-----------+ +-----------+ | $ | | | $ | +-----------+ | $ | | subpar2=c6| | $ | +-----------+ | $ | | $ +-----------+ +-----------+ | $ | subpar1=c4|--| subpar2=c8| | $ +-----------+ +-----------+ | $  | $ +---------+ $ +------------+ +------------+ | par1=c2 |------------------| subpar1=c10|--| subpar2=c12| +---------+ + $-- -- -- -- -- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- -- -- + | $... $Copy the code

    For example, the stack for the first row (par1=c1 and par2=c2 and subpar1=c3 and subpar2=c5) would be: in find_used_partitions(key_tree = “subpar2=c5”) (*) in find_used_partitions(key_tree = “subpar1=c3”) in find_used_partitions(key_tree = “par2=c2”) () in find_used_partitions(key_tree = “par1=c1”) in prune_partitions(…) And then I’m going to continue with the following condition, Or (par1=c1 and par2=c2 and subpar2= c3 and subpar2=c6) or (par1=c1 and par2=c2 and subpar1=c4 and subpar2=c8) Or (par1=c2 and subpar1=c10 and subpar2=c12)

  • The following figure shows the structure and process of pruning:

Derived Table (push_conditions_to_derived_tables)

This function pushes the conditional down to derived tables, as detailed in WL#8084 – Condition pushdown to materialized derived table.

root:test> set optimizer_switch = 'derived_merge=off'; // Disable dervied_merge test pushdown capability Query OK, 5 Rows affected (0.00 SEC) root:test> EXPLAIN FORMAT=tree SELECT * FROM (SELECT c1,c2 FROM T1) as dt WHERE C1 > 10; +----------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------+ | EXPLAIN | +----------------------------------------------------------------------------------------------------------------------- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | - > Table scan on dt (cost = 2.51.. 2.51 rows=1) -> Materialize (cost=2.96.. 2.96 rows = 1) - > Filter: (t1. C1 > 10) (cost = 0.35 rows = 1) - > Table scan on t1 (cost = 0.35 rows = 1) | +----------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------+Copy the code

The process is as follows:

  • Derived table (can_push_condition_to_condition_condition_condition_condition_derived) Derived Table has UNIONDerived Table has LIMITDerived table cannot be an inner table in outer join, and the rows that would result in more NULL compensation cannot be the Derived table contained in CTE
  • Condition_pushdown::make_cond_for_derived (Condition_pushdown::make_cond_for_derived)
  • Condition_pushdown::get_remainder_cond
  • Push_conditions_to_derived_tables is a top-down recursive call

The process is illustrated in detail as follows:

Three reviews

The two articles focus on the rule-based optimization part of the optimizer and do not cover more cost-based optimization. You can see that for the speed of execution caused by direct rule optimization, you can directly transform, especially for the variable class transformation above the query structure, such as Merge_derived. If the application of rule optimization cannot determine whether it will accelerate the execution, the optimizer will retain some temporary structures to provide more choices for the subsequent cost estimation, such as IN/EXIST/Materialized transformation. Of course, there are still some, which change the query structure and cannot determine whether rule conversion results in execution speed, MySQL does not currently support. Although the article is detailed, but can not cover the whole situation, is also to throw a brick to introduce a stone, but also requires readers to further understand the specific process of a certain TYPE of SQL through debugging methods.

Iv Reference Materials

MySQL 8.0 Server Layer Architecture

WL#13520: Transform scalar subqueries

WL#8084 – Condition pushdown to materialized derived table

WL#2980: Subquery optimization: Semijoin

  • WL#3740: Subquery optimization: Semijoin: Pull-out of inner tables
  • WL#3741: Subquery optimization: Semijoin: Duplicate elimination strategy
  • WL#3750: Subquery optimization: Semijoin: First-match strategy
  • WL#3751: Subquery optimization: Semijoin: Inside-out strategy

WL#4389: Subquery optimizations: Make IN optimizations also handle EXISTS

WL#4245: Subquery optimization: Transform NOT EXISTS and NOT IN anti-join

WL#2985: Perform Partition Pruning of Range conditions

MySQL · Source Code Analysis · Semi-Join Optimized Execution Code Analysis

MySQL· Source analysis · Sub-query optimization source analysis

Optimizing Subqueries, Derived Tables, View References, and Common Table Expressions

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.