This paper sort the: kangaroo cloud hui | SQL optimization case (2) : optimize the OR conditions

Data Stack is a cloud native – site-based data central-platform PaaS. We have an interesting open source project on Github: github.com/DTStack/fli…

FlinkX is a data synchronization tool based on Flink. It can collect static data, such as MySQL, HDFS, etc., as well as real-time changing data, such as MySQL binlog, Kafka, etc. It is a data synchronization engine that integrates whole domain, heterogeneous and batch data. Welcome to github community to find us to play ~

In MySQL, the same query condition, if the position of OR is changed in THE SQL statement, then the query result will be different, in more complex cases, may bring poor index selection performance hidden trouble, in order to avoid the problem of greatly reduced execution efficiency, It is appropriate to consider using Union All to separate SQL with more complex query logic.

For common OR usage scenarios, please read the following cases:

Case 1: Query different columns using the OR condition

  1. Scenarios to be optimized

SELECT .. . FROMt1 a WHERE a.token= ‘16149684’ AND a.store_id= ‘242950’ AND(a.registrationId IS NOT NULL AND a.registrationId<> ”) OR a.uid= 308475 AND a.registrationId IS NOT NULL AND a.registrationId<> ”

The execution plan

+————–+———————–+—————–+—————-+——————-+——————-+—– ———-+—————-+———————————————+ | id | select_type | table | type | key | key_len | ref | rows | Extra | +————–+———————–+—————–+—————-+——————-+——————-+—– ———-+—————-+———————————————+ | 1 | SIMPLE | a | range |idx_registrationid | 99 | | 100445 | Using index condition; Using where | +————–+———————–+—————–+—————-+——————-+——————-+—– ———-+—————-+———————————————+

A total of 1 row is returned, which takes 5 ms.

  1. Scenario analysis

According to the query conditions, token and UID filters are very good. However, because OR is used, index merge is required to achieve better performance. However, the MySQL optimizer defaults to using indexes on registrationId during actual execution, resulting in poor SQL performance.

  1. Scene optimization

Let’s rewrite our SQL as a union all.

SELECT … . FROMt1 a WHERE a.token = ‘16054473’ AND a.store_id = ‘138343’ AND b.is_refund = 1 AND (a.registrationId IS NOT NULL AND a.registrationId <> ”) union all SELECT … . FROMt1 a where a.uid = 181579 AND a.registrationId IS NOT NULL AND a.registrationId <> ”

+————–+———————–+—————–+—————-+——————————+————– -+——————-+——————————+—————-+————————————+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +————–+———————–+—————–+—————-+——————————+————– – + — — — — — — — — — — — — — — — — — — – + — — — — — — — — — — — — — — — — — — — — — — — — — — — — — – + — — — — — — — — — — — — — — — – + — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — + | | 1 PRIMARY | a | ref | IDX_TOKEN,IDX_STORE_ID_TOKEN | IDX_TOKEN | 63 | const | 1 | Using index condition; Using where | | 1 | PRIMARY | b | eq_ref | PRIMARY | PRIMARY | 4 | youdian_life_sewsq.a.role_id | 1 | Using where | | 2 | UNION | a | const | PRIMARY | PRIMARY | 4 | const | 1 | | | 2 | UNION | b | const | PRIMARY | PRIMARY | 4 | const | 0 | the unique row not found | | | the UNION RESULT | < union1, 2 > | ALL | | | | | | Using temporary | +————–+———————–+—————–+—————-+——————————+————– -+——————-+——————————+—————-+————————————+

A total of 5 rows were returned, which took 5 ms.

By comparing the execution plan before and after optimization, it can be clearly seen that splitting SQL into two sub-queries and using Union to merge the results has better stability, security and performance.

Case 2: Use the OR query condition for the same column

  1. Scenarios to be optimized

select …. . from t1 as mci left join t1 as ccv2_1 on ccv2_1.unique_no = mci=category_no1 left join t1 as ccv2_2 on ccv2_2.unique_no = mci=category_no2 left join t1 as ccv2_3 on ccv2_3.unique_no = mci=category_no3 left join( select product_id, count(0) count from t2 pprod inner join t3 pinfo on pinfo.promotion_id = pprod.promotion_id and pprod.is_enable =1 and ppinfo.is_enable=1 and pinfo.belong_t0 =1 and pinfo.end_time >=now() and not ( pinfo.onshelv_time>’2019-06-30 00:00:00′ or pinfo.end_time>’2018-12-05 00:00:00′ )group by pprod.product_id )as pc on pc.product_id = mci.product_id where Is_enable =0 and McI.odifty_type in (‘1’, ‘5’, ‘6’) and (pc.count =0 or pc.count isnull) limit 0,5;

The execution plan

  1. Scenario analysis

In this example, there is a sub-query in the SQL query. The sub-query is used as the driver table and generates auto_key. The test is carried out through THE SQL split.

  1. Scene optimization

Count =0, or pc.count is null. Start by writing a similar SQL

Select col from test where col =100 or col is null; + — — — — — — — — + | col | + — — — — — — — — + | 100 | | NULL | + — — — — — — — — + 2 rows in the set (0.00 SEC)

In this case, we see the same column with different values, which can be converted using case when.

Select col From test where case when col is null then 100 else col =100 end; + — — — — — — — — + | col | + — — — — — — — — + | 100 | | NULL | + — — — — — — — — + 2 rows in the set (0.00 SEC)

Go back to the original SQL and rewrite it.

select …. . from t1 as mci left join t1 as ccv2_1 on ccv2_1.unique_no = mci=category_no1 left join t1 as ccv2_2 on ccv2_2.unique_no = mci=category_no2 left join t1 as ccv2_3 on ccv2_3.unique_no = mci=category_no3 left join( select product_id, count(0) count from t2 pprod inner join t3 pinfo on pinfo.promotion_id = pprod.promotion_id and pprod.is_enable =1 and ppinfo.is_enable=1 and pinfo.belong_t0 =1 and pinfo.end_time >=now() and not ( pinfo.onshelv_time>’2019-06-30 00:00:00′ or pinfo.end_time>’2018-12-05 00:00:00′ )group by pprod.product_id )as pc on pc.product_id = mci.product_id where mci.is_enable =0 and mci.comodifty_type in (‘1’, ‘5’, ‘6’) and case when pc.count is null then 0 else PC. count end=0 LIMIT 0,5;

You can see that the optimized SQL is 30 seconds faster than the original SQL, and the execution efficiency is about 50 times higher.

Case 3: Optimizing associated SQL OR conditions

  1. Scenarios to be optimized

SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’,… FROM user_msg LEFT JOIN user ON user_msg.user_id = user.user_id LEFT JOIN group ON user_msg.group_id = group.group_id WHERE user_msg.gmt_modified >= date_sub(‘2018-03-29 09:31:44’, INTERVAL30SECOND) OR user.gmt_modified >= date_sub(‘2018-03-29 09:31:44’, INTERVAL 30 SECOND) OR group.gmt_modified >= date_sub(‘2018-03-29 09:31:44’, INTERVAL 30 SECOND)

2. Scenario analysis

After careful analysis of the above query statements, we found that although the business logic only needs to query the data changed within half a minute, the execution process must associate all the data, resulting in unnecessary performance loss.

3. Scene optimization

SQL > alter table SQL > alter table SQL > alter table SQL > alter table SQL

SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’,… FROM user_msg LEFT JOIN user ON user_msg.user_id = user.user_id LEFT JOIN group ON user_msg.group_id = group.group_id WHERE user_msg.gmt_modified >= date_sub(‘2018-03-29 09:31:44’, INTERVAL 30 SECOND)

Sql-01 uses the USER_MSG table as the driver and uses the GMT_modified index to filter the latest data.

Part 2 OF SQL-02 is as follows:

SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’,… FROM user_msg LEFT JOIN user ON user_msg.user_id = user.user_id LEFT JOIN group ON user_msg.group_id = group.group_id WHERE user.gmt_modified >= date_sub(‘2018-03-29 09:31:44’, INTERVAL 30 SECOND)

MSG user_id index filters rows very well.

Part 3 SQL-03 is as follows:

SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’,… FROM user_msg LEFT JOIN user ON user_msg.user_id = user.user_id LEFT JOIN group ON user_msg.group_id = group.group_id WHERE group.gmt_modified >= date_sub(‘2018-03-29 09:31:44’, INTERVAL 30 SECOND)

Sql-03 uses the group driven table and uses the GMT_modified index to filter the latest data.

conclusion

MySQL OR conditional optimization is used in the following scenarios:

1. The same column can be replaced by IN

2. For different columns and complex cases, union All can be used for separation

3, associate SQL OR conditions

We need to combine the actual scene, analysis and optimization.