Welcome to this article on MySQL technology shared by the GreatSQL community. If you have any questions or questions you would like to learn, please leave them in the comments section below

Authors: Jing Yunli, Lu Hao, Song Yuandong

  • GreatSQL community original content is not allowed to use without authorization, please contact xiaobian and indicate the source.

The introduction

Update a=a+1 where PK > 500 update a=1 where PK =1; update a=12 where pk=7; . This scenario of multiple consecutive UPDATE statements is rare.

By accident or necessity, during the development of GreatDB 5.0, we needed to optimize the multi-statement batch update scenario.

When we do multi-row updates to a table, we usually encounter two situations

Update a=a+1 where PK > 500

Update a=1 where pk=1; update a=12 where pk=7; …).

Let’s do the actual operation to compare the time consumed in updating the same number of rows in the two scenarios.

Data preparation

Database version: MySQL 8.0.23

T1 table, construct the table sentence and prepare the initial data 1000 lines

create database if not exists test; Create table t1(c1 int primary key,c2 int); DROP PROCEDURE IF EXISTS insdata; DELIMITER $$ CREATE PROCEDURE insdata(IN beg INT, IN end INT) BEGIN WHILE beg <= end DO INSERT INTO test.t1 values (beg, end); SET beg = beg+1; END WHILE; END $$ DELIMITER ; Call insdata(1,1000)Copy the code

1. Batch update of a single statement

Update statement

update  t1 set c2=10 where c1 <=1000;
Copy the code

The execution result

mysql> update t1 set c2=10 where c1 <=1000; Query OK, 1000 rows affected (0.02sec) Rows matched: 1000 Changed: 1000 Warnings: 0Copy the code

2. Batch update of multiple statements

The following script is used to generate 1000 lines of UPDATE statements that update c2 to a random number up to 1000

#! /bin/bash for i in {1.. 1000} do echo "update t1 set c2=$((RANDOM%1000+1)) where c1=$i;" >> update.sql doneCopy the code

Generate the following SQL statement

update t1 set c2=292 where c1=1; update t1 set c2=475 where c1=2; update t1 set c2=470 where c1=3; update t1 set c2=68 where c1=4; update t1 set c2=819 where c1=5; . . update t1 set c2=970 where c1=1000;Copy the code

Because the source/SSD/TMP/TMP / 1000 / update. SQL; The result is as follows. The execution time is not easy to calculate:

Query OK, 1 row affected (0.00 SEC) Rows matched: 1 Changed: 1 Warnings: 0 Query OK, 1 row affected (0.01sec) Rows matched: 1 Changed: 1 Warnings: 0 Query OK, 1 row affected (0.00 SEC) Rows matched: 1 Changed: 1 Warnings: 0 Query OK, 1 row affected (0.00 SEC) Rows matched: 1 Changed: 1 Warnings: 0 Query OK, 1 row affected (0.01sec) Rows matched: 1 Changed: 1 Warnings: 0 Query OK, 1 row affected (0.00 SEC) Rows matched: 1 Changed: 1 Warnings: 0 Query OK, 1 row affected (0.01sec) Rows matched: 1 Changed: 1 Warnings: 0Copy the code

So use Linux timestamp for statistics:

#! /bin/bash start_time= 'date +%s%3N'/SSD/TMP /mysql/bin/mysql -h127.0.0.1 -uroot -p3316-pabc123 -e "use test; source /ssd/tmp/tmp/1000/update.sql;" End_time = ` date + % s % 3 n ` echo "execution time is:" $(($end_time - $start_time)) "ms"Copy the code

Execution Result:

[root@computer-42 test]# bash update.sh mysql: [Warning] Using a password on the command line interface can be insecure. The execution time is 4246msCopy the code

Execution time: 4246ms=4.246 SEC

The result comparison

conclusion

From the above example, we can see that 1000 rows of data are also updated. The execution efficiency of single statement batch update differs greatly from that of multi-statement batch update.

In addition to the network and statement parsing overhead of the 1000 lines SQL statement itself, performance is affected by the following factors:

1. If the session is auto_COMMIT =1, commit is performed after each update statement. The commit operation takes a long time and results in two disk synchronizations (binlog and redo log writing). In comparison testing, try to put multiple statements into a transaction to ensure that only one transaction is committed.

2. When multiple statements are sent to the back end, each statement processed by the back end returns a response package to the client for interaction. Network IO interaction should be a major performance impact if multiple statements use a single transaction. The CPU usage of the NIC driver was found to be very high during the performance test.

Our goal is to get the second scenario down to less than a second by the time we update 1000 rows.

Optimization of the second scenario

Let’s explore how to improve the efficiency of updating a table with multiple actions with different values.

Simple analysis

In terms of the update statement itself, the table structure used in both scenarios has been simplified to the maximum extent possible. The UPDATE statement is very simple, and the WHERE condition is the primary key, so there is no room for optimization in theory.

If from other aspects, according to the above reasons, there will be three optimization ideas:

Reduce the parsing time of executing statements to improve execution efficiency. 2. Reduce the impact of commit operations on performance. 3. Combine multiple statements into one statement to improve execution efficiency

Solution 1: Use the prepare statement to reduce the parsing time

The following script is used to generate the prepare execution statement

#! /bin/bash echo "prepare pr1 from 'update test.t1 set c2=? where c1=? ';" > prepare.sql for i in {1.. 1000} do echo "set @a=$((RANDOM%1000+1)),@b=$i;" >>prepare.sql echo "execute pr1 using @a,@b;" >> prepare.sql done echo "deallocate prepare pr1;" >> prepare.sqlCopy the code

Generate the following statement

prepare pr1 from 'update test.t1 set c2=? where c1=? '; set @a=276,@b=1; execute pr1 using @a,@b; set @a=341,@b=2; execute pr1 using @a,@b; set @a=803,@b=3; execute pr1 using @a,@b; . . set @a=582,@b=1000; execute pr1 using @a,@b; deallocate prepare pr1;Copy the code

Execute the statement

#! /bin/bash start_time= 'date +%s%3N'/SSD/TMP /mysql/bin/mysql -h127.0.0.1 -uroot -p3316-pabc123 -e "use test; source /ssd/tmp/tmp/test/prepare.sql;" End_time = ` date + % s % 3 n ` echo "execution time is:" $(($end_time - $start_time)) "ms"Copy the code

Execution Result:

[root@computer-42 test]# bash prepare_update_id.sh mysql: [Warning] Using a password on the command line interface can be insecure. The execution time is 4518msCopy the code

Compared with before optimization

Unfortunately, the total execution time has increased.

Here, I have a bit of a guess because the original update statement was split into two:

set @a=276,@b=1;
execute pr1 using @a,@b;
Copy the code

This increases the number of times the MySQL client communicates with the MySQL process, thus increasing the total elapsed time.

Because the prepare preprocessing statement can only be executed using user variables, the following execution statement will report an error

Mysql > execute pr1 using 210,5; ERROR 1064 (42000): You have an error in your SQL syntax; Check the manual that buta collection to your MySQL server version for the right syntax to use near '210,5' at line 1Copy the code

Unable to syntactically recombine the two statements, I used the following alternative execution

Execute the statement

#! /bin/bash start_time= 'date +%s%3N'/SSD/TMP /mysql/bin/mysql -h127.0.0.1 -uroot -p3316-pabc123 <<EOF use test; DROP PROCEDURE IF EXISTS pre_update; DELIMITER $$ CREATE PROCEDURE pre_update(IN beg INT, IN end INT) BEGIN prepare pr1 from 'update test.t1 set c2=? where c1=? '; WHILE beg <= end DO set @a=beg+1,@b=beg; execute pr1 using @a,@b; SET beg = beg+1; END WHILE; deallocate prepare pr1; END $$ DELIMITER ; Call pre_update (1100); EOF end_time = ` date + % s % 3 n ` echo "execution time is:" $(($end_time - $start_time)) "ms"Copy the code

Execution Result:

[root@computer-42 test]# bash prepare_update_id.sh mysql: [Warning] Using a password on the command line interface can be insecure. The execution time is 3862msCopy the code

Compared with before optimization:

The range of optimization is consistent with the theoretical expectation of prepare statement, but it is still not ideal.

Scheme 2: Execute multiple UPDATE statements in a transaction and commit once

The following script is used to generate 1000 lines of UPDATE statements in a transaction that update c2 values equal to or less than 1000 random numbers

#! /bin/bash echo "begin;" > update.sql for i in {1.. 1000} do echo "update t1 set c2=$((RANDOM%1000+1)) where c1=$i;" >> update.sql done echo "commit;" >> update.sqlCopy the code

Generate the following SQL statement

begin; update t1 set c2=279 where c1=1; update t1 set c2=425 where c1=2; update t1 set c2=72 where c1=3; update t1 set c2=599 where c1=4; update t1 set c2=161 where c1=5; . . update t1 set c2=775 where c1=1000; commit;Copy the code

The method of performing time statistics is as above

[root@computer-42 test]# bash update.sh mysql: [Warning] Using a password on the command line interface can be insecure. The execution time is 194msCopy the code

The execution time is 194ms=0.194 SEC

Compared with before optimization:

As you can see, multiple COMMIT operations have a significant impact on performance.

Solution 3: Combine multiple UPDATE statements using special SQL syntax

Merge multiple UPDATE statements Here we introduce an uncommon MySQL syntax:

1) Before optimization:

Update multi-line execution statements like “update XXX; update xxx; update xxx; . …”

2) After optimization:

Instead, the statement to be updated is first assembled into a view (result set table) and then updated with the result set table associated with the source table. This type of update has an implicit restriction of associating updates by primary key or unique index.

UPDATE t1 m, ( SELECT 1 AS c1, 2 AS c2 UNION ALL SELECT 2, 2 UNION ALL SELECT 3, 3 ... . UNION ALL SELECT n, 2 ) r SET m.c1 = r.c1, m.c2 = r.c2 WHERE m.c1 = r.c1;Copy the code

3) Concrete examples:

Create table t1(c1 int primary key,c2 int); # # # insert five rows of data insert into t1 values (1, 1), (2, 1), (3, 1), (4, 1), (5, 1); select * from t1; SELECT 1 AS c1, 2 AS c2 UNION ALL SELECT 2, 3 UNION ALL SELECT 3, 4 UNION ALL SELECT 4, 5 UNION ALL SELECT 5, 6 ) r SET m.c1 = r.c1, m.c2 = r.c2 WHERE m.c1 = r.c1; Select * from t1;Copy the code

Execution Result:

mysql> create table t1(c1 int primary key,c2 int); Query OK, 0 rows affected (0.03 SEC) mysql> insert into t1 values(1,1),(2,1),(3,1),(4,1),(5,1); Query OK, 5 rows affected (0.00 SEC) Records: 5 Duplicates: 0 Warnings: 0 mysql> select * from T1; +----+------+ | c1 | c2 | +----+------+ | 1 | 1 | | 2 | 1 | | 3 | 1 | | 4 | 1 | | 5 | 1 | +----+------+ 5 rows in set (0.00 SEC) mysql> update t1 m,(select 1 as c1,2 as c2 union all select 2,3 union all select 3,4 union all select 4,5 Union all select 5,6) r set m.c. =r.c.,m.c. =r.c. Query OK, 5 rows affected (0.01sec) rows matched: 5 Changed: 5 Warnings: 0 mysql> select * from t1; +----+------+ | c1 | c2 | +----+------+ | 1 | 2 | | 2 | 3 | | 3 | 4 | | 4 | 5 | | 5 | 6 | +----+------+ 5 rows in set (0.00 SEC)Copy the code

4) Further proof

Here, the author chooses to prove the correctness of the optimization method by observing the binlog generated by the execution of the statement.

The first is the unoptimized statement:

begin;
update t1 set c2=2 where c1=1;
update t1 set c2=3 where c1=2;
update t1 set c2=4 where c1=3;
update t1 set c2=5 where c1=4;
update t1 set c2=6 where c1=5;
commit;
Copy the code
. ### UPDATE `test`.`t1` ### WHERE ### @1=1 /* INT meta=0 nullable=0 is_null=0 */ ### @2=1 /* INT meta=0 nullable=1 is_null=0 */ ### SET ### @1=1 /* INT meta=0 nullable=0 is_null=0 */ ### @2=2 /* INT meta=0 nullable=1 is_null=0 */ . ### UPDATE `test`.`t1` ### WHERE ### @1=2 /* INT meta=0 nullable=0 is_null=0 */ ### @2=1 /* INT meta=0 nullable=1 is_null=0 */ ### SET ### @1=2 /* INT meta=0 nullable=0 is_null=0 */ ### @2=3 /* INT meta=0 nullable=1 is_null=0 */ . ### UPDATE `test`.`t1` ### WHERE ### @1=3 /* INT meta=0 nullable=0 is_null=0 */ ### @2=1 /* INT meta=0 nullable=1 is_null=0 */ ### SET ### @1=3 /* INT meta=0 nullable=0 is_null=0 */ ### @2=4 /* INT meta=0 nullable=1 is_null=0 */ . ### UPDATE `test`.`t1` ### WHERE ### @1=4 /* INT meta=0 nullable=0 is_null=0 */ ### @2=1 /* INT meta=0 nullable=1 is_null=0 */ ### SET ### @1=4 /* INT meta=0 nullable=0 is_null=0 */ ### @2=5 /* INT meta=0 nullable=1 is_null=0 */ . ### UPDATE `test`.`t1` ### WHERE ### @1=5 /* INT meta=0 nullable=0 is_null=0 */ ### @2=1 /* INT meta=0 nullable=1 is_null=0 */ ### SET ### @1=5 /* INT meta=0 nullable=0 is_null=0 */ ### @2=6 /* INT meta=0 nullable=1 is_null=0 */ .Copy the code

Then there is the optimized statement:

UPDATE t1 m, (
  SELECT 1 AS c1, 2 AS c2
  UNION ALL
  SELECT 2, 3
  UNION ALL
  SELECT 3, 4
  UNION ALL
  SELECT 4, 5
  UNION ALL
  SELECT 5, 6
 ) r
SET m.c1 = r.c1, m.c2 = r.c2
WHERE m.c1 = r.c1;
Copy the code
### UPDATE `test`.`t1` ### WHERE ### @1=1 /* INT meta=0 nullable=0 is_null=0 */ ### @2=1 /* INT meta=0 nullable=1 is_null=0 */ ### SET ### @1=1 /* INT meta=0 nullable=0 is_null=0 */ ### @2=2 /* INT meta=0 nullable=1 is_null=0 */ ### UPDATE `test`.`t1` ### WHERE ### @1=2 /* INT meta=0 nullable=0 is_null=0 */ ### @2=1 /* INT meta=0 nullable=1 is_null=0 */ ### SET ### @1=2 /* INT meta=0 nullable=0 is_null=0 */ ### @2=3 /* INT meta=0 nullable=1 is_null=0 */ ### UPDATE `test`.`t1` ### WHERE ### @1=3 /* INT meta=0 nullable=0 is_null=0 */ ### @2=1 /* INT meta=0 nullable=1 is_null=0 */ ### SET ### @1=3 /* INT meta=0 nullable=0 is_null=0 */ ### @2=4 /* INT meta=0 nullable=1 is_null=0 */ ### UPDATE `test`.`t1`  ### WHERE ### @1=4 /* INT meta=0 nullable=0 is_null=0 */ ### @2=1 /* INT meta=0 nullable=1 is_null=0 */ ### SET ### @1=4 /* INT meta=0 nullable=0 is_null=0 */ ### @2=5 /* INT meta=0 nullable=1 is_null=0 */ ### UPDATE `test`.`t1` ### WHERE ### @1=5 /* INT meta=0 nullable=0 is_null=0 */ ### @2=1 /* INT meta=0 nullable=1 is_null=0 */ ### SET ### @1=5 /* INT meta=0 nullable=0 is_null=0 */ ### @2=6 /* INT meta=0 nullable=1 is_null=0 */Copy the code

As you can see, the SQL statements recorded in the binlog are the same before and after optimization. This also shows that our optimized statement is equivalent to the original execution statement.

5) Analysis from the perspective of grammar

SELECT 1 AS c1, 2 AS c2 UNION ALL SELECT 2, 3 UNION ALL SELECT 3, 4 UNION ALL SELECT 4, 5 UNION ALL SELECT 5, 6) r -- SET m.cl1 = r.cl1, m.cl2 = r.cl2 WHERE m.cl1 = r.cl1Copy the code

Select * from temporary table r;

Mysql > select 1 as c1,2 as c2 union all select 2,3 union all select 3,4 union all select 4,5 union all select 5,6; + - + - + | c1 | c2 + - + - + | | | 1 | 2 | 2 | 3 | | | 3 | 4 4 5 | | | | | | 6 + - + - 5 + 5 rows in the set (0.00) sec)Copy the code

And you can see that this is the part of the data that we want to update, after the update. Join update t1 table and r table, we can update the corresponding part of t1 table to what we want, complete the operation of using one statement to complete the multi-row update.

6) Look at the execution plan

The following is the explain execution plan, using nested loop join, the outer loop table T1 as M filters out 5 pieces of data according to the condition M.c1 = R.C1, and the inner loop table R needs to be scanned once for every updated row of data, circulating for 5 times in total:

If you look at the execution plan, it doesn’t look like this statement is very efficient, so let’s actually execute it.

7) Practice test

The following script is used to generate an optimized UPDATE statement that updates the value of C2 to a random number up to 1000

#! /bin/bash echo "update t1 as m,(select 1 as c1,2 as c2 ">> update-union-all. SQL for j in {2.. 1000} do echo "union all select $j,$((RANDOM%1000+1))" >> update-union-all.sql done echo ") as r set m.c2=r.c2 where m.c1=r.c1" >> update-union-all.sqlCopy the code

Generate the following SQL statement

Update T1 as M,(select 1 as c1,2 as c2 union all select 2,644 union all select 3,322 union all select 4,660 union all Select 5,857 union all select 6,752... . Union all select 999,225 union all select 1000,77) as r set m.c. =r.c. Where m.c. =r.cCopy the code

Execute the statement

#! /bin/bash start_time= 'date +%s%3N'/SSD/TMP /mysql/bin/mysql -h127.0.0.1 -uroot -p3316-pabc123 -e \ "use test; source /ssd/tmp/tmp/1000/update-union-all.sql;" End_time = ` date + % s % 3 n ` echo "execution time is:" $(($end_time - $start_time)) "ms"Copy the code

Execution Result:

[root@computer-42 test]# bash update-union-all.sh mysql: [Warning] Using a password on the command line interface can be insecure. The execution time is 58msCopy the code

Compared with before optimization:

The comparison results of multiple tests are as follows:

conclusion

According to the above theoretical analysis and practical verification, we find an optimization method for batch update scenarios.

Enjoy GreatSQL 🙂

Article recommendation:

Technology sharing | MGR Best practices (MGR Best Practice) mp.weixin.qq.com/s/66u5K7a9u…

Share | wanli database MGR Bug fixes mp.weixin.qq.com/s/IavpeP93h road…

Macos system compile percona and some functions on Macos systems operation difference mp.weixin.qq.com/s/jAbwicbRc…

Technology sharing | use systemd manage MySQL standalone multi-instance mp.weixin.qq.com/s/iJjXwd0z1…

Products | GreatSQL, create a better ecological mp.weixin.qq.com/s/ByAjPOwHI MGR…

Products | GreatSQL MGR optimization reference mp.weixin.qq.com/s/5mL_ERRIj…

About GreatSQL

GreatSQL is a MySQL branch maintained by Wanli Database, which focuses on improving the reliability and performance of MGR. It supports InnoDB parallel query feature and is a MySQL branch version suitable for financial applications.

Gitee: gitee.com/GreatSQL/Gr…

Making: github.com/GreatSQL/Gr…

Wechat &QQ Group:

You can search to add GreatSQL Community Assistant wechat friends, and send the verification message “Add group” to join the GreatSQL/MGR communication wechat group

QQ group: 533341697 wechat assistant: WanliDBC

This article is published by OpenWrite!