GreatSQL feature, InnoDB parallel query optimization test

GreatSQL will soon be open source with two new features: InnoDB transaction lock optimization and InnoDB engine parallel query optimization, which is a combination of Patch contributed by Huawei Cloud.

How is InnoDB parallel query optimization implemented?

According to the characteristics of the B+ tree, the B+ tree can be divided into several subtrees, in which multiple threads can scan different parts of the same InnoDB table in parallel. The execution plan of each sub-thread is consistent with the original execution plan of MySQL, but each sub-thread only needs to scan part of the data of the table, and then summarize the results after the sub-thread completes the scanning. Multi-thread transformation can make full use of multi-core resources and improve query performance.

After optimization, it has excellent performance in TPC-H test, with a maximum improvement of 30 times and an average improvement of 15 times.

This feature applies to SAP and financial statistics services such as periodic data summary reports, for example, batch running at the beginning and end of the month.

Restrictions on use:

Subquery is not supported at present, but can be converted to JOIN.
For now, only ARM architecture platform is supported, and the optimization of X86 architecture platform will be completed as soon as possible.

About the Patch for details, please see: support.huaweicloud.com/fg-kunpengd…

In this paper, the parallel query optimization feature of InnoDB engine is compared and tested.

1. Test environment

Server: Shenzhou Kuntai R222, Huawei Hi1616 * 2,256 GB memory. Operating system: Docker 20.10.2, CentOS Linux release 7.9.2009, Linux 4.15.0-29-generic In this test, TPC-H and DBGEN were used to construct the test data parameter DBGEN-VF-S 50. After importing, the physical size of the database was about 70G. GreatSQL Key configuration:

Innodb_buffer_pool_size =96G #InnoDB parallel query optimization #global level, set parallel query switch Bool, on/off. The default value is off. The parallel query feature is disabled. It can be dynamically modified online. The force_parallel_execute = ON #global level sets the total number of parallel query threads in the system. Valid values range from 0 to ULONG_MAX, and the default is 64. Parallel_max_threads = 64 #global level, the maximum amount of memory used by the leader and worker threads for parallel execution. The range of valid values is (0, ULONG_MAX) and the default is 1G parallel_memory_limit = 32GCopy the code

2. Test data

During testing, be careful to ensure that each query is based on pure memory scenarios, that is, ensure that innodb_BUFFer_POOL_size is greater than the database physical size, and ensure that no additional physical I/O occurs during the query.

Add the value of the temptable_max_ram option to create a temporary table. The default value of this option is 1 GB. Given the amount of data tested above, it needs to be increased to 4 GB. If The value of this option is not enough, you may run an error such as The table ‘/ TMP /# SQL57_a1_0 ‘is full and exit The query.

The InnoDB parallel query feature can be easily used with the HINT syntax. First check that the feature is enabled (it can be dynamically opened online) :

$ mysqladmin var|grep force_parallel_execute
| force_parallel_execute                                   | ON
Copy the code

So by default, all SQL can automatically use parallel query as long as it meets the conditions, by checking the execution plan to confirm:

mysql> EXPLAIN SELECT ... FROM ... WHERE ... . Parallel execute (4 workers) ...Copy the code

You can see that the execution plan output contains the Parallel execute (4 workers) keyword, which means that up to four threads can be queried in Parallel.

You can also view the tree execution plan:

mysql> EXPLAIN FORMAT=TREE SELECT ... FROM ... WHERE ... . | -> Limit: 1 row(s) -> Sort: lineitem.l_returnflag, lineitem.l_linestatus, limit input to 1 row(s) per chunk -> Table scan on <temporary> -> Aggregate using temporary table -> Parallel scan on <temporary> -> Sort: lineitem.l_returnflag, lineitem.l_linestatus -> Table scan on <temporary> -> Aggregate using temporary table -> Filter: (lineitem.l_shipdate <= <cache>((DATE'1998-12-01' - interval '88' day)) (cost=6342898.28 rows=19669815) -> PQblock scan On lineitem (cost=6342898.28 rows=59015354)...Copy the code

PQblock scan on… Cost =6342898.28, which is one of the conditions for enabling parallel queries. That is, cost exceeds the threshold switch set by parallel_cost_threshold = 1000.

If you do not want to enable parallel query, add the corresponding HINT:

mysql> SELECT /*+ NO_PQ */ ... FROM ... WHERE ...
Copy the code

You can also dynamically adjust the number of parallel threads to a maximum of 64:

mysql> SELECT /*+ PQ(64) */ ... FROM ... WHERE ...
Copy the code

Ok, let’s go straight to the result comparison data:

TPCH	Parallel scan (default) Time (seconds)	Parallel scanning (after parameter optimization) Time (seconds)	Before optimization Time (seconds)	Parallel scanning vs improvement before optimization	Parallel scan optimization improves before and after
Q1	616.407015	43.688772	1396.791060	31.971	14.109
Q3	139.579648	24.343778	330.946837	13.595	5.734
Q5	343.604734	30.501792	338.576433	11.100	11.265
Q6	248.830780	20.128220	233.490352	11.600	12.362
Q10	155.077042	41.948881	263.921069	6.291	3.697
Q12	325.281718	24.850585	582.405888	23.436	13.089
Q19	17.475904	5.296522	42.447522	8.014	3.300

Here’s a quick summary of the results:

1. The average increase is about 14 times, and the highest increase is about 32 times.
2. If the concurrency is higher, the optimization effect is better.
3. The original SQL performance of Q5 is not improved much, but the performance of Q5 is improved significantly (from 28% to 11 times) after the JOIN order is adjusted.

The full text.

Enjoy GreatSQL 🙂

read

GreatSQL officially open source for financial grade applications
MGR Best Practice
Wanli Database MGR Bug repair road
MySQL high availability automatic switch MGR
Install GreatSQL/MySQL on Linux
GreatSQL, create a better MGR ecology

Scan code to add GreatSQL Community Assistant wechat friends

Send “Add group” to join GreatSQL/MGR exchange wechat group \

GreatSQL feature, InnoDB parallel query optimization test

1. Test environment

2. Test data

read

Related Posts

List of commonly used Atom plug-ins (2020.07.30 continuously updated…)

Overview of Spring Security (1

Cs107 Programming Paradigm (12)