GreatSQL will soon be open source with two new features: InnoDB transaction lock optimization and InnoDB engine parallel query optimization, which is a combination of Patch contributed by Huawei Cloud.

How is InnoDB parallel query optimization implemented?

According to the characteristics of the B+ tree, the B+ tree can be divided into several subtrees, in which multiple threads can scan different parts of the same InnoDB table in parallel. The execution plan of each sub-thread is consistent with the original execution plan of MySQL, but each sub-thread only needs to scan part of the data of the table, and then summarize the results after the sub-thread completes the scanning. Multi-thread transformation can make full use of multi-core resources and improve query performance.

After optimization, it has excellent performance in TPC-H test, with a maximum improvement of 30 times and an average improvement of 15 times.

This feature applies to SAP and financial statistics services such as periodic data summary reports, for example, batch running at the beginning and end of the month.

Restrictions on use:

  • Subquery is not supported at present, but can be converted to JOIN.
  • For now, only ARM architecture platform is supported, and the optimization of X86 architecture platform will be completed as soon as possible.

About the Patch for details, please see: support.huaweicloud.com/fg-kunpengd…

In this paper, the parallel query optimization feature of InnoDB engine is compared and tested.

1. Test environment

Server: Shenzhou Kuntai R222, Huawei Hi1616 * 2,256 GB memory. Operating system: Docker 20.10.2, CentOS Linux release 7.9.2009, Linux 4.15.0-29-generic In this test, TPC-H and DBGEN were used to construct the test data parameter DBGEN-VF-S 50. After importing, the physical size of the database was about 70G. GreatSQL Key configuration:

Innodb_buffer_pool_size =96G #InnoDB parallel query optimization #global level, set parallel query switch Bool, on/off. The default value is off. The parallel query feature is disabled. It can be dynamically modified online. The force_parallel_execute = ON #global level sets the total number of parallel query threads in the system. Valid values range from 0 to ULONG_MAX, and the default is 64. Parallel_max_threads = 64 #global level, the maximum amount of memory used by the leader and worker threads for parallel execution. The range of valid values is (0, ULONG_MAX) and the default is 1G parallel_memory_limit = 32GCopy the code

2. Test data

During testing, be careful to ensure that each query is based on pure memory scenarios, that is, ensure that innodb_BUFFer_POOL_size is greater than the database physical size, and ensure that no additional physical I/O occurs during the query.

Add the value of the temptable_max_ram option to create a temporary table. The default value of this option is 1 GB. Given the amount of data tested above, it needs to be increased to 4 GB. If The value of this option is not enough, you may run an error such as The table ‘/ TMP /# SQL57_a1_0 ‘is full and exit The query.

The InnoDB parallel query feature can be easily used with the HINT syntax. First check that the feature is enabled (it can be dynamically opened online) :

$ mysqladmin var|grep force_parallel_execute
| force_parallel_execute                                   | ON
Copy the code

So by default, all SQL can automatically use parallel query as long as it meets the conditions, by checking the execution plan to confirm:

mysql> EXPLAIN SELECT ... FROM ... WHERE ... . Parallel execute (4 workers) ...Copy the code

You can see that the execution plan output contains the Parallel execute (4 workers) keyword, which means that up to four threads can be queried in Parallel.

You can also view the tree execution plan:

mysql> EXPLAIN FORMAT=TREE SELECT ... FROM ... WHERE ... . | -> Limit: 1 row(s) -> Sort: lineitem.l_returnflag, lineitem.l_linestatus, limit input to 1 row(s) per chunk -> Table scan on <temporary> -> Aggregate using temporary table -> Parallel scan on <temporary> -> Sort: lineitem.l_returnflag, lineitem.l_linestatus -> Table scan on <temporary> -> Aggregate using temporary table -> Filter: (lineitem.l_shipdate <= <cache>((DATE'1998-12-01' - interval '88' day)) (cost=6342898.28 rows=19669815) -> PQblock scan On lineitem (cost=6342898.28 rows=59015354)...Copy the code

PQblock scan on… Cost =6342898.28, which is one of the conditions for enabling parallel queries. That is, cost exceeds the threshold switch set by parallel_cost_threshold = 1000.

If you do not want to enable parallel query, add the corresponding HINT:

mysql> SELECT /*+ NO_PQ */ ... FROM ... WHERE ...
Copy the code

You can also dynamically adjust the number of parallel threads to a maximum of 64:

mysql> SELECT /*+ PQ(64) */ ... FROM ... WHERE ...
Copy the code

Ok, let’s go straight to the result comparison data:

TPCH Parallel scan (default)

Time (seconds)
Parallel scanning (after parameter optimization)

Time (seconds)
Before optimization

Time (seconds)
Parallel scanning vs improvement before optimization Parallel scan optimization improves before and after
Q1 616.407015 43.688772 1396.791060 31.971 14.109
Q3 139.579648 24.343778 330.946837 13.595 5.734
Q5 343.604734 30.501792 338.576433 11.100 11.265
Q6 248.830780 20.128220 233.490352 11.600 12.362
Q10 155.077042 41.948881 263.921069 6.291 3.697
Q12 325.281718 24.850585 582.405888 23.436 13.089
Q19 17.475904 5.296522 42.447522 8.014 3.300

Here’s a quick summary of the results:

  • 1. The average increase is about 14 times, and the highest increase is about 32 times.
  • 2. If the concurrency is higher, the optimization effect is better.
  • 3. The original SQL performance of Q5 is not improved much, but the performance of Q5 is improved significantly (from 28% to 11 times) after the JOIN order is adjusted.

The full text.

Enjoy GreatSQL 🙂

read

  • GreatSQL officially open source for financial grade applications
  • MGR Best Practice
  • Wanli Database MGR Bug repair road
  • MySQL high availability automatic switch MGR
  • Install GreatSQL/MySQL on Linux
  • GreatSQL, create a better MGR ecology

Scan code to add GreatSQL Community Assistant wechat friends

Send “Add group” to join GreatSQL/MGR exchange wechat group \