Author: Rao Longhui Huawei cloud database kernel expert

Huawei cloud database team recently launched the expert technical interpretation series, which is based on GaussDB products and provides in-depth analysis on one core technology point in each phase. In the last issue, Ms. Lu Manyi, chief scientist of Database Lab of Huawei Swedish Institute, interpreted “How Powerful is the Vertical Integration of GaussDB(for MySQL) Cloud Stack?” In this issue, Rao Longhui, kernel expert of Huawei cloud database, will explain in detail the key feature of GaussDB(for MySQL) called “PQ: Parallel Query” introduced in Lv Manyi’s article.

In general, there are two very broad application scenarios for databases, namely “transaction processing” and “query analysis”. As one of the most used relational databases in China, open source MySQL began to support transaction processing scenarios as early as version 5.1. As the technology evolves, open source MySQL has continued to enhance its transaction processing capabilities in the iteration process, and the latest MySQL 8.0 version has started to support many query analysis features, such as Window Function, CTE, Hash Join, column histogram, parallel COUNT calculation, and so on.

Huawei cloud GaussDB(for MySQL) is huawei’s latest generation of high-performance enterprise-level distributed relational databases. It is rooted in the MySQL open source community. It not only inherits all open source query analysis features, but also develops many enhanced features for enterprise-level application query load, such as parallel query and query computing push-down (NDP: Near Data Processing, etc.

In traditional data solutions, “transaction” and “query analysis” are performed on different types of libraries, because transaction processing focuses on data addition, deletion, modification, and small data query, with more emphasis on real-time response, high throughput, and transactional requirements. “Query analysis” focuses on complex computation with large data volume, and pays more attention to data capacity expansion and complex computing capability. The following figure shows a traditional data solution: AN OLTP database for online transaction processing and an OLAP data warehouse for analytical queries.

What kind of database does the customer need?

At present, the industry’s universal requirements for database are: real-time response, high throughput, transactional, good scalability and support for complex queries. This has a number of advantages for consumers: first, it reduces the deployment cost. Users only need to deploy a set of database clusters; Secondly, the problem of data migration delay is solved. Update and query run in the same cluster. There is also support for transactional query, to meet the requirements of consistency, principle, isolation and persistence, query results will be more accurate.

Huawei cloud native database GaussDB(for MySQL) is capable of transaction processing and query analysis through technological innovation. GaussDB(for MySQL) is an OLTP transaction relational database that is 100% compatible with MySQL syntax. In the architecture, computing and storage resources are separated. Computing resources and storage resources can be independently expanded online. The storage layer uses a huawei-developed distributed storage system DFV (data virtualization: Data Function Virtualisation), up to 128TB capacity, a single compute node can support a maximum of 64 core cpus, 512 gb memory capacity, and a maximum of 1 write node and 15 read-only nodes.

In addition to powerful resource expansion capabilities, GaussDB(for MySQL) has been optimized for transaction processing and query analysis to greatly enhance transaction read/write and query analysis capabilities. This article focuses on one of the enhanced query analysis capabilities: parallel queries.

Improve query analysis: Release CPU multi-core computing resources

As is known to all, the improvement of software computing capability on the one hand benefits from the enhancement of CPU hardware capacity, on the other hand also benefits from the software design level to make full use of CPU computing resources. Currently, cpus use multiple cores. For example, GaussDB(for MySQL) supports a maximum of 64 cores. Single-threaded query can use only one CPU core at most, but the performance improvement is limited, which is far from meeting the requirements of reducing latency in the scenario of large data volume query. Therefore, the complex query analytical computing process must take full advantage of the CPU’s multi-core computing resources and allow multiple cores to participate in the parallel computing task to greatly improve the processing efficiency of query computing.

The following is an example of the process of computing count(*) of a table in parallel using multiple CPU cores: the table data is sliced and distributed to multiple cores for parallel computing. Each core computes part of the data to obtain an intermediate count(*) result, and aggregates all the intermediate results to obtain the final result.

GaussDB(for MySQL) Query analysis promotion: Parallel query

GaussDB(for MySQL) supports parallel query, which reduces the processing time in analytical query scenarios and meets the requirements for low query latency in enterprise-level applications. As mentioned above, the basic implementation principle of parallel query is to divide query tasks and distribute them to multiple CPU cores for calculation, and make full use of CPU multi-core computing resources to shorten the query time. The performance improvement multiple of parallel query is positively correlated with the number of CPU cores in theory, that is, the higher the parallelism, the more CPU cores can be used and the higher the performance improvement multiple is.

The following figure shows the query duration for COUNT(*) of 100 gb data on the 64U instance of GaussDB(for MySQL). The query duration varies with the concurrency. The higher the concurrency, the shorter the query duration.

GaussDB(for MySQL) supports multiple types of parallel query operators to meet different complex query scenarios. The latest version (2021-9) supports the following parallel query scenarios:

  • Primary key query, secondary index query
  • Primary key scan, index scan, range scan, index equivalent query, index reverse query
  • Parallel condition filtering (WHERE/HAVING), projection calculation
  • Parallel multi-table JOIN (including HashJoin, NestLoopJoin, SemiJoin, etc.) query
  • Parallel aggregation function calculation, including SUM/AVG/COUNT/BIT_AND/BIT_OR/BIT_XOR, etc
  • Parallel expression operation, including arithmetic operation, logic operation, general function operation and mixed operation
  • Parallel group by, sort order BY, limit/offset, distinct operation
  • Parallel UNION, subquery, view query
  • Parallel partitioned table query
  • Data types supported by parallel queries include: integer, character, time, floating point, and so on
  • Other queries

The following figure shows the performance test result of the parallel query of GaussDB(for MySQL) for 22 TPC-H queries. The test data amount is 100 GB, and the concurrent thread data is 32. The following figure shows the performance improvement of a parallel query compared to a traditional MySQL single-thread query: under 32 parallel execution, the performance of a complex query with a single table is up to 26 times better, and the performance of a general query is 20+ times better. Multi-table JOIN complex queries can improve performance by up to 27 times and generally by 10+ times. Sub-query performance is also greatly improved.

In a word, the parallel query of GaussDB(for MySQL) fully invokes CPU multi-core computing resources, greatly reducing the processing time of analytical query scenarios, greatly improving database performance, and meeting customers’ requirements for low latency in multiple complex query scenarios. At present, the parallel query function of GaussDB(for MySQL) is available on the whole network. More parallel query scenarios are being unlocked. Please visit huawei cloud official website to experience them.

www.huaweicloud.com/product/gau…

Ps: Didn’t you enjoy it? Next time, another expert will explain GaussDB(for MySQL) “parallel DDL” technology, more exciting, please look forward to!

Huawei will hold Huawei Connect 2021 at Shanghai World Expo Center & Expo Convention & Exhibition Center from September 23 to 25, 2021. With the theme of “Deep Digitization”, huawei will gather industry thought leaders, business elites, technology giants, pioneer enterprises, ecological partners, application service providers and developers to discuss how to deepen the industry scene. Combine digital technology with industry knowledge deeply, truly integrate into the main business process of government and enterprise, solve core business problems, promote experience improvement, efficiency improvement and model innovation; We also release scenario-oriented products and solutions, share the latest achievements and practices of customers and partners, and build an open and win-win healthy ecology.

For more information, please visit our official website **www.huawei.com/hc2021**