Shu Lin, more than 20 years of database kernel development experience. Former IBMDB2 database kernel expert, specializing in database kernel performance optimization, SQL query optimization, MPP distributed data warehouse technology, etc. Now I am working for Huawei Research Institute in Canada. I am fully involved in the research and development of RDS for MySQL and GaussDB(for MySQL), and familiar with GaussDB(for MySQL) full stack technology. Responsible for the overall architecture design and implementation of NDP, and successfully launched it. He has many technical invention patents and co-authored SIGMOD 2020 Taurus(GaussDB(for MySQL)) Paper. He is currently focusing on the research of intelligent optimizer for next generation cloud database.

Service growth poses new challenges to database throughput and responsiveness

Demand for cloud-based database as a Service (DBaaS) products is also growing rapidly as businesses and government agencies migrate their applications to the cloud. In traditional DBaaS products, cloud service providers deploy regular databases on cloud VMS based on the existing database software and use local or cloud storage. This approach is easy to implement, but does not provide sufficient performance and scalability, and storage costs are high due to the need to replicate data.

To address these challenges, cloud service providers are beginning to build new cloud-native relational database systems, designed specifically for cloud infrastructure, often with designs that separate computing and storage into independently extended distributed layers. This approach has several advantages, including automatic scaling of database storage, pay-per-use capabilities, high reliability for deployment across multiple AZs, and fast failover and recovery from failures. These cloud native designs also help reduce data update latency for read-only copies and improve hardware sharing and scalability. Huawei cloud database GaussDB(for MySQL) is a cloud native distributed database with the above advantages.

As computing and storage nodes communicate over networks, network bandwidth and latency are often bottlenecks. To overcome this challenge, GaussDB(for MySQL) optimizes network traffic associated with writes by removing page writes from database nodes and pushing checkpoint operations down to storage nodes. GaussDB(for MySQL) database nodes send REDO logs to storage nodes instead of data pages. Because REDO logs (which record changes to data pages) are typically much smaller than the modified data pages, this approach reduces network traffic. A storage node (also known as a page store) builds database pages from REDO logs and returns pages to the database node in response to requests from the database node.

In a traditional database, the SQL execution engine fetches data from the store and performs steps including projection, predicate calculation, and aggregation. For analysis queries that often involve large table scans, the SQL execution engine must read a large number of data pages from the store. When storage nodes are separated from computing nodes and communicate over networks, large table scanning translates into increased network traffic. Is a typical example of a very large table to count query, query object table all pages must be from the Page (Page stores) sent to the storage pool to count the database node, after the database node will discard most of these pages, because the buffer pool can’t load so much data Page, this is a waste of bandwidth resources. Huawei Cloud’s innovative Near Data Processing (NDP) solution solves this problem.

GaussDB(for MySQL) NDP details

NDP is designed to avoid moving data around a distributed system and let data processing take place where it is stored. In cloud native databases, storage nodes typically consist of a large number of powerful servers, and CPU resource utilization on these storage nodes is often low, which presents an excellent opportunity for near data processing (NDP).

The NDP function of GaussDB(for MySQL) pushes selected SQL operations down to the page store. The page store filters out unnecessary data and returns only a subset of matched data to the database node for further processing. For example, to process count queries, the page data store can count rows and return the count instead of the actual data page to the database node. This avoids a lot of network traffic and improves query response times using this technique.

GaussDB(for MySQL) can push three SQL operations to page storage: column projection, predicate evaluation, and aggregation.

  • Column projection: Page storage reduces the length of rows by keeping only the columns needed for the query and discarding the rest.

  • Predicate evaluation: The page data store retains only the rows that satisfy the predicate and discards those that do not.

  • Aggregation: The page aggregates multiple rows into a single row as required by the aggregation function in the query, discarding the original rows.

These three SQL operations can appear in any combination in the NDP. For example, an NDP operation might contain only column projections, or it might contain all three SQL operations. Let’s look at a sample SQL query:

Sele ctsum(salary) from worker where age< 40 and join_date>= date ‘2010-01-01’ and join_date< date ‘2010-01-01’ + The interval is 1 year

For each row in the “worker” table, The page stores the calculation predicate “AGE < 40 and JOIN_DATE >= date ‘2010-01-01’ and Join_date < date ‘2010-01-01’ + interval ‘1’ year”. If the row does not satisfy the predicate, it is immediately discarded. If the row satisfies the predicate, it is aggregated into the sum(salary) value and the original row is discarded. If the page data store cannot aggregate rows (due to some internal processing requirements), it can still project three columns (salary, age, and Join_date) from the row and generate narrower rows. After that, the original row is discarded. Finally, a significantly reduced dataset is returned to the database node.

The following figure shows the NDP architecture of GaussDB (for MySQL). Database nodes send NDP requests to the page store (note that there are usually multiple page stores serving each database node). To reduce IOPS (IO per second), multiple pages are grouped into one NDP request (batch page read request). The NDP operator in the page store can perform the above three SQL operations. The smaller dataset is returned to the database node. The database node can be either a master node or a read-only copy node, both of which support NDP.

Batch reading and parallel processing in NDP

In a cloud native database system, even though database nodes and page storage are connected over a high-speed RDMA network, latency is still very high compared to local storage in a traditional database. The negative impact of latency can be reduced by reducing network IOPS and executing multiple IO in parallel. In NDP functionality, we implement the concept of “batch reads”. The idea is to look ahead in the B+ leaf data nodes that are needed for queries that are doing near-data processing and group adjacent leaf data nodes into a batch request. Batch reads are an excellent way to reduce IOPS. If we send one page per request, the number of IO will equal the number of pages. If we group 1000 pages into a single request, the number of IO is reduced by a factor of 1000.

The following figure illustrates how batch reads work. The database node sends batch requests, SAL identifies the page store where the page resides, and splits the batch read into multiple sub-reads: one sub-read is stored per page. The child reads are then sent in parallel to the page store. With this approach, multiple page stores can be used simultaneously to service NDP requests.

The page store receives NDP requests containing multiple pages that have no dependencies between them and can therefore be processed in any order using NDP. This provides flexibility and enables the page store to allocate pages to multiple threads for parallel processing.

GaussDB(for MySQL) uses the enhanced SQL optimizer to automatically determine whether NDP is likely to be beneficial to a particular query. If advantageous, it automatically enables NDP, the SQL optimizer looks at factors such as scan size, and whether SQL operations can significantly reduce the data set size if pushed to page storage. In general, NDP is not conducive to small scans, for example, when indexes can be used to reduce the amount of data to be scanned.

At the same time, the NDP has its own resource demands. In the database node, NDP is primarily a memory resource because it needs memory to hold NDP pages. In GaussDB(for MySQL) database nodes, NDP pages share the same memory pool (aka buffer pool) as regular pages, with no memory reserved exclusively for NDP. The advantage of this approach is that the entire buffer pool can be used for normal processing when there is no NDP in the system. However, once the page memory is occupied by the NDP operation, it cannot be used by other queries until the NDP operation is complete. This is why it is important to control the number of NDP pages allocated to avoid regular pages being stripped of memory.

NDPQ(NDP+PQ), release query extreme performance, define a new direction of distributed database

Parallel Query (PQ) is the de facto standard for commercial relational database systems, providing high-performance support for analysis workloads. PQ usually adopts a “leader-worker” design, in which the table to be processed is divided into non-overlapping data blocks and these data blocks are distributed among multiple workers for processing. Each worker will produce intermediate results, and the leader will accumulate these results and further process them to produce the final results. PQ provides parallelism in the database node, using multiple cpus to process queries concurrently. Huawei Cloud GaussDB(for MySQL) has the PQ feature. NDP and PQ work together to further improve query performance. NDP can be enabled for PQ workers. Some SQL operations performed by PQ workers can be pushed to the page storage area. By combining NDP and PQ, we activate the magic of parallel processing in the three layers of database nodes of GaussDB system, between multiple page storage and within a page storage to further improve query performance.

How do I enable NDP?

GaussDB(for MySQL) automatically determines whether NDP helps query and enables NDP for query. All the user needs to do is turn on the system variable “ndP_mode”. Ndp_mode can be turned on for the entire database or just for the current session. To enable ndP_mode for the entire database, add the “global” keyword to the “set” command, as follows:

set[global] ndp_mode = on

You can use the Explain query to see if NDP is enabled for the query. For example, here is the interpreted output of tPC-H query 14 in tree format. NDP is enabled for LINEITEM table scans, and projections and predicate calculations are pushed to the data page store. In addition, PQ is enabled for LINEITEM table scanning.

Here is another example, a count query on the LINEITEM table, which we will name Q002. Predicate calculations and aggregations are pushed to the page store, and PQ is enabled.

Let’s show how NDP and PQ can improve query efficiency by running multiple queries on a 100GB TPC-H database.

Test environment:

  • Huawei Cloud GuassDB (for MySQL) in Shanghai -1 region

  • CPU: 16, memory: 64GB, buffer pool size: 20GB

  • Increase join_buffer_size and sort_buffer_size to 1MB, as these two buffers are important for hash join and sort performance

  • PQ concurrency is set to 16

The Y-axis below shows the query response time acceleration factor. Definition of acceleration factor: If the original query time is 100 seconds and the query time changes to 50 seconds after PQ is enabled, the acceleration factor is 2.

As you can see from the test results below, the NDP+PQ accelerates the Q002 by over 100 times.

NDP decouples database nodes from storage nodes, a feature that will become a standard for future cloud native database systems. Large scans are common in OLAP workloads, and NDP will greatly improve the efficiency of such operations.

To sum up, NDP can:

  • Reduce network bandwidth usage

  • Reducing network IOPS

  • Concurrent use of multiple page data stores for NDP parallel processing

  • Improves performance for SQL queries that require large table scans

  • Reduce the CPU utilization of the database node to enable the database node to support more OLTP workloads