Abstract: with the development of information technology, human beings enter the era of big data, the data quantity of explosive growth, financial sector data carrying core business, even with all kinds of hardware and software errors or disaster, also need to be recovered and rapid recovery ability, therefore the backup recovery ability become one of the most key ability of several positions.

This article is shared by a magician at the end of the century in huawei cloud community: Fault Tolerance for GaussDB(DWS) Services: Physical Fine-grained Backup and Recovery Technology.

1. Technical Overview

1.1 Value and main contents

With the development of information technology, human beings have entered the era of big data, and the amount of data shows explosive growth. Data in the financial field bears the core business. Even if it encounters various hardware and software errors or disasters, it is necessary to have the ability to recover and quickly recover the business. GaussDB(DWS) supports physical fine-grained backup and recovery capabilities. Users can back up the entire cluster or part of database elements, and flexibly restore single or multiple tables, effectively reducing the time window and storage space required by backup data, and focusing on the backup and recovery of key tables in user service scenarios.

Physical fine-grained backup and recovery supports the following scenarios:

1. Centralized recovery of single/multiple tables from fine-grained cluster-level full backup;

2. Back up the full data of the specified schema and restore single/multiple tables from the backup set

2. Principle of technical solution

2.1 NBU Backup and Restoration Scheme

Roach is a GaussDB(DWS) database backup tool that supports various backup and recovery types and solutions. For the Roach generic architecture, each cluster node has a Roach Agent process that backs up its data. The first node has an additional Roach Master process for distributed cluster backup. Roach provides a non-invasive backup to NBU solution, and physical fine-grained backup and recovery is also based on this framework, so this is explained first. The Roach Client plug-in is deployed on the NBU Media Server to receive backup data sent by the Roach Agent and forward the data to the NBU Server.

The NBU cluster deployment mode is as follows:

Figure 1 NBU cluster deployment mode

Architecture of non-invasive NBU backup solution:

Figure 2 NBU non-invasive deployment scheme

As shown in the preceding figure (three-node GaussDB(DWS) cluster), NBU backup data flows are as follows:

1. The Roach Agent transmits compressed data to the Roach Client in fragments.

2. The Roach Agent invokes the XBSA interface of the NBU client to request NBU backup.

3. The NBU client forwards the request to the NBU Master Server.

4. The NBU master server allocates storage resources to the NBU media server.

5. Roachclient invokes xbSA interface to transfer backup data to NBU MediaServer.

6. MediaServer stores backup data to mounted tape drives or disks

2.2 Fine-grained meta-information generation scheme

In order to recover single or multiple tables from the backup set in a fine-grained manner, the metadata DDL of all the schemas and tables in the database needs to be obtained, persisted and backed up to the media. It should be noted that DDL export takes a long time, so DDL export backup and data backup are used in the design to improve performance: The design idea of Roach obtaining DDL is as follows

Figure 3 Metadata acquisition scheme of physical fine-grained backup and recovery

During the backup, to support fine-grained recovery, you need to obtain and back up a mapping map, which is obtained layer by layer based on database elements. The map includes:

Agent –> Instance –> Database –> Schema –> Table –> Related Relations

In the actual execution, meta mapping information will be queried and stored for each physical node and instance in parallel. The design logic is as follows:

Figure 4 Obtaining the MAP of fine-grained backup and recovery files

2.3 Production-consumer model of data backup

The fine-grained full backup of GaussDB(DWS) is implemented in the Roach backup tool. Roach performs full cluster backup. The Roach interacts with the GaussDB(DWS)Kernel to backup row storage data files and WAL files in sequence after pg_start_backup() is executed, and runs pg_stop_backup() to backup column storage data files. This series of backup processes ensure that data falls off disks and is backed up to media paths managed by NBU in an orderly manner with complete transaction guarantees.

In the process of fine-grained data backup, the list of physical files to be backed up is firstly sorted out by the MAP queried from the system table in 2.2, which is specific to each library, schema and table, layer by layer. Finally, the minimum set of all associated files dependent on a table is obtained, and the minimum logical granularity of backup blocks created is schema. Tables in the same schema are consecutively written to the same logical block and physically cut by segment configuration (usually 4G). The specific backup data writing medium relies on the production-consumer model as follows. The data files under the instance are read by block by the data writer thread (producer) and written into buffer after compression. The data sending thread (consumer) obtains data blocks from buffer and calls the STANDARD API interface of XBSA. Data is written to the Media layer in a streaming manner. The NBU Master allocates the Media Server and finally drops the disk. During data appending, if the segementsize of segment files exceeds the upper limit, backup files such as file_0. RCH and file_1. RCH are cut. The production-consumer model is shown below:

Figure 5 Backing up physical fine-grained data to the media producer-consumer model

2.4 Fine-grained recovery multi-table Solution

Currently, fine-grained recovery of multiple tables can be performed from a cluster-level full backup set or a Schema-level backup set. The core technologies of the two scenarios are the same. The scenarios are as follows:

  • You can restore one or more tables from a cluster-level full backup set at a time. The name list of the recovery tables is written into a configuration file. The configuration file name is specified by the recovery parameter – table-list.

  • Restore single or multiple tables in a cluster-level full backup set. Specify that the tables to be restored can span multiple schemas.

  • The new table can be in a different schema from the original table but in the same database. The new table name can be the new table name. The specified target schema may or may not exist, and is created during restoration. The restored table is configured in the file specified by -restoret-target-list. If you want to restore all the tables to the original table names, – table-list and – restore-target-list can specify the same configuration file.

  • If the specified target table exists (the original table name or the new table name), you can specify – Clean parameter drop cascade to delete the table (such as views, indexes, and permissions) and then restore the table. If this parameter is not set, you need to manually confirm and drop the table before restoring the table. This is mainly for backup and restore scenarios where the front table names are the same, but the table definitions are different.

  • Fine-grained data is restored online, and data is not cleared or the cluster is not stopped. After the recovery is complete, the table can be used directly without extra time consumption such as build.

The following figure shows the main scenario design for recovery:

Figure 6 Logical diagram of fine-grained online recovery of single or multiple tables

The recovery steps are described as follows:

Step1: After receiving the data request of the Roach Agent corresponding to each instance of each node, the RoachClient establishes a connection with the NBU medium, starts the List search and obtains the file, and sends it to the RoachClient.

Step2: RoachClient and Roach Agent of each node forward the data to be restored through TCP connection, including metadata and instance data, and the instance is stored in buffer after the instance is obtained.

Step3: Roach reads the information list of tables to be restored, constructs the filter filter for recovering the falling disk file, and restores only the backup file of the target table;

Step4: According to the restored DDL, parse to the meta information of the table to be restored, and create the intermediate temporary TMP table and the final target table according to the meta information;

Step5: create a meta information map based on the new TMP table and map it to the map information of the original backup table one by one to filter the file falling disk.

Step6: Exchange the backup data file relfilenode to the new TMP table relfilenode;

Step7: query TMP table data and insert data into the final target table.

3. Application test of fine-grained backup and recovery technology

3.1 Test Environment

3.2 Execution of fine-grained recovery use cases

Here are some examples of use case execution for typical scenarios:

  • Schema level backup, restore single/multiple tables

The verification point:

  • The specified schema is backed up successfully.

  • Restore multiple tables from the SCHEMA backup set to the target table.

  • The data structure

  • Perform Schema level backup:

python GPHOME/script/GaussRoach. Py – tbackup – – master – port9500 – – media – destinationnbu_policy – – media – typeNBU – – the metadata – destinationGP HOME/script/GaussRoach.py -t backup –master-port 9500–media-destination nbu\_policy –media-type NBU –metadata-destination GPHOME/script/GaussRoach. Py – tbackup – – master – port9500 – – media – destinationnbu_policy – – media – typeNBU – – the metadata – destinationGA USSHOME/roachbackup/metadata –physical-fine-grained –schemanamepublic –dbname test_tpch1 –parallel-process 3 –nbu-on-remote–nbu-media-list /home/omm/media.list –client-port 9200

  • Restore customer(column storage table) and public. Nation (row storage table) to liding11.customer11 and liding22.nation22 specified from this Schema backup set

  • Select * from table_name; select * from table_name;

python GPHOME/script/GaussRoach. Py – trestore – – master – port9500 – – media – destinationnbu_policy – – media – typeNBU – – the metadata – destinationG PHOME/script/GaussRoach.py -t restore–master-port 9500 –media-destination nbu\_policy –media-type NBU–metadata-destination GPHOME/script/GaussRoach. Py – trestore – – master – port9500 – – media – destinationnbu_policy – – media – typeNBU – – the metadata – destinationG AUSSHOME/roachbackup/metadata –physical-fine-grained–backup-key 20201226_101715 –dbname test_tpch1 –table-list/home/omm/table.list –parallel-process 3 –restore-target-list/home/omm/target.list –clean –nbu-on-remote –nbu-media-list/home/omm/media.list –client-port 9200

  • Data validation

4. Technical summary

This article mainly from the technical value, application scenarios, the technical principle, technical measurement shows several dimensions of GaussDB (DWS) analyzes the physical fine-grained backup recovery technique, can see physical fine-grained backup recovery is for the full amount of data backup to restore an effective enhancement, customers can plan in more flexible ways of cold and hot, Smaller logical granularity for backup or recovery saves valuable backup storage space and CPU resources and minimizes impact on online services. At the recovery level, unlike the old cluster-level full recovery, which requires cluster shutdown and data clearing, fine-grained online recovery does not affect any online services and does not cause data loss risks due to cluster clearing before recovery. Therefore, this technology has broad prospects and far-reaching significance.

For more information about GuassDB(DWS), please search “GaussDBDWS” on wechat to follow the wechat public account and share with you the latest and most complete PB series warehouse black technology ~

Click to follow, the first time to learn about Huawei cloud fresh technology ~