How does CloudQuery implement data import and export on the cloud

With the “data security law” landing, data security has become a hot topic. When we talk about data value within an organization, we obviously don’t mean data lying in static Excel spreadsheets. The concept of “big data” has emerged amid an exponential rise in data volumes and increasingly complex data structures. The appearance of massive data causes the capacity of database to undergo great test at any time, and the performance fluctuation also leads to more and more frequent backup, import, and export actions of DBA.

While the industry is embracing the era of cloud and big data, the tools we use every day are still in the PC era. When data is processed in batches, single-point tools cannot cope with the increasing data volume and more flexible operation scenarios. Therefore, CloudQuery in version 1.4.0 introduces the concept of “DTS”, or “Database Toolbox Service”, which aims to provide data administrators with various tools for data batch operation and greatly improve work efficiency. In this article we will elaborate on the first tool of CloudQuery “DTS”, “Data Import and Export”.

In the past, DBAs usually need to use existing desktop tools to implement data backup, data import and export, which can not only guarantee the success rate, but also difficult to achieve both in the selection of file format. If the data is exported in a data-friendly Excel format, it has high requirements on the amount of data. Once the amount of data exported exceeds the capacity of excel, the export operation will fail. If you choose SQL format, although the problem of data volume limitation is avoided, it is difficult to locate data, and special fields cannot be imported again.

In addition, the import and export tools on the market generally do not have the cloud data storage capability, and the exported files can only be stored on the local export machine. If you need to save or share data, you need to use other cloud disk tools. However, because the files are often large, uploading and downloading take a long time and are not flexible.

In addition, import and export is a high-risk action in an enterprise. Once database account information leakage or theft occurs, it will bring great threats to enterprise data security. It is also mentioned in the opening chapter that data is usually the foundation of an enterprise, and the leakage of internal data is often more deadly than external damage.

Therefore, CloudQuery evaluates the usage scenarios of data administrators and provides unique “import and export” functions based on actual service requirements. The main features are as follows:

① Export files to the cloud storage system

The platform provides the function of personal folder, each user belongs to their own cloud space. All exported data is kept on the server, enabling cross-client data synchronization and enabling operators to obtain remote data from different clients at any time without sharing data through various mobile hard disks or cloud disks.

② The tool permission is controlled

CloudQuery includes the tool permission into the platform permission control system. When an employee uses the import and export function, the administrator must grant the corresponding permission to the employee. Otherwise, the tool cannot be used. The inclusion of tool use permission into the platform permission control system implements the security principles of CloudQuery as a data control platform, incorporating all user actions on the platform into the management and control system to reduce the risk of data leakage as much as possible and prevent massive data leakage and dragging table events caused by account leakage or theft.

③ Cloud native form, more flexible operation mode

In the enterprise, DBAs are usually faced with various and complex data interaction scenarios, such as scheduled backup of data, import and export of data from production to test database, etc. However, due to the inflexible existence of existing desktop tools, DBAs will face numerous troubles when performing heretics recovery. The exported data provided by CloudQuery removes the strong coupling between the data operator and the terminal and improves the data control within the enterprise. The exported files are only stored in the user’s personal folder, preventing data leakage caused by the data file being stolen and spread by other personnel in the hard disk. At the same time, CloudQuery supports custom parameter functions to help DBAs choose to export data flexibly, such as ignore=y full=y and so on.

④ Data source feature support

Considering the data interaction in various complex scenarios, we designed the import and export modules to support specific features for each data source, such as mysql’s mysqldump –single-transaction. When exporting, we changed the session transaction isolation level to RR isolation level and started a transaction. At the RR isolation level, the exported data is the snapshot of the data when transactions are started, ensuring the consistency of exported data.

As for the choice of development language, most export actions require direct interaction with database terminal commands, so we abandon the mainstream Java in the market and use Go language for code development, aiming to make the command line tools SaaS and access to various database tools to facilitate expansion. Each DTS tool has its own description file, and the description language can be used to define new command-line tools along the way. Similarly, front-end interactive pages, back-end tools, commands, and so on can be implemented through description files. All real-time status output is pushed by WebSocket for front-end page interaction.

Using mysqldump as an example:

As shown in the figure above, when adding tool support, you need to define the description file first. When the DTS service is started, the files are automatically mapped to dynamic classes. The classes are independent of each other, facilitating subsequent expansion. After successful startup, the front end will automatically retrieve dynamic classes in the rendering page tool library and render dynamic page forms with the server side. After the specified task is selected, it will be converted into operation commands according to the parameters selected and customized by the user. Execute when the user requests submitted to the server, the server will block differences between tool, through the parameters for the difference between different processing and dynamically generated database command, asynchronously subsequently submitted to the database, will be carried out in real time during the execution of log information and the execution state collection, finally the result feedback and log output in the form of ws, In addition, the system works with the persistence layer cache to facilitate log downloading and viewing.

As you can see in the above scenario, CloudQuery’s tooling support has gone beyond the desktop limitations of the PC era. Gradually improve of the construction of the server with the clouds, in the DTS will integrate more data manipulation tools, while improve tool performance is more close to the complete data within the enterprise operating scenarios, from the perspective of different users build the corresponding path operation, intuitive and convenient way to reconstruct the DBA integrated intelligent tool box.

How does CloudQuery implement data import and export on the cloud

Related Posts

This is an RDB snapshot, which can record the actual data

Huawei cloud UGO official open test: four core advantages to solve the problem of heterogeneous database migration

Agenda announced! 2021 MongoDB Chinese Community Hangzhou Conference