“In recent years, the emerging Internet services, and telecommunications, finance, and traffic and so on various traditional industry appeared the explosive growth of data assets, the type of the data assets is given priority to with unstructured and semi-structured, how to store and process of low cost and high efficiency even the PB EB magnitude data became a great challenge.” “With the age of big data

Increasing amount of data and enterprise scale expands unceasingly, the business system is more diverse, we face the same, the analysis of the different source data processing ability in stand the test of the giant, and a distributed system requirements for data consistency is extremely high, at the same time a large number of business data and trend of the data on the cloud several proves several positions of convenience, However, in the actual work of the data warehouse, only 20% of the time is spent on data mining, and the rest of the time is mostly spent on data synchronization, task scheduling, data cleaning and other construction of qualified data. This article introduces the new features of CloudQuery 1.4 as a whole and the future iteration planning based on the background of this era.

The introduction of “DTS” concept, data tool integration

As mentioned above, a large amount of time is spent processing data in database jobs, and CloudQuery is intended to solve the problems encountered by users in data manipulation. Before 1.4, we focused on optimizing the editor experience and tackling the key difficulties. When we gradually improved the data manipulation workbench, we also found that data manipulation in SQL (hand-written SQL or visual data query editing, terminal operation) was only a part of the work of database related staff. Most of the time we need to interact with data in batches, such as batch data import, data migration, etc. At this time, the use of SQL or terminal form is far from our expectations for this function.

Therefore, CloudQuery 1.4 introduces the concept of “DTS”, which stands for “Database Toolbox Service”. As a sub-module of “DTS”, “Import and Export” provides the convenience of data transmission for DBA, operation and maintenance roles.

In 1.4.0, we will first launch the dump format export function of MySQL and Oracle database. Later, we will provide data tool support according to each data source, such as PostgreSQL dump related import and export function. Meanwhile, the database tool will also be included in the permission control. Users need authorization from an administrator to use the database tool.

In addition, Data Migration will be added as a tool to the CloudQuery DTS family during the 1.4 iteration. When we design the data migration module given the current distributed business systems usually do not use a single data source for data storage, puts forward the homologous/heterogeneous migration, at the same time, according to the similarities and differences of the database structure is “homogeneous/heterogeneous migration”, so provide migration is also more diversified, specific as follows:

  • Amount of migrated data

    • Full amount
    • The incremental
  • The migration time

    • real-time
    • timing
  • The migration end

    • homologous
    • The heterogeneous

      • Relational to relational
      • Relational to non-relational
      • Relational/non-relational to big data/data storehouse
    • homogeneous
    • heterogeneous
  • Selective transfer

    • Horizontal partitioning
    • Vertical segmentation

Because the design of “data migration” is relatively complex and contains a wide range, so we will continue to improve the function of “data migration” in version 1.4. We also hope that you can put forward your own suggestions during use. We will adjust the function after careful analysis.

“Visualization” module, data operation does not need to ask people

If “DTS” is close to the underlying data in a database, “visualization” is closer to the business scenario. The second major feature of CloudQuery 1.4 is Visualization, which consists of two modules: Visualization Assist Query and ER Modeling.

Visual aid

Not every user in the query enterprise can use SQL statements skillfully. For users who do not have the basis of SQL but need to conduct data query, they need “visualization” to assist.

CloudQuery 1.4.0 has added the function of “Visualized Assisted Query”, which enables users to conduct data operation in a graphical way and make it easier to understand when adding conditions such as query, screening and sorting. Even if they do not know SQL or database, they can get the results they need.

At the same time, we will also support the user to query the canvas to save or generate SQL statement to save, easy to get the results directly in the future use.

ER modeling

“ER Modeling” is aimed at relatively advanced users, rendering the table relationships under the database in the form of ER diagram, making the primary and foreign keys and constraint relationships more intuitive.

Changes to table structures, such as adding tables, designing tables, dropping tables, adding constraints, and so on, are also supported on the ER diagram rendering canvas. The ER diagram canvas also supports the export of image format, which is convenient for DBA to organize database element relationships and circulate them within the business.

New data source support to cover all types of databases

As a unified database entry, CloudQuery’s data source support is the most basic function. At the same time, as a community-centered product, users’ needs are of great importance. We continued to collect suggestions from the community throughout the 1.3 iteration, and after our evaluation we will include the following data sources in the 1.4 iteration:

  • Hive
  • Es
  • DB2
  • PolarDB
  • OceanBase

CloudQuery will continue to expand the variety of data sources without losing sight of the characteristics of each data source. In the near future, automatic hints in the data operation area and the presentation of result sets will be optimized.

OpenAPI, leverage the power of the community

CloudQuery is a product that continues to grow in the community, but at the same time there is a limit to what we can do on our own. Therefore, on the way of function iteration, we will continue to open API to facilitate users with development ability to carry out third-party applications, organizational structure and other resources access within the enterprise.

Next, we will give priority to opening part of the interface of the “user” module, but before calling the interface, the system administrator needs to activate the developer identity of the specified user in the developer center of the platform. After the identity is activated, the AppID and the corresponding secret will be obtained, which will be used as the key to authenticate and call the API interface.

conclusion

This is the overall functionality and iteration plan for CloudQuery in the 1.4 release. We will write a series of articles on the new features for 1.4, detailing each of the new features and the technology behind them, so you can better understand the architecture of CloudQuery. At the same time, we will continue to improve our own basic capabilities to bring more convenient and fast data operation and interactive experience to the community users.

Official website address:https://cloudquery.club/