How do I quickly develop new data sources for Apache Kylin?

In the recent Apache Kylin Innovation Meetup, Li Dong, technical partner and technical director of ecological cooperation of Kyligence, shared with everyone the latest data source development function of Kylin. Starting with Apache Kylin V1.6, Apache Kylin added support for Kafka data sources, opening the door to streaming OLAP analysis. At the same time, as more and more enterprise users expect to connect traditional data warehouse and database data to Apache Kylin for analysis, Apache Kylin V2.1 began to support JDBC data sources, To meet users’ requirements for OLAP analysis and exploration of data in SQL on Hadoop, RDBMS and other data sources.

Click to play the full video

However, JDBC data sources are different, and developers need to put a lot of effort into the adaptation of data sources to achieve in-depth interconnection of certain data sources. Therefore, Apache Kylin V2.6 introduces the function of Data Source SDK, which can help developers quickly develop the adaptation of JDBC Data sources and realize the connection of new Data sources.

What is the Data Source SDK?

This SET of SDK is used to help developers improve development efficiency. Developers can use this Data Source SDK to develop an Adaptor for a certain Data Source, so as to realize the requirements of Apache Kylin synchronizing tables from Data sources, constructing Cube and querying down pressure. The yellow diamonds in the figure below are the extended interfaces provided by the Data Source SDK.

First, take metadata synchronization as an example. The standard JDBC interface implements the API for retrieving databases, tables, and columns. However, different databases may be implemented in different ways, and some developers do not want to expose system libraries and system tables to analysts, which can be met in Adaptor.

The JDBC data source build process is shown in the following figure, relying on SQOOP to perform the task of flattening tables in the data source, then transferring the data to the Hadoop cluster, and then performing subsequent build tasks through a series of MapReduce or Spark tasks. The main role of Adaptor in the build process is to convert the flattened table SQL statements generated by Apache Kylin into SQL dialects supported by the data source.

The same is true for pressing under a query. Apache Kylin supports routing queries that do not hit Cube to the data source engine for execution and then returning the results to the user. Adaptor in the process of query pressure is to achieve SQL dialect conversion, the user input SQL statement from Apache Kylin SQL dialect into the SQL dialect of the underlying data source engine.

How to develop an Adaptor?

As shown in the figure below, the Data Source SDK is actually a set of function interfaces. Developers only need to develop the implementation of response interfaces according to the characteristics of Data sources.

By default, the Data Source SDK comes with a default implementation. Based on this default implementation, developers can complete the development of a Data Source Adaptor by adding a configuration transformation template in XML format. The following figure shows a configuration transformation template for a data source that defines the configuration, function expressions, and expression of data types for a particular SQL syntax. Developers can quickly develop a new data source by filling in the corresponding configuration items according to the data source features. If there are no requirements overridden by the template, developers can still override the interface functions to extend the default implementation.

Apache Kylin solution

With the capabilities of the Data Source SDK, Apache Kylin can support a wider range of Data sources, making it possible for enterprises to optimize their big Data platform architectures. Previously, data analysis applications such as reports needed to be connected to various underlying technologies based on the scenario, such as multi-dimensional analysis connected to Apache Kylin and flexible analysis connected to SQL on Hadoop. Now, enterprises can use Apache Kylin architecture as a unified OLAP platform for big data, providing a unified data outlet for BI applications, simplifying system architecture and development.

As shown in the above on the right side, we use the Tableau report did test: bearing the same statements in the query and access Cube two modes, all can normal rendering, and without modifying the report content, get 14 times efficiency at the same time, can effectively help enterprise users to realize data analysis technology from the traditional to the big data platform migration.

Reference material

In this paper, only the Data Source SDK made a brief introduction to the function, framework, if you want to know more in-depth technical details, please refer to the following link: kylin.apache.org/development… Kylin.apache.org/blog/2019/0…

PPT sharing by lecturers

[email protected]

Kyligence website

How do I quickly develop new data sources for Apache Kylin?

What is the Data Source SDK?

How to develop an Adaptor?

Apache Kylin solution

Reference material

Related Posts

Node Express Web development framework

Async and await

How has your thinking changed since you became a product manager?