I. Business background and demand sorting

At the beginning of the annual plan of the year, WE planned to go on BI this year, but the reasons for the epidemic were delayed and delayed, and it was just at that time that the demand broke out. Various business problems and various data analysis made us realize that we could not go on and could not carry the load, so we began to set up projects and conduct BI selection research.

Here to share my research notes in a written form, only on behalf of personal views.

Companies’ data needs now fall into two main categories:

Temporary demand: The business suddenly wants to see how effective this wave of activity is (the definition of data metrics can be changed and added at any time)
Solidified requirements: weekly to see, monthly to see data, (data is very clearly defined).

For these two types of requirements, our current solution is as follows:

For temporary requirements: write HQL to Hive to check, and then convert the result into Excel and send it to the requirements personnel.
For solidified requirements: write scripts, run results with Hive, write the results into the corresponding DB library, and then summarize and show them through third-party open source BI tools.

This is neat and clear, but the problem is obvious:

Development costs were too high: Every requirement, whether temporary or long-term, required custom development, and we were swamped with manpower.
Inflexible use: a report can only be displayed without analysis function. If you want to analyze it, you need to copy the data into Excel for processing and analysis. However, our data users may not have such ability.
Waste of resources: Reports developed by different people, in many cases with a lot of double counting.
The hue query speed is very slow. It takes more than one minute to perform a select from query.

In this case, we are going to build a multi-dimensional analysis platform that allows the business side to take numbers themselves, preferably without SQL, because most of our business side does not know SQL(although we have SQL training, there is still a threshold).

Based on the above, we conducted product research on BI selection.

Ii. Product trial analysis

Based on the discussion and understanding of BI tools recommended by peers in several groups, after a round of screening, the following products were selected as key research objects: Superset, Metabase and FineBI.

1, the Superset

Overall, after personal experience, the feeling is as follows:

Installing the tutorial is a pain, just like installing mysql
Table joins are not supported. The speed of calculation depends on the speed of your database
Visualization options are very rich, and there are several geographic location visualization schemes based on latitude and longitude
Permissions are very fine, down to each function key;
Unfortunately, the biggest problem is that the user experience of business analysts is not very good. The visualization process requires setting corresponding parameters for different graphic schemes, and the permission control is also very complicated.

The details are as follows:

1) Data source and data management

The supported databases are very rich: Druid, Hive, Impala, Kylin, Spark SQL, BigQuery, Pinot, ClickHouse, Google Sheets, Greenplum, IBM Db2, MySQL, Oracle, PostgreSQL, Presto, Snowflake, SQLite, SQL Server, Teradata, and Vertica support uploading local CSV files
Data table model management, can set the field type, dimension/can filter/can do time column, secondary processing fields, statistical indicators
Chart available tables have to be added one by one from the database (SQL toolbox can see all), which is not very convenient.
Deep support for DURID

2) Chart chart & Dashboard kanban

Chart making process: Select data source (table or view) > Select chart type > Set chart parameters (indicator/dimension/filter criteria). The data source can only be selected from the data table list page. After entering the analysis page, the data source cannot be changed. Because when switching different chart types, it is necessary to fill in parameters according to different charts, which is not convenient to use in self-help analysis.
It supports a variety of visualization graphics, 48 kinds of visualization schemes;
Kanban’s filtering function is very weak, even the most basic date filtering component is not implemented through a filter component in a single graph, can only be made for a single table, and then applied to kanban, this feature is also very inconvenient.
Provides simple chart drilling and exploration (jump straight into a single chart), but does not support chart linkage;
Kanban can not be cloned directly, to copy a Kanban can only be re-edited to select a single graph;
Kanban supports automatic refresh with a minimum refresh time of 10 seconds

3) SQL query

Support to fill field/table information by association
Supports cross-library associated query
A multi-tabbed environment that processes multiple queries at once
The query results should be visualized and saved as a view before jumping to chart page. And you need to weight that view (very inconvenient process)
Can do search on query history records;
Support for templated using the Jinja template language, which allows macros to be used in SQL code

4) Permission management

By setting permissions for roles, users can specify roles to control permissions
The granularity of permission control is very fine, supporting functional permission control (table modification can be subdivided to delete, new operations), support menu, data source, data table, field, chart, kanban permission control
The configuration of permissions is very complicated and tedious
Data row level control is not supported

Secondary development

Technical architecture: Python+Flask+Recat+Redux+SQLAlchemy
Originally an open source project of Airbnb, it is supported by a large company team for maintenance, version update, bug repair and secondary development
Support for restful API

Metabase (Open Source, Gihub Star 15,670)

Overall, after personal experience, the highlights are as follows:

The interactive experience is business-friendly. Through a global search function for kanban and single figure, it creates an intelligent scene of “ask a question”, that is, the system will tell you the answer through the search box consultation, and the interface of the whole product is very simple and clear.
It is very simple to create a single graph, data-centric, to select different graphics (optional graphics are automatically grayed). Basically, it takes half a minute to complete a single graph analysis.
However, the biggest disadvantage is that the permission management is too weak, only modifiable/visible coarse-grained control, whether the table can be deleted can not be independently controlled.

The details are as follows:

1) Data source and data management

The supported databases are relatively weak: Postgres, MySQL, SQL Server, Redshift, SQLite, Google BigQuery, H2, Oracle, Vertica, Snowflake, MongoDB, Druid, Presto, SparkSQL Druid does not support SQL queries. Druid does not support SQL queries. Also no Hive support,Kylin)
Unified data model management portal to set dimension/measure fields after adding tables/views (this section is very detailed and the types of Settings are extended greatly)
Provides scheduled tasks, database synchronization (hour level)
Self-service table field information perspective function, intelligent exploration, automatic kanban, automatic distribution of associated data (bonus points cool features)

2) Chart chart & Dashboard kanban

The process of making a single graph is very simple: select data source -> Filter criteria -> Analysis indicators -> Group dimensions -> Visual type
The types of visualizations supported only meet basic requirements, 14 visualization schemes (including funnels, numbers with changes, maps)
Some charts can be controlled in detail, such as the form according to the condition control line color, adjust field position, display mini color bar, suffix setting
Support for basic filtering criteria, including date segments (associations between fields passed by filters and fields in a single graph)
Provides simple chart drilling function, but does not support chart linkage
Copy existing kanban with one click
Automatically refreshing data in the minimum granularity of 1 minute
Sharing support: public links, public embedding (blog pages), embedding in the application
Using Trap to send data to Slack (a foreign chat tool) or email

3) SQL query

Support to fill field/table information by association
SQL query results can be directly switched to graphical display scheme
Cross-library associated queries are not supported
Variables in native queries allow dynamic substitution of values in queries using filtering components or URL parameters

4) Permission management

By setting permissions for roles, users can specify roles to control permissions
The permission setting is very weak, only the access permission can be set (the accessible data may be directly deleted)
The permission setting object is shallow: it can only control the permission of data source, data table, chart and analysis item set, not the data row level
Field control at the field level can be set visible or not (sensitive field scenarios), but cannot be managed by role

5) Secondary development

Technical architecture: Clojure+Recat+Redux
Complete API documentation is provided, and many secondary developments can be completed with rich API and documentation

3. FineBI (Business)

(Small program has been added here, please go to toutiao client to view)

5 minutes to complete the data analysis, zero code operation, click and drag to complete the analysis, can make a data report within half an hour.
Meet a variety of analysis needs, data processing, exploratory OLAP analysis, self-service data analysis
With the self-service data set function, ordinary business personnel can screen, cut, sort and summarize the data, so as to achieve the desired data results flexibly
One-click realization of data sharing and control, detailed and accurate data permission control, data and reports can be shared throughout the company, sharing results updated in real time
It supports large amount of data analysis, adopts advanced column storage, has efficient computing capability and powerful data compression capability, and supports fast data analysis at the front end.

Overall, after personal experience, the highlights are as follows:

Getting started requires getting used to the process of configuring data, handling self-service data sets, and then visualizing dashboards and charts, which is a bit confusing
Creating the visualizations is very simple, and the interaction is a bit like Tableau, dragging and dropping data fields into dimension boxes, rendering the visualizations immediately, and then building dashboards based on the visualized components
Linkage drill function is large, but also very intelligent, can automatically associate to the common field
Data processing function is very powerful, self-help data set, can deal with a lot of data functions, including group summary, modify data fields, table merge and so on
The biggest highlight is that the authority control is very detailed and practical. It can be subdivided into original data sources, processed data sets and dashboards. It can manage users of different roles, including positions and departments, with authority control of administrators and users

The details are as follows:

1) Data source and data management

The supported databases are very rich, as you can see in the figure below
There is a business bundle capability to sort different data sources, such as by department or by business requirements
Data table can be visual management, data preview, blood analysis, associated view, etc

2) Chart chart & Dashboard kanban

Visual production process: connect data source (database or import Excel) — self-service data set processing data — make chart components based on data — make visual dashboard.
There are more than 50 kinds of basic charts and over 100 kinds of styles with overlapping charts.
There are three types of table: grouping, cross and detail. Especially complex table is not supported. There is another report software finereport
The dashboard has powerful filtering functions, such as time, text, value, and query, as well as user-defined condition filtering.
Provide data drilling, linkage, jump functions, can directly interact in the dashboard, can also interact with other dashboards, support chart linkage;
Components and kanban can be reused and copied directly
There are rich function functions, in the visualization of the data can be filtered twice, summary, sorting, as well as their own formula calculation
The dashboard has the timed refresh function, which can automatically refresh a single dashboard, multiple dashboards, and a single component. JS is required to write the timed refresh frequency
Dashboard public links are supported on Share and can be attached to its decision system or embedded in web pages
Dashboards share directly, dashboards create public links, and dashboards hang out.

3) SQL query

Connect to the database directly through JDBC

Supports cross-library associated query
Support SQL data sets, allowing write SQL fetch
Supports visual data preview. After data tables are added and updated in the service package, the data preview area exists on the service package editing page, where you can view the edited table data.
Visual association tables and consents analysis
BI project that provides real-time data and extracted data in two computing modes

4) Permission management

You can set permissions based on roles. Permission recipients include departments, roles, duties, and users
It can set permissions for personnel management, directory permissions, management system, data connection, data permissions (data tables), sharing permissions, timing scheduling management permissions, etc., with rich permissions setting
The permission setting object is deep, down to the component or data row level

5) Secondary development

Pure Java development, is basically zero code tools
Support certain secondary development, API documentation

conclusion

Finally, in terms of comprehensive selection, the choice of BI selection falls between Metabase and FineBI, with the former open source and the latter commercial.

Open source has its disadvantages, such as weak rights and functions, lack of platform operation and maintenance functions, and the interface is all in English. Some development applications in our department are ok. However, considering that BI may be promoted to the whole company in the future, the selection of tools needs to be accepted by the business side, and product stability needs to be guaranteed by technology and service.

In either case, mature platforms are to be purchased, so I prefer the latter within the range of acceptable budget. The function satisfaction reaches 90%, and the specific performance has to be tested. The price is about 200,000 ~ hundreds of thousands, and there are also service projects.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Ali entrepreneurship employees share the company’s BI selection road! From research, open source pit have stepped on

Ii. Product trial analysis

1, the Superset

Metabase (Open Source, Gihub Star 15,670)

3. FineBI (Business)

conclusion

Ali entrepreneurship employees share the company’s BI selection road! From research, open source pit have stepped on

Ii. Product trial analysis

1, the Superset

Metabase (Open Source, Gihub Star 15,670)

3. FineBI (Business)

conclusion

Related Posts

The 2021 Popular Creators list is officially open

APP is being abused by a cute new “abuse”? !

Android network programming (2) Basic HTTP protocol packet structure