1. Demand background

1.1 Challenges of big data visualization

With the rise of big data, data storage and computing technologies emerge in an endless stream, but the final visual presentation of data and exploration of data also become a very important part. This piece is not as blooming as storage and computing technology stack. Have we ever had these puzzles when doing big data visualization?

  1. Traditional visualization interconnects with traditional databases, but is not compatible or even incompatible with Hive, Spark, Presto, ElasticSearch, and Clickhouse of big data components. Each time, a redundant operation is required to distribute big data cluster data to traditional databases.
  2. Commercial products are expensive and even set technical barriers. Many of them even require the docking of their own big data technology.
  3. Excel with a large mass base is used to drag and drop, the convenience of SQL operation, the exclusion of its own school of new technology, web version account login is better than users download client login;
  4. The company has a tight configuration of developers and no redundant manpower to research the big data visualization platform, but the decision makers hope to have a unified visualization platform.

Such as, it is really a headache, now I recommend an antidote metabase – open source big data analysis exploration, visual report magic tool.

1.2 Target architecture of big data data visualization

It is still necessary to establish a target structure for work. Only by centering on the target structure can we do things more easily, as shown in Figure 1.2, but the structure is divided into three echelons.

  1. Level 1: ClickHouse, DorisDB, Kylin and other excellent OLAP technologies for storage, the use of its own connection engine, fast response, while supporting real-time data and offline data access, external visual platform, through permission control to present to users;
  2. Tier 2: Data is stored in Hive or Hbase of NoSQL, and is connected to a visual platform through excellent and efficient engines such as Presto, Flink, and Spark. After permission control, data is presented to users.
  3. The rest is a special file access, such as MySQL, temporary files, etc.;

Note: There are other technical architectures that are commonly used, such as ELK, which is made up of ElasticSearch, Logstash and Kiabana. Elasticsearch is an open source distributed search engine. It features distributed, zero configuration, automatic discovery, index sharding, index copy, restful interface, multiple data sources, and automatic search load. Logstash is a completely open source tool that collects, analyzes, and stores your logs for future use (e.g., searching). Kibana is also an open source and free tool that provides a log analysis friendly Web interface for Logstash and ElasticSearch to help you aggregate, analyze and search important data logs. We’ll talk about this later, but let’s get back to metabase.

2. Introduction of metabase

2.1 What is metabase?

Metabase is a simple and powerful open source analysis tool that is seamlessly compatible with big data and traditional database, helping everyone in the company to learn and mine enterprise data, so as to achieve better data-based operations and decisions.

Metabase is a simple and powerful analytics tool which lets anyone learn and make decisions from their company’s data. No technical knowledge required! We hope you love it.

Website: www.metabase.com/

All Versions download: Available Versions

Githup:github.com/metabase/me…

Development language: Clojure, but the official website is packaged into a Jar package, very convenient.

2.2 What can Metabase do?

  1. Connect different databases for data visualization;
  2. Data analysis, kanban display;
  3. Data report;

2.3 Supported data sources (most concerned)

At present, ClickHouse is not supported by its own, so you need to add the Jar driver for ClickHouse. The Jar driver needs to be manually loaded and updated synchronly on GitHub, but there is a lag compared with Metabase. For details, please refer to the follow-up chapter of Connecting with ClickHouse.

2.4 Comparison with Apache Superset

The blogger also wrote a blog about Apache Superset, a big data visual reporting platform Apache Superset – open source big data exploration and analysis, visual reporting platform blog, so with Apache Superset, Metabase has what advantages and disadvantages?

  • Metabase, a data source with built-in support, is slightly inferior to Apache Superset.
  • Data chart form Metabase slightly inferior to Apache Superset;
  • Operation interface beautiful silk smoothness Metabase beat Apache Superset;
  • Metabase beat Apache Superset in pull and pull operation.

Metabase is a Metabase dedicated business data requirements staff, Apache Superset for SQL data requirements staff, the two generated universal dashboard, can use a unified web page hyperlink together, Form a unified report platform.

3. Get started

Here first show metabase’s most awesome function, metabase zero code generation of all kinds of product sales and proportion, and saved in the dashboard, can regularly refresh, can share others, this is a favorite brand of the blogger, more and more exciting experience, please follow the blogger.

4. Deploy and install

4.1 Deployment mode and Version

Metabse also supports a variety of deployment methods. For details, see Installing and Running Metabase on the official website. The common deployments are as follows:

  • Run the Jar package directly in an environment with JRE
  • Runs as an App on the Mac
  • Run on Docker
  • Running on K8s
  • Run on cloud platforms like AWS’s Elastic Beanstalk
  • other

The blogger chose the easiest way to run the Jar directly in an environment with a JRE. The version selection is a little more delicate. If you don’t feel the need to integrate ClickHouse’s database connection, you can choose the latest version of Release or one or two versions back. If you need to integrate ClickHouse database connection, Metabase does not support ClickHouse connection itself (at least not currently 20210722), you need to download the appropriate driver package.

Making the metabase – clickhouse – driver: metabase – clickhouse – driver;

The latest version of ClickHouse supported by 20210723 is Metabase-0.38.1.

4.2 Configuration Requirements

In principle, as long as the JDK platform can run Jar package, the configuration of the blogger is as follows:

  • JDK 1.8 +
  • CentOS 7

4.3 Download and Installation

Go to the Metabase home page and download metabase.jar as shown in figure 4.3. Be careful to select the version you want, default is the latest version.

All Versions download: Available Versions

The installation is even easier. Upload the metabase.jar download to one of the JDK servers and the blogger will put it in /data/metabase/metabase-0.38.1.

#Let's name the Jar package to be literalCD /data/metabase/metabase-0.38.1 mV Metabase.jar Metabase-0.38.1
#Start theJava - jar. / metabase - 0.38.1. Jar
#Such startup logs are printed on the client, and this is followed by startup and shutdown
#The purpose of this startup is to ensure that the logs printed during startup can be accurate. If there are any bugs, please fix them according to the bug prompt, such as Java is not installed.
Copy the code

After normal startup, the default port is3000; On a machine with access to the server 3000, open a browser and typeIP address of the installation server :3000(for example, 10.215.23.506:3000)You can jump out of the registration interface, fill in the login name (email) and password, and then jump to the Web homepage in Figure 4.3.1, and then you can practice the case of getting started quickly.

4.3 Startup and Shutdown

Collect log prints to files and start the process in the background.

#Start the

#Set the soft chain current, so that current always points to the latest metabase version, so that the candidate version can be updatedCD /data/metabase ln -sf metabase-0.38.1 current CD /data/metabase/metabase-0.38.1 mkdir log . / metabase - 0.38.1. Jar > >. / log/metabase. Log > & 1 & 2Copy the code

./log/metabase.log Create a metabase.log file in the log folder and append the logs to 2>&1

0 indicates the STDIN standard input. 1 indicates the STDout standard output. 2 indicates the STderr standard error output. & for background running

Shut down the service;

JPS # metabase-0.38.1.jar = 2578 JPS 2523 metabase-0.38.1.jar
#Kill the process number
kill -9 2523 


Copy the code

5. User manual

5.1 Connecting to ClickHouse

In the firstgithubDownload the ClickHouse driver packagemetabase-clickhouse-driverThe ClickHouse driver should be compatible with the metabase version, and the ClickHouse driver should be compatible with the JDBC version20.8.12.2-1. El7. X86_64. RPM, as of 20210723, adaptation versions are as follows:

Clickhouse.metabase-driver. jar, upload to /data/metabase/metabase-0.38.1/plugins. Then reboot Metabase and you can see the ClickHouse in figure 5.1.2. Make sure you configure it first to see if you can connect to the ClickHouse cluster properly because it is compatible with the version.

Note: Metabase does not have a button like Test Connection. When adding a database, simply click save. If the Connection is wrong, save will be wrong, prompting Connection timeout and other anomalies. Warm reminder: If you want to be compatible with ClickHouse, it is recommended to use a SuperSet that is naturally compatible with ClickHouse.

5.2 Creating a Database Connection

Click Settings in the upper right corner of the homepage – > Jump Figure 5.2 – > Administrator – > Menu bar Database or Add a database in the Start Wizard – > Jump to Add database configuration, and then configure the database instance, user name, password, and click Save.

5.3 Create questions, Visualize, Save, Share, Delete charts (key points)

Create a problem
Native queries
Custom Query
Native queries

  • Simple and custom queries: By means of drag drag, let users do not have SQL capability can achieve data exploration, data report forms, to keep the visualization of the results to the dashboard, can also will create problem to save to a directory, further add to the dashboard, as shown in figure 5.3.1, design all the operations do not need to write code, you just need to click on the graphical selection;

Click the visualized jump figure 5.3.2;

  • Native query, as shown in Figure 5.3.3, is actually SQL query. It can write SQL statements, then make visualization, and directly save to the dashboard. It can also save to a directory when creating a problem first, and then add to the dashboard later.

The questions created can eventually be visualized from the data, as shown in Figure 5.3.4, optionally saved in different directoriesAnalysis of the.Your personal collection.Other users' personal collections, you can also create a new directory; Each directory can adjust the permissions of visitors;Check the archiveIn fact, it is similar to the recycle bin, where obsolete creation questions and dashboards must be filed before they can be deleted.

5.4 Creating A Dashboard (Report)

Dashboards can actually be understood as reports, there are two ways to create dashboards, the first is in youCreate a problemAfter saving, a pop-up will ask you whether to save to the dashboard or save to a new dashboard; The second is to directly click the menu bar of the main interface+The number will already existCreate a problemExplore chart added to newly created dashboard.

Set 6.

Set + in the upper right corner of the menu on the main interface, as shown in Figure 6.0.

  • Account setting: Modify account information, user name, password, etc.
  • Administrator (Key): Next, add people and groups; Add database; Open sharing, open embedded application system, etc.;
  • Activities, help, about Metabase: Belong to the interaction with Metabase development and operation company, those who are interested can participate;
  • Logout: Logs out of the account.

Next, I will focus on the related Settings in the administrator. First, the administrator is a group, that is, as long as the members of the administrator group, they can operate these Settings.

6.1 Personnel management and authority

Personnel management is divided intopersonnelandgrouping.personnelIs to invite new personnel, the default is to invite personnel email.groupingThe default isThe administratorandAll usersTwo groups can be added, and subsequent permissions are assigned according to the groups. At the same time, A person can belong to multiple groups. As long as one of the groups belongs to has the permission, A has the permission to access the chart or dashboard of the permission and create problems.

Permissions are assigned by group, including data permissions and folder permissions, as shown in Figure 6.1.1.

  • Data accessTable level permissions can be specified at the most granular level. Default is at the library level permissions.
  • Folder permissionsThis is the permission to save the files in the charts, dashboards, and directories that created the problem.

6.2 database

As mentioned before, adding a new database connection, loading the database and table model into Metabase for data exploration and visual presentation, as shown in Figure 6.2.0;

6.3 Data Model

The details can be detailed to the table, which is mainly to add some model remarks to the added table so that users can better use the data, score data, filters and indicators, as shown in Figure 6.3.0.

  • dataSet attributes, visibility, entity keys, entity names, entity foreign keys, etc.
  • The filter: Set filtering conditions to skip unnecessary tables, such as orders whose ID is not equal to 250;
  • indicators: A new column calculated from the original data is loaded into the table together as an indicator column.

6.4 Dashboard embedded in the business system or shared to non-Metabase registered users

As shown in FIG. 6.4.0 and 6.4.1, it should be opened firstEmbed it in other applicationsandPublic shareThe switch enables both functions to take effect among themEmbed it in other applicationsA key is also generated.

Then set – > Exit Administrator in the upper right corner, select one of the dashboards in the folder where you have permissions, as shown in FIG. 6.4.2. Click Share and Edit to jump to FIG. 6.4.3 and generate public links and public embeddings.

Figure 6.4.3, generating public links and public inserts;

  • Open link: Copy the link address to person A who needs it. A calculates that there is no registered account of Metabase and can see the report page of this dashboard. In addition, if the dashboard changes, A can also receive the changes, as shown in Figure 6.4.4.
  • Open embedded: a piece of code embedded in the front end so that if the dashboard changes, so does the application.

6.6 Troubleshooting Errors

You can view the task logs and other information. The two tasks on the interface are the analysis logs and the scheduler information. When you encounter a bug, you can view it here.

This is Metabase installation and deployment, using a simple tutorial, more exciting experience, you can refer to the official website of Metabase Documentation, or leave a message together;