A, Kylin binary source directory parsing

  • bin: shell script used to start/stop Kylin, back up/restore Kylin metadata, check ports, and obtain Hive/HBase dependencies.
  • conf: XML configuration files of Hadoop tasks. For details, seeConfiguration page
  • lib: Jar files for external applications, such as Hadoop task JARS, JDBC drivers, and HBase CoProcessor.
  • meta_backups: performbin/metastore.sh backupAfter the default backup directory;
  • sample_cubeThe files used to create the sample Cube and tables.
  • spark: Built-in Spark.
  • tomcat: Built-in Tomcat for starting the Kylin service.
  • toolJar files used to execute some command lines.

2. HDFS directory structure

Kylin generates files in HDFS. The root directory is/Kylin (which can be customized in conf/kylin.properties), and then uses the metadata table name of the Kylin cluster as the name of the layer 2 directory, which is kylin_metadata by default.

In general, the /kylin/kylinmetadata directory has several subdirectories: Cardinality, Coprocessor, Kylin-jobid, Resources, and JDBC-Resources.

  1. Cardinality: When Kylin loads the Hive table, he starts an MR task to calculate the cardinality of each column. The output is stored in this directory temporarily. This directory is safe to clean. The cardinality calculation of each column is shown in the figure below:

  1. Coprocessor: Kylin is used for storageHBase coprocessor jarThe directory; Do not delete.
  2. kylin-jobId: data store directory of Cube calculation process. Do not delete it. If need to clean up, please follow the [storage cleanup guide] (HTTP: / / http://kylin.apache.org/cn/docs/howto/howtoCleanup_storage.html). During Cube building, an intermediate file is generated in this directory, as shown below:

If cube builds successfully, the directory is automatically deleted; If the Cube build fails, you need to manually remove the directory.

  1. Resources: Kylin stores metadata in HBase by default. However, files that are too large, such as dictionaries or snapshots, are saved in the HDFS directory. Do not delete them. To cleanup, follow cleanup resources from metadata.
  2. Jdbc-resources: same as above, only available when MySQL is used to store metadata.

When the sample.sh file of Kylin is executed, the data will be temporarily loaded into the/TMP/Kylin /sample_cube file. After the script is executed, the directory will be deleted.

Zookeeper storage

After Kylin is successfully started, the Znode of/Kylin is registered in Zookeeper. The Znode of Job_engine and create_htable is related to the HBase service.

Fourth, the Hive

Kylin’s data is obtained from the Hive database. When a Cube is built, intermediate tables are generated in the Hive database. If the Cube is built successfully, the intermediate tables are deleted. If the Cube build fails, the intermediate tables are left in Hive and need to be cleaned up manually.

Fifth, HBase table

Kylin contains a large amount of metadata information, including cube definition, star model definition, job information, job output information, dimension directory information and so on. Metadata and Cube are stored in hbase, and metadata is stored in hbase kylin_metadata by default. The stored format is A JSON string.

Some HBase tables may be left behind in HBase tables when cleaning/deleting/merging cubes. If you need to cleanup, consult: storage cleanup guide.

Pay attention and don’t get lost

All right, everybody, that’s all for this article. All the people here are talented.

White piao bad, creation is not easy. Your support and recognition is the biggest motivation for my creation. See you in the next article!

If there are any mistakes in this blog, please comment, thank you very much!