background

Global dictionary

  • Used to resolve Count DISTINCT scenarios
  • Build globally unique contiguous ids
  • Convert a string to an int based on a bitmap

Cons: If you use the Global Dictionary, the Global Dictionary gets bigger and bigger, and builds slower and slower.

build

  1. At build time, a field of type BigInt, do Count distinct and select exact calculation.

  1. Found that Kylin added the global dictionary.

  1. Problem: String is technically an int, bigInt is not

screening

View the source code found that the field can only be “tinyint” “SmallINT” “int” “INTEGER” four types.

Source code address: github.com/apache/kyli…

Source code address: github.com/apache/kyli…

To solve

Currently, data stored in Hive is in Bigint format. The int type can store 2 billion + data. Changing the int type to meet table requirements.

Alter table [table_name] change column [old_colum_name] [new_colum_name] [new_colum_type]

Remaining issues

It is recommended that there be only one “all (Job + Query)” or “job” in cluster deployment mode. If you deploy three “all”, the global dictionary will fail to build.