OLAP databases with MPP architectures such as Doris typically handle large amounts of data by increasing concurrency. Essentially, Doris’s data is stored in a data structure like SSTables (Sorted String Table). This structure is an ordered data structure that can be stored sorted by a specified column. On this kind of data structure, it is very efficient to search by the condition of the sequence.

limit

  • inCount(*)Syntactically, the native mode is not particularly high performing and needs to be self-optimized (http://doris.apache.org/docum…)
  • There are no field types other than dimensions and metrics, and if multiple requirements scenarios need to be implemented, multiple table types need to be created to achieve redundant data

Data storage structure

In Doris, data is logically described in the form of tables. A table contains rows and columns. A Row is a Row of data for the user. Column is used to describe the different fields in a row of data.

Column can be divided into two broad categories: Key and Value. From a business perspective, Key and Value can correspond to dimension columns and metric columns, respectively.

The data model of Doris is mainly divided into three categories:

  • Aggregate
  • Uniq
  • Duplicate

Aggregate model

In Doris, the aggregation granularity of value is determined by key.

CREATE TABLE IF NOT EXISTS example_db.expamle_tbl (' user_id 'LARGEINT NOT NULL COMMENT "user id", 'date' date NOT NULL in 'city' VARCHAR(20) in 'user's city', 'age' SMALLINT in 'user's age', 'sex' TINYINT COMMENT "gender ", 'last_visit_date' DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT" last time accessed ", 'cost' BIGINT SUM DEFAULT "0" COMMENT "total consumption ", 'max_dwell_time' INT MAX DEFAULT "0" COMMENT" maximum dwell_time ", 'min_dwell_time' INT MIN DEFAULT "99999",) AGGREGATE KEY(' user_id ', 'date', 'timestamp', 'city', `age`, `sex`) ... /* Omit Partition and Distribution information */;

Fields with tags like REPLACE, SUM, MAX, and MIN are values, user_id, date, timestamp, city, age, and sex are keys.

Uniq model

There is no aggregation requirement for this type of data, just to ensure that the primary key is unique.

CREATE TABLE IF NOT EXISTS example_db.expamle_tbl (' user_id 'LARGEINT NOT NULL COMMENT "user id", 'username' VARCHAR(50) NOT NULL in 'city' VARCHAR(20) in 'city', 'age' smallInt in 'age', 'sex' TINYINT in COMMENT 'phone' LARGEINT in COMMENT 'address' VARCHAR(500) in COMMENT 'address' VARCHAR(500) in COMMENT 'address' 'register_time' DATETIME COMMENT) UNIQUE KEY(' user_id ', 'user_name')... /* Omit Partition and Distribution information */;

Duplicate model

In some multidimensional analysis scenarios, the data has neither primary keys nor aggregation requirements. Therefore, we introduce the DUPLICATE data model to meet such requirements.

This data model is different from Aggregate and Uniq models. The data is stored exactly as it is in the imported file, without any aggregation. Even if the two rows of data are exactly the same, they are preserved. The DUPLICATE KEY specified in the construction clause is just used to indicate which columns the underlying data is sorted according to.

In the selection of DUPLICATE KEY, we suggest that the first 2-4 columns should be selected appropriately.

CREATE TABLE IF NOT EXISTS example_db.expamle_tbl (' timestamp 'DATETIME NOT NULL COMMENT); 'type' INT NOT NULL COMMENT ", 'error_code' INT COMMENT ", 'error_msg' VARCHAR(1024) COMMENT ", DUPLICATE KEY(' timestamp ', 'type'); 'op__id' BIGINT COMMENT 'DATETIME' COMMENT '); /* Omit Partition and Distribution information */;

Data model selection recommendations

Because the data model is determined when the table is built and cannot be modified. Therefore, it is very important to choose a suitable data model.

  1. Aggregate model can greatly reduce the amount of scanned data and the computation amount of query in Aggregate query by pre-aggregation, which is very suitable for the report query scenario with fixed mode. But the model is not friendly to count(*) queries. At the same time, because the aggregation method on the Value column is fixed, semantic correctness needs to be considered when conducting other types of aggregation queries.
  2. Uniq model can guarantee the uniqueness of a primary key for situations where a unique primary key constraint is required. But you can’t take advantage of the query benefits of pre-aggregations like ROLLUP (since it’s essentially REPLACE, there’s no SUM aggregation).
  3. Duplicate is suitable for ad-hoc queries of any dimension. Although it is also impossible to take advantage of pre-aggregation, it is not constrained by the aggregation model and can take advantage of the column storage model (only related columns are read, not all Key columns are read).

The prefix index

In Aggregate, Uniq and Duplicate three data models. The underlying data storage is sorted and stored according to the columns specified in AGGREGATE KEY, UNIQ KEY and DUPLICATE KEY in the respective table construction sentences.

Prefix index, on the basis of sorting, realizes a fast query data index method according to the given prefix column.

We index the first 36 bytes of a row of data as a prefix for that row. When a VARCHAR type is encountered, the prefix index is truncated directly. Let’s give an example:

  1. The following table structure has a prefix index of user_id(8Byte) + age(4Bytes) + message(prefix 24 Bytes).
ColumnName Type
user_id BIGINT
age INT
message VARCHAR(100)
max_dwell_time DATETIME
min_dwell_time DATETIME
  1. The following table structure has a prefix index of USER_NAME (20 Bytes). Even if it does not reach 36 bytes, it is simply truncated, not continued, because a VARCHAR is encountered.
ColumnName Type
user_name VARCHAR(20)
age INT
message VARCHAR(100)
max_dwell_time DATETIME
min_dwell_time DATETIME

When our query condition is the prefix of the prefix index, we can greatly speed up the query. For example, in the first example, we execute the following query:

SELECT * FROM table WHERE user_id=1829239 and age=20;

This query will be much more efficient than the following query:

SELECT * FROM table WHERE age=20;

Therefore, the correct selection of column order can greatly improve the query efficiency when building a table.

Materialized View (Rollup)

Rollup means “ROLLUP” in multidimensional analysis, where data is further aggregated at a specified granularity.

In Doris, we call the Table created by the user through the build clause the Base Table (Base Table). The BASE table holds the underlying data that is stored in the manner specified by the user’s build clause.

From the BASE table, we can create as many ROLLUP tables as we want. The ROLLUP data is generated based on the BASE table and is physically stored independently.

The basic purpose of the ROLLUP table is to get more coarse-grained aggregated data from the BASE table.

A ROLLUP can essentially be understood as a materialized index of the Base Table. You can select only a few columns from the Base Table as the Schema when setting up ROLLUP. The order of the fields in the Schema can also be different from that in the Base Table.

The trigger after ROLLUP creation is automatic and does not require any other specification or configuration.

For example, when a user_id (key), cost (value) format rollup is created, it is fired when the following statement is executed.

SELECT user_id, sum(cost) FROM table GROUP BY user_id;

For both Aggregate and Uniq data storage formats, using ROLLUP will change the granularity of Aggregate data, but for DUPLICATE, it just adjusts the prefix index.

Because the column order was specified when the table was built, there is only one prefix index for a table. This may not be efficient for queries that use other columns that cannot hit the prefix index as a condition. Therefore, we can artificially adjust the column order by creating a ROLLUP. Let me give you an example.

Base table structure as follows:

ColumnName Type
user_id BIGINT
age INT
message VARCHAR(100)
max_dwell_time DATETIME
min_dwell_time DATETIME

We can create a ROLLUP table from this:

ColumnName Type
age INT
user_id BIGINT
message VARCHAR(100)
max_dwell_time DATETIME
min_dwell_time DATETIME

As you can see, the columns in the ROLLUP and BASE tables are exactly the same, with the order of USER_ID and AGE reversed. So when we do the following query:

SELECT * FROM table where age=20 and massage LIKE "%error%";

The ROLLUP table is preferred because the ROLLUP prefix index matches better.

Create the ROLLUP syntax

ALTER TABLE table1 ADD ROLLUP rollup_city(citycode, pv); ALTER TABLE ROLLUP FROM table1;

ROLLUP adjusts the prefix index

Because the column order was specified when the table was built, there is only one prefix index for a table. This may not be efficient for queries that use other columns that cannot hit the prefix index as a condition. Therefore, we can artificially adjust the column order by creating a ROLLUP.

A few notes for Rollup

  • The fundamental purpose of ROLLUP is to improve the query efficiency of some queries (whether by aggregating to reduce the amount of data or changing the column order to match the prefix index). So the meaning of a ROLLUP goes beyond “ROLLUP”. That’s why in the source code, we call it Materized Index.
  • ROLLUP is attached to the BASE table and can be viewed as a secondary data structure to the BASE table. You can create or drop a ROLLUP from the BASE table, but you cannot explicitly query a ROLLUP in the query. Whether a Rollup is hit or not is automatically determined by the Doris system.
  • Rollup’s data is physically stored independently. Therefore, the more rollups created, the more disk space will be used. There is also an impact on the import speed (the ETL phase of the import automatically generates all the ROLLUP data), but there is no decrease in query efficiency (just better).
  • Rollup data updates are fully synchronized with the Base representation. The user should not be concerned.
  • The columns in ROLLUP are aggregated in exactly the same way as the BASE table. Rollup does not need to be specified and cannot be modified during creation.
  • A necessary (but not sufficient) condition for a query to hit a ROLLUP is that all the columns involved in the query (including select list, query condition columns in where, etc.) exist in the columns of the ROLLUP. Otherwise, the query can only hit the BASE table.
  • Some types of queries, such as COUNT (*), cannot hit ROLLUP under any conditions.
  • Can be achieved byEXPLAIN your_sql;Command to get the query execution plan, in the execution plan, to see if ROLLUP was hit.
  • Can be achieved byDESC tbl_name ALL;Statement displays the BASE table and all created ROLLUP.

There is no limit to the number of rollups, but more will consume more memory. Support SQL to change the number of ROLLUP fields.

Partitioning and buckets

Doris supports two levels of partition storage, the first level is a RANGE partition, the second level is a HASH bucket.

1.3.1 Range Partition

Range partitioning is used to divide data into different intervals. Logically, it can be understood that the original table is divided into multiple child tables. In business, most users will choose to partition according to time. The advantages of partition according to time are as follows: * can distinguish hot and cold data * can use Doris hierarchical storage (SSD + SATA) function * when data is deleted according to partition, it is more rapid

1.3.2. Hash Buckets

Divide the data into buckets based on the hash value. * In order to facilitate data recovery, it is recommended that the size of a single bucket should not be too large and should be kept within 10GB. Therefore, please consider the number of buckets reasonably when creating a table or increasing a partition. Where different partitions can specify different number of buckets.

Sparse indexes and Bloom filters

Doris stores the data in order, and establishes a sparse index on the basis of the ordered data, with index granularity of block(1024 rows).

The sparse index selects fixed length prefixes in Schema as the index content, while Doris currently selects a prefix of 36 bytes as the index.

  • When building the table, it is recommended to put the common filter fields in the query in front of the Schema. The more distinguished the query fields are, the more frequently they are put earlier.
  • The one special thing about this is the VARCHAR type of field. A field of type VARCHAR can only be the last field in a sparse index. The index is truncated at VARCHAR, so if VARCHAR appears first, the index may be less than 36 bytes long. See Data Model, Rollup, and Prefix Indexes for details.
  • In addition to sparse indexes, Doris also provides BloomFilter indexes, which are effective in filtering columns with large differentiations. If you consider that VARCHAR cannot be placed in a sparse index, you can create a BloomFilter index.

Broadcast/Shuffle Join

The default way for the system to realize Join is to filter the small table conditionally, broadcast it to each node where the large table is, form an in-memory Hash table, and then stream the data of the large table for a Hash Join. However, if the amount of data filtered by the small table cannot be put into memory, then the Join cannot be completed, and the usual error should cause the memory limit in the first place.

If you encounter any of the above, it is recommended to use a SHUFFLE JOIN, also known as Partitioned Join. This means that both the small table and the large table are hashed according to the key of the Join, and then a distributed Join is performed. This memory consumption is then spread over all the compute nodes in the cluster.

The problem

  1. Table structure field changes and rollup index changes based on tables already created?

Supported, but the data schema cannot be changed once the table is created.

  1. Is there a quantity limit for Rollup?

No, but more ROLLUP memory resources will consume more and import data will be slower.

  1. Does the index composed of (A, B, C) support only field A as A query condition?

Support, but have order requirement.

conclusion

The Doris table structure is composed of Key and Value. Key is the dimension and Value is the statistical index. It is suitable for simple aggregate calculation and dimensional calculation, and has high performance with relatively low hardware conditions.

  • Query: Satisfies MySQL syntax
  • Improve query performance: Use prefix index + ROLLUP or Partition, Bloom filter.
  • Improved query performance in join mode: SHUFFLE JOIN.
  • Both the table structure and the index support changes, but the data schema does not.

Doris has also officially released a Dev version of Docker to test out the feature.https://hub.docker.com/r/apac…