At the ClickHouse Meetup earlier this year, the ClickHouse Projection team shared their in-house practice. The sharing covers the concept, usage, and performance testing of Projection. According to the performance test data, Projeciton has seen a hundredfold improvement in query performance, which means that the query response latency of the previous minute has been increased to a second response time. Seconds of query response latency will increase to milliseconds of response, resulting in a much more perfect experience for the user.

After reading Kuaishuo’s share of ClickHouse Projeciton, several questions also arose in my mind.

  1. What were the problems with ClickHouse prior to the Projection feature?
  2. How does ClickHouse Projection solve this problem?
  3. What scenarios is ClickHouse Projection suitable for?
  4. Anything to note about Clickhouse Proejction?

What were the problems with ClickHouse prior to the Projection feature?

ClickHoSue is an OLAP engine that sits at the top of the data platform and connects directly to the platform’s users. The quality of query performance directly determines the user’s experience.

  1. Although the query performance of ClickHouse has been very perfect, there are still some problems in the scene of large data volume. The reason is that ClickHouse is an analytical database based on the MPP architecture based on memory computing, which is different from Spark, Hive, MR and other computing frameworks. There is no disk option for temporary data during calculation. During the query, the data is loaded into memory. If the memory configuration is insufficient, the query will fail and the stability of the ClickHouse cluster will be affected.
  2. In the scenario of data query, users will have certain usage habits. For example, certain charts are looked at at certain times each day. These charts contain a full range of statistics, complex data query logic, and so on. These queries can be classified as exception queries compared to other queries. These queries may fail because of memory problems, or they may take too long because of complex computation logic, affecting the queries of other users on the platform.

How does ClickHouse Projection solve this problem?

In the field of OLAP, there are two types of data models: ROLAP(Relational OLAP) and MOLAP(Multidimension OLAP). ROLAP expresses data as a two-dimensional relational model, similar to the relational database model, with better data expression ability and external SQL interface. MOLAP physically stores the multidimensional data used in OLAP analysis in the form of multidimensional arrays, forming a “cube” structure. The attribute value of the dimension is mapped into the subscript value or subscript range of the multidimensional array, and the summary data is stored as the value of the multidimensional array in the unit of the array. The idea of pre-aggregation is adopted to accelerate the data query, but the data model is not flexible enough.

ClickHouse is one of the typical representatives of ROLAP, and the performance of single-table query in pure column storage is almost unmatched. It is named after Vertica and corresponds to the traditional materialized view. It borers on the idea of MOLAP pre-aggregation and calculates that the aggregate data written to the data is written together with the original data according to the expression defined by Projection when the data is written. In the process of data query, if the query SQL can be obtained through the analysis of aggregated data, the direct query of aggregated data can reduce the cost of calculation and solve the memory problem caused by the amount of data.

Expansion of data belonging to the PART directory on the underlying storage of Project iton can be understood as a form of query index.

In terms of the core code of the data writing logic (ClickHouse version 21.7), multiple projections are stored in multiple subdirectories under the part directory where data based on the original data aggregation is stored. Projection is made in sync with the original data and it is only after it is created that the written data is materialized to ensure consistency.

MergeTreeDataWrite. CPP. If there is the projection of 390 configuration, will the projection part added new_data_part. if (metadata_snapshot->hasProjections()) { for (const auto & projection : metadata_snapshot->getProjections()) { /// 1. Gets the execution plan for the Projection Query. /// 2. The current Block is used as input, and the aggregate result is calculated. Auto In = InterpreterSelectQuery(Propagate. query_ast, Context, Pipe(std::make_shared<SourceFromSingleChunk>(block, Chunk(block.getColumns(), block.rows()))), SelectQueryOptions{ projection.type == ProjectionDescription::Type::Normal ? QueryProcessingStage::FetchColumns : QueryProcessingStage::WithMergeableState}) .execute() .getInputStream(); in = std::make_shared<SquashingBlockInputStream>(in, block.rows(), std::numeric_limits<UInt64>::max()); in->readPrefix(); Auto projection_block = in->read(); if (in->read()) throw Exception("Projection cannot grow block rows", ErrorCodes::LOGICAL_ERROR); in->readSuffix(); if (projection_block.rows()) { // 5. Add aggregate data (.proj) to new_data_part new_data_part->addProjectionPart(project.name, writeProjectionPart(projection_block, projection, new_data_part.get())); }}}

P2. Proj is the data directory for p2 projection under the Data Part where the columns are aggregated and the aggregated functions are stored as separate column files.

─ Dim1.bin Heavy School ─ Dim1.bin Heavy School ─ Dim2.bin Heavy School ─ Dim2.bin Heavy School ─ Dim3.bin Heavy School ─ Dim3.bin Heavy School ─ Dim3.mrk2 Heavy School ─ Event_Key.bin Heavy School ─ Event_Key.mrk2 ─ event_time.bin ├─ event_time.mrk2 ├─ p2.proj ├─ checksums. Txt ├── checksums. Txt ├── count%28% Count %28%29.mrk2 │ ├─ Count.txt │ default_compression_code.txt │ default_compression_code.txt │ default_compression_code.txt │ - dim3.bin │ - dim3.mrk2 │ Bin │ ├─ Groupbitmap % 29user %29.mrk2 │ ├─ sigma, sigma, sigma, sigma, sigma, sigma

What scenarios is ClickHouse Projection suitable for?

In order to explore the scenarios for which Projection is applicable, a typical user behavior data set of 100 million items is prepared. The data model selects the event model, which contains the events the user has done and the dimensions of the events.

In dimension selection, dim1 and dim2 are ordinary dimension values, and there are 10 dimension value types. Dim3 is a high basic dimension with 100,000 dimension value types.

event_key Event identification
event_time Event time
dim1 General dimension
dim2 General dimension
dim3 High degree of kiwi

How to construct Projection for a data table?

  1. It is the same as the source table in any other projection. It contains the basic SELECT statement and omits the FROM TABLE clause.
CREATE TABLE event_projection1
(
    `event_key` String,
    `user` UInt32,
    `event_time` DateTime64(3, 'Asia/Shanghai'),
    `dim1` String,
    `dim2` String,
    `dim3` String,
    PROJECTION p1
    (
        SELECT
            groupBitmap(user),
            count(1)
        GROUP BY dim1
    )
)
ENGINE = MergeTree()
ORDER BY (event_key, user, event_time)
  1. The ALTER TABLE statement adds the PROJECTION definition

ALTER TABLE event_projection1

ADD PROJECTION p2
(
    SELECT
        count(1),
        groupBitmap(user)
    GROUP BY dim1, dim3
)

How to query for a hit projection?

  1. The SELECT expression must be a subset of the SELECT expression in the projection definition.
  2. Group by clause must be a subset of Group by clause in the projection definition.
  3. Where clause key must be a subset of group by column in projeciton definition.

How do I know if projection was hit?

  1. ReadFromStorage (Mergetree (with Projection)) indicates the hit projection
Explain SQL ExAIN Actions =1 select dim, count(1) from event_projection group by dim1 Expression ((Projection + Before ORDER BY)) Actions: INPUT :: 0 -> dim1 String : 0 INPUT :: 1 -> count() UInt64 : 1 Positions: 0 1 SettingQuotaAndLimits (Set limits and quota after reading from storage) ReadFromStorage (MergeTree(with projection))
  1. Clickouse queries the critical log
Projection P (selectExecutor): Choose Aggregate Projection P (selectExecutor): Projection Required Columns: dim1, count() (SelectExecutor): Reading approx. 63 rows with 4 streams

How is the query going?

| scenario: select dim1, count (1) the from event_projection group by dim1 |

The projection is defined Query takes storage Insert the time
Without the projection 5.347 s 650M 7min
The dim1 polymerization 0.018 s 654M 12min
(dim1 + dim3) aggregate 0.319 s 923M 20min
  1. A hit projection provides a significant improvement in query performance compared to no hit projection.
  2. Projection is constructed with additional overhead for storage and data insertion.
  3. If it is constructed with a high base dimension the query time is approximately 200 times lower and the storage and insertion time is significantly higher than if it is not.

Scenario: Comparison of performance improvement effects of different aggregation functions

Aggregation function There is no the projection General dimensional polymerization High base dimensional polymerization
count(1) 5.347 s 0.018 s 0.319 s
groupBitmap(user) 7.936 s 0.040 s 5.840 s

 

  1. GroupBitmap does not improve the performance as well as the aggregate count function under the same conditions.
  2. In high base dimension scenarios, the query effect is almost the same even if the projection is hit or not, but there is additional storage overhead.

It can be concluded that high base dimension is not Projection friendly because of its limited query performance and significant overhead. It is not recommended to use high base dimension when projection is constructed.

What are the concerns of the ClickHouse Projection?

  1. Additional storage overhead

As mentioned above, each projection is stored in a separate directory under the PART directory where aggregate data computed based on the original data is stored. Projection data can be abstractly understood as an aggregation table and its storage overhead will vary according to the different dimensions and degree of aggregation.

  1. Affects data write speed

Source code analysis shows that Projection writing is consistent with the original data writing process. Each data part write calculates aggregate data based on the original data Block combined with Projection definition, which increases the overhead of data write, increases the time of data write, and reduces the data timeliness.

  1. Historical data does not automatically materialize

Projection is based on part granularity and is consistent with data write guarantees in that inserted data is not materialized after it is created. A merge between parts contains a merge between projections. Inconsistent projections between parts will result in a failure of a part merge. Projection data in the PART can be aligned through the Projection Materialization operation.

Projection Materialization: Projection calculation is based on raw data blocks and is prone to memory problems for larger parts of the calculation. INSERT SELECT PIPELINE can be built to simulate the process of new data generation. In the process, multiple temporary small parts will be generated, and the proejction in the small part will carry out multi-segment merge.

  1. Projection cannot be hit due to too many parts

The data query hits the Projector with a condition of 50% or more of the part overriding the Projection. There are some scenarios in which many small parts are generated due to frequent data writing. The increase in the number of parts increases the denominator for calculating coverage, resulting in failure to meet the hit projection condition. However, as the number of merged Parts decreases, subsequent queries may hit Projection.

This article is just a brief introduction to the ClickHouse Proejction features and a basic performance test.

The effect of high base dimension on ClickHouse Projection was also found during performance testing. Later, other articles will give a detailed explanation of the query process and underlying storage of ClickHouse, and analyze the internal reasons for its influence.