Index system computing architecture design

preface

In the previous article “Index Management System Design”, I talked about the problems to be solved by the index system, as well as the macro construction and model design of the index system. It is not particularly clear what the computational storage architecture will be when it is implemented. In this chapter, I will focus on the design of indicator computing architecture.

Past implementation issues

Indicator systems are similar to label systems in that they both have a lot of fields, and to some extent, they can also become dependencies. For example, a labeling system can use an indicator system as a data base, but that is a separate topic. Here are the problems existing in the label system and some report development that I participated in before.

The following figure is a logical fragment of a tag processing system. This script calculates all the tags in the tag system one by one

The following is a script fragment of a report SQL, which also calculates multiple fields of the report at one time, with very complex processing logic

From a software design perspective, the above development approach is highly coupled, which can cause a number of problems

All the logic is coupled together, making subsequent reading difficult
To remove, add, or update a indicator/label, you need to change the original script, which costs a lot. If a bug occurs, the previous indicator/label will be affected
When multiple reports have the same caliber index, the processing logic of the index needs to be repeatedly compiled in multiple reports, and it also needs to be modified in multiple places when modifying, which is unavoidable

Computing Architecture Design

The main idea to solve the above problems is low coupling and high cohesion. The report is disassembled to the indicator granularity. The unit of computing storage is not the report, but the indicator. In this way, the reusability of indicators is enhanced, and the efficiency of adding, deleting and modifying individual indicators is higher, thus improving the robustness of the whole system.

The entire computing architecture is divided into four layers as shown below

Basic number warehouse layer, this layer is mainly number warehouse model
Index calculation layer, one index one calculation task (task can be SQL or other processing script code), it is based on the underlying basic data warehouse layer for processing
Indicator storage layer, each processed indicator has a corresponding table to store
In the report layer, the output of the business report can obtain the final report through flexible JOIN combination of indicator tables

Why one index and one table

The number of dimensions used by different indicators is different, which makes the fields of indicator result data different, and it is impossible to store all indicator values in a unified table. For example, the sales index of each region in the last 30 days uses one dimension, big region

regional	sales
Central China	500000
The north China	600000
East China	700000
.	.

In the last 30 days, the sales amount of each product line and each region is divided into two dimensions: region and product line

regional	The product line	sales
Central China	Women’s clothing	10000
Central China	Men’s clothing	20000
The north China	Men’s clothing	30000
The north China	Children’s clothes	40000
.	.	.

Whether one index in one table is too many

As long as the index does not duplicate construction, almost ten thousand can cover all reporting requirements of the average company. Tens of thousands of tables is not a big problem for a typical big data processing system such as Hive. Moreover, these indicators are only many, not necessarily large data volume.

The resources

Mp.weixin.qq.com/s/uavKimWsk… www.cnblogs.com/niceshot/p/…

Index system computing architecture design

preface

Past implementation issues

Computing Architecture Design

Why one index and one table

Whether one index in one table is too many

The resources

Related Posts

Ali how to achieve a million magnitude of hardware fault self-healing?

Automated Testing trends in 2021

Super easy to use at home and abroad online free SMS receiving platform