preface

In the previous article “Index Management System Design”, I talked about the problems to be solved by the index system, as well as the macro construction and model design of the index system. It is not particularly clear what the computational storage architecture will be when it is implemented. In this chapter, I will focus on the design of indicator computing architecture.

Past implementation issues

Indicator systems are similar to label systems in that they both have a lot of fields, and to some extent, they can also become dependencies. For example, a labeling system can use an indicator system as a data base, but that is a separate topic. Here are the problems existing in the label system and some report development that I participated in before.

The following figure is a logical fragment of a tag processing system. This script calculates all the tags in the tag system one by one

The following is a script fragment of a report SQL, which also calculates multiple fields of the report at one time, with very complex processing logic

From a software design perspective, the above development approach is highly coupled, which can cause a number of problems

  • All the logic is coupled together, making subsequent reading difficult
  • To remove, add, or update a indicator/label, you need to change the original script, which costs a lot. If a bug occurs, the previous indicator/label will be affected
  • When multiple reports have the same caliber index, the processing logic of the index needs to be repeatedly compiled in multiple reports, and it also needs to be modified in multiple places when modifying, which is unavoidable

Computing Architecture Design

The main idea to solve the above problems is low coupling and high cohesion. The report is disassembled to the indicator granularity. The unit of computing storage is not the report, but the indicator. In this way, the reusability of indicators is enhanced, and the efficiency of adding, deleting and modifying individual indicators is higher, thus improving the robustness of the whole system.

The entire computing architecture is divided into four layers as shown below

  • Basic number warehouse layer, this layer is mainly number warehouse model
  • Index calculation layer, one index one calculation task (task can be SQL or other processing script code), it is based on the underlying basic data warehouse layer for processing
  • Indicator storage layer, each processed indicator has a corresponding table to store
  • In the report layer, the output of the business report can obtain the final report through flexible JOIN combination of indicator tables

Why one index and one table

The number of dimensions used by different indicators is different, which makes the fields of indicator result data different, and it is impossible to store all indicator values in a unified table. For example, the sales index of each region in the last 30 days uses one dimension, big region

regional sales
Central China 500000
The north China 600000
East China 700000
. .

In the last 30 days, the sales amount of each product line and each region is divided into two dimensions: region and product line

regional The product line sales
Central China Women’s clothing 10000
Central China Men’s clothing 20000
The north China Men’s clothing 30000
The north China Children’s clothes 40000
. . .

Whether one index in one table is too many

As long as the index does not duplicate construction, almost ten thousand can cover all reporting requirements of the average company. Tens of thousands of tables is not a big problem for a typical big data processing system such as Hive. Moreover, these indicators are only many, not necessarily large data volume.

The resources

Mp.weixin.qq.com/s/uavKimWsk… www.cnblogs.com/niceshot/p/…