This is my ninth original article

Here comes the useful dry stuff!

I have a friend private letter, said that I hope to know the overall construction of warehouse details and templates. What’s the point of that? Dry stuff!

A panoramic view of the town building

00

The construction process

The process of warehouse modeling is divided into business modeling, domain modeling, logical modeling, and physical modeling, but these steps are relatively abstract. In order to facilitate landing, I summarized the above seven steps based on my own experience: sorting out business processes, vertical segmentation, index system sorting, entity relationship research, dimension sorting, data warehouse stratification and physical model building. For each step, tools, templates, and cases are presented instead of theories.

01

The business process

1

Find the company’s core business process, find who, in what links, what key actions to do, what results to get.

2

Comb through the customers and focus of each business node to find where the data is.

02

Domain/subject

3

Decide the construction way of storehouse, fast hand in work, use bottom – up construction. To fully support, top-level planning, step by step implementation, delivery slightly slower.

4

Subject areas are also divided by business domain. The topic domain can be divided by business flow (recommendation), demand, responsibility and product function.

03

The indicator system

5

The significance of index lies in unified language and unified caliber. So the definition of indicators must have strict standards. Otherwise, such as water without roots.

Indicators can be divided into atomic indicators, derived indicators and derived indicators. Their meanings and naming rules are illustrated as follows:

6

According to the standard of index system construction, the index system was sorted out. The whole system should also be combed with business as the core. At the same time, the dimensions required by each business process are sorted out. Dimensions are the way you view the business, and metrics are the quantitative results that measure how good or bad the business is doing.

Note that you cannot be limited by existing data at this point. If the analysis shows that the business process should have this metric, but there is no data, please note it and ask for data collection.

04

Entity relationship

7

Data is generated for every business action. We will be able to obtain the data, extract entities, and draw ER diagrams for later dimensional modeling.

8

The business process is also taken as the starting point to comb down, and the core at this time is the business table. Sort out the dimensions and indicators involved in each table.

05

Dimension to sort out

9

Dimension standardization is the process of unifying the same dimensions in various business systems. The field name, code, name may be different, we need to fully grasp and standardize.

The dimension standards refer to national standards and industry standards as far as possible. For example, regions can refer to national administrative region codes.

In addition, some dimensions have hierarchies, such as the province, city, and county of a region. Cascades in most business systems are multilevel dimensions.

06

Several positions layered

10

Data warehouse is generally divided into 4 layers, the name may be different, but its purpose and construction method are basically the same:

Each layer uses a different modeling approach, the core of which is layer by layer decoupling. The lower you go, the closer you are to the record of the business happening, and the higher you go, the closer you are to the business goals.

11

Depending on the design theory of storehouse layering, we can sort out the overall data flow diagram according to the actual business scenario. This map will tell everyone exactly where the data comes from, where it goes, and what services it ultimately provides.

07

model

12

This is when you really get into pure code. Data warehouse, ETL tool selection; ETL process development; Establishment of Cube; Task scheduling, setting update mode and update frequency; Check logs daily, monitor ETL execution, etc.

The ETL will become very clear

bootleg

1, warehouse construction must come from the business, to the business;

2. The purpose of warehouse layering is business decoupling;

3. Either way, the core of modeling is business entities;

4. Construction by field can deliver work quickly, sequelae will break out after 2 years, and it is difficult to solve;

5. Warehouse construction should spend 75% of your time in the design phase. If not, you’re screwed.

6. The warehouse itself can also be iterated.

7. Traditional data warehouse does not have a model called “wide table model”, a new term born in the era of big data, because the join cost of many big data components is very high. In effect, paradigm degeneration.

Wait ~~~ click “watching” before you go ~~