This is peng Wenhua’s 95th original article \

Iqiyi’s copyright awareness is too strong, yesterday after the article, the little assistant contacted me to delete the manuscript. Well, we’ll just have to issue a castrated version.

I went to DataFun’s activity again. This time, I went to the headquarters of IQiyi to learn the experience of building data center of IQiyi, Meituan and Bytedance.

Now I will share with you the entry-level iQiyi data warehouse platform and service construction practice. It is said that the entry level, because the data warehouse is the most basic part of the content of this share, the rest of the share will be based on warehouse work

Let’s get Mr. Chang out of the building.

[Big guy photo] \

Iqiyi number warehouse evolution road

[Data Warehouse 1.0 Architecture Diagram]

This is the structure of iQiyi warehouse 1.0, which consists of 4 layers, ODS, detail layer, aggregation layer and application layer. The scene has a friend logarithmic storehouse construction not quite understand, grabbed Zhang Di big man hard to ask. Some on the live streaming platform mocked the questioner for being unprofessional. I have a comprehensive article written before, have not played data warehouse students can refer to: click to read: how to build a data warehouse.

Zhang Di disclosed on the spot that the construction was based on Kimball construction from the bottom up, and finally built one chimney after another. It was too painful to quote each other’s chimneys, duplicate wheels, have different calibre, and fight every day. Although there are all kinds of warehouse specifications, but if you build a number of warehouse know, chimney type construction has a small scope, construction fast, quick benefits, but also split each other, caliber is difficult to unify, each say, “chimney type construction” natural and difficult to cure the problem.

In order to reduce costs, increase efficiency, and reduce arguments, IQiyi began the construction of warehouse 2.0, in the words of Zhang Di, almost to push back.

Data Warehouse 2.0 Architecture Diagram

Notice the three boxes in the middle of this diagram: unified warehouse at the bottom, subject warehouse and business mart at the top. Also don’t ignore the uniform metric system and consistency dimension on the right.

This is the fundamental difference between iQiyi Warehouse 2.0 and 1.0. It looks like a few boxes on the PPT, but I can feel that the time invested after this is at least calculated in “years”, with countless wrangling, compromise, arguments, overtime and late nights.

In fact, this framework is CIF architecture that integrates Inmon top-down construction and KimBall bottom-up construction. What’s the difference between a traditional data warehouse and a big data warehouse? There are details in it.

【 Schematic diagram of Data Warehouse 2.0 】

The advantage of this approach is that all the underlying data of the company is an exit (unified data warehouse), solving the problem of duplicate wheels. Above classifies the construction according to the theme and business respectively, satisfies the personalized demand. Unified index system and consistency dimension solve the problem of data fighting. Then the world will be at peace. \

Iqiyi warehouse construction experience

[Data Platform Architecture Diagram]

Photography technology is really too slag, you more excuse, make do.

This structure is the total experience of iQiyi’s warehouse construction. Let’s leave it for now and continue to read it. We will talk about it in detail. It will be clear when you come back and look at this picture.

[Schematic diagram of unified dimension combing]

[Schematic diagram of index system sorting] \

These two graphs are a methodology for sorting out dimensions and metrics. I have an article devoted to these, which I mentioned above, and here it is for you to read: How to Build a Data Warehouse.

In addition, I have previously shared all the theoretical books and process templates for warehouse construction, click download: [Resource Pack] Complete data warehouse construction resource pack. I won’t repeat it here.

Unified warehouse modeling flow chart \

Market, theme number warehouse modeling flow chart \

What Zhang Di wrote is really detailed, which can be used as a standard reference for warehouse modeling. This is the process of modeling, business modeling – data modeling – physical modeling. There will be two films, because unity number CangJianMo starts from the bottom, and warehouse and business fairs, theme is based on unity number above positions, modeling, so which layer can be avoided business modeling (also is to comb the business actually, but I don’t need to do basic subject domains, entity and facts to confirm, etc). Someone on the scene asked about the modeling method, and I also gave a previous share of the content as a reference, basically a number of warehouse modeling methods are talked about: click to read a lecture on the data warehouse modeling method – data warehouse architect read aloud.

[Data model page display] \

This is the built model, the content has been heavily coded. As you can see from this picture, they also built a platform on which to manage all the content in the warehouse. This platform is called the “data graph”.

Iqiyi data governance

[Introduction to data graph] \

Put out the introduction of the data graph. Because the industry is generally called data map, data governance, data management, data asset management and so on, are basically the same meaning, we understand on the line.

[Schematic diagram of metadata Management]

The first feature is metadata management. Everything else is fine — metadata collection, unified services, technical metadata, business metadata — all metadata management is involved. But there were two points about the architecture that caught my attention: JanusGraph and genealogy collection.

This means that they are using HiveHook, etc., to automatically collect lineage/blood information between layers and store it in JanusGraph. This is very interesting, indicating that they are using Atlas. Because that’s how Atlas saved the kinship to JanusGraph. In addition, they use ES to store technical metadata and business metadata, which should provide a variety of fast and efficient metadata query and search services.

[Data Graph function introduction] \

This is the content of iQiyi data atlas, you can come to copy the homework ha ~~~

[Data consents Display chart] \

This is their data flow diagram. Big bosses also revealed that each report form has its own grade label, on the one hand to distinguish the priority level, on the other hand can be traced up, consistent to find the source. Once there is a problem, it can be quickly located, that is quite useful ah!

[Data Atlas Page Display Diagram]

This is the data atlas from the business perspective. Business students can find the data they want freely on this interface.

Notice the content of the figure above: tables, dimensions, metrics, and you can query metadata freely.

Then look at the picture below. It’s a little unclear. I guess it looks something like this:

The labels in the upper left corner of the figure below should read: Switch storehouse: Unified storehouse, Business Mart, Theme storehouse; The following steps 12345 are:

  • Select associated businesses, such as shopping malls;
  • Select business processes, such as playback (should be the business domain);
  • Choose the data model, which is the topic model, and it looks all test;
  • Select the physical table, hit the heavy code, can not see;
  • Finally, look at the dimensions and metrics, this should be able to see all the dimensions and metrics that can display data within this limited scope, and even pull up a large table for you.

conclusion

A quick summary:

1, the construction process is the first Kimball, quickly meet the business needs. As the problems piled up, we started rebuilding with CIF. Tip: This is the best path from 0. Don’t use CIF or INmon at first, you will die miserably.

2. Establish proper methodology, sort out dimensions and indicators, and then start modeling (business modeling, data modeling, physical modeling). This article has more detailed methodology, please refer to.

3. Atlas is used for data atlas, and metadata is stored in ES. Metadata and blood relationship are automatically collected, and API is used to provide query and search services to facilitate businesses to quickly locate metadata.

Time is limited, Zhang Di did not disclose more construction details, more regret. But fortunately, IQiyi OLAP leader Lin Hao has shared OLAP experience in Apache Kylin. PPT has been made public, so there is no need to consider infringement.

“Apache-kylin’s Practice in IQiyi” can be obtained by replying “iQiyi”. Free to write iQiyi OLAP analysis.

If there’s something I’m not clear about, leave a message in the background and let’s talk.

Enjoy better with the following articles

Dry goods | breath finished data warehouse modeling method

Dry goods | what is called understand the business? Five levels of analysis

Dry goods | how to build a data warehouse

[Data package] Data warehouse construction complete data package

I need your thumbs up. I love you so much