Brief introduction:As an upgrade of the traditional data warehouse, Dataphin is a complete system for data collection, construction, management and use. Dataphin is a powerful tool for building a data center. Its core advantage is that it introduces OneModel methodology which has been accumulated in Alibaba’s data center construction for many years in data construction and management.

preface

The data center platform is the most cutting-edge data construction system in the field of big data at present. It is not created from scratch. The data center is an upgrade of the traditional data warehouse. It is a complete system of data acquisition, construction, management and use. Dataphin is a powerful tool to build a data platform. Its core advantage is that it introduces OneModel methodology (one of the components of OneData system) which Alibaba has accumulated over the years in the construction and management of data. This article focuses on the design concept of Dataphin core function planning.

OneModel

OneModel divides the construction of the data center into four layers:

  1. Topic domain modeling: In the data middle platform, a topic corresponds to a macro area of analysis, such as sales analysis, which analyzes the topic “sales”. A collection of closely related topics is called a topic area. Each industry can be broken down into a Topic Domain Model consisting of multiple (not equal to ten) topic domains.
  2. Conceptual modeling: On top of topic domain, entities and relationships between entities are added to each topic domain.
  3. Logical modeling: Add attributes of each entity and attribute constraints to the conceptual model.
  4. Business Analysis Modeling: An important and commonly used analytical approach and perspective in the industry. Based on the logical model, the business analysis problem is transformed into Dataphin-specific derived metrics, and further refined to atomic metrics and business constraints.

planning

Topic domain modeling and concept modeling in the four layers of OneModel are carried out by the planning function of Dataphin. The four layers of OneModel are not aimed at enterprise-level data middle platform, but are developed around a single independent business, which realizes enterprise-level data middle platform through common dimensions. Therefore, the planning function of Dataphin also includes the division of individual businesses, i.e. the division of business segments. Planning does not affect the accuracy and effectiveness of data, but is an important data-oriented (asset) management function, which affects data search, understanding and authority control, etc.

The business sector

Enterprises vary in size, complexity and span of business. Data reflects business, so the data center of each enterprise is also different. The first step of data center construction is to make planning. The first step of planning is to comprehensively sort out the business structure of the enterprise and divide the business into independent businesses. In Dataphin, it is the division of business segments.

The general principle of dividing business segments is high cohesion and low coupling. The specific process is as follows:

  1. Looking at all the business processes of the enterprise, if two business processes have upstream and downstream relationships, or have common business objects, then they should be placed in the same business block. For example, after the end of the procurement process (purchase order), there will generally be logistics (the enterprise’s purchase logistics) this process. Logistics is dependent on procurement, and goods are the common business objects of the two processes. Therefore, procurement and logistics should belong to the same business plate. Expand the scope and list the upstream and downstream and business objects of each business process. Business processes that are connected directly or indirectly should belong to the same business block. For example, in the retail business, procurement -> procurement logistics -> warehousing -> sales and delivery, marketing -> sales -> performance -> after-sales, etc., some have upstream and downstream relations, and some can be connected together through goods, and they belong to the “retail” business segment.
  2. Conversely, if two business processes do not have any direct or indirect upstream or downstream relationship between them and do not have direct or indirect common business objects, they should not be placed in the same business block. Example: Under the same business, there may be retail and real estate. In the real estate business, there is no upstream and downstream relationship between the processes of land acquisition -> design -> development -> sales and the retail business process, nor can they be connected together through a certain business object. Therefore, two business segments, “retail” and “real estate”, should be created respectively.
  3. It is important to note that some business objects are common at the enterprise level, such as the employees of the company, administrative geographical divisions (yes, this is also a business object), and so on, which link all the business processes of the entire company into a large single network. Therefore, the first step is to identify these enterprise-level business objects, and for business processes that are only connected by these business objects (and have no upstream or downstream relationships), you need to cut the connections and assign them to different business segments.

Topic domain modeling

Subject domain modeling, which further divides the business into multiple subject domains under the business plate. There is no objective principle for the division of the subject domain, but it is mainly based on the industry experience and business understanding of the data modeler. Specific to retail industry for example to explain.

The theme domain of the retail industry is divided into the following figure. The core theme domain is “people”, “goods” and “field “:

  1. Common subject areas: Data that is referenced in all business processes, such as geo-location data, people organization data of an enterprise
  2. Consumer (people) subject domain: this subject domain is mainly the business activity data related to the operation of users (consumers) in the retail enterprise
  3. Commodity (goods) subject area: commodity management (category management, brand management, etc.), commodity structure management (group goods) and other related business activity data
  4. Merchant (site) subject field: includes the data related to offline stores, online e-commerce (self-operated or third-party) and other sales channels
  5. Traffic subject area: consumer visits to stores and other related data
  6. Transaction subject field: information flow and money flow data in the form of contracts between retailers and consumers, such as sales orders, payments, refunds and returns
  7. Performance Subject Areas: Optional. Retailers deliver goods to consumers in accordance with contracts (orders), which is the logistics data from retailers to consumers
  8. Service subject area: mainly after-sales and other data
  9. Interactive topic fields: Optional. Data on non-contractual information flow between retailers and consumers. For example, the interaction between retailers and consumers on social media, and the comments, sharing and collection of consumers on e-commerce platforms
  10. Marketing subject areas: advertising, events, coupons, and other data
  11. Content subject areas: Optional. Content built by retailers for the purpose of drainage, such as commercial advertorials, live broadcast with goods, publicity publications, etc
  12. Supply chain subject areas: the third stream between retailers and suppliers, as well as logistics and information flow data within retailers

The concept modeling

On the basis of the topic domain model, the model that builds the entities in each topic domain and the relations between entities is the conceptual model.

The conceptual model has the following terms:

  1. Entity: A projection of a business object or business activity into the data world. Entities typically correspond to tables one to one. Several entities may have the same characteristics (represented by many of the same attributes), which can be abstracted and generalized into generalized entities, which do not have corresponding data tables.
  2. Business object: An entity that is either a person or an object that participates in a business, or it can be a pure concept. For example, consumers (people), commodities (items), categories (concepts), etc. In some versions of Dataphin, business objects are also referred to as “dimensions.”
  3. Business activity: An entity, the behavior of changing business objects, or the behavior of interacting with business objects. For example: visit behavior, sales behavior and so on. In some versions of Dataphin, business activities are also referred to as “business processes.”
  4. Entity relationship: There are mainly two kinds of relationships between entities. A. One is reference relationship, in which one entity is an attribute of another entity. For example, if the user entity has the attribute address, and the address itself is also an entity, then the user entity refers to the address entity. For another example, in the order entity, the buyer, seller, and item are all participating entities of the order, and the order entity refers to the buyer entity, seller entity, and item entity. In technical terms, a reference is an “association” in SQL. There are three types of reference relationships, one-to-one, one-to-many, and many-to-many, which represent quantitative relationships between instances (records) of two entities that have a reference relationship. B. The second is the inheritance relationship, in which A certain entity A is subordinate to another entity B. Conceptually, A is more detailed and specific than B. For example, in a retail business, an entity can be defined as a “user “, where both the” buyer “and the” member “are users, but more specifically (the buyer is a user who has made a transaction, the member is a user who has participated in the membership program), a” buyer “entity, and a” member “entity inherits the” user “entity.

The above is the design philosophy behind the core functionality planning of Dataphin. I hope it will help you better use the functionality of Dataphin planning.

Copyright Notice:The content of this article is contributed by Aliyun real-name registered users, and the copyright belongs to the original author. Aliyun developer community does not own the copyright and does not bear the corresponding legal liability. For specific rules, please refer to User Service Agreement of Alibaba Cloud Developer Community and Guidance on Intellectual Property Protection of Alibaba Cloud Developer Community. If you find any suspected plagiarism in the community, fill in the infringement complaint form to report, once verified, the community will immediately delete the suspected infringing content.