Introduction: Dataphin is an upgrade of traditional data warehouse and a complete set of system for data collection, construction, management and use. Dataphin is a powerful tool for building data center. Its core advantage is that it introduces OneModel methodology from alibaba’s data center construction over the years in data construction and management.

preface

Data Center is the most advanced data construction system in the field of big data. It is not created from scratch. Data center is an upgrade of traditional data warehouse, which is a complete system of data collection, construction, management and use. Dataphin is a powerful tool for building data center. Its core advantage is that it introduces OneModel methodology (one of the components of OneData system), which has been accumulated in alibaba’s data center construction for many years, into data construction and management. This paper will focus on the design concept of Dataphin’s core function planning.

OneModel

OneModel divides the construction of the data center into four layers:

  1. Topic domain modeling: In the data center, topics correspond to a macro analysis domain, such as sales analysis is to analyze the topic of “sales”. A collection of closely related topics is a topic domain. Each industry can be broken down into a subject area model consisting of multiple (ten or so) subject areas.
  2. Conceptual modeling: Based on the topic domains, entities and the relationships between the entities are added within each topic domain.
  3. Logical modeling: Add attributes and constraints to each entity based on the conceptual model.
  4. Business analysis modeling: important and common analysis methods and perspectives in the industry. Based on the logical model, the business analysis problem is transformed into dataphin-specific derived metrics, and the atomic metrics and business constraints are further refined.

planning

The topic domain modeling and concept modeling in the four layers of OneModel are carried out by Dataphin’s planning capabilities. The four layers of OneModel are not targeted at the enterprise-level data center, but are developed around a single independent business. Multiple independent businesses implement the enterprise-level data center through common dimensions. Therefore, Dataphin’s planning capabilities also include the division of independent businesses, i.e. business blocks. Planning does not affect the accuracy and timeliness of data, but is an important function of data (asset) management, which affects data search, understanding and authority control.

The business sector

Enterprises vary in size, business complexity and span. Data reflects business, so the data center of each enterprise is also different. The first step of data center construction is planning. The first step of planning is to sort out the business structure of an enterprise and divide the business into independent businesses. In Dataphin, this is the division of business blocks.

The general principle of business segment division is high cohesion and low coupling. The specific process is as follows:

  1. Look at all business processes in an enterprise. If two business processes have upstream and downstream relationships, or have common business objects, they should be placed in the same business block. For example, after the procurement process (purchase order), there will generally be logistics (logistics of the purchase of the enterprise). Logistics is dependent on procurement, and goods are the common business object of two processes. Therefore, procurement and logistics should belong to the same business plate. Expand the scope by listing the upstream, downstream and business objects of each business process, directly or indirectly connected business processes that should belong to the same business block. For example, in retail business, procurement -> procurement logistics -> storage -> sales and delivery, marketing -> sales -> performance -> after-sales, etc., some have upstream and downstream relationship, some can be connected by goods, they belong to the “retail” business plate.
  2. Conversely, if there is no direct or indirect upstream or downstream relationship between two business processes and no direct or indirect common business object, they should not be placed in the same business block. For example, there may be retail and real estate under the same enterprise. In real estate business, there is no upstream and downstream relationship between land acquisition -> design -> development -> sales and retail business processes, nor can they be connected together through a business object. Therefore, two business segments, “retail” and “real estate”, should be created respectively.
  3. It is important to note that some business objects are shared at the enterprise level, such as the company’s employees, administrative geographic divisions (yes, this is also a business object), and so on, which connect all business processes across the entire company into a single, large network. Therefore, these enterprise-level business objects need to be identified first, and business processes that are connected only through these business objects (without upstream and downstream relationships) need to be cut off and grouped into different business blocks.

Topic domain Modeling

Subject area modeling, which further divides the business into subject areas under business blocks. Subject areas are divided without objective principles and are based on the industry experience and business understanding of the data modeler. Take retail industry as an example.

The theme fields of the retail industry are divided as follows, with the core theme fields as “people”, “goods” and “market “:

  1. Common subject areas: data that is referenced in all business processes, such as geolocation data, people organization data of the enterprise
  2. Consumer (person) topic domain: This topic domain is mainly the business activity data related to the operation of users (consumers) in retail enterprises
  3. Commodity (goods) subject field: commodity management (category management, brand management, etc.), commodity structure management (grouping goods) and other related business activity data
  4. Merchant (field) subject field: includes data related to offline stores, online e-commerce (self-owned or third-party) and other sales channels
  5. Traffic subject area: consumer access to stores and other related data
  6. Transaction subject field: includes information flow and capital flow data in the form of contracts between retailers and consumers, such as sales orders, payments, refunds and returns
  7. Subject area of performance: optional. Retailers deliver goods to consumers according to contracts (orders), which is retailer-to-consumer logistics data
  8. Service subject area: mainly after-sale data
  9. Interactive subject fields: optional. Non-contractual information flow data between retailers and consumers. For example, retailers interact with consumers on social media, and consumers’ comments, sharing and collection on e-commerce platforms
  10. Marketing subject areas: advertising, events, coupons and other data
  11. Content Subject field: optional. Retailers to drainage for the purpose of the construction of content, such as advertorials, live with goods, promotional publications
  12. Supply chain subject areas: the third flow between retailers and suppliers, as well as logistics and information flow data within retailers

The concept modeling

On the basis of the topic domain model, the conceptual model is constructed from the entities in each topic domain and the relationship between the entities.

The conceptual model has the following terms:

  1. Entity: A projection of business objects or activities in a business into the data world. Entities typically correspond to data tables one to one. Several entities may have the same characteristics (represented by many of the same attributes), and these entities can be abstractly generalized into generalized entities, which have no corresponding data tables.
  2. Business object: An entity that is either a person or an object that participates in a business or can be a pure concept. For example: consumer (person), commodity (item), category (concept) and so on. In some versions of Dataphin, business objects are referred to as “dimensions”.
  3. Business activity: An entity, the changing behavior of business objects, or the interaction behavior between business objects. For example: access behavior, sales behavior and so on. In some versions of Dataphin, business activities are referred to as “business processes”.
  4. Entity relationship: there are two main types of relations between entities: a. Reference relationship. An entity is an attribute of another entity, for example, the user entity has the address attribute, and the address itself is an entity, so the user entity references the address entity. For another example, in the order entity, the buyer, the seller, and the commodity are all participating entities of the order, and the order entity refers to the buyer entity, the seller entity, and the commodity entity. Technically, a reference is an “association” in SQL. There are three types of reference relationships, one-to-one, one-to-many, and many-to-many, which represent quantitative relationships between instances (records) of two entities that have a reference relationship. B. The second is the inheritance relationship. An entity A is subordinate to another entity B, and A is more detailed than B in concept. For example, in a retail business, you might define an entity as a “user,” where “buyer” and “member” are both users, but more specifically (the buyer is a user who has a transaction, and the member is a user who has participated in a membership project), and the “buyer” entity inherits the “user” entity.

These are the design concepts behind Dataphin’s core function planning, hoping to help you make better use of Dataphin’s function planning.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.