Building trusted Data products from Zero to one: Analyzing how data governance works in the process trust revolution

Abstract:This paper elaborates on the action process of the idea of “data traction improvement, tool solidification specification” in the process of business team landing, and defines the key role definition and organizational operation form supporting the whole process.

purpose

In order to realize the credibility of cloud service development process, it is necessary to conduct data collection, progress visualization, target traction and capability evaluation on the trusted transformation actions of each service product department based on data, and finally reflect the achievement of goals with data. It is fundamentally different from the traditional operation mode of “driving business team improvement based on data drying and 6+1 indicator measurement”. We are based on the objective data presentation generated on the unified operation tools, identify the basic process breaking points and quality missing actions in the RESEARCH and development process, and reach a consensus with the business team after reaching the target. We call this idea “data-driven improvement, tool-solidified specification”. That is, we not only tell the business team where there is a problem, but also help the business team to improve and improve based on our business tools.

This paper elaborates on the action process of the idea of “data traction improvement, tool solidification specification” in the process of business team landing, and defines the key role definition and organizational operation form supporting the whole process.

Data driven improvement refers to paying attention to the collection, statistics, analysis and feedback of various measurement data in the process of software delivery, objectively reflecting the status of the whole RESEARCH and development process through visual data, analyzing system constraints from a global perspective, and reaching consensus with the business team to extract objective and effective improvement goals. The tool solidifies the specification, analyzes the identified Gap points and key problems, formulates the template specification that can be carried by the operation tool, and the ability requirements that need to change the engineer’s behavior. The implementation effect of these specification requirements is checked on the operation tool, and the improvement effect is measured with data. Finally, summarize and share improvement projects, build a learning organization, drive continuous improvement and value delivery, and drive the transformation of r&d team model and culture.

The trustworthiness of the r&d process in 2020 revolves around four areas of CleanCode, construction, open source and E2E traceability, which are also the most basic, most important and the largest input-output ratio among the trustworthiness changes required by the company.

Overall process Description

The whole operation process, around the data, in accordance with the “define software engineering specification – > tools – > define data analysis model for data measurement and analysis – > data operation found that the actual software project activities and standard deviation of the tool – > tools to aid the team improvements – > curing software engineering specifications” the implementation process, and carry on stage summary to the result. With the improvement of the capability of the business team and the change of the software engineering standardization and development mode, the software engineering specification defined at the beginning will be gradually improved, repeated and continuously optimized, and finally the business team will achieve business success on the premise of complying with the credible specifications of the r&d process required by the company.

1) Define software engineering specifications: Centering on the goal of credible transformation of the company, BU sets r&d mode specifications and capability requirements for each service product department, and COE formulates software engineering specifications suitable for the current situation of BU;

2) Define the data model: COE extracts the core, targeted, tool-measurable data model based on the established software engineering specification, and agrees with each service Product department;

3) The tool realizes data measurement and analysis: according to these data models, the data analysis tool automatically collects, summarizes and calculates data from the data source, and presents the results on the data kanban; The business team can open the summary data, according to the detailed data for action specification self-check and improvement;

4) Data operation finds deviations between actual software engineering activities and specifications: The data management team analyzes the data of metrics in the actual operation process, and identifies gaps and key issues where the actual software engineering activities of the business team are inconsistent with requirements and specifications;

5) Tool-assisted improvement of the business team: COE formulates corresponding improvement measures for Gap points and key problems analyzed, and the operation tool carries the template rectification of process specifications. In view of non-standard behaviors of the business team, convention requirements suitable for each service product department are formulated to promote the ability improvement of the business team members.

6) Tool solidification of software engineering specifications: According to the convention requirements of the business team, check is carried out on the operation tool. The final operation tool bears the requirements of the whole software engineering specifications and the specification requirements incorporated into the operation process.

Three-tier data analysis model

We have adopted three layers of data analysis model, the operating tool automatically user behaviour on the research and development process of detailed data, data analysis tools for quasi real-time summary calculation presents the overall goals, three layer data of systemic auxiliary business team systematic identification specification point in the process of research and development and ability, make “learning, know how to” business team. The three layers of data model are in-depth, iteratively improved, and the lower layer supports the upper layer.

Level 1: target, progress, result data analysis; Align with the company’s credibility reform goal, combine the actual situation of BU, form the overall credibility requirements of BU, and present the process credibility goals to be achieved by each service product department, daily improvement progress and final completion on the data analysis kanban; For example, each service product division is required to meet CleanCode goals.

Level 2: lexical/grammatical analysis data; COE will decompose the measures of specific implementation links for the target traction of the first layer. Only when these decomposed indicators are completed, the target of the first layer can be achieved. The purpose of this layer of data is to help the business team analyze their ability shortcomings and make targeted improvements. By opening the layer down drill of summary data, the detailed data is used to analyze the missing points of the key actions executed by the business team in the DevSecOps software engineering specification process, and the specific improvement specification requirements are formulated, and the operation tools or the business team are pulled to make up the missing actions. For example, the achievement of CleanCode’s process trust goal can be divided into three goals: static check configuration compliance rate, Committer entry guarantee rate and code store Clean. Only these three goals can be achieved, the overall goal of CleanCode can be considered to be achieved.

The third layer: semantic analysis data: COE opens the second layer of data, not only to see whether these key actions have been done, but also to see how the effect is. The final effect is reflected in the improvement of DevSecOps software engineering specifications of the business team. This layer of data analysis focuses on preventing layer 2 data from being done for metrics, but rather on whether the business team is truly referring to BU’s prescription-driven goals to improve performance, credibility, and quality capabilities in the business delivery process, and ultimately to produce actual business results. By opening the detailed data analysis of each team, we can examine whether the key actions performed by the business team conform to the specification, whether they are executed at the appropriate stage and whether the execution effect is effective. And periodically summarize and refine experience, forming knowledge assets solidified into operation tools. For example, the configuration compliance rate of layer 2 static check can be divided into static check configuration validity and static check execution validity. Static check the configuration validity, including: check the number of static check tool configurations, whether they comply with BU configuration specifications, and whether the code is configured when the code is connected to the master trunk. The effectiveness of static check execution mainly depends on whether static check is performed on each MR submission and whether problems are found at the earliest stage of development activities. The level 2 goal can be said to have been achieved only after the level 3 action metrics have been achieved.

Data governance process flow chart

Traction to improve, in order to achieve the “data tools curing specifications” this goal, accurate, consistent and accurate real-time data is the key core, but because of incomplete data acquisition, non-standard business team, inconsistent data dimension, the accuracy of the data there is a growing process, so you need to each hierarchy of data management. In the whole process of data governance, the five roles of “business team/operation tool/governance team/Data analysis tool (Archimedes) /COE” work closely together, and take annual/half-year as the target, constantly summarize experience, cycle and continuous optimization process.

A) COE: traction and alignment with the company’s credibility reform goal, combining with the current situation of BU capability, to form the overall credibility requirements of BU;

B) COE: define software engineering specification requirements suitable for BU according to the business status of BU; Business team: agree with the software engineering specification guidance goals of all areas published by BU;

C) COE: decompose core metrics for specifications and develop metrics data models;

D) R&D users: conducting R&D activities using operation tools; Operation tool: carries the behavioral data precipitated by each BU service product department in the use process;

E) Data analysis (Archimedes) : Quasi-real-time access to the data of operating tools, showing the current status of r&d capabilities of each service product department;

F) COE: Agree with each service product division to set the annual traction target for each service Product division;

G) Data analysis (Archimedes) : present the current situation of traction objectives and capabilities of each service product department with data, and unify data caliber; Present detailed monthly/weekly/daily data and data views supporting Gap analysis and key issues;

H) COE: Analyze the causes and key issues of Gap according to the status quo of traction target and capacity; Governance team: In the process of data operation, according to the current ability of the data analysis team is consistent with the data presentation;

I) R&D users: they can log in the data tool (Archimedes) in real time to view the detailed data of all levels;

J) Governance team: according to the quasi-real-time progress data, analyze the actual problems in the current r&d process of the team and summarize them to COE;

K) COE: combine fine-grained analysis data and practical problems of each service product department summarized by the governance team to formulate specifications and improvement measures, including specifications of operating tools and conventions of r&d users’ actions and behaviors;

L) Working tools: bearing the specification requirements of landing on working tools; Governance team: as the interface person, undertake the code of conduct convention of R&D engineers, combine the actual situation of each service product department to be responsible for implementation;

M) R & D users: standardize the behavior of r & D process according to the specification requirements and self-inspection of data;

N) R & D tools: automate checking whether r & D users’ codes of conduct meet the requirements; The ultimate goal is for the entire software engineering specification to be solidified into a tool for hosting;

O) Data analysis (Archimedes) : present detailed data and summary objectives improved according to the specification; R & D users: self-help view the detailed data after rectification;

P) COE: According to the effect of data improvement and problems exposed in the process, experience assets are formed after summarization and continuous improvement;

Data flow diagram

Credible data in the process are collected, transferred, gathered, analyzed and presented in each tool system. The whole data flow diagram is as follows:

Among them, six important full data sources are identified:

A) Code base data: this data consists of the code base data configured on Fuxi’s service information tree and the code base manually configured on Archimedes to form the complete code set released to the production warehouse by each cloud service;

B) Workitem information flow data: the requirements, problems and tasks on the current identified vision, together with issues on Gitlab/Codeclub, constitute the complete set of Workitem data that can be identified;

C) SRE live network package data: includes package data of all types of deployment, CCE, CPS, and CDK, which constitutes the full live network package data.

D) Open source binary package data: open source central warehouse data (Java, Python, Go, NodeJS four languages), plus the company’s C/C ++ data constitute the full open source binary package data;

E) CONFIGURATION data of r&d process: The COMMITter data configured on Archimedes is full committer data; The master branch identified on Archimedes is the total master branch (logical “master”) data;

F) Fuxi RESEARCH and development process data: Fuxi three libraries, static check and access control data of MongoDB; Test and publish data in MySQL; The corresponding relationship data between packages and multiple pipelines in MySQL; All of them constitute the data of the full Fuxi r&d process with the dimension of “packet”.

Operation organization

Data governance operations team

BU landing strategies, in accordance with the process is reliable in CleanCode, build, open source, E2E trace four areas set up data management operations team, by the “data analysis tool (Archimedes) – COE – interface management team”, consisting of various service products division of three roles, to “measurement for traction, data to be born way to present an objective, The principle of business value feedback as the end “is implemented in data governance.

Responsibilities of the COE:

1) Traction and alignment with the company’s credibility reform goal, combining with the current situation of BU capability, to form the overall credibility requirements of BU; Define the software engineering specification requirements suitable for BU; Decompose the core metrics for the specification and develop the metrics data model;

2) Analyze and identify data quality problems with the management team by using the data generated by the operation tools, and open the layers according to the three-layer data analysis model to identify the gaps in the capabilities of the business team.

3) Analyze typical problems, identify the breaking points of the operation flow to make up, and the non-standard actions of the business team, formulate norms and convention requirements, and gradually improve the data quality.

4) After the summary, identify the lack of process, lack of organization, lack of responsibility and other mechanism problems, and solidify into operation tools.

Governance Team:

1) The implementation of COE data governance specifications in each service product department based on the actual situation of each service product department;

2) Identify the actual problems in the implementation process of data governance actions in each service product department, analyze with COE, propose systematic solutions, and finally solidify into operation tools.

3) Track the progress of process trust in the process of landing the business team, and be responsible for the business team to achieve the trusted change goal and the actual business value generated by the improvement process;

Data analysis tools (Archimedes) :

1) Ensure that the access data is accurate, real-time and consistent, and use the data to reflect the capability status of each BU service product department in real time, so as to provide data support for the data operation of COE and the governance team;

2) Systematic COE scheme design to achieve a unified standard data kanban for the whole BU, which can clearly identify the ability Gap of the business team through data and lead the business team to achieve the overall improvement goal;

3) Perform data presentation according to the three-layer data model, drill down layer by layer, and let the business team “know what it is and why it is”, so that everyone in the business team can make improvements by themselves;

4) Through data analysis, identify the key problems in the process of DevSecOps software engineering specification landing in BU business team, as well as the lack of process and system behind this problem, and promote the final specification to be solidified in operation tools.

Meeting set

The “Data-Driven DevSecOps Capability Enhancement Meeting” is a meeting for help and adjudication on data governance issues in the RESEARCH and development field.

The conference is divided into three stages:

1) The first stage, routine issues, similar to “physical examination report”, reflect the current situation and problems of the business team with data;

2) In the second stage, declare topics in a form similar to “expert consultation”, discuss problems in a specific data governance process and ask for help from Top difficulties;

3) In the third stage, topics are flexibly arranged in a form similar to “problem summary”. For specific issues of a certain type, centralized discussion, summary and definition are carried out to form the standard process and constitution summary of BU.

Master data hosting system

Master data is a single, accurate and authoritative source of data that has high business value and can be reused across multiple business units within an enterprise. Compared with business data and analytical data, master data mainly has the following characteristics:

1) Feature consistency: whether the key features of master data can be guaranteed to be highly consistent in different applications and systems directly affects the success or failure of data governance;

2) Identification uniqueness: within a system, a platform, or even an enterprise, the same master data entity requires a unique data identification, that is, data coding;

3) Long-term validity: the whole life cycle of the business object is even longer. When the master data loses its effect, the measures taken by the system are usually soft deletion, rather than physical deletion;

4) Business stability: in the business process, its identification information and key features will be inherited, referenced and copied by the data generated in the business process. Unless the characteristics of the master data itself change, the master data will not be modified by other systems during the service process.

Main data source identification principle:

A) If there are multiple data sources constituting the master data of the same type of data, there are two processing strategies:

1) Select a source system and gradually integrate the data of other source systems to become the only master data source

2) If 1) cannot be realized, multiple data source systems will be shielded by Archimedes system after encapsulation, and the only data source of this type of data will become Archimedes. After 1) implementation, Archimedes main data source of this type will become invalid.

3) When the data is transferred in multiple operating systems, the criterion for judging whether to be the main data source is that the data has actual business action generated in the system, rather than only carrying the data flow.

B) If it is identified as a unique data source, other systems consuming this type of data cannot conflict with the data source.

All data can only be generated in the data source, other systems can only read, not modify. Data source quality problems found downstream should be corrected at the data source.

C) The user of master data shall not change the original data, but may extend it.

Data consumers are not allowed to add, delete or modify the obtained data, but can extend the attributes based on the data.

D) Fully share information on the premise of information security, and reasonable data sharing requirements shall not be rejected.

If the data is not transferred, it will not only produce no business value, but also increase the storage cost. Only when continuous flow produces real value to the business team can we also get feedback on the use effect and promote the further improvement of data value.

The principle is: priority is given to the safety of core assets and efficiency of non-critical assets.

A class of primary data sources

Secondary primary data source

Click to follow, the first time to learn about Huawei cloud fresh technology ~