1. Before the order

For the engineering team, the construction of a sustainable, multi-faceted quality assurance delivery system can build a highway for the rapid delivery of business value, but also play a role in protecting the quality of the delivery process. This paper introduces the evolution and landing of continuous delivery system in Autonavi.

2. Continuous delivery

As summarized in the preceding paragraph, we need to build a continuous delivery system to ensure further breakthroughs in business value delivery without degrading quality. So let’s first understand what continuous delivery is and what guidelines the group has on the construction of continuous delivery.

2.1 Continuous delivery concept

Quoting master Martin Fowler’s article published in 2013, he explains the concept of continuous delivery as follows: Continuous Delivery is a software development discipline where you build software in such a way that the software can be released to production at any time.

In the above text, several key words can be extracted:

  • Standardized guidelines for software development

  • It can be published anytime, anywhere

When does a team reach a state of continuous release? Martin Fowler also gave the standard answer:

  • Your software is deployable throughout its lifecycle

  • Your team prioritizes keeping the software deployable over working on new features

  • Anybody can get fast, automated feedback on the production readiness of their systems any time somebody makes a change to them

  • You can perform push-button deployments of any version of the software to any environment on demand

Based on the above views, we need to grasp the following key points when building our own continuous delivery system:

Standardized process flow provides quick, accurate, and automatic feedback when changes come in solving deployment problems takes precedence over functionality development and one-click release

2.2 Continuous delivery construction of the Group

From a theoretical basis for the continuous delivery after a preliminary understanding, we understand how from the group level defines the ability of continuous delivery, and for continuous delivery what performance improvement objectives, see ali technology of public article “how to measure r&d efficiency? Ali, senior technical experts suggest 5 set of indicators

In this paper, the ability to deliver continuous value is divided into five groups of indicators from three levels, and the ability to deliver continuous value is measured from different angles.

With these professional metrics in mind, how do we define a good continuous delivery measurement goal?

“If you can’t measure it, you can’t improve it,” said Drucker, the father of management. Measurement helps us to have a deeper understanding of r&d performance, set the direction of improvement, and measure the effect of improvement. Therefore, the premise of improving performance is to identify the quality and efficiency bottlenecks in the delivery process.

As a result, the Group developed a 2-1-1 vision based on some of the best practices of BU.

  • The one-hour lead time is a requirement for infrastructure capabilities, which can be packaged, deployed, and verified within one hour through the delivery pipeline after meeting the delivery standards.

  • The one-week development cycle presents challenges in terms of product requirements separation, r&d QA collaboration, continuous testing, and rapid feedback.

  • The 2-week demand delivery cycle is based on the previous two items, which not only involves three parties of production and research, but also includes the cooperation of other collaborative departments to ensure the rapid delivery of business value.

3. Continuous delivery at Autonavi

Guided by the vision of the group, we found that there were many efficiency silos in the delivery process of autonavi server. These efficiency problems would become efficiency bottlenecks in the whole delivery process, thus affecting the early delivery of business value.

Let’s start with an overall Milestone review of some of the important time points for continuous delivery:

  • 2018/08 Conception and Engineering Capacity Building: In the start-up phase of the project, the engineering efficiency team and the business line identified the goal of continuous delivery and started engineering capacity building

  • 2018/12 Initial landing and pilot: In the pilot stage of the project, I completed the initial construction of continuous delivery process, and verified the process sticking points and basic ability verification of quality standards in a project. Ultimately, basic quality standards are established and time consuming in the process is reduced

  • Promote access and platform optimization: in the project promotion stage, optimize the quality items of continuous delivery project and promote them in 6 business lines of Autonavi’s service end. In September, complete the continuous delivery of 6 business lines and 11 applications

  • Review and Outlook: Summary of project promotion, review and outlook of the whole process of promotion and follow-up continuous delivery, problems in the overall output business promotion and improvement methods. Future: Micro-innovation in the delivery process in line with the business line, and in-depth excavation of performance bottlenecks. Combined with various vertical platforms for in-depth mining, such as coverage and accurate regression, Yunge Case platform, code scanning platform, etc

  • After having a general understanding of the evolution of Autonavi continuous delivery system through the presentation of milestone, the following is a detailed review of the implementation process and improvement content.

3.1 Access the delivery process before continuous delivery

Firstly, we will introduce how autonavi’s server is developed and launched iteratively before continuous delivery system.

Like most Internet companies, we break software delivery into multiple cycles and deliver it iteratively to deliver user value incrementally. The figure above depicts the development, test, and release process in a normal iteration cycle, which can be broken down into the following aspects:

1. The iteration cycle starts with changes to the code base

2. After the function development is completed, the R & D will conduct smoke test verification through CI system to ensure that the service can be started normally and basic functions are available

3. Before the specified time of raising and testing, the R&D merges the Feature branch into the iteration branch through CR and MR, and deploits it into the daily environment for raising and testing

4.QA participates in the daily environment testing after receiving the test request email

5. When the daily environment test is completed, QA will produce the test report and confirm that the daily environment test is passed, which can be released to the pre-release environment

6. After being deployed in the pre-delivery environment, the system performs traffic playback tests and finally passes online gray scale verification before being released to the official environment

Through the above pictures and descriptions, we can see that in the seemingly perfect software delivery process, there are still some quality and efficiency problems as follows:

1. Demand accumulation, testing and release:

At present, most services of Autonavi server release requirements in a fixed iteration cycle. The requirements planned within the iteration cycle, no matter the size of the demand, need to wait until the iteration test time point for testing, and release online in the release window of iteration. In this mode, the good thing is that there is a fixed version rhythm and the overall iteration planning is relatively strong. However, due to the fixed test and release window, the delivery of the overall business value is also waiting. Therefore, it is necessary to reduce the coupling of requirements through requirement splitting, and reduce the silo waiting in the process of requirement raising and testing by changing the development and testing mode of R&D and QA, so as to improve the efficiency of business value delivery.

2. Quality standards are not transparent and cannot be timely fed back:

From code submission to final product release, there are usually several stages including daily, pre-release, gray scale and official release. Each stage has its own problems to be solved and different requirements on quality. At present, the collection, summary and notification of results are collected and counted manually by the version followers, and the project members are notified by email. In this way, all the standard control is controlled by the person following the version of each version, and the information is not transparent and the feedback is not timely. Through the establishment of quality standards and transparent and timely notification of market results, inefficiency in communication and information loss in the transmission process can be solved, so as to improve communication efficiency and avoid misunderstandings in communication. After addressing the current issues of transparency and timely notification, we need to further optimize in the following two areas:

Categorize and prioritize notifications to minimize their negative impact

Information content optimization helps services quickly locate and troubleshoot problems

3. Manual participation is required in the deployment and flow process:

For a continuous release process, human involvement is bound to affect the efficiency of the process. So we split the deployment and phase flow into two aspects:

Phase flow: combined with the above phase standards, the program is used to calculate whether the current quality situation can be met and whether the phase flow can be carried out, so as to eliminate human factors and time consuming in phase flow, so as to achieve accuracy

Deployment: Extract the configuration information of the corresponding environment, combine docker-based, and transform some activities such as packaging, deployment, and health check into the standardized execution of the machine, so as to avoid errors or deployment failures caused by human participation through standardization

4. Manual supervision of multi-machine room official release verification:

At present, in the formal release process of the application, due to the large number of machine rooms and machines involved, the business will carry out batch verification. After the release of a batch of machines, the R&D will inform QA to carry out sampling inspection (partial automatic test) of some machines in this batch, which also has efficiency problems. Therefore, how to save the manpower loss in each online process is also a problem to be solved in the pursuit of maximum efficiency.

Each of these details creates obstacles on our way to rapid business value delivery. Therefore, in order to achieve the goal of delivering business value earlier (faster), we must optimize in terms of delivery efficiency, quality standards, and quick feedback of results.

3.2 Continuous delivery landing in Autonavi

Based on the four problems separated in the previous section, from the engineering perspective, due to the scheduling of iteration, the decomposition and separation of requirements need long-term practice and planning, and depend on the support of production, research, testing, project and even other departments, which is a process that needs to be gradually explored and adjusted. Therefore, we will focus on the construction of the last three aspects, hoping to establish the ability of rapid release in a short period of time and eliminate the inefficient points in the delivery process.

In order to solve the problem of efficiency, with the help of the group’s release process and good deployment ability, we have broken down the current situation into the following dimensions:

Based on the group’s release process, establish a standardized flow mechanism corresponding to the group’s release process in the continuous delivery system

Establish the quality standard system of the service side, pull through the quality standard, and eliminate artificial

Get through the quick feedback mechanism of each link, and control the release process, so that the change results are visible at any time

Reduce human participation in the release process and make the whole release process unattended

Through the following continuous delivery flow chart, we can see how the above four grips connect the overall continuous delivery process of Autonavi and how these items are implemented in the delivery process of Autonavi server.

Establish standardized process flow mechanism

In FY19, about 12% of the online problems occurred in Autonavi server due to changes or releases. With this set of data, we hope to reduce or avoid such problems by establishing a complete set of delivery flow processes to control and manage changes.

Based on the above theory, we combined with the current server delivery characteristics, first of all, set the group standard release process as the pilot, to get through the overall continuous delivery process; Secondly, according to the different needs of each application, such as the need for performance environment, coverage environment, etc., combined with pipeline configuration, the whole continuous delivery process flow is optimized; Finally, standardized process flow mechanism for each service is precipitated. Through this method of first ossification, then optimization, and then solidification, a number of standard delivery processes are finally implemented on the server side, avoiding the omission of delivery links and non-standard operations.

Open and implement the server quality system standard

In the existing delivery process of Autonavi, the overall quality assurance measures are mostly carried out in the daily stage. In the process of iterative delivery, the implementation of various quality assurance measures and the execution results are still collected and summarized by QA personnel manually, and whether the results of the stage are approved or not. In this case, there will be omissions of quality items due to the alternation with the version, and quality standards are difficult to control the situation.

Therefore, based on the problems in these aspects, we hope to replace the original manual control with machine control and establish a standardized quality template to avoid the situation of opaque overall implementation standards and no precipitation of execution results. In addition, the latone standard further avoids the omission of non-key service quality checkpoints.

Through communication with the business team, we split the existing quality assurance means on the server side in the first stage, extracted 12 quality items that were relatively important in different stages, and replaced the original manual statistics method with machine supervision. Specifically covered the following dimensions:

Get through the quick feedback mechanism of each link, and control the release process, so that the change results are visible at any time

After establishing effective quality system, quality requirements in each stage as well as access to the standard, to solve the problems of information collection, then we have to think about is how to gather up all kinds of information, effective feedback to various stakeholders in the project, for the subsequent decision support, and when not to phase out the standard, Effectively control the project phase flow.

We divided the problem into two aspects, one is effective feedback and decision support, and the other is control of process flow.

From the aspects of effective feedback and decision support:

Prior to access continuous delivery, most of the automated test tasks in each line of business for different types are notified of feedback results via Jenkins or test case engineering. However, this kind of feedback has a fatal problem, that is, through individual feedback, it cannot see the overall situation, which is not enough to support subsequent decisions.

After continuous delivery of access, in addition to the feedback mechanism of the original business, the platform can provide an overview of the overall status of the current version, through which the platform can observe whether the current version has reached the state of release or what is still insufficient. After combining the two, the project executor can still know the quality results of single point through the original feedback mechanism. For those who need to look at the overall situation, such as the following version, first-line and second-line managers, the gap between the current version and the state to be released can be effectively and clearly known through the quality market, and support the subsequent decisions and adjust the focus of attention

From the perspective of process control:

Before continuous delivery of access, deployable products can be artificially deployed to any environment regardless of whether they have been verified by stage. Although flexibility is relatively high, there are certain quality risks.

We also discussed with business students the trade-off between flexibility and standardization in designing continuous delivery processes. From the overall point of view, in order to avoid missed testing or other online accidents caused by irregular process, the standardization of process flow was finally determined in the initial release, so as to reduce the quality risk brought by flexible deployment. The platform connects with the group’s deployment and publishing system through the group’s laboratory plug-in, and prevents the release process from entering the next stage (link) when the quality items in the stage are not up to the standard.

After the basic continuous delivery process is implemented, in order to meet the requirements of flexibility in business, we are also trying to carry out multi-environment distribution and deployment through custom assembly lines, so as to increase the flexibility of deployment and adapt to different business forms while ensuring that the flow in the main stage is controlled.

Reduce human participation in the process of process release and make the whole process unattended

We know that deployment in an online environment is much more complex than deployment in a daily and pre-release environment. For some service lines, the number of online machines is large and distributed in different equipment rooms. To ensure service availability, thousands of online machines are deployed in batches.

Before continuous delivery of access, in order to ensure the availability of post-deployment services and meet high standards for quality, AFTER each batch of deployment is completed, QA needs to conduct whole-batch verification or spot test verification for the current batch. After the verification is passed, the next batch of release and subsequent verification will be carried out. Although the validation itself is performed through automated scripts, the entire release and validation process can take hours due to the large number of machines and batches, which is a significant efficiency issue.

After realizing the efficiency bottleneck in the business, we can configure the release verification strategy according to the release scenarios of different businesses by connecting the upstream and downstream systems, the group standard process, the group release system and the online verification project of the original business. By sensing deployment messages, the system obtains the list of machines that are deployed in batches and performs automatic authentication based on the authentication policies configured for each service. In addition, combined with the alarm monitoring in the online stage, when problems occur in a batch of release verification, the system can locate the specific machine in which batch of release problems in the first time, helping businesses to quickly locate deployment problems.

Business architecture for continuous delivery systems

4. Landing effect

The construction of the whole continuous delivery system has been implemented in autonavi server for a period of time. So far:

Line of Business coverage: The entire continuous delivery system has covered most of the key businesses of Autonavi server

Construction of quality items in each stage: 12 items

Official release: 50%~90%

While obtaining the above results, in addition to the above quantitative indicators, what is more valuable is the change in research and development and testing habits hidden behind. From r & D, QA and project initiatives to shorter project cycles, to QA being more demanding on quality issues, there is a sense of emphasis on delivering business value early and with high quality. Of course, there is still a certain gap in the goal of delivering business value earlier (faster), which is also a problem that we and the business line need to solve together in the future.

5. The future of continuous delivery

Some people describe continuous delivery as a highway on value delivery. The landing of continuous delivery marks that the fast way of value delivery to users has been established. But ultimately delivering business value sooner (and faster) depends on the vehicles on the fast road.

According to this theory, in addition to ensuring that there are no potholes on the highway, we should also take into account the capability of the vehicle itself, and the performance of the vehicle. Therefore, before the vehicle starts, we need to check the condition of the vehicle more to ensure that the vehicle on the highway will not be unable to raise the speed because of its own reasons.

5.1 Vehicle condition inspection

At present, the existing continuous integration system can only ensure that the vehicle is able to drive on this road, and the inspection of the vehicle condition starts after getting on the highway (most of the quality assurance means are deployed in the daily environment). So based on the guidelines described above, we need to do inspections as early as possible, and we need to do more comprehensive inspections (quality assurance measures move to the left).

Based on this goal and the excellent practice of other BU in the group, we hope to carry out such comprehensive inspection as soon as possible by means of code access control. It is a challenge for both the engineering efficiency team and the business R&D and QA team to get code access to the ground. We need to make the following changes:

QA quality assurance of the simultaneous capacity building

Quality assurance stability and time – consuming optimization

RD RD code submission process changes

Unit test capability building

Regular implementation of Code Review and specification summary

Capability underpins code coverage, business scenario coverage

Code merge access control capability

Code scan combined with CodeReview summary landing

When the above tasks are gradually completed, the time of waiting for each other in the delivery of business value in batch delivery can be eliminated, and the vehicles can also be guaranteed to run faster and more stable on the highway of continuous delivery.

5.2 Vehicle performance improvement

Front vehicle inspection can be said to be the inspection and assurance of vehicles before they hit the road, moving quality assurance means left to the development stage. On the other hand, we hope that through the method of vehicle performance improvement, after the vehicle hits the road, it can make the vehicle drive faster and raise the upper limit of speed.

Vertical test ability to improve accurate regression: through sensing code changes, deduce the Case affected by code changes, making quality assurance more accurate and less time-consuming

Scene coverage: combined with online traffic playback, the code coverage and scene coverage are used to check and fill gaps, so as to make quality assurance more complete

Problem location: combined with failure cases, quickly conduct problem location and feedback

Simultaneous ability: Based on the Case platform of Yunge, I strengthened the simultaneous writing ability of test code and R&D code through interface definition, and explored how to reduce the cost of Case writing and maintenance

Reduce data interference: Based on high frequency, isolation and disposable theoretical practices, reduce data interference in everyday environments and make quality assurance more effective

Big data analysis combined with online data mining:

Online log analysis was used to produce online real scene models, which reduced the time consuming of corpus preparation on the pressure measurement platform, and achieved accurate and efficient scene screening

Big data application:

Combining online real scenes and scene coverage, the offline regression Case set is constructed to reduce the maintenance cost of business regression Case, improve Case efficiency, and quickly locate problems

Using scene playback and recording playback intermediate products to solve the problem of scene construction in single test

Along with the continuous delivery of fast-track, completing expectations by continuous delivery system as an opportunity to dig in multiple longitudinal dimension, and improve the continuous delivery system, finally in earlier (fast) under the premise of delivering business value, can have higher quality and lower labor costs, to ensure that the market competition advantage, Let Gaode in the fierce competition advantage more obvious.