To understand the function of a grayscale publishing system, I think it is necessary to understand the concept of grayscale publishing and grayscale publishing process, clear the purpose of grayscale from the concept and process and sort out the system tools can support the process, then the implementation of a set of publishing system needs to consider the place will be clear. The purpose of grayscale release is to achieve smooth upgrade when the application is upgraded from the old version to the new version. During the upgrade process, it is common to select part of user traffic according to a certain release strategy and request the application of the new version first. By collecting the feedback of these users on the application of the new version, And the logs, performance, and stability of the application instances of the new version. Based on the review, determine whether to continue adding application instances and traffic ratio of the new version until full upgrade or roll back to the old version if problems are found. The corresponding grayscale release flow chart is as follows:

According to the above grayscale publishing concept and process definition, a grayscale publishing system needs us to consider the problem is clear at a glance.

1. Customize release policies

The deployment of applications of a new version is usually divided into multiple phases in the grayscale release process, and the number of instances is gradually increased. For example, a grayscale release is divided into three phases, and the number of deployment instances of the new version is gradually increased from 10 to 50 to 100 in the three phases. This is to ensure that the overall functionality of the application is stable. At the end of each phase, we can observe and review the effect of the new version, and decide whether to continue to add new version instances according to the effect of the phase release, or to adopt a policy to roll back if problems are found. On the other hand, in order to increase the degree of automation of the release process, the grayscale release system will consider supporting the function of increasing automatic execution between different stages. Of course, users will also need the support of the grayscale release platform to add manual audit between stages. Therefore, it is necessary to support customized multi-stage publishing strategy for grayscale publishing system.

2. Traffic ratio

In gray release process, when the traffic load balancing strategy of entrance is a simple equilibrium by number of instances, the flow ratio is the number of different application versioning than, but realized in certain scenarios that limits the user traffic configuration mode of use, for example, assume that the user is limited by resources less want to use the new instance to deal with larger flow rate won’t do it. Grayscale publishing platform still needs to consider the application of the old version of the traffic ratio function, so combined with the function of customized publishing strategy mentioned in the previous point, users can achieve more accurate control of the user traffic ratio processed by the new version. Grayscale publishing functions, such as those implemented by Netease, have been combined with Service Mesh technology to precisely control the traffic ratio for each version of the application.

3. Logs and monitoring

At every stage of the gray release process, release people all need according to the new version of the operation is to determine the follow-up to continue the upgrade process or find problems directly rolled back, and gray release system will need to provide users with as much as possible the judgment of the indicator and the reference data, such as the need to support the user to see deployed instance running log, It also provides monitoring data such as CPU, memory usage and network card traffic to provide a basis for judging the function and stability of the new application.

4. Fast rollback

For the deployment system, any online upgrade of an application requires the ability to quickly roll back, so that the old stable version can be restored in time to control the damage when problems occur. The rollback function is used to offline or delete the new instance, recreate the old instance, and switch traffic to the old version.

5. Alarm function

The publishing system is responsible for the entire publishing process. In the process of connecting with users, I have also encountered users’ feedback that the old and new versions of gray process coexist for a long time, and I hope to provide immediate alarm requirements for the unfinished gray process. For example, after the new version of some mobile apps is launched, they need to run for a period of time to investigate and obtain users’ feedback on the new function. At this time, if the publishing system can timely remind users of the current unfinished grayscale publishing process, as well as the old and new versions of the application information in the process is very necessary. On the other hand, the publishing system also needs to provide alarms for monitoring indicators. For example, the CPU usage and memory usage rise caused by the release of a new version can be notified to the publishing personnel in a timely manner.

From years of DevOPS product design and development experience of netease Cloud, the above five points are indispensable for a grayscale publishing system. At present, netease Light Boat DevOPS product realizes grayscale publishing function of host and container according to these requirements. When users publish grayscale on the light Boat platform, To customize the released at each stage of the old and new version instance rate and flow, at the same time control at the end of each stage the system automatically into the next phase key nodes or artificial audit operation, once found the problem, support users quickly rolled back, docking system at the same time also the application log and monitoring data view, alarm, application of version management, product management, and other functions, The closed-loop management of application publishing is realized.