Grayscale publishing definition

Internet products need to be developed and put online quickly, and the quality should be guaranteed. Once problems occur, the newly launched system can quickly control the impact surface, so it is necessary to design a grayscale publishing system.

The grayscale publishing system can guide the user’s traffic to the new online system according to the configuration, to quickly verify the new function, and once there is A problem, can also be immediately repaired, in short, is A set of A/B Test system.

Grayscale publishing is allowed with a bug online, as long as the bug is not fatal, of course, the bug is not known, if you know it will be quickly changed

Design of simple grayscale publishing system

Grayscale simple architecture is shown in the figure above, where the necessary components are as follows:

1, policy configuration platform, store gray strategy

2, gray function of the execution of the program

3. The registered service carries IP /Port/name/version

With the above three components, it is a complete gray platform

Gray scale strategy

Grayscale must have grayscale strategy, grayscale strategy has the following common ways

1. Traffic is segmented based on the Request Header

2. Traffic segmentation based on cookies

3. Traffic segmentation based on request parameters

For example, according to the user UID carried in the request, the gray scale range is 1%, then the uid module range is 100, module 0 to access the new version of the service, module 1~99 to access the old version of the service.

Grayscale publishing strategy can be divided into two types, single strategy and combination strategy

Single policy: For example, the mode is obtained by user UID, token, and IP address

Combination strategy: Multiple services grayscale at the same time. For example, I have A/B/C three services and need to grayscale A and C at the same time, but B does not need to grayscale. In this case, A tag field is required

Grayscale release specific execution control

In the above simple grayscale release system architecture, we learned that gray publishing service is divided into upstream and downstream, upstream service is the specific implementation of the gray level strategy program, this service can be nginx, can also be micro service gateway/business logic layer in the architecture, here we analyze the different upstream service, how to fall to the ground

Nginx

If the upstream service is NGINx, then nGINx needs to implement gray policy configuration and forwarding through Lua extension nginx, because NGINx does not have gray policy execution

Nginx does not have the grayscale policy to accept configuration management platform (SCM).

Solution: Locally deploy Agent (need to develop), receive gray policy issued by service configuration management platform, update nginx configuration, graceful restart nginx service

0 Gateway layer/business logic layer/data access layer

You only need to integrate the SDK of the CONFIGURATION management platform, receive the gray policy sent by the service configuration management platform, and execute the gray policy using the integrated SDK

Grayscale release complex scene

Here are two slightly more complex grayscale publishing scenarios. The grayscale strategy assumes that each user takes 1% of the grayscale of the module according to uid. Let’s see how this can be implemented.

Scenario 1: Call multiple services on the chain simultaneously

Function upgrade involves multiple service changes. The grayscale of gateway layer and data access layer remains unchanged, while the grayscale of business logic layer remains unchanged. How should the grayscale be performed at this time?

Solution:

After the requests from the gateway layer of the new version are tagged with tag T, all the requests marked with tag T are forwarded to the data access layer of the new version, and all the requests without tag T are forwarded to the data access layer of the old version.

Scenario 2: Grayscale services involving data

Gray service involving data, will be used to the database, the use of the database will involve you using the database before and after the table field is inconsistent, my old version is A/B/C three fields, the new version is A/B/C/D four fields. At this time, the grayscale of the new version, can not be modified to the old version of the database, this time you need to copy the data out to do this thing

Database there is no concept of gray, this time we can only make a copy of data to come out to read and write, when you write must be full amount (double), cannot say that 90% of the data written to the old version, 10% of the data written to the new version, because this time you will find two database data are not full amount.

In the process of offline full data replication, data loss must occur. In this case, the business logic layer writes a copy of data to MQ. After data synchronization is complete, the new data access layer writes MQ data to the new DB to achieve data consistency, which is the main purpose of MQ.

In the gray scale process, it is necessary to compare the data of two databases to observe whether the data is consistent. In this way, regardless of gray failure, abandon the new DB, or gray successfully switch to the new DB, the data will not be lost.