Ant Financial blue green release practice

This article is reprinted from the public account “Financial Level Distributed Architecture”

prompt: There are children’s shoes xiaomi AI audio has not received oh!

Blue Green Deployment is a smooth transition pattern. In terms of the operation mode of blue-green release, the whole site application can be divided into two equal units A and B. A releases the new product code and introduces A little user traffic first, while B continues to run the old product code. If the new code A runs on the line and there is no indication of A problem, or if user behavior does not respond specifically to changes in A, then gradually introduce more user traffic until all users have access to the new product. Therefore, blue-green release can ensure the stability of the overall system. Problems can be found and adjusted in the early stage of product opening to ensure that the impact surface can be controlled. This ability weaves a strong safety net for frequent online changes, making code changes more secure and reliable.

Before introducing the practice of blue and green release, let’s briefly review the evolution of alipay system release.

A single release

In the early stage of Alipay’s development, there was only one application called Pay in the whole site, and several servers formed a service cluster through load balancing strategy. Release at this stage is very simple. Release in sequence. First, close the traffic of one server, update the product code on the server, and then open the traffic of the server.

Group released

With the increase of service volume and the evolution of SOA architecture, the number of application servers is gradually increasing, and the responsibilities of a single application are divided into multiple applications. As a result, the number of applications and servers involved in a release is increasing, and the release takes longer and longer. To improve the release speed, an application server is divided into several groups and released in batches. On the other hand, multiple applications are released in parallel as far as possible. The release sequence is determined according to the application dependency relationship. Dependent applications are released first and then dependent applications. Publishing in this mode adds a new complexity that needs to be considered, namely the dependency order in which applications are published. Before releasing an application, evaluate the release sequence of the application. If the assessment is incorrect, service errors may occur during the release.

With the rapid development of Alipay business and the comprehensive implementation of the third-generation architecture, the overall release scale has developed into a huge system of hundreds of applications, thousands of single application servers and daily release twice a week. The dependency relationship between applications is complex, and the release order is more and more difficult to coordinate, and even cyclic dependency may occur. On the other hand, the lack of parallelism also makes it difficult to improve the release speed. It often takes more than ten hours to release hundreds of systems in sequence in a daily release, which is simply unacceptable in the era of Internet finance where agility is the key.

Beta release

A beta release is a special form of group release designed to control the risk of a project going live. For some high-risk projects, it is common to release a small number of servers first, observe a long period of time, confirm that the business is OK, and then release the rest of the servers. Compared with normal group releases, beta releases can detect some problems in advance. If there are problems, only a small number of servers need to be rolled back, and the impact on the existing business operation is small. However, beta releases also have its limitations:

Beta releases control the flow of new product code by the number of servers released, and cannot control the flow freely.
Beta releases have very little traffic and limited ability to expose problems.
If the interaction between multiple systems is involved, A and B beta A few servers at the same time, A needs to call B, the new code server of A cannot control will be transferred to the new code server of B, there are new and old code cross-call; One is to increase the difficulty of business verification for observers, and the other is to consider various compatibility issues.

Therefore, we have been actively exploring how to make the increasingly large and complex system code changes can be fast and good agile R & D operation road. The realization of the unitary architecture capability opens up new and enormous possibilities for this kind of exploration. This new publishing capability brought about by the unitized architecture is the blue-green publishing pattern.

The evolution of the unit architecture (LDC) has brought two important capabilities, the deployment of applications in units and the ability to deploy traffic flexibly. This allows us to divide the application into closed units that can independently support the whole site, and user traffic can be flexibly deployed at the entry point of the unit. With the help of LDC logical ZONE structure, the blue and green publishing mode of zone-by-zone publishing was born to break through the difficulties faced by previous releases.

The principle of blue-green publishing

In the LDC deployment architecture, applications are divided into two peer blue and green units by ZONE. Each unit contains several Rzones and one GZONE. The units are isolated from each other, and the invocation links triggered by services exist only within the units. In daily operation, blue and green bear 50% of online traffic respectively.

Step1. Before the release, adjust the traffic of “blue” to 0%, and release all applications of “blue” in two groups.

Step2. Observe the “blue” drainage at 1%. If there is no abnormality, gradually increase the shunt ratio to 100%.

Step3. The traffic of “green” is 0%, and all applications of “green” are released in two groups in an overall disorder

Step4. Restore the daily running state, and blue and green units each undertake 50% of the online service flow

New products launched smoothly

The blue-green release can effectively reduce the risk of the launch of new products. We can freely control the number of users visiting new products and gradually open access from zero, so that problems can be found and adjusted and the impact areas can be controlled in the early stage of the opening of new products.

Fast rollback of exceptions

In normal release mode, the processing speed of exceptions depends on the rollback progress of the code, which can take tens of minutes to hours. In a blue-green release mode, if a major exception occurs, all you need to do is shut down traffic for the new product, a process that can take only a few seconds. After isolating access traffic for the new product, there is ample time to discuss the decision on how to resolve the exception.

Old and new talk of isolation

In blue-green publishing mode, call control is applied inside cells, isolating calls between cells. A unit is either entirely new or entirely old product code, and there is no cross-call between old and new code, avoiding old and new compatibility issues during release.

Improve publishing efficiency

Blue-green release solves the issue of release order, because there is no traffic during release, and there is no cross-call between old and new code, all systems in a cell can be released in parallel at the same time, and release speed has a qualitative improvement.

Ability to monitor and verify service correctness

How to quickly and comprehensively discover problems during traffic introduction? Fine blue-green group monitoring becomes very important, including finer granularity service health monitoring and DB level monitoring. In addition, online automated test verification can also help us find problems caused by the difference between online and offline environments in the drainage stage.

Rapid deployment of large-scale automated applications

The concurrent release of all site applications overturns the sequential release model and poses great challenges to the performance of the distribution platform in terms of resource distribution.

Automatic and accurate traffic scheduling capability

The powerful PaaS platform enables fine-grained traffic scheduling of multiple modes according to the requirements of different product release features.

As the unitized structure and subsequent landing constant evolution and improvement of the operations of PaaS platform construction and the enhancement unceasingly, blue green distribution pattern in the realization of agile, efficient, reliable, complete production environment code changes and guarantee sustainable available on the front of played a huge role, is the ant gold take strong research and development operations system one of the key ability module.

Recommend:

Using ShardingJdbc to deal with large data volume case (1)

Organization and management of microservice small team clusters

ShardingJdbc is used in SpringBoot to shard database tables

Top 5 reasons Spring Boot 2.0 chose HikariCP as the default database connection pool

Mainstream Java database connection pool comparison and foresight

Hystrix application case: Automatic switching and recovery of multiple SMS providers

Long press fingerprint

A key concern

In-depth communication, irregular unique benefits

Scan to join my planet of knowledge

Click “Read the original article” to see the rest of our highlights

Ant Financial blue green release practice

Related Posts

Use GTD to optimize your work and life

The calculation thought and code realization of infix expression to suffixed expression

AVFoundation learning – 1