For those familiar with Kubernetes, you should know that Once your application is deployed to Kubernetes, Kubernetes can automatically manage the application for you, when the Pod failure can be automatically scheduled to rebuild, to ensure the continuous availability of services. However, Kubernetes’ native publishing strategy is difficult to meet production-level publishing requirements. This article will introduce a common application distribution model in Alibaba: batch distribution, and how the cloud effect is implemented in Kubernetes distribution model.


Kubernetes rolling upgrade

Kubernetes RollingUpdate is a native service upgrade strategy provided by Kubernetes. This mode is intended to update applications without stopping external services.


In native RollUpdate, users can set upgrade policies, such as maxSurge and maxUnavailable control Pod startup policies and the maximum number of unusable pods, to ensure that there are no pods available during a rolling upgrade.


For veteran Kubernetes, livenessProbe and readinessProbe probes must also be added to confirm whether the service is available.


However, the ideal is always full, reality is always backbone. In the actual release process, the service is successfully upgraded and the image is successfully launched. But that doesn’t mean your release is complete.


Those of you who are interested in continuous delivery may have heard of various publishing strategies, such as blue-green publishing, grayscale publishing, and so on. These release strategies, tracing back, are designed to separate deployment from release, allowing for human intervention to ensure that the upgrade is truly meeting the business needs before the service actually goes live.


Alibaba’s batch release model


Batch release is a service release online mode that is widely used inside Alibaba. Batch publishing simply means upgrading only one instance of a service at a time in batches.




A very important action for batch publishing is to pause. After the pause, users can manually verify the newly upgraded instances. If everything is correct, they can continue to upgrade the later batch of service instances.


The importance of batch publishing is to provide manual or automatic (unattended) intervention in the validation of the release process, as well as the ability to quickly roll back problems if found.


Batch publishing is implemented on Kubernetes


In Kubernetes’ application model, Pod and Pod generally do not communicate directly, and all traffic between internal applications or outside the cluster needs to pass through a separate Serviec object.




In the cloud effect deployment model, we abstract the Service as a target application for deployment. During a batch release, we automatically create a copy of the new version for the Deployment object associated with the current Service. Users can define an execution batch for the entire batch release process.


As shown below, cloud effect controls the number of Pod instances of different versions by controlling the number of copies of the current version and the new version Deployment object in a batch release:



After the first releases are complete, the whole process will be suspended automatically. In this case, you can directly verify the deployment result in the cluster, and confirm whether to continue the release process if the verification is correct. If the release is abnormal, you can roll back the entire release process, and the application is automatically rolled back to the pre-release state:

To ensure that Service traffic does not flow to the starting Pod instance during the entire batch publishing process, LivenessProbe and ReadinessProbe are used together to ensure that the Service is continuously available throughout the publishing process.


Enhanced batch publishing capability with Istio


In the original Service load balancing implementation of Kubernetes, it realizes traffic routing from ClusterIP to PodIP through iptable, and uses the feature of — Probability of Iptables to realize traffic diversion.




In the above example, if the batch was released to 2 batches, the new and old PODS would get about 50% of the traffic each. In the batch distribution strategy based on native Kubernetes, the proportion of traffic between the new version and the old version can be controlled by increasing the number of Replicas of the application.


Cloud-based batch publishing makes it easy to implement more refined traffic control rules for users already using Istio. Cloud effect automatically adds version labels to the Deployment instances during the release process.


Based on version tags, Istio users can easily control the traffic ratio between different versions using RouteRule or implement AB tests directly based on cookies.


Of course, cloud effect will directly integrate these capabilities into the whole assembly line in the future, making the whole process more smooth.