The author | Hao Shuwei (flow) ali senior r&d engineers

preface

Software technology updated quickly, but our goal is always the same, that is on the premise of security and stability, increase the deployment of application frequency, shorten the product function iteration cycle, this advantage is that the enterprise can in a shorter period of time for the value of the products, to obtain customer feedback and faster response to customer demand, so as to further enhance the competitiveness of the products; In addition, enterprises can also release more resources to invest in the research and development of innovative businesses and create more value. This is a virtuous cycle process.

The rapid iteration of an application brings all kinds of benefits, but also challenges. A higher frequency of application, means that the online business have a greater risk of unpredictable faults, in addition to product before full test validation in the pretest test environment iterative function, set the most suitable application release strategy is also a very important subject, because it can maximum limit reduce the risk of business failure, and losses.

Key points for cloud native application delivery

We say that frequent product iterations mean greater risk of failure, both in traditional applications and cloud native applications. Because cloud native applications are usually distributed deployment mode based on cloud, and each application may be called by multiple functional components to provide complete services together, each component has its own independent iterative process and plan. In this case, the more functional components, the greater the chance of error. So how to improve the above pain points at the application delivery level, we have summarized the following key points for cloud native application delivery.

  • How to take advantage of cloud native architecture infrastructure. This advantage can be summed up in two simple points: flexibility and high availability;
  • How to have cross-platform portability and delivery capabilities. Computing, storage, and network resources at the bottom of infrastructure vary greatly. In the past, the difference of infrastructure was determined by upper-layer applications, while the delivery of cloud native applications requires cross-platform migration and delivery capabilities.
  • How to implement application operation and maintenance autonomy? Autonomy is not equal to automation, automation refers to trigger a process, the end of the process can achieve automatic want an expected results, and autonomy refers to the application and running state of high availability, if one of functional components of a fault occurs, a copy of the copy of the application can automatically remove faults and added a new copy of the application;
  • How to make your app more predictable. The final delivery state of the application is predictable when we write the application choreography template, and the risk is minimized if the delivery of the application becomes more predictable;
  • How to improve the average recovery time of your application. Faster average recovery time means lower lost business if an application has a failure that requires human intervention that is beyond the ability of the application to be autonomous.

Kubernetes is a portable, extensible, open source platform for managing containerized workloads and services that facilitate declarative configuration and automation. Its own platform capabilities already meet most of the requirements we mentioned earlier. Kubernetes uses container technology to deploy applications. The benefits include but are not limited to:

  • More agile application creation and deployment
  • portability
  • Environmental consistency
  • Loosely coupled and distributed
  • Resource isolation
  • High efficiency and high density resource utilization

Kubernetes also provides powerful capabilities for application management, scheduling, monitoring, and o&M:

  • Service discovery and load balancing capabilities
  • Automatic deployment and rollback capabilities of applications
  • The autonomous repair capability of the application
  • Storage orchestration capability
  • Key and configuration management capabilities

However, Kubernetes also has many functions that are not provided but allow expansion, such as log collection, monitoring and alarm capabilities. The following figure is the container service is in support of standard Kubernetes on the basis of the ability to enhance and enhance the user is closely related to the architecture of the big picture, including providing the largest flexible and low-cost global access ability, strong security architecture support ability, deep integration of basic resource services ability, And after the double 11 verification and precipitation of massive user experience, at the same time support proprietary, hosted, no service, edge and Dragon bare metal and other product forms, all our later demonstrations today are done on this platform.

Application delivery boundaries

What are the boundaries of application delivery in Kubernetes?

From a simple point of view, we can consider the delivery of an application as its network service mode, back-end resources of the service and persistent storage of business data, which are abstracted into Service, Deployment/POD and volume resources respectively.

Take a wordpress application as an example. It has two functional components: a front-end component that processes user requests and a back-end component that stores data. Front-end components include a frontend Service and three PODS. Back-end components include a Backend Service and a POD component. So the resources delivered by this wordpress application are two Services and a total of four back-end Pods. This backend POD resource is managed by Deployment in Kubernetes. The Service resource acts as a load balancer, routing requests to back-end PODS. It involves calls between components in the cluster and external users’ access to services in the cluster. So there are different categories.

According to the different ways of service exposure, it can be divided into the following types:

ClusterIP

Access within the Cluster is achieved by assigning Kubernetes’ Service a fixed Cluster IP accessible within the Cluster. Is the most common way.

apiVersion: v1
kind: Service
metadata:
  name: wordpress
spec:
  type: ClusterIP      # Default service type, services exposed only for internal cluster access
  ports:
    - port: 80         Exposed service ports within the cluster
      targetPort: 80   The service port on which the container listens
      protocol: TCP
  selector:
    app: wordpress     Forward requests to back-end pods with the same tag
Copy the code

NodePort

NodePort maps the service port to a port on the cluster node. If you do not specify this port, the system will select a random port. Most of the time we should let Kubernetes choose the ports, it is too expensive for users to choose the available ports themselves.

apiVersion: v1
kind: Service
metadata:
  name: wordpress
spec:
  type: NodePort       NodePort Service exposes a fixed static port for external access to the cluster
  ports:
    - port: 80         Exposed service ports within the cluster
      targetPort: 80   The service port on which the container listens
      protocol: TCP
      nodePort: 31570  External services can be accessed through this port
  selector:
    app: wordpress     Forward requests to back-end pods with the same tag
Copy the code

The NodePort approach exposes services to out-of-cluster access, but it also has a number of disadvantages:

  • Each port can be only one service
  • The port range is limited. Generally, the port range ranges from 30000 to 32767
  • If the IP address of the node changes, you need to make some changes to accommodate it

This is generally not recommended in production, but it can be used if your application is cost-sensitive and tolerates service unavailability Windows.

LoadBalancer

LoadBalancer is the standard way for services to be exposed outside the cluster or on the public network, but it relies on the capability of a LoadBalancer provided by the cloud provider, which assigns a separate IP address and listens for a specified port of the back-end service. The requested traffic is forwarded to the corresponding back-end service through the specified port.

apiVersion: v1
kind: Service
metadata:
  name: wordpress
spec:
  type: LoadBalancer       LoadBalancer Service type, which generally depends on the load balancing capability provided by the public cloud vendor
  ports:
    - port: 80         Exposed service ports within the cluster
      targetPort: 80   The service port on which the container listens
      protocol: TCP
  selector:
    app: wordpress     Forward requests to back-end pods with the same tag
Copy the code

Ingress

NodePort can be used to expose service access. However, each node occupies a port, which increases the complexity of port management. LoadBalancer usually requires the support of a third-party cloud provider and has certain restrictions. Ingress is different from the previous three service types. It is not actually a service type, but similar to the existence of a cluster service portal, which can route traffic to the corresponding back-end service based on different paths or subdomain names configured by you. It is more like an “intelligent routing” service.

We can label each application’s POD and service with the same label. We can label each application’s POD and service with the same label. This tagging mechanism is the key to several application publishing strategies that we will cover later.

Application publishing strategy

In a Kubernetes cluster, in addition to selecting a service exposure method based on business requirements, it is important to choose the right publishing strategy in order to keep the application serviced smoothly during the upgrade.

scrolling

The first and more common application publishing strategy is rolling publishing. It progressively deploys the new version of the application by replacing each instance one by one until all instances have been replaced.

As shown in the figure below, currently the service version provided by my application is V1. There are three copies of the back end of this service, but when I updated version V2, it was one copy and one copy and I started to replace it until eventually all the back ends of the service were replaced with version V2.

The marshalling file for an example application looks like this:

  • go-demo-v1.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: go-demo
spec:
  replicas: 3
  selector:
    matchLabels:
      app: go-demo
  template:
    metadata:
      labels:
        app: go-demo
    spec:
      containers:
      - name: go-demo
        image: registry.cn-hangzhou.aliyuncs.com/haoshuwei24/go-demo:v1
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: go-demo
spec:
  ports:
  - port: 80
    targetPort: 8080
    name: go-demo
  selector:
    app: go-demo
  type: ClusterIP
Copy the code
  • Deployment Version V1
$ kubectl apply -f go-demo-v1.yaml
Copy the code
  • View the POD running status
$ kubectl get po
NAME                       READY   STATUS    RESTARTS   AGE
go-demo-78bc65c564-2rhxp   1/1     Running   0          19s
go-demo-78bc65c564-574z6   1/1     Running   0          19s
go-demo-78bc65c564-sgl2s   1/1     Running   0          19s
Copy the code
  • Accessing Application Services
$ whileSleep 0.1;doThe curl http://172.19.15.25;done
Version: v1
Version: v1
Version: v1
Version: v1
Version: v1
Version: v1
Copy the code
  • updatego-demo-v1.yaml 为 go-demo-v2.yamlAnd update the mirror tag
. registry.cn-hangzhou.aliyuncs.com/haoshuwei24/go-demo:v2 ...Copy the code
  • Deployment version V2
$ kubectl apply -f go-demo-v2.yaml
Copy the code
  • You can see that the pods will be replaced one by one with newer versions of pods
$kubectl get po -w
NAME                                READY   STATUS              RESTARTS   AGE
application-demo-8594ff4967-85jsg   1/1     Running             0          3m24s
application-demo-8594ff4967-d4sv8   1/1     Terminating         0          3m22s
application-demo-8594ff4967-w6lpz   0/1     Terminating         0          3m20s
application-demo-b98d94554-4mwqd    1/1     Running             0          3s
application-demo-b98d94554-ng9wx    0/1     ContainerCreating   0          1s
application-demo-b98d94554-pmc5g    1/1     Running             0          4s
Copy the code
  • Access to the service will find that both versions v1 and v2 are accessed during the rolling upgrade of the application, depending on the startup speed of the application
$ whileSleep 0.1;doThe curl http://172.19.15.25;done
Version: v1
Version: v2
Version: v1
Version: v1
Version: v2
Version: v1
Version: v1
Version: v2
Copy the code

The advantage of rolling publishing is that it is simpler and does not consume too many computing resources. The disadvantage is that:

  • Versions are slowly replaced between instances
  • This rolling release may take some time
  • Unable to control flow

In terms of the final state of the application in the cluster, there is either a version 1 application back end or a version 2 application back end; If version 2 is defective, then the online service is applied to the entire user, and although we have mechanisms to roll back quickly, the cost of failure involving the entire user is too high.

Blue green release

The second is a blue-green release, in which the back-end pods of both version 1 and version 2 of the application are deployed in the environment and the traffic switch is controlled to determine which version is released. Compared with the rolling publishing, the final state of the application under the blue-green publishing strategy can have both VERSION 1 and version 2 PODS. We can decide which version of the back end the current service uses by switching service traffic.

The choreography file for an example application is shown below.

  • go-demo-v1.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: go-demo-v1
spec:
  replicas: 4
  selector:
    matchLabels:
      app: go-demo
      version: v1
  template:
    metadata:
      labels:
        app: go-demo
        version: v1
    spec:
      containers:
      - name: go-demo
        image: registry.cn-hangzhou.aliyuncs.com/haoshuwei24/go-demo:v1
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
Copy the code
  • go-demo-v2.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: go-demo-v2
spec:
  replicas: 4
  selector:
    matchLabels:
      app: go-demo
      version: v2
  template:
    metadata:
      labels:
        app: go-demo
        version: v2
    spec:
      containers:
      - name: go-demo
        image: registry.cn-hangzhou.aliyuncs.com/haoshuwei24/go-demo:v2
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
Copy the code
  • service.yaml
apiVersion: v1
kind: Service
metadata:
  name: go-demo
spec:
  ports:
  - port: 80
    targetPort: 8080
    name: go-demo
  selector:
    app: go-demo
    version: v1
  type: ClusterIP
Copy the code
  • Deploy the preceding three resources
$ kubectl apply -f go-demo-v1.yaml -f go-demo-v2.yaml -f service.yaml
Copy the code
  • Accessing services you can see that only version 1 services are currently accessed
$ whileSleep 0.1;doThe curl http://172.19.8.137;done
Version: v1
Version: v1
Version: v1
Version: v1
Version: v1
Copy the code
  • Modify theservice.yamlUnder the spec. The selector version = v2
apiVersion: v1
kind: Service
metadata:
  name: go-demo
spec:
  ports:
  - port: 80
    targetPort: 8080
    name: go-demo
  selector:
    app: go-demo
    version: v2
  type: ClusterIP
Copy the code
  • redeploy
$ kubectl apply -f service.yaml
Copy the code
  • Revisit the service and you can see that you quickly switch to version 2
$ [root@iZbp13u3z7d2tqx0cs6ovqZ blue-green]# while sleep 0.1; Do the curl http://172.19.8.137; done
Version: v2
Version: v2
Version: v2
Copy the code

We just said to the rolling upgrade process takes time, there is a even rollback, it also requires a certain amount of time to roll back end, under the condition of application in the new version is defective, blue and green strategy can be released quickly before the two versions of the v1 and v2 cut traffic, so the flow switching time is shortened a lot compared to the rolling upgrade, But released blue-green faults with the same is the defect that the scrolling will affect the overall user, service hundred switch to either version 2, or hundred to switch to version 1, this is a non-zero or 100 operations, even if the blue green release strategy can greatly shorten the recovery time, but in some scenarios are also not acceptable. In addition, there are two copies of POD versions in the cluster environment at the same time, and the resource consumption is twice as large as that of rolling release.

Canary Release (Grayscale release)

The third publishing strategy to be introduced is Canary publishing, where both VERSION 1 and version 2 applications are deployed in the environment, and user requests may be routed to either version 1 or version 2 backends, thereby allowing a number of new users to access the VERSION 2 application. This release strategy, we can adjust the flow rate through the percentage to gradually control switch to the new version, it compared with blue green deployment, not only inherited the advantages of blue green deployment and occupancy resources better than blue green resources deployment need 2 times, under the condition of the new version has defect affects only a small number of users, the losses to a minimum.

For the concept of grayscale publishing, some people think it’s the same thing as canary publishing, others think it’s different. It’s the same process as Canary’s release, but for a different purpose:

  • For example, I may not be sure whether the user experience of this new version of my function can be well accepted by the public. I hope to get some timely feedback from online users. After adjusting the function experience on the product side, I will iterate on V3 version.
  • Grayscale release means that the functions of my product have been designed and developed very well. Now it is necessary to gradually replace the old version online, but to control the risks that may be brought by the release, grayscale release is required.

Example application 1 follows. In this example we control the traffic ratio by the number of pods.

  • go-demo-v1.yamlSet the number of copies to 9
apiVersion: apps/v1
kind: Deployment
metadata:
  name: go-demo-v1
spec:
  replicas: 9
  selector:
    matchLabels:
      app: go-demo
      version: v1
  template:
    metadata:
      labels:
        app: go-demo
        version: v1
    spec:
      containers:
      - name: go-demo
        image: registry.cn-hangzhou.aliyuncs.com/haoshuwei24/go-demo:v1
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
Copy the code
  • go-demo-v2.yamlSet the number of copies to 1
apiVersion: apps/v1
kind: Deployment
metadata:
  name: go-demo-v2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: go-demo
      version: v2
  template:
    metadata:
      labels:
        app: go-demo
        version: v2
    spec:
      containers:
      - name: go-demo
        image: registry.cn-hangzhou.aliyuncs.com/haoshuwei24/go-demo:v2
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
Copy the code
  • service.yaml
apiVersion: v1
kind: Service
metadata:
  name: go-demo
spec:
  ports:
  - port: 80
    targetPort: 8080
    name: go-demo
  selector:
    app: go-demo
  type: ClusterIP
Copy the code
  • Deploy the preceding three resources
$ kubectl apply -f go-demo-v1.yaml -f go-demo-v2.yaml -f service.yaml
Copy the code
  • Access to the service can see that basically 10% of the traffic is switched to version 2
$ whileSleep 0.1;doThe curl http://172.19.8.248;done
Version: v1
Version: v2
Version: v1
Version: v1
Version: v1
Version: v1
Version: v1
Version: v1
Version: v1
Version: v1
Version: v1
Copy the code

In addition, we can use nginx Ingress Controller to control traffic switching, which is more accurate.

  • go-demo-v1.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: go-demo-v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: go-demo
      version: v1
  template:
    metadata:
      labels:
        app: go-demo
        version: v1
    spec:
      containers:
      - name: go-demo
        image: registry.cn-hangzhou.aliyuncs.com/haoshuwei24/go-demo:v1
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
Copy the code
  • go-demo-v2.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: go-demo-v2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: go-demo
      version: v2
  template:
    metadata:
      labels:
        app: go-demo
        version: v2
    spec:
      containers:
      - name: go-demo
        image: registry.cn-hangzhou.aliyuncs.com/haoshuwei24/go-demo:v2
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
Copy the code
  • service-v1.yaml
apiVersion: v1
kind: Service
metadata:
  name: go-demo-v1
spec:
  ports:
  - port: 80
    targetPort: 8080
    name: go-demo
  selector:
    app: go-demo
    version: v1
  type: ClusterIP
Copy the code
  • service-v2.yaml
apiVersion: v1
kind: Service
metadata:
  name: go-demo-v2
spec:
  ports:
  - port: 80
    targetPort: 8080
    name: go-demo
  selector:
    app: go-demo
    version: v2
  type: ClusterIP
Copy the code
  • ingress.yamlTo set upnginx.ingress.kubernetes.io/service-weight: | go-demo-v1: 100, go-demo-v2: 0, version 1-100% traffic, version 2-0% traffic
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/service-weight: |
        go-demo-v1: 100, go-demo-v2: 0
  name: go-demo
  labels:
    app: go-demo
spec:
  rules:
    - host: go-demo.example.com
      http:
        paths:
          - path: /
            backend:
              serviceName: go-demo-v1
              servicePort: 80
          - path: /
            backend:
              serviceName: go-demo-v2
              servicePort: 80
Copy the code
  • Deploy the preceding four resources
$ kubectl apply -f go-demo-v1.yaml -f go-demo-v2.yaml -f service-v1.yaml -f service-v2.yaml -f nginx.yaml
Copy the code
  • Access to the service can see 100% traffic to version 1
$ whileSleep 0.1;do curl http://go-demo.example.com; done
Version: v1
Version: v1
Version: v1
Version: v1
Version: v1
Version: v1
Copy the code
  • updateingress.yamlTo set the traffic ratio to 50:50
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/service-weight: |
        go-demo-v1: 50, go-demo-v2: 50
  name: go-demo
  labels:
    app: go-demo
spec:
  rules:
    - host: go-demo.example.com
      http:
        paths:
          - path: /
            backend:
              serviceName: go-demo-v1
              servicePort: 80
          - path: /
            backend:
              serviceName: go-demo-v2
              servicePort: 80
Copy the code
  • Access to the service will see 50% traffic to version 1 and 50% traffic to version 2
$ whileSleep 0.1;do curl http://go-demo.example.com; done
Version: v2
Version: v1
Version: v1
Version: v1
Version: v2
Version: v2
Version: v1
Version: v1
Version: v2
Version: v2
Copy the code
  • updateingress.yamlTo set the traffic ratio to 0:100
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/service-weight: |
        go-demo-v1: 0, go-demo-v2: 100
  name: go-demo
  labels:
    app: go-demo
spec:
  rules:
    - host: go-demo.example.com
      http:
        paths:
          - path: /
            backend:
              serviceName: go-demo-v1
              servicePort: 80
          - path: /
            backend:
              serviceName: go-demo-v2
              servicePort: 80
Copy the code
  • Access to the service can see 100% traffic to version 2
$ whileSleep 0.1;do curl http://go-demo.example.com; done
Version: v2
Version: v2
Version: v2
Version: v2
Version: v2
Version: v2
Version: v2
Version: v2
Version: v2
Version: v2
Copy the code

The downside of both canary and grayscale releases is that the release cycle is much slower.

Among these release strategies,

  • Use scrolling publishing when you want to release updates in a development test environment;
  • In production, rolling updates or blue-green releases can be used if the new version has been fully tested in advance;
  • If an update to a new version of your application needs to minimize risk and minimize the impact of a glitch on users, use Canary or grayscale publishing.

These are some of the publishing strategies we use in Kubernetes.

“Alibaba Cloud originators pay close attention to technical fields such as microservice, Serverless, container and Service Mesh, focus on cloud native popular technology trends and large-scale implementation of cloud native, and become the technical circle that knows most about cloud native developers.”