Brief introduction:As a CNCF[member](
https://landscape.cncf.io/car…[Weave Flagger](Flager.app) offers capabilities for continuous integration and delivery. Flagger summarizes three types of progressive publishing: – ** Grayscale publishing/Canary publishing ** : For progressive traffic shifting to grayscale versions – **A/B Testing ** : Used according to the request information to

As a CNCF member, Weave Flagger offers continuous integration and delivery capabilities. Flagger identifies three types of incremental publishing:

  • Gray Release/Canary Release: For Progressive Traffic Steps to Gray Release
  • A/B Testing: Used to route user requests to A/B version (HTTP Headers and Cookies Traffic Routing) based on request information.
  • Blue/Green: Traffic switching and mirroring

This article introduces Flagger on ASM’s practice of progressive grayscale publishing.

Setup Flagger

1 deployment Flagger

To deploy Flagger, execute the following command (see demo\_canary.sh for the full script).

alias k="kubectl --kubeconfig $USER_CONFIG"
alias h="helm --kubeconfig $USER_CONFIG"

cp $MESH_CONFIG kubeconfig
k -n istio-system create secret generic istio-kubeconfig --from-file kubeconfig
k -n istio-system label secret istio-kubeconfig istio/multiCluster=true

h repo add flagger https://flagger.app
h repo update
k apply -f $FLAAGER_SRC/artifacts/flagger/crd.yaml
h upgrade -i flagger flagger/flagger --namespace=istio-system \
    --set crd.create=false \
    --set meshProvider=istio \
    --set metricsServer=http://prometheus:9090 \
    --set istio.kubeconfig.secretName=istio-kubeconfig \
    --set istio.kubeconfig.key=kubeconfig

2 the deployment of Gateway

During the Grayscale publishing process, Flagger asks ASM to update the VirtualService for the Grayscale traffic configuration. The VirtualService uses a gateway named public-gateway. To do this we create the relevant Gateway configuration file public-Gateway. Yaml as follows:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: public-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - "*"

Deploy the Gateway by issuing the following command:

kubectl --kubeconfig "$MESH_CONFIG" apply -f resources_canary/public-gateway.yaml

Flagger – loadtester 3 deployment

Flager-LoadTester is the grayscale release phase of the application used to detect grayscale POD instances.

Deploy Flager-LoadTester with the following command:

kubectl --kubeconfig "$USER_CONFIG" apply -k "https://github.com/fluxcd/flagger//kustomize/tester?ref=main"

4 Deploy Podinfo and its HPA

We started with the Flagger HPA configuration (an operations-level HPA), and then used the application-level HPA as we went through the process.

Execute the following command to deploy Podinfo and its HPA:

kubectl --kubeconfig "$USER_CONFIG" apply -k "https://github.com/fluxcd/flagger//kustomize/podinfo?ref=main"

Progressive grayscale release

1 the deployment of Canary

Canary is Flagger’s core CRD for grayscale publishing. See How It Works. We first deploy the following Canary configuration file podinfo-canary.yaml to complete the complete progressive grayscale process, and then introduce the monitoring index of application dimension on this basis to further realize the gradual grayscale publication with application awareness.

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: podinfo
  namespace: test
spec:
  # deployment reference
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  # the maximum time in seconds for the canary deployment
  # to make progress before it is rollback (default 600s)
  progressDeadlineSeconds: 60
  # HPA reference (optional)
  autoscalerRef:
    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    name: podinfo
  service:
    # service port number
    port: 9898
    # container port number or name (optional)
    targetPort: 9898
    # Istio gateways (optional)
    gateways:
    - public-gateway.istio-system.svc.cluster.local
    # Istio virtual service host names (optional)
    hosts:
    - '*'
    # Istio traffic policy (optional)
    trafficPolicy:
      tls:
        # use ISTIO_MUTUAL when mTLS is enabled
        mode: DISABLE
    # Istio retry policy (optional)
    retries:
      attempts: 3
      perTryTimeout: 1s
      retryOn: "gateway-error,connect-failure,refused-stream"
  analysis:
    # schedule interval (default 60s)
    interval: 1m
    # max number of failed metric checks before rollback
    threshold: 5
    # max traffic percentage routed to canary
    # percentage (0-100)
    maxWeight: 50
    # canary increment step
    # percentage (0-100)
    stepWeight: 10
    metrics:
    - name: request-success-rate
      # minimum req success rate (non 5xx responses)
      # percentage (0-100)
      thresholdRange:
        min: 99
      interval: 1m
    - name: request-duration
      # maximum req duration P99
      # milliseconds
      thresholdRange:
        max: 500
      interval: 30s
    # testing (optional)
    webhooks:
      - name: acceptance-test
        type: pre-rollout
        url: http://flagger-loadtester.test/
        timeout: 30s
        metadata:
          type: bash
          cmd: "curl -sd 'test' http://podinfo-canary:9898/token | grep token"
      - name: load-test
        url: http://flagger-loadtester.test/
        timeout: 5s
        metadata:
          cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test:9898/"

Execute the following command to deploy Canary:

kubectl --kubeconfig "$USER_CONFIG" apply -f resources_canary/podinfo-canary.yaml

After Canary is deployed, Flagger copies the Deployment named podinfo as podinfo-primary and expands the podinfo-primary to the minimum number of PODS defined by the HPA. Then step by step shrink the POD number of this Deployment named podinfo to zero. That is, podinfo will be deployed for the grayscale version and podinfo-primary will be deployed for the production version.

At the same time, create three services — podinfo, podinfo-primary, and podinfo-canary, with the first two pointing to the podinfo-primary Deployment and the last to the podinfo Deployment.

2 to upgradepodinfo

Upgrade the Grayscale Deployment version from 3.1.0 to 3.1.1 by executing the following command:

Kubectl - kubeconfig "$USER_CONFIG" - n test set image deployment/podinfo podinfod = stefanprodan/podinfo: 3.1.1

3. Progressive grayscale release

At this point Flagger begins the same incremental grayscale release process described in Part 1 of this series. Here’s a quick overview of the main process:

  1. Step by step expand gray scale POD and verify
  2. Progressive tangential flow, verification
  3. Rolling upgrade production Deployment, validation
  4. 100% cut back to production
  5. Decrease gray scale POD to 0

We can observe the process of progressive tangential flow with the following command:

while true; do kubectl --kubeconfig "$USER_CONFIG" -n test describe canary/podinfo; sleep 10s; done

The output log information is shown as follows:

Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Synced 39m flagger podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less then desired generation Normal Synced 38m (x2 over 39m) flagger all the metrics providers are available! Normal Synced 38m flagger Initialization done! podinfo.test Normal Synced 37m flagger New revision detected! Scaling up podinfo.test Normal Synced 36m flagger Starting canary analysis for podinfo.test Normal Synced 36m flagger Pre-rollout check acceptance-test passed Normal Synced 36m flagger Advance podinfo.test canary weight 10 Normal Synced 35m flagger Advance podinfo.test canary weight 20 Normal Synced 34m flagger Advance podinfo.test canary weight 30 Normal  Synced 33m flagger Advance podinfo.test canary weight 40 Normal Synced 29m (x4 over 32m) flagger (combined from similar  events): Promotion completed! Scaling down podinfo.test

The corresponding Kiali view (optional), as shown below:

At this point, we have completed a complete and progressive grayscale release process. The following is an extended reading.

Scaling volume of application stage in gray scale

After completing the above progressive grayscale release process, let’s take a look at the configuration of HPA in the above CANARY configuration.

  autoscalerRef:
    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    name: podinfo

This HPA, called Podinfo, is a Flagger native configuration that expands when the Grayscale Deployment reaches 99% CPU utilization. The complete configuration is as follows:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: podinfo
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  minReplicas: 2
  maxReplicas: 4
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          # scale up if usage is above
          # 99% of the requested CPU (100m)
          averageUtilization: 99

We described the practice of application-level scaling in the previous article, and here we apply it to the grayscale publishing process.

1. Perceive HPA of QPS application

Execute the following command to deploy the HPA that is aware of the number of application requests to expand when QPS reaches 10 (see advanced\_canary.sh for the full script) :

kubectl --kubeconfig "$USER_CONFIG" apply -f resources_hpa/requests_total_hpa.yaml

Accordingly, the Canary configuration is updated to:

  autoscalerRef:
    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    name: podinfo-total

2 to upgradepodinfo

Upgrade the Grayscale Deployment version from 3.1.0 to 3.1.1 by executing the following command:

Kubectl - kubeconfig "$USER_CONFIG" - n test set image deployment/podinfo podinfod = stefanprodan/podinfo: 3.1.1

Verify progressive grayscale publication and HPA

The command watches the process of the progressive tangential flow:

while true; do k -n test describe canary/podinfo; sleep 10s; done

During the progressive grayscale publication (after the Advance podinfo.test canary weight 10 information appears, see figure below), we use the following command to make a request from the entry gateway to increase the QPS:

INGRESS_GATEWAY=$(kubectl --kubeconfig $USER_CONFIG -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
hey -z 20m -c 2 -q 10 http://$INGRESS_GATEWAY

Use the following command to observe the progress of the progressive grayscale release:

watch kubectl --kubeconfig $USER_CONFIG get canaries --all-namespaces

Use the following command to observe the change in the number of HPA copies:

watch kubectl --kubeconfig $USER_CONFIG -n test get hpa/podinfo-total

The results are shown in the figure below. In the process of progressive gray scale release, when the tangential flow reaches a certain point of 30%, the number of copies of gray scale Deployment is 4:

Application level monitoring index in grayscale

After completing the application-level scaling volume in the above gray scale, we finally look at the configuration of metrics in the above Canary configuration:

  analysis:
    metrics:
    - name: request-success-rate
      # minimum req success rate (non 5xx responses)
      # percentage (0-100)
      thresholdRange:
        min: 99
      interval: 1m
    - name: request-duration
      # maximum req duration P99
      # milliseconds
      thresholdRange:
        max: 500
      interval: 30s
    # testing (optional)

1 Flager built-in monitoring metrics

Until now, the metrics used in Canary have been Flagger’s two built-in metrics: request-success-rate and request-duration. In Flagger, Istio uses Mixerless Telemetry data from the first article in this series, as shown in the figure below.

2. Custom monitoring indicators

To demonstrate the greater flexibility that telemetry data can bring to validate grayscale environments during grayscale publication, we created a metricTemplate named not-found-percentage again, using istio_requests_total as an example. The number of requests that return 404 error codes is counted as a percentage of the total number of requests.

The configuration file Metrics -404.yaml is as follows (see: advanced\_canary.sh for the full script) :

apiVersion: flagger.app/v1beta1 kind: MetricTemplate metadata: name: not-found-percentage namespace: istio-system spec: provider: type: prometheus address: http://prometheus.istio-system:9090 query: | 100 - sum( rate( istio_requests_total{ reporter="destination", destination_workload_namespace="{{ namespace }}", destination_workload="{{ target }}", response_code! ="404" }[{{ interval }}] ) ) / sum( rate( istio_requests_total{ reporter="destination", destination_workload_namespace="{{ namespace }}", destination_workload="{{ target }}" }[{{ interval }}] ) ) * 100

To create the above metricTemplate, execute the following command:

k apply -f resources_canary2/metrics-404.yaml

Accordingly, the Metrics configuration in Canary is updated to:

  analysis:
    metrics:
      - name: "404s percentage"
        templateRef:
          name: not-found-percentage
          namespace: istio-system
        thresholdRange:
          max: 5
        interval: 1m

3 Final verification

Finally, we execute the entire experiment script at one time. The script advanced_canary.sh is schematic as follows:

#! /usr/bin/env sh SCRIPT_PATH="$( cd "$(dirname "$0")" >/dev/null 2>&1 pwd -P )/" cd "$SCRIPT_PATH" || exit source config alias k="kubectl --kubeconfig $USER_CONFIG" alias m="kubectl --kubeconfig $MESH_CONFIG" alias h="helm --kubeconfig $USER_CONFIG" echo "#### I Bootstrap ####" echo "1 Create a test namespace with Istio sidecar injection enabled:" k delete ns test m delete ns test k create ns test m create ns test m label namespace test istio-injection=enabled echo "2  Create a deployment and a horizontal pod autoscaler:" k apply -f $FLAAGER_SRC/kustomize/podinfo/deployment.yaml -n test  k apply -f resources_hpa/requests_total_hpa.yaml k get hpa -n test echo "3 Deploy the load testing service to generate traffic during the canary analysis:" k apply -k "https://github.com/fluxcd/flagger//kustomize/tester?ref=main" k get pod,svc -n test echo "......" sleep 40s echo "4 Create a canary custom resource:" k apply -f resources_canary2/metrics-404.yaml k apply -f resources_canary2/podinfo-canary.yaml k get pod,svc -n test echo "......" sleep 120s echo "#### III Automated canary promotion ####" echo "1 Trigger a canary deployment by updating the container Image: "k - n test set image deployment/podinfo podinfod = stefanprodan/podinfo: 3.1.1 echo" 2 Flagger detects that the deployment revision changed and starts a new rollout:" while true; do k -n test describe canary/podinfo; sleep 10s; done

Execute the complete experiment script with the following command:

sh progressive_delivery/advanced_canary.sh

The experimental results are shown as follows:

#### I Bootstrap #### 1 Create a test namespace with Istio sidecar injection enabled: namespace "test" deleted namespace "test" deleted namespace/test created namespace/test created namespace/test labeled 2  Create a deployment and a horizontal pod autoscaler: deployment.apps/podinfo created horizontalpodautoscaler.autoscaling/podinfo-total created NAME REFERENCE TARGETS MINPODS  MAXPODS REPLICAS AGE podinfo-total Deployment/podinfo <unknown>/10 (avg) 1 5 0 0s 3 Deploy the load testing service to generate traffic during the canary analysis: service/flagger-loadtester created deployment.apps/flagger-loadtester created NAME READY STATUS RESTARTS AGE pod/flagger-loadtester-76798b5f4c-ftlbn 0/2 Init:0/1 0 1s pod/podinfo-689f645b78-65n9d 1/1 Running 0 28s NAME TYPE Cluster-IP External -IP Port (S) Age Service/Flager-LoadTester ClusterIP 172.21.15.223 < None > 80/TCP 1s...... 4 Create a canary custom resource: metrictemplate.flagger.app/not-found-percentage created canary.flagger.app/podinfo created NAME READY STATUS RESTARTS AGE pod/flagger-loadtester-76798b5f4c-ftlbn 2/2 Running 0 41s pod/podinfo-689f645b78-65n9d 1/1 Running 0 68s NAME TYPE Cluster-IP External -IP Port (S) Age Service/Flager-Loadtester ClusterIP 172.21.15.223 < None > 80/TCP 41s...... #### III Automated canary promotion #### 1 Trigger a canary deployment by updating the container image: deployment.apps/podinfo image updated 2 Flagger detects that the deployment revision changed and starts a new rollout: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Synced 10m flagger podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less then desired generation Normal Synced 9m23s (x2 over 10m) flagger all the metrics providers are available! Normal Synced 9m23s flagger Initialization done! podinfo.test Normal Synced 8m23s flagger New revision detected! Scaling up podinfo.test Normal Synced 7m23s flagger Starting canary analysis for podinfo.test Normal Synced 7m23s flagger Pre-rollout check acceptance-test passed Normal Synced 7m23s flagger Advance podinfo.test canary weight 10 Normal Synced 6m23s flagger Advance podinfo.test canary weight 20 Normal Synced 5m23s flagger Advance podinfo.test canary weight 30 Normal Synced 4m23s flagger Advance podinfo.test canary weight 40 Normal Synced 23s (x4 over 3m23s) flagger (combined from similar events): Promotion completed! Scaling down podinfo.test

Copyright Notice:The content of this article is contributed by Aliyun real-name registered users, and the copyright belongs to the original author. Aliyun developer community does not own the copyright and does not bear the corresponding legal liability. For specific rules, please refer to User Service Agreement of Alibaba Cloud Developer Community and Guidance on Intellectual Property Protection of Alibaba Cloud Developer Community. If you find any suspected plagiarism in the community, fill in the infringement complaint form to report, once verified, the community will immediately delete the suspected infringing content.