Brief introduction:Service Grid ASM’s Mixerless Telemetry technology provides non-intrusive Telemetry data for business containers. On the one hand, telemetry data is collected by ARMPS/ Prometheus as a monitoring indicator for the observability of the service grid. On the other hand, it is used by HPA and Flaggers and becomes the cornerstone of application-level scaling and progressive grayscale publishing. This series focuses on the practice of telemetry data in application-level scaling and progressive grayscale publishing. It will be divided into three parts to introduce telemetry data (monitoring indicators), application-level scaling, and progressive grayscale publishing.

sequence

Service Grid ASM’s Mixerless Telemetry technology provides non-intrusive Telemetry data for business containers. On the one hand, telemetry data is collected by ARMPS/ Prometheus as a monitoring indicator for the observability of the service grid. On the other hand, it is used by HPA and Flaggers and becomes the cornerstone of application-level scaling and progressive grayscale publishing.

This series focuses on the practice of telemetry data in application-level scaling and progressive grayscale publishing. It will be divided into three parts to introduce telemetry data (monitoring indicators), application-level scaling, and progressive grayscale publishing.

The overall architecture

The overall architecture of this series is shown in the figure below:

  1. ASM sends Mixerless Telemetry related EnvoyFilter configuration to each ASM sidecar(envoy) to enable the collection of application-level monitoring indicators.
  2. Business traffic enters through the Ingress Gateway, and each ASM Sidecar begins to collect relevant monitoring indicators.
  3. Prometheus collects monitoring indicators from each POD.
  4. HPA queries relevant POD monitoring indicators from Prometheus through Adapter, and expands capacity according to configuration.
  5. Flagger queries Prometheus for monitoring metrics for the associated POD and issues a VirtualService configuration update to ASM based on the configuration.
  6. ASM sends the VirtualService configuration to each ASM Sidecar to achieve progressive grayscale publishing.

Flagger’s incremental release process

Flagger’s website describes the incremental launch process.

  1. Detect and update the grayscale Deployment to the new version
  2. The number of grayscale POD instances starts to expand from 0
  3. Wait for the number of grayscale POD instances to reach the minimum number of copies defined by HPA
  4. Grayscale POD instance health detection
  5. Initiated by the Flager-LoadTester instanceacceptance-testvalidation
  6. Grayscale publishing terminates when validation fails
  7. Initiated by the Flager-LoadTester instanceload-testvalidation
  8. Start copying from production full flow to grayscale when configuring flow replication
  9. Monitor metrics such as request success rate and request latency are querying from Prometheus every minute
  10. Grayscale publication is terminated when the number of monitoring indicators that are not expected reaches the threshold value
  11. Stop traffic replication when the number of iterations in the configuration has been reached
  12. Begin cutting stream to grayscale POD instance
  13. Update the production Deployment to the new version
  14. Wait for the production Deployment rolling upgrade to complete
  15. Wait for the number of production POD instances to reach the minimum number of copies defined by the HPA
  16. Production of POD instance health inspection
  17. The flow is cut back to the production POD instance
  18. Gray scale POD instance is shrunk to 0
  19. Send grayscale release analysis result notification

The original text reads:

With the above configuration, Flagger will run a canary release with the following steps:

  • detect new revision (deployment spec, secrets or configmaps changes)
  • scale from zero the canary deployment
  • wait for the HPA to set the canary minimum replicas
  • check canary pods health
  • run the acceptance tests
  • abort the canary release if tests fail
  • start the load tests
  • mirror 100% of the traffic from primary to canary
  • check request success rate and request duration every minute
  • abort the canary release if the metrics check failure threshold is reached
  • stop traffic mirroring after the number of iterations is reached
  • route live traffic to the canary pods
  • promote the canary (update the primary secrets, configmaps and deployment spec)
  • wait for the primary deployment rollout to finish
  • wait for the HPA to set the primary minimum replicas
  • check primary pods health
  • switch live traffic back to primary
  • scale to zero the canary
  • send notification with the canary analysis result

The premise condition

  • An ACK cluster has been created. See Creating a Hosted Version of Kubernetes cluster for more details.
  • An ASM instance has been created, see Creating an ASM instance for more details.

Setup Mixerless Telemetry

This article describes how to configure and capture application-level monitoring metrics (such as total number of requests istio_requests_total, request delay istio_request_duration, etc.) based on ASM. The main steps include creating envoyFilter, verifying the telemetry data from the Envoy, and verifying the telemetry data from the Prometheus acquisition.

1 EnvoyFilter

Log in to the ASM console, select the service grid > grid management from the left navigation bar, and go to the ASM instance’s feature configuration page.

  • Tickle to enable the collection of Prometheus monitoring indicators
  • Click to enable self-built Prometheus, and fill in the Prometheus service address: `prometheus:9090This series will use the community version of Prometheus, which will be used later in this article. If you use Aliyun product ARMS, please refer to itIntegrated ARMS Prometheus for grid monitoring.
  • Tick to enable Kiali(optional)

After clicking OK, we will see a list of related enVoyFilter generated by ASM in the control plane:

2 Prometheus

2.1 the Install

Execute the following command to install Prometheus(see demo\_mixerless.sh for the full script).

kubectl --kubeconfig "$USER_CONFIG" apply -f $ISTIO_SRC/samples/addons/prometheus.yaml

2.2 the Config Scrape

After installing Prometheus, we need to add ISTIO-related monitoring metrics to its configuration. Log in to the ACK console, select the configuration management > configuration item from the left navigation bar, find the Prometheus line under istio-system, and click Edit.

In the prometheus.yaml configuration, append the configuration from scrape\_configs.yaml to scrape_configs.

After saving the configuration, select the workload > container group from the left navigation bar, find the Prometheus line under the istio-system, and delete the Prometheus POD to ensure that the configuration takes effect in the new POD.

To view the JOB_NAME in the PROMETHEUS configuration, execute the following command:

kubectl --kubeconfig "$USER_CONFIG" get cm prometheus -n istio-system -o jsonpath={.data.prometheus\\.yml} | grep job_name
- job_name: 'istio-mesh'
- job_name: 'envoy-stats'
- job_name: 'istio-policy'
- job_name: 'istio-telemetry'
- job_name: 'pilot'
- job_name: 'sidecar-injector'
- job_name: prometheus
  job_name: kubernetes-apiservers
  job_name: kubernetes-nodes
  job_name: kubernetes-nodes-cadvisor
- job_name: kubernetes-service-endpoints
- job_name: kubernetes-service-endpoints-slow
  job_name: prometheus-pushgateway
- job_name: kubernetes-services
- job_name: kubernetes-pods
- job_name: kubernetes-pods-slow

Mixerless validation

1 podinfo

1.1 the deployment

Deploy the sample application Podinfo for this series with the following command:

kubectl --kubeconfig "$USER_CONFIG" apply -f $PODINFO_SRC/kustomize/deployment.yaml -n test
kubectl --kubeconfig "$USER_CONFIG" apply -f $PODINFO_SRC/kustomize/service.yaml -n test

1.2 Load generation

Use the following command to request podinfo to generate monitoring metrics data

podinfo_pod=$(k get po -n test -l app=podinfo -o jsonpath={.items.. metadata.name}) for i in {1.. 10}; do kubectl --kubeconfig "$USER_CONFIG" exec $podinfo_pod -c podinfod -n test -- curl -s podinfo:9898/version echo done

2 Envoy Envoy

The monitoring metric items that this series focuses on are istio_requests_total and istio_request_duration. First, we confirm that these metrics have been generated inside the Envoy container.

2.1 istio \ _requests \ _total

Use the following command to ask the envoy to get the stats related indicator data and confirm that it contains ISTIO_REQUESTS_TOTAL.

kubectl --kubeconfig "$USER_CONFIG" exec $podinfo_pod -n test -c istio-proxy -- curl -s localhost:15090/stats/prometheus  | grep istio_requests_total

The result information is as follows:

:::: istio_requests_total :::: # TYPE istio_requests_total counter istio_requests_total{response_code="200",reporter="destination",source_workload="podinfo",source_workload_namespace="tes t",source_principal="spiffe://cluster.local/ns/test/sa/default",source_app="podinfo",source_version="unknown",source_clu ster="c199d81d4e3104a5d90254b2a210914c8",destination_workload="podinfo",destination_workload_namespace="test",destinatio n_principal="spiffe://cluster.local/ns/test/sa/default",destination_app="podinfo",destination_version="unknown",destinat ion_service="podinfo.test.svc.cluster.local",destination_service_name="podinfo",destination_service_namespace="test",des tination_cluster="c199d81d4e3104a5d90254b2a210914c8",request_protocol="http",response_flags="-",grpc_response_status="", connection_security_policy="mutual_tls",source_canonical_service="podinfo",destination_canonical_service="podinfo",sourc e_canonical_revision="latest",destination_canonical_revision="latest"} 10 istio_requests_total{response_code="200",reporter="source",source_workload="podinfo",source_workload_namespace="test",so urce_principal="spiffe://cluster.local/ns/test/sa/default",source_app="podinfo",source_version="unknown",source_cluster= "c199d81d4e3104a5d90254b2a210914c8",destination_workload="podinfo",destination_workload_namespace="test",destination_pri ncipal="spiffe://cluster.local/ns/test/sa/default",destination_app="podinfo",destination_version="unknown",destination_s ervice="podinfo.test.svc.cluster.local",destination_service_name="podinfo",destination_service_namespace="test",destinat ion_cluster="c199d81d4e3104a5d90254b2a210914c8",request_protocol="http",response_flags="-",grpc_response_status="",conne ction_security_policy="unknown",source_canonical_service="podinfo",destination_canonical_service="podinfo",source_canoni cal_revision="latest",destination_canonical_revision="latest"} 10

2.2 istio \ _request \ _duration

Use the following command to ask the envoy to get the stats related metrics data and confirm that istio_request_duration is included.

kubectl --kubeconfig "$USER_CONFIG" exec $podinfo_pod -n test -c istio-proxy -- curl -s localhost:15090/stats/prometheus  | grep istio_request_duration

The result information is as follows:

:::: istio_request_duration :::: # TYPE istio_request_duration_milliseconds histogram istio_request_duration_milliseconds_bucket{response_code="200",reporter="destination",source_workload="podinfo",source_w orkload_namespace="test",source_principal="spiffe://cluster.local/ns/test/sa/default",source_app="podinfo",source_versio n="unknown",source_cluster="c199d81d4e3104a5d90254b2a210914c8",destination_workload="podinfo",destination_workload_names pace="test",destination_principal="spiffe://cluster.local/ns/test/sa/default",destination_app="podinfo",destination_vers ion="unknown",destination_service="podinfo.test.svc.cluster.local",destination_service_name="podinfo",destination_servic e_namespace="test",destination_cluster="c199d81d4e3104a5d90254b2a210914c8",request_protocol="http",response_flags="-",gr pc_response_status="",connection_security_policy="mutual_tls",source_canonical_service="podinfo",destination_canonical_s Ervice = "podinfo", source_canonical_revision = "latest", destination_canonical_revision = "latest", le 10 = "0.5"} istio_request_duration_milliseconds_bucket{response_code="200",reporter="destination",source_workload="podinfo",source_w orkload_namespace="test",source_principal="spiffe://cluster.local/ns/test/sa/default",source_app="podinfo",source_versio n="unknown",source_cluster="c199d81d4e3104a5d90254b2a210914c8",destination_workload="podinfo",destination_workload_names pace="test",destination_principal="spiffe://cluster.local/ns/test/sa/default",destination_app="podinfo",destination_vers ion="unknown",destination_service="podinfo.test.svc.cluster.local",destination_service_name="podinfo",destination_servic e_namespace="test",destination_cluster="c199d81d4e3104a5d90254b2a210914c8",request_protocol="http",response_flags="-",gr pc_response_status="",connection_security_policy="mutual_tls",source_canonical_service="podinfo",destination_canonical_s ervice="podinfo",source_canonical_revision="latest",destination_canonical_revision="latest",le="1"} 10 ...

3 Confirm acquisition (Prometheus)

Finally, we verify that the monitoring metrics generated by the Envoy are captured in real time by Prometheus. Expose the ProMetheus service to the public and request it using a browser. Then enter ISTIO_REQUESTS_TOTAL in the query box and get the results shown in the figure below.

Copyright Notice:The content of this article is contributed by Aliyun real-name registered users, and the copyright belongs to the original author. Aliyun developer community does not own the copyright and does not bear the corresponding legal liability. For specific rules, please refer to User Service Agreement of Alibaba Cloud Developer Community and Guidance on Intellectual Property Protection of Alibaba Cloud Developer Community. If you find any suspected plagiarism in the community, fill in the infringement complaint form to report, once verified, the community will immediately delete the suspected infringing content.