[TOC]

Istio Trace Link tracing solution

Istio Trace support

Envoy support trace

Envoy native supports access to distributed tracking systems such as Jaeger and Zipkin, and the official documentation for Envoy Tracing shows that envoy supports the following trace features:

  • Generate a Request Id and fill in the HTTP header field x-request-id
  • External trace service integration, such as support for LightStep, Zipkin or any Zipkin-compatible back end (such as Jaeger)
  • Add the Client trace ID

For more information, please refer to Jaeger Tracing here or the envoy’s official documentation, as well as trace Tracing in the source code.

Istio support trace

According to Istio’s distributed tracking introduction, Istio’s envoy proxy will proactively report traffic to the Trace system after intercepting it. The address of the Trace system is specified by proxy parameter zipkinAddress, so that it will not pass through mixer again. Direct envoy interaction with trace system, general flow:

  • If incoming requests do not trace the associated headers, a root span is created before traffic enters pods

  • If an incoming request contains headers traced, Sidecar’s proxy will extract context information for those spans and then create a new span that inherits from the previous span before traffic enters pods

Since Istio’s proxy is envoy and Envoy supports Jaeger natively, Istio naturally supports Jaeger. This is explained in the official document Distributed Tracing

Of course, the default is reported directly through the envoy proxy, but it is also feasible to report via Mixer. You can configure it and perform additional processing. For details, please refer to Istio 08: Is the embedding point of the call chain really “zero modification”?

However, the current Trace scheme supported by Envoy is relatively simple. Adoption policies cannot be applied to all of Jaeger’s policies, and there is no support for different sampling strategies for different businesses, so Istio configuration is also global.

~ / goDev/Applications/SRC/Istio. IO/Istio/pilot/PKG/networking/core/v1alpha3 / listener. Go in the source code is as follows:

	if env.Mesh.EnableTracing {
		tc := model.GetTraceConfig()
		connectionManager.Tracing = &http_conn.HttpConnectionManager_Tracing{
			OperationName: httpOpts.direction,
			ClientSampling: &envoy_type.Percent{
				Value: tc.ClientSampling,
			},
			RandomSampling: &envoy_type.Percent{
				Value: tc.RandomSampling,
			},
			OverallSampling: &envoy_type.Percent{
				Value: tc.OverallSampling,
			},
		}
		connectionManager.GenerateRequestId = &google_protobuf.BoolValue{Value: true}}Copy the code

Persistent storage of Trace(Jaeger)

Jaeger’s status in Istio

For more information about Jaegertracing, see the architecture introduction on the official website

Simple deployment of Jaeger

Jaeger Tracing/All-In-one, which includes three components, Jaeger-Agent, Jaeger-Collector, and Jaeger-Query. The jaeger-collector will store the data, and the all-in-one image is currently only stored in memory, i.e. temporary storage. If the POD is deleted and restarted, the Jaeger data will be lost. Jaegertracing/All-in-one Image memory storage related.

For this reason, we need to consider how to specify the jaeger-collector data store as our own storage service, such as ES cluster. We can only deploy a set of Jaeger service by ourselves, or let the jaeger service agent collection address point to our own service

Doc. Istio. Cn/en/help/FAQ… Istio. IO/docs/refere…

In the yamL configuration, only trace_zipkin_URL was set:

      containers:
      - name: mixer
        image: "Docker. IO/Istio/mixer: 1.0.0"
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 9093
        - containerPort: 42422
        args:
          - --address
          - unix:///sock/mixer.socket
          - --configStoreURL=k8s://
          - --configDefaultNamespace=Istio-system
          - --trace_zipkin_url=http://zipkin:9411/api/v1/spans
Copy the code

This is a trace for the Mixer component only; if it is an envoy’s own TARce, modify the parameters of Proxyv2

Jaeger persistent storage scheme

In the Istio open source version, the call chain to the Jaeger is dispatched directly from the envoy, not through Mixer.

Because the data volume of the call chain will be large, reliability and scale need to be considered, so we must use our own service. If we want to configure it as our own Jaeger service, we need to use Kubectl to modify the zipkinAddress in Istio configmap and configure it as our own service address. The other thing is that Jaeger data is zipkin-compatible.

Kubectl get configmap Istio -n Istio - system - o yaml | grep zipkinAddress two zipkinAddress address, as follows: zipkin. Istio - system: 9411Copy the code

For Huawei cloud, here they will guarantee the reliability and performance of APM services received from Huawei Cloud.

Connect to external trace system [Jaeger]

K8S independently deploys Jaeger components

Deploy containers based on K8S Jaeger in Production environment according to K8S installation and deployment Jaeger official document, pay attention to adopt the Production generated deployment mode in the document.

Elasticserach is a Running pod, and the rest of the Jaeger components are dependent on Elasticserach

Jaeger-query kubectl get Service jaeger-query kubectl get Service jaeger-query kubectl get Service jaeger-query kubectl get Service jaeger-query For example, 2.2 Jaeger Query UI for Cluster Independent Deployment

It is relatively troublesome to deploy jAEger. Some parameters need to be set, which requires certain learning ability and understanding of ports and protocols exposed by Jaeger. Then you need to have some knowledge of storage engines such as ES

In addition, if you deploy through a binary installation, it is relatively easy and you only need to pay attention to the startup parameters

Modify Istio to an existing Jaeger service

Through kubectl get configmap istio -n istio – system – o yaml | grep zipkinAddress directly modify zipkinAddress address first, Specifies the address of the Service jaeger-collector with port 9411, which is the only zipkin compatible address. Because the envoy proxy defaults to using an environment variable to set the zipkinAddress address, the default is Zipkin.istio-system :9411. After the configMap is modified, if the helm upgrade is later updated, the data that is now modified in this way will become invalid, so it needs to be completely modified. Must be through the modified install/kubernetes/helm/istio /. / templates/configmap yaml, And to modify the install/kubernetes/helm/istio / / charts/mixer/templates zipkin the address below.

If the istio.yaml template file is deployed, you need to:

  • Modify zipkinAddress: zipkin.istio-system :9411 in istio.yaml

    • Change to: zipkinAddress: 10.233.61.200:9411
  • Modify the Mixer-related trace_zipkin_URL address in the istio.yaml file

    • Is amended as: – trace_zipkin_url = http://10.233.61.200:9411/api/v1/spans [ClusterIP only for K8S within the cluster 】
  • Modify the zipkinAddress args associated with proxyv2 in the istio.yaml file

    • 10.233.61.200:9411 [ClusterIP for K8S only]

Other notes:

  • Change all zipkinAddress related addresses to the address of the jaeger-collector Service. If the Service is a K8S cluster, you can directly configure it as a ClusterIP address, but you need to configure a global domain name

    • Istio specifies the zipkinAddress address in the configmap. Proxy uses this environment variable, but deployed POD does not take effect because it is manually injected. Proxy environment variables are only used when istioctl kube-inject. Therefore, you need to delete Deployment and redeploy to take effect. If it is auto-injected, you should only need to restart the POD.
  • Jaeger-agent the jaeger-agent service, if Istio is configured with an external Jaeger address, does not pass through jaeger-Agent and can therefore be disabled

  • Since the data is stored via ES, now kill the Pod associated with Jaeger and the data will still exist after the restart

  • For external Jaeger services, Istio’s own Jaeger services can be shut down

Modify and configure the sampling policy

Jaeger itself supports adjusting sampling policies on the client side and via the collector, but Istio does not have Jaeger’s client, only trace support in the envoy. Modifying the envoy trace source directly is not friendly. However, Istio provides a global setting that can be used to control adoption policies by setting pilot parameters.

Istio sampling process is roughly as follows: In the traffic management interface of V1alpha3 in pilot, when providing Http filters for listeners, it will determine: Whether the trace of the mesh is enabled. If the trace is enabled, the configuration is adopted to create and read the trace. Sampling is configured through the environment variable PILOT_TRACE_SAMPLING, and the range is 0.0-100.0.

There are two ways to modify:

  • TraceSampling: helm installation options: pilot. TraceSampling

  • Modify PILOT_TRACE_SAMPLING variable by using kubectl -n istio-system edit deploy istio-pilot

The specific details

When the value of PILOT_TRACE_SAMPLING is 100, it means full sampling. Every request will be sampled, which can be verified. Visit the test page, record the number of initiated requests, and then check Jaeger Trace UI. Check the Trace for ProductPage where Services is productPage and find that the number of times the request was made is the same as the number of times the Trace was made.

When the value of PILOT_TRACE_SAMPLING is 50, it means sampling 1/2. Sampling will be performed once for each two requests. After modification, wait for a short time and then verify. The verification result shows that it is not 1:2 now, but it is not exactly a probability of 1/2. The specific analysis of Jaeger’s principle is still needed, but at least it proves that the modification of the strategy is effective

Connect to the Mtrace system

The existing Mtrace system, based on the original Jaeger has made some adjustments, using the Protobuf protocol, while adding Kafka. So, to connect to the MTrace system, you need to do some proxy tweaks, not just configuration parameters and addresses, etc

Precautions for service access Trace links

A. Services process HTTP headers

Istio can intercept traffic and automatically send Span messages, but to track and link the entire process together, the business needs to process the HTTP headers associated with the trace in the code so that the agent can properly unify the same trace process when sending Span messages. The specific details

In systems such as Mtrace or native Jaeger, there is a client role that creates and initializes a trace and handles TraceID. However, after Istio, there is no SDK for Mtrace to send to the client. Therefore, HTTP headers need to be processed in the business code, but for non-HTTP such as TCP need to extend support, this is not considered for the moment; For gRPC, it can be added in several ways because it is also based on HTTP.

Therefore, the trace part is not completely without any intrusion. There is a small change in the business code, which requires additional processing of the specified HTTP headers and passing them in turn

B. Istio sets the sampling percentage

The default value of percentage is 100. For all samples, you can change it to 0-100. There are two ways to modify:

  • TraceSampling: helm installation options: pilot. TraceSampling

  • Modify PILOT_TRACE_SAMPLING variable by using kubectl -n istio-system edit deploy istio-pilot

The specific details

This setting is global and there is no way to have a specific sampling strategy for a particular business. If you need to sample a particular service, you need to configure a static JSON policy file for the collector, which also needs client support. For details, refer to the Jaeger official documentation.

Question & TODO

  1. HTTP headers need to be handled in business code, but what about non-HTTP?

    • Both HTTP and gRPC require the business to handle the header itself
    • TCP requires custom extension fields
  2. Reported not by envoy, but by Mixer

  3. Supports different adoption policies for different services

reference

Istio distributed tracking documents

Jaeger-kubernetes production deployment

Istio 08: Is the call chain burying point really “zero modification”?

Welcome to follow my wechat official account: Linux server system development, and we will vigorously send quality articles through wechat official account in the future.