Zhao Huabing, senior engineer of Tencent Cloud, Istio Member, ServiceMesher management committee, Istio project contributor, keen on open source, network and cloud computing. Currently, he is mainly engaged in the open source and RESEARCH and development of the service grid.

The introduction

TCM (Tencent Cloud Mesh) is a Service Mesh hosting Service that is enhanced based on Istio and fully compatible with Istio API. Users can quickly utilize the traffic management and Service governance capabilities provided by Service Mesh at a low cost of migration and maintenance. This series of articles will cover best practices on THE TCM, how to use Spring and OpenTracing to simplify the passing of Tracing context to applications, and how to implement method level fine-grained call Tracing on top of the inter-process call Tracing provided by Istio.

Distributed call tracing and OpenTracing specifications

What is distributed call tracing?

One of the major changes to a traditional Megalithic application is the splitting of the different modules of the application into separate processes. In a microservice architecture, what was once an in-process method call becomes a cross-process RPC call. Debugging and fault analysis of cross-process calls is very difficult compared to single-process method calls, and it is difficult to view and analyze distributed calls using traditional debuggers or log printing.As shown in the figure above, a request from a client goes through multiple microservice processes. If the request is to be analyzed, information about all the services through which the request passes must be collected and correlated, known as “distributed call tracing.”

What is OpenTracing?

CNCF OpenTracing project

OpenTracing is a project under CNCF (Cloud Native Computing Foundation) that includes a standard specification for distributed call tracing, apis, programming frameworks, and libraries for various languages. The purpose of OpenTracing is to define a standard for distributed call tracing to unify the various implementations of distributed call tracing. There are a lot of support OpenTracing specification of Tracer implementation, including Jager, Skywalking, LightStep, etc. In microservice applications, OpenTracing API is adopted to realize distributed call tracing, which can avoid vendor locking and interconnect with any OpenTracing compatible infrastructure at the minimum cost.

OpenTracing conceptual model

The conceptual model of OpenTracing is shown in the following figure:

Figure come fromopentracing.io/As shown in the figure, OpenTracing mainly includes the following concepts:

  • Trace: Describes an end-to-end transaction in a distributed system, such as a request from a client.
  • Span: An operation with a name and a length of time, such as a REST call or database operation. Span is the smallest Trace unit of distributed call Trace. A Trace consists of multiple Span segments.
  • Span context: Distributed call tracing context information, including Trace ID, Span ID, and other content that needs to be passed to downstream services. An implementation of OpenTracing requires that Span context be passed across process boundaries via some serialization mechanism (Wire Protocol) to associate Span in different processes to the same Trace. These Wire protocols can be text-based, such as HTTP headers, or binary.

OpenTracing data model

A Trace can be thought of as a directed acyclic graph (DAG graph) consisting of multiple related spans. Here is a Trace made up of 8 spans:

[Span A] please please please (the root Span) | + -- -- -- -- -- - + -- -- -- -- -- -- + | | [Span B] [C] Span please please please (Span C is A ` ChildOf ` Span A) | | [Span D] + + -- -- -- -- -- -- -- -- -- -- + | | [Span E] [Span F] > > > [Span G] > > > [Span H] write write write (Span G ` FollowsFrom ` Span F)Copy the code

The trace above can also be shown in chronological order as follows:

-- - | -- - | -- - | -- - | -- - | -- - | -- - | -- - | -- - > time [Span A. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·] [Span B · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·] [Span D · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·] [Span C · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·] [Span E · · · · · · ·] [the] Span f. [the] Span g. to [the] Span h.Copy the code

The data structure for Span contains the following:

  • Name: indicates the operation name of Span, for example, the resource name of the REST interface.
  • Start TIMESTAMP: Span indicates the Start time of the operation
  • Finish timestamp: The end time of the operation represented by Span
  • Tags: A series of Tags, each consisting of a key value pair. This tag can be any information that is useful for call analysis, such as method name, URL, and so on.
  • SpanContext: Used to pass span-related information across process boundaries, in combination with a Wire Protocol.
  • References: Childof and FollowsFrom
    • Childof: The most commonly used reference to indicate a direct dependency between the Parent Span and Child Span. Example The relationship between RPC server Span and RPC client Span, or between database SQL insert Span and ORM Save action Span.
    • FollowsFrom: If the Parent Span does not depend on the execution results of the Child Span, the Parent Span can be represented with FollowsFrom. For example, an online store sends an email notification to the user after payment is made, but whether or not the email notification is sent successfully does not affect the status of the payment, which is applicable with FollowsFrom.

Call information propagation across processes

SpanContext is one of the more confusing concepts of OpenTracing. SpanContext is mentioned in the conceptual model of OpenTracing as a context for passing distributed calls across process boundaries. Instead, OpenTracing just defines an abstract interface to SpanContext that encapsulates the contextual content of a Span in a distributed call, including the Trace ID to which the Span belongs, the Span ID, and any other information that needs to be delivered to downstream services. SpanContext itself does not implement cross-process context passing, and is required by Tracer (an implementation that follows the OpenTracing protocol, such as Jaeger, Skywalking’s Tracer) serializes SpanContext and passes it through the Wire Protocol to the next process, which then deserializes SpanContext to obtain relevant context information that can be used to generate Child Span.

In order to provide maximum flexibility for various concrete implementations, OpenTracing only requires that SpanContext be passed across processes, and does not specify how the specific implementation of SpanContext can be serialized and passed across the network. Different Tracers can use different Wire protocols to pass SpanContext depending on their situation.

In distributed calls based on the HTTP protocol, HTTP headers are typically used to pass the contents of SpanContext. The common Wire Protocol includes the B3 HTTP header used by Zipkin, Jaeger uses uber-trace-id HTTP Header,LightStep uses “X-ot-span-context” HTTP Header, and so on. Istio/Envoy supports B3 headers and X-ot-span-context headers and can connect to Zipkin,Jaeger, and LightStep. The following is an example of B3 HTTP header:

X-B3-TraceId: 80f198ee56343ba864fe8b2a57d3eff7
X-B3-ParentSpanId: 05e3ac9a4f6e3b90
X-B3-SpanId: e457b5a2e4d86bd1
X-B3-Sampled: 1
Copy the code

Istio support for distributed call tracing

Istio/Envoy provides distributed call tracing for microservices out of the box. In a microserver system with Istio and Envoy installed, the Envoy intercepts inbound and outbound requests to the service and automatically generates call trace data for each call to the microservice. By plugging in a distributed tracing backend system, such as Zipkin or Jaeger, into the service grid, you can see the details of a distributed request, such as which services were passed through, which REST interfaces were invoked, and how much time was spent on each REST interface.

Note that while Istio/Envoy does most of the work, it still requires a few changes to the application code: The APPLICATION code needs to copy the B3 headers from the upstream HTTP requests it receives into the headers of the HTTP requests it sends downstream to pass the invocation trace context to the downstream service. This part of the code cannot be done by an Envoy because an Envoy is not aware of the business logic in the service it represents and cannot relate incoming and outgoing requests according to business logic. Although this part of the code volume is small, but need to make changes to each HTTP request code, very tedious and easy to miss. Of course, you can make this easier by encapsulating the code that makes the HTTP request into a code base for the business module to use.

The following is a simple example application for an online store to show how Istio provides distributed call tracing. The sample program consists of eshop, Inventory, billing, delivery and several microservices, and the structure is shown in the figure below:The EShop microservice receives the request from the client, and then calls the REST interfaces of the inventory, billing, and Delivery microservices to implement the checkout business logic for the user to purchase the goods. The code for this example can be downloaded from Github:Github.com/aeraki-fram…

As shown in the following code, we need to pass the B3 HTTP Header in the application code of the EShop microservice.

 @RequestMapping(value = "/checkout")
public String checkout(@RequestHeader HttpHeaders headers) {
    String result = "";
    // Use HTTP GET in this demo. In a real world use case,We should use HTTP POST
    // instead.
    // The three services are bundled in one jar for simplicity. To make it work,
    // define three services in Kubernets.
    result += restTemplate.exchange("http://inventory:8080/createOrder", HttpMethod.GET,
            new HttpEntity<>(passTracingHeader(headers)), String.class).getBody();
    result += "<BR>";
    result += restTemplate.exchange("http://billing:8080/payment", HttpMethod.GET,
            new HttpEntity<>(passTracingHeader(headers)), String.class).getBody();
    result += "<BR>";
    result += restTemplate.exchange("http://delivery:8080/arrangeDelivery", HttpMethod.GET,
            new HttpEntity<>(passTracingHeader(headers)), String.class).getBody();
    return result;
}
private HttpHeaders passTracingHeader(HttpHeaders headers) {
    HttpHeaders tracingHeaders = new HttpHeaders();
    extractHeader(headers, tracingHeaders, "x-request-id");
    extractHeader(headers, tracingHeaders, "x-b3-traceid");
    extractHeader(headers, tracingHeaders, "x-b3-spanid");
    extractHeader(headers, tracingHeaders, "x-b3-parentspanid");
    extractHeader(headers, tracingHeaders, "x-b3-sampled");
    extractHeader(headers, tracingHeaders, "x-b3-flags");
    extractHeader(headers, tracingHeaders, "x-ot-span-context");
    return tracingHeaders;
}
Copy the code

Let’s test the eshop instance. We can build our own Kubernetes cluster and install Istio for testing. For convenience, we directly use the fully hosted service grid TCM provided on Tencent Cloud, and add a container service TKE cluster in the created Mesh for testing.

Deploy the program in a TKE cluster to see the effects of Istio distributed call tracing.

git clone [email protected]:aeraki-framework/method-level-tracing-with-istio.git
cd method-level-tracing-with-istio
git checkout without-opentracing
kubectl apply -f k8s/eshop.yaml
Copy the code
  • Open the address in your browser: [http:// INGRESSEXTERNALIP/checkout](http://{INGRESS_EXTERNAL_IP}/checkout](http://%24{ingress_external_ip}/checkout “Http://INGRESSEXTERNALIP/checkout] {INGRESS_EXTERNAL_IP} (http:// / checkout”), to trigger calls eshop sample REST interface of the program.
  • Open the TCM in a browser and view the generated distributed call tracing information.

TCM graphical interface shows the detailed information of this call intuitively. You can see that the client request enters the system from Ingressgateway, and then calls the EShop micro service checkout interface. There are three child span calls of checkout, corresponding to inventory respectively. REST interfaces for billing and Delivery.

Use OpenTracing to pass the distributed trace context

OpenTracing provides A Spring-based code burial point, so we can use the OpenTracing Spring framework to provide the passing of HTTP headers to avoid this hard coding. Using OpenTracing to pass distributed trace context in Spring is as simple as the following two steps:

  • Declare dependencies on the OpenTracing SPring Cloud Starter in the Maven POM file. In addition, since Istio uses Zipkin’s reporting interface, we also need to introduce Zipkin’s dependencies.
  • Declare a Tracer bean in the Spring Application. As shown below, note that we need to set the zpkin reporting address in Istio to OKHttpSernder.
@Bean
    public io.opentracing.Tracer zipkinTracer() {
        String zipkinEndpoint = System.getenv("ZIPKIN_ENDPOINT");
        if (zipkinEndpoint == null || zipkinEndpoint == ""){
            zipkinEndpoint = "http://zipkin.istio-system:9411/api/v2/spans";
        }
        OkHttpSender sender = OkHttpSender.create(zipkinEndpoint);
        Reporter spanReporter = AsyncReporter.create(sender);
        Tracing braveTracing = Tracing.newBuilder()
                .localServiceName("my-service")
                .propagationFactory(B3Propagation.FACTORY)
                .spanReporter(spanReporter)
                .build();
        Tracing braveTracer = Tracing.newBuilder()
                .localServiceName("spring-boot")
                .spanReporter(spanReporter)
                .propagationFactory(B3Propagation.FACTORY)
                .traceId128Bit(true)
                .sampler(Sampler.ALWAYS_SAMPLE)
                .build();
        return BraveTracer.create(braveTracer);
    }
Copy the code

Deploy the version of the program that uses OpenTracing for HTTP header passing. The call trace information is as follows:As you can see from the figure above, instead of passing the HTTP headers directly in the application code, the same call with OpenTracing burying the code adds seven Span prefixed with spring-boot, which are generated by the OpenTracing tracer. Although we don’t show the creation of these spans in the code, OpenTracing’s code burying point automatically generates a Span for each REST request and associates it based on the calling relationship.

The Span generated by OpenTracing provides us with more detailed distributed call tracing information, from which we can analyze the time of each step of an HTTP call from the application code on the client side to the Envoy on the client side, to the Envoy on the server side, and finally to the server side to receive the request. As can be seen from the figure, the forwarding time of an Envoy is about 1 millisecond, which is very short compared to the processing time of the business code. For this application, the processing and forwarding of an Envoy have little impact on the processing efficiency of the business request.

Add method level call trace information to the Istio call trace chain

Istio/Envoy provides call chain information across service boundaries. In most cases, service-grained call chain information is sufficient for system performance and fault analysis. But for some services, more granular invocation information is needed, such as the business logic inside a REST request and the time consuming of the database access, respectively. In this case, we need to bury the point in the service code and associate the call trace data reported in the service code with the call trace data generated in the Envoy to uniformly present the call data generated in the Envoy and the service code.

The code for adding call traces to methods is similar, so we do it with AOP + Annotation to simplify the code. Start by defining an annotation and the corresponding AOP implementation logic:

@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
@Documented
public @interface Traced {
}
Copy the code
@Aspect
@Component
public class TracingAspect {
    @Autowired
    Tracer tracer;
    @Around("@annotation(com.zhaohuabing.demo.instrument.Traced)")
    public Object aroundAdvice(ProceedingJoinPoint jp) throws Throwable {
        String class_name = jp.getTarget().getClass().getName();
        String method_name = jp.getSignature().getName();
        Span span = tracer.buildSpan(class_name + "." + method_name).withTag("class", class_name)
                .withTag("method", method_name).start();
        Object result = jp.proceed();
        span.finish();
        return result;
    }
}
Copy the code

Then annotate the methods that need call tracing:

@Component public class DBAccess { @Traced public void save2db() { try { Thread.sleep((long) (Math.random() * 100)); } catch (InterruptedException e) { e.printStackTrace(); }}}Copy the code
@Component public class BankTransaction { @Traced public void transfer() { try { Thread.sleep((long) (Math.random() * 100)); } catch (InterruptedException e) { e.printStackTrace(); }}}Copy the code

Master Branch of the Demo program has been added to the method level code trace and can be deployed directly.

git checkout master
kubectl apply -f k8s/eshop.yaml

Copy the code

The effect is shown in the figure below. You can see that the Span of transfer and save2DB methods has been added to the trace.You can open the Span of a method to see details, such as the Name of the Java class and the method called, and you can add information, such as the exception stack when an exception occurs, as needed in the AOP code.

conclusion

Istio/Envoy provides distributed call tracing for microservice applications, improving the visibility of service calls. Instead of applying hard coding, we can use OpenTracing to pass the relevant HTTP headers for the distributed trace; You can also use OpenTracing to add method level call information to the call chain trace information that Istio/Envoy provides by default to provide finer grained call trace information.

The next step

In addition to synchronous invocation, asynchronous messaging is a common form of communication in microservice architectures. In the next article, I’ll continue to use the Eshop Demo program to explore how to incorporate Kafka asynchronous messages into Istio’s distributed call tracing through OpenTracing.

The resources
  1. Source code for the eshop sample program in this article
  2. Opentracing docs
  3. Opentracing specification
  4. Opentracing wire protocols
  5. Istio Trace context propagation
  6. Zipkin-b3-propagation
  7. OpenTracing Project Deep Dive