In software engineering, Tracing refers to Logging and Metrics, while Tracing refers to Logging and Metrics.

  • Logging: It is used to record discrete events, including detailed information about the execution of a program to a certain point or stage, for example, debugging information or error information of an application. It’s what we use to diagnose problems.
  • Metrics: Used to record aggregable data, usually of a fixed type over time, each of which is a logical unit of measurement, or a bar chart over a time period. For example, the current depth of a queue can be defined as a unit of measurement, which is updated as it is written or read. The number of incoming HTTP requests can be defined as a counter for simple accumulation; The execution time of a request can be defined as a bar chart, updated and aggregated over a specified slice of time.
  • Tracing: Tracing is used to record the processing information within the scope of a single request, including service invocation and processing duration. For example, the RPC execution process of a call to a remote service. An actual SQL query; The service ID of an HTTP request. It’s our tool for troubleshooting system performance problems.

As the architecture shifts from singleton to microservice, a single request often involves calls between multiple services. As the number of services increases and the internal invocation chain becomes more complex, it is difficult to “See the Whole Picture” with logs and performance monitoring alone, and it is no different when it comes to troubleshooting problems or performance analysis.

Distributed tracking system (Tracing) aims to analyze the request behind what service, call the service call sequence and time consuming, the reason for the error, etc., to help developers, intuitive analysis request link quickly locate performance bottlenecks, gradual optimization services rely on, can also help developers from a more macro perspective to better understand the whole distributed system.

As early as 2005, Google deployed a Distributed tracking system called Dapper internally and published a paper called Dapper, A Large-scale Distributed Systems Tracing Infrastructure. The design and implementation of the distributed tracking system are described, which can be regarded as the originator of the distributed tracking field. Later, various manufacturers have launched some excellent distributed tracking systems, such as Jaeger (Uber), Zipkin (Twitter), X-ray (AWS), SkyWalking, etc. However, different distributed tracking schemes may be incompatible with each other, so OpenTracing was born.

1. OpenTracing

OpenTracing standardization is a lightweight layer, it is located in the library < / application > and < tracking or log analysis program >, by providing a platform independent, vendor-neutral API, allowing developers to easily add (or replace) tracking system, solve different distributed tracking system API incompatible problems, It also makes it possible to add support for distributed tracing to the common code base.

  • A set of interfaces that are independent of the background. The traced service only needs to call this interface and can be supported by any trace background (such as Zipkin, Jaeger, etc.) that implements this interface. As a trace background, as long as this interface is implemented, any service that calls this interface can be traced.
  • Standardizing the management of tracking the minimum unit Span: defining the start Span, end Span, and logging Span time apis.
  • Standardizing the way trace data is transferred between processes: an API is defined to facilitate trace data transfer.
  • Standardized management of the current Span within a process: defined apis to store and retrieve the current Span.
  • No coding standards: No coding standards for trace data that is passed between processes, no coding standards for trace data that is sent to the background, and let the tracing background decide which coding method is best for them.

OpenTracing has been introduced to CNCF and is providing unified concepts and data standards for global distributed tracing in multiple languages: github.com/opentracing… API for Java: github.com/opentracing…

  • Opentracing – API is a pure API with no dependencies
  • Opentracing noop, which implements the API, but is empty and does nothing, relies on opentracing API
  • Opentracing -util, which contains a GlobalTracer and a simple Thread_local based implementation * ScopeManager, relies on OpenTracing – API and Opentracing -noop
  • Opentracing – Mock, a mock test that contains a simple MockTracer that stores data into memory, relying on OpenTracing – API, OpenTracing -noop, opentracing-util
  • Opentracing – Testbed for testing and trying new features

There are three important interrelated types in the OpenTracing data model: Tracer, Span, and SpanContext.

  • Trace: indicates a complete request link
  • Span: a call process (need to have a start and end time)
  • SpanContext: Global context information for the Trace, such as traceId

Trace

A Trace represents the execution of a transaction or process in a (distributed) system. A Trace (call chain) can be thought of as a directed acyclic graph (DAG) consisting of multiple spans, implicitly defined by the Span belonging to the call chain.

The Tracer interface is used to create spans and handles Inject(serialize) and Extract (deserialize), which are delivered across process boundaries.

Span

A Span represents a logical unit of operation in the system with a start time and an execution time. A logical causal relationship is established between spans by nesting or ordering them. A Span can be understood as a method call, a block call, or an RPC/ database access, as long as a program access has a full time cycle. Each Span contains the following states:

  • An operation name indicates the operation name
  • A start timestamp indicates the start time
  • A Finish timestamp, end time
  • Span Tag: A collection of Span tags consisting of a set of key-value pairs. In key-value pairs, the key must be string, and the value can be string, Boolean, or numeric
  • Span Log: A collection of Span logs. Each log operation contains a key-value pair and a timestamp. In a key-value pair, the key must be string and the value can be of any type
  • SpanContext, Span context object
  • ChildOf (parent, parent Span depends to some extent on child Span) and FollowsFrom (follow, parent nodes do not in any way follow the execution results of their children)

SpanContext

Span context object that represents the state passed across process boundaries to subspans. Each SpanContext contains the following states:

  • Any implementation of OpenTracing needs to transfer the state of the current calling chain (for example, the ID of trace and span) across process boundaries depending on a unique span
  • Baggage Items, the Trace’s accompanying data, is a collection of key-value pairs that exist in Trace and also need to be transported across process boundaries

Users of OpenTracing simply need to use SpanContext and References when creating spans, injecting, and extracting transport protocols.

2. Jaeger

Jaeger is Uber’s open source distributed tracking system (github.com/jaegertraci… API (Java language support: github.com/jaegertraci…

  • Jaeger-client: A jaeger client that implements the OpenTracing API and supports mainstream programming languages. The client integrates directly into the application and passes trace information to the Jaeger-agent according to the specified sampling strategy, a process commonly referred to as burying.
  • Jaeger-agent: a network daemon that listens for trace information received on UDP ports and sends data to jaeger-Collector in batches. It is designed as a basic component that is deployed to all hosts, and the Agent decouples the client from the Collector, masking the routing for the client and discovering the collector’s details.
  • Jaeger-collector: Receives data sent by jaeger-agent, processes it asynchronously, and stores the data to DB. It is designed as a stateless component so that any number of Jaeger-collectors can be run simultaneously.
  • Jaeger-query: Receives a query request, retrieves trace information from the DB and displays it through the UI. Query is stateless, and multiple instances can be started and deployed behind load balancers like Nginx.
  • Jaeger-ingester: reads data from Kafka and writes it to Jaeger’s back-end storage, such as Cassandra and Elasticsearch.

Distributed tracking system is generally divided into three parts, data acquisition, data persistence, data display.

  • Data collection refers to burying points in the code, setting the phase to be reported in the request, and setting the parent phase to which the currently recorded phase belongs.
  • For example, Jaeger supports multiple storage backends, such as Cassandra and Elasticsearch.
  • Data display is the request phase associated with the front-end TraceId query and is presented on the interface.

3. dd-trace-java

Dd – trace – Java (github.com/DataDog/dd-…

Start entry in AgentBootstrap premain method:

  • AgentInstaller. InstallBytebuddyAgent: registered various buried point plug-in support for different components
  • TracerInstaller. InstallGlobalTracer: register a global Tracer
public class AgentInstaller {
 
    public static ResettableClassFileTransformer installBytebuddyAgent(final Instrumentation inst) {
        AgentBuilder agentBuilder =
                new AgentBuilder.Default()
                        .disableClassFormatChanges()
                        .with(AgentBuilder.RedefinitionStrategy.RETRANSFORMATION)
                        .with(new RedefinitionLoggingListener())
                        .with(AgentBuilder.DescriptionStrategy.Default.POOL_ONLY)
                        .with(AgentTooling.poolStrategy())
                        .with(new TransformLoggingListener())
                        .with(new ClassLoadListener())
                        .with(AgentTooling.locationStrategy())
                        .ignore(any(), skipClassLoader())
                        .or(nameStartsWith("datadog.trace."))
                        .or(nameStartsWith("datadog.opentracing."))
                        .or(nameStartsWith("datadog.slf4j."))
                        .or(nameStartsWith("java.").and(not(nameStartsWith("java.util.concurrent."))))
                        .or(nameStartsWith("com.sun."))
                        .or(nameStartsWith("sun.").and(not(nameStartsWith("sun.net.www."))))
                        .or(nameStartsWith("jdk."))
                        .or(nameStartsWith("org.aspectj."))
                        .or(nameStartsWith("org.groovy."))
                        .or(nameStartsWith("com.p6spy."))
                        .or(nameStartsWith("org.slf4j."))
                        .or(nameContains("javassist"))
                        .or(nameContains(".asm."))
                        .or(nameMatches("com\\.mchange\\.v2\\.c3p0\\..*Proxy"));
 
        for (final Instrumenter instrumenter : ServiceLoader.load(Instrumenter.class)) {
            log.info("Loading instrumentation {}", instrumenter.getClass().getName());
            agentBuilder = instrumenter.instrument(agentBuilder);
        }
        return agentBuilder.installOn(inst);
    }
}
Copy the code

Service instruments. Default can be used for plug-in development to support embedded points of different components.

@AutoService(Instrumenter.class) public class MDCInjectionInstrumentation extends Instrumenter.Default { private static final String mdcClassName = "org.TMP.MDC".replaceFirst("TMP", "slf4j"); @Override protected boolean defaultEnabled() { return Config.get().isLogsInjectionEnabled(); } @Override public ElementMatcher<? super TypeDescription> typeMatcher() { return named(mdcClassName); } @Override public void postMatch( final TypeDescription typeDescription, final ClassLoader classLoader, final JavaModule module, final Class<? > classBeingRedefined, final ProtectionDomain protectionDomain) { if (classBeingRedefined ! = null) { MDCAdvice.mdcClassInitialized(classBeingRedefined); } } @Override public Map<? extends ElementMatcher<? super MethodDescription>, String> transformers() { return singletonMap( isTypeInitializer(), MDCInjectionInstrumentation.class.getName() + "$MDCAdvice"); } @Override public String[] helperClassNames() { return new String[]{LogContextScopeListener.class.getName()}; } public static class MDCAdvice { @Advice.OnMethodExit(suppress = Throwable.class) public static void mdcClassInitialized(@Advice.Origin final Class mdcClass) { try { final Method putMethod = mdcClass.getMethod("put", String.class, String.class); final Method removeMethod = mdcClass.getMethod("remove", String.class); GlobalTracer.get().addScopeListener(new LogContextScopeListener(putMethod, removeMethod)); } catch (final NoSuchMethodException e) { org.slf4j.LoggerFactory.getLogger(mdcClass).debug("Failed to add MDC span listener", e); }}}}Copy the code

End

OpenTracing builds a standard that addresses the incompatibility of buried apis for different distributed tracing systems (similar to SLF4J); Uber’s open source Jaeger provides a complete set of distributed tracking solutions (compatible with OpenTracing API), including data collection, data persistence, and data presentation; Datadog open source DD-trace-Java is an APM client for Java (relying on Jaeger-client-Java), using bytecode injection technology (JavaAgent) for embedding, supporting plug-in development for different components.