This is the 11th day of my participation in the Gwen Challenge in November. Check out the details: The last Gwen Challenge in 2021

This article is recently done in the company to share, synchronous here hope more students understand the distributed link

In the past 10-15 years, under the background of the mobile Internet era, app products of more than 10 million level were born in every few days. Any operation we perform on the client usually requires the cooperation of multiple modules, basic components and machines of the back-end service to complete the processing and response of the request. In this series of requests, which may be serial or parallel, how to determine which applications and modules are called behind an operation of the client side, which nodes are passed through, what is the sequence of the invocation of each module, and what are the performance problems of each module? With the increasing complexity of business system model, a set of link tracing system is urgently needed in distributed system to solve these pain points.

Background of distributed link tracking

In the singleton era, we don’t need to spend time worrying about call links. However, link tracing is not only available in distributed scenarios, but also in single applications. For example, if we take each service interface in the application as a link node, then we can form a complete link by connecting all the method interfaces in the process from request input to return response, as shown in the figure below:

For individual applications, if access to a resource fails, we can quickly identify which machine it is and then locate the problem by querying the logs on that machine.

But in microservices architectures, this approach can be very weak. For a larger application, a request may span quite a number of service nodes, in which case, if a request is not successfully responded to, it is not clear which node is the problem.

1. ApplicationC may have multiple instances

2. ApplicationC may fail because C calls D fail

Therefore, in the face of such a complex distributed cluster implementation of the service system, it needs some help to understand the online call behavior of each application, and can analyze the remote call components.

Distributed link tracking classification

Currently, there are a number of well-known link tracing systems in the industry, such as: Twitter’s Zipkin, Naver Pinpoint, EagleEye, Dianping’s CAT, Huawei SkyWalking, Spring Cloud Sleuth (essentially Zipkin-Brave), SOFATracer of ants. From these products, we can roughly fall into two categories:

  • In the narrow sense, link tracing only includes the collection of link tracing data

    • Spring Cloud Sleuth、 SOFATracer
  • Link tracing (APM system) in the broad sense: it includes the collection, storage and presentation of link tracing data

    • Pinpoint, SkyWalking, CAT

Modern distributed link tracking theory

The accepted origin of modern distributed link tracing is Google’s 2010 paper “Dapper: A large-scale Distributed Systems Tracing Infrastructure, but actually before that, there have been cross-service Tracing Systems like X-Trace, Magpie, Pinpoint, etc. However, in the early stage of their development, these systems tend to be research-oriented and lack practical demonstration of engineering systems. By contrast, at Google, Where Dapper has been working in mass production since 2004, the paper attracted a lot of attention. Most of the distributed link tracking component products mentioned above were directly influenced by Dapper’s paper.

Dapper interpretation

This figure covers the practice summary of Dapper in distributed links; Here not to delve into Dapper is how to implement such a set of system components (closed source itself), but the concept, the goal and requirements proposed by the authors for the distributed link tracking provides the basic direction, also make the 10 years after its release, has produced a large number of excellent distributed link tracking products, for micro service sector increased by a thick ink.

OpenTracing from Dapper

If Dapper lays the foundation for modern distributed link tracking theory, Then OpenTracing defines the overall model for the building of distributed link tracking system.

Neither Dapper himself nor HTrace, X-Trace, the products evolved from the internal scene of the company (even commercial products), can escape the problem of platform-independent, vain-independent API, although these systems all have very similar API. But developers of various languages still have a hard time integrating their own systems (using different languages and technologies) with specific distributed tracing systems; This is no doubt unacceptable for some companies with large volumes and complex technology stacks or cloud vendors. This also reflects the problem that there are many similar products emerging from the community or cloud service vendors.

The emergence of OpenTracing makes it possible for these independent distributed link tracking systems to be interconnected and compatible.

OpenTracing

OpenTracing enables developers to easily track (or change) system implementations by providing platform-independent, vendor-independent apis.

This paragraph may be more abstract, and can not intuitively know what it is; Here are two scenarios of problems that OpenTracing can solve using an example.

scenario

System interactions using different link products

A-B system, respectively, the use of Skywalking and pinpoint two different link components buried point, because of the Skywalking and pinpoint transmission protocol is different, so the link data from A to B, because B can not be analyzed, so the link is broken.

The link system of the application is switched

A-b-c-d 4 systems before the use of the pinpoint link buried point, due to the need to switch cloud manufacturers, need to switch zipkin up, how compatible?

From these two small cases, it can be seen that different link tracking systems have different storage in models or APIS. When distributed systems have multiple embedded components or want to switch, it is very troublesome or even unavailable.

Going back to the introduction of OpenTracing, it’s easy to understand what it means to make it easy for developers to implement (or change) trace systems by providing platform-independent, vendor-independent apis.

OpenTracing

In addition to being platform and vendor independent, OpenTracing is actually language independent.

Application code and OSS packages are programmed against the abstract OpenTracing API, which describes the path of requests within each process and the propagation between processes. The OpenTracing implementation controls the buffering and encoding of trace span data, and they also control the semantics of process-to-process trace context information. Therefore, application code can describe and propagate tracing without making any assumptions about the OpenTracing implementation.

In other words, it is easier to understand that it is not tied to a specific implementation, but can adapt to different implementations in different scenarios. Isn’t this somewhat similar to slF4J’s facade pattern? OpenTracing on API design concept, is indeed and slf4j is similar, except API OpenTracing also unified in the model, also see: OpenTracing. IO/docs/overvi… (PS: Wu Sheng’s translated version has a delay compared with the official version. The Chinese version was updated in 2018, but the overall difference is not very big)

Instead of introducing OpenTracing separately, you can read the official documentation directly.

For example, Trace should provide the create Span capability, and Span should provide the close interface. I’ll skip over the details here.

The data model

trace

Spans in OpenTracing are implicitly defined by spans. A Trace can be considered as a set of spans (DAG). The spans boundary relationship is called References.

[Span A] please please please (the root Span) | + -- -- -- -- -- - + -- -- -- -- -- -- + | | [Span B] [C] Span please please please (Span C is A ` ChildOf ` Span A) | | [Span D] + + -- -- -- -- -- -- -- -- -- -- + | | [Span E] [Span F] > > > [Span G] > > > [Span H] write write write (Span G ` FollowsFrom ` Span F)Copy the code

span

  • Name:

    • A name, such as a method name
  • The start time
  • The end of time
  • A set of Tags

    • For example, the HTTP code returned by web MVC. http.code: 200
  • A set of Logs

    • For example, the HTTP code returned by Web MVC at point XX. http.code: 200
  • SpanContext

    • As mentioned earlier, SpanContext is used to organize data passing through across processes
  • References

    • Causality between spans is generally maintained through the spanContext, not the SPAN

SpanContext

The SpanContext needs to include the tracerId, spanId, parentId, sample tags and some Baggage (Baggage, also K-V) of the current span. Normally, Baggage stores information that needs to be passed through the downstream link. Baggage can be subdivided into bizBaggage and sysBaggage. Authentication information is stored in bizBaggage. (When invoking internal services, you only need to check whether authentication information exists in bizBaggage, and authentication is verified at the traffic entry point.)

The shortage of the Opentracing

There are two current relationships between the spans of OpenTracing:

  • child-of
  • follow-from

A, B, E, C, E, D, E

C follow-from B

There is one scenario that is not covered, that is, how to deal with the divide-and-conquer model of fork-join. In the divide-and-conquer framework, each task is independent thread, independent of each other, and the results of their execution are combined as the final result. In this case, Child-of and follow-from cannot be described. This issue has been lying in OT for 2 years, and nothing has happened.

Ant distributed link component SOFATracer

Through the implementation of SOFATracer, to subdivide, a distributed link component buried point need to pay attention to the point.

Globally unique tracerId

The unique identification of a link is the tracerId, which is passed along with the SpanContext to any service node that a request passes through. As mentioned in Dapper’s article, the initial node takes a little longer than the others, mainly because some time is spent creating the tracerId. For a mature link component, the generation of tracerId should adhere to two basic principles:

  • Low overhead
  • Avoid conflict (i.e., no duplication)
/ IP Timestamp Sequence PID
The length of the 32 bit 64 bit 16 bit 32 bit
For example, 0ad1348f 1403169275002 1003 56696

The figure above shows the generation rule of SOFATracer tracerId, where IP and PID can be cached directly, so the time is mainly in obtaining the current timestamp and generating random strings.

Non-inductive access

Insensitive access here refers to insensitive access at the level of business logic without modifying any business code logic. At present, the mainstream link tracking products basically achieve non-inductive access, mainly in two ways:

  • The agent into
  • Starter Dependency import

SOFATracer uses the introduction of tracer-SOFA -boot-starter to achieve non-inductive access to buried points.

pluggable

The link tracing system provides optional plug-ins or switches to select components that need to be buried, for example when my system only provides Web MVC services, there is no need to bring in any Dubbo-related burying plug-in dependencies.

All optional plug-ins provided by SOFATracer: github.com/sofastack/s…

The data reported

  • Print to disk
  • Report zipkin
  • Report to Jaeger (to merge)
  • Reporting skyWalking (Pending merge)

Report the timing

Take the above call as an example:

  • When CS (Client Send) is used as the traffic outlet, a SPAN-1 is generated.
  • Server Receive (SR) when the server receives a request as the traffic inlet, a SPAN-2 is generated
  • When server send (SS) is used, it indicates that all logic processes on the server are complete, and the span 2 generated by the SR ends and needs to be closed
  • When the client receives (CR) request, the span generated by CS-1 ends and needs to be closed

Span-1 is parent of SPan-2, span-2 child of SPan-1

Reporting process

Disruptor based lockless log down

This logic borrows from the log4j2 drop-disk implementation, primarily through Disruptor. Rough logic: SofaTracer’s report behavior is triggered when the span#finish method executes; The Report will eventually place the current SPAN data into the Disruptor queue and publish a SofaTracerSpanEvent. Disruptor’s Consumer EventHandler implementation class, Consumer, listens for the current queue event and then flusher span data to disk in the callback function onEvent.

Report zipkin

Model compatible, report directly to Zipkin

Report the jaeger

The following is a partial diagram of Jaeger data reporting, where the CommandQueue holds the refresh or add instructions, the producer is the sampler and flush timer, and the consumer is the queue processor. The sampler determines that a span needs to be reported and adds a AppendCommand to the CommandQueue. The Flush timer keeps adding FlushCommands to the queue based on the set flushInterval. The queue handler continuously reads AppendCommand or FlushCommand from the commandQueue to determine whether it is an AppendCommand or a FlushCommand that sends data from the current byteBuffer to the receiving end, or an add instruction that adds the span to the buffer temporarily.

Jaeger supports two protocols for reporting, HTTP and UDP. SofaTracer uses UdpSender to send Span data to Jaeger Agent, and uses HttpSender to send data directly to Jaeger-Collector

Report skywakling

At present, the community is still in progress. Compared with Zipkin and Jaeger, skywakling reporting is relatively complicated, mainly because skywakling uses segments to describe nodes instead of span of Opentracing.

The segment data in Json format is reported to the backend using HTTP. The unit of the report is Message. Multiple segments are combined into one message. The process is as shown in the figure below. After the span is over, the transformed segment is added to the segment buffer array, and another thread keeps refreshing data to Message in the array. When the size of Message reaches the maximum or the waiting time for sending reaches the set value, the data is sent once. The maximum value of message is 2MB by default.

The sampling

It is not necessary to collect all links in a distributed system; in fact, it is sufficient to collect only a few of them. This reduces wastage. Sampling is mentioned in Dapper:

Each request takes advantage of a large number of high-throughput online services on servers, which is one of the primary requirements for effective tracking. This situation requires large amounts of trace data to be generated, and their impact on performance is most sensitive. The delay and throughput losses are all within the experimental error range after the sampling rate is adjusted to less than 1/16.

In practice, we found that even adjusting the sample rate to 1⁄1024 would still be enough trace data to track a large number of services. It is important to keep the performance loss baseline of link tracking systems at a very low level, because it provides a loose environment for applications to use the full Annotation API without fear of performance loss. The use of lower sampling rates has the added benefit of allowing trace data persisted to hard disk to be retained longer before being processed by the garbage collection mechanism, thus providing more flexibility for the collection component of the link tracking system.

The consumption of any given process in a distributed link tracking system is proportional to the tracking sampling rate per unit time of each process. However, important events can be missed at lower sampling rates and lower transmission loads, and higher sampling rates require an acceptable corresponding performance loss. In the process of deploying variable sampling, we parameterize the sampling rate, instead of using a uniform sampling scheme, we use a sampling expectation rate to identify the tracking of sampling in unit time. In this way, low flow and low load will automatically increase the sampling rate, while high flow and high load will reduce the sampling rate, so that the loss is always under control. The actual sampling rate used is recorded along with the trace itself, which facilitates accurate analysis of troubleshooting from the trace data.

The in-ant version of SOFATracer has a single, all-or-nothing sampling strategy, plus the ability to turn off components; In the case of high concurrency, on the one hand, the loss can be reduced by allowing queue data to be discarded; on the other hand, the total amount of data collected can be reduced by closing the buried points of some components on the link.

SOFATracer’s sampling design does not provide a set of preset sampling strategies like Jaeger’s. SOFATracer by default only provides a strategy of sampling by percentage, and then gives users flexibility to scale by exposing an extensible SPI.

Mon SOFATracer based on com.alipay.com. The tracer.. Core samplers. The sampler executed SamplerFactory link data sampling basic process:

  1. The link tracker is constructed, and the Sampler of specified policy is generated through the SamplerFactory configuration according to the fully qualified name of the self-defined sampling rule implementation class. The sampling mode based on user extension implementation has a high priority, and the default sampling policy is the sampling calculation rule based on fixed sampling rate.
  2. Reporter data report reportSpan or link span SofaTracerSpan call the sample method to check whether the link needs sampling and obtain the SamplingStatus. SamplingStatus whether the sampling identifier is isSampled.

Sampling is determined by the first node

Link transparent transmission (cross-process)

Follow Opentracing Inject and Extract; The Inject and extract of Opentracing is a data format type that provides passthrough, such as Text Map or binary. These are called Carrier formats.

In addition to the format, data is organized, and the data that needs to be propagated is organized in the SpanContext data model.

Finally, there is pass-through, which in HTTP’s case serializes the SpanContext (JSON or hessian or whatever) and then passes it downstream into an HttpHeader.

Thread passing and passthrough

Thread passing is based on a single thread context and thread passthrough refers to cross-thread context.

Support for thread delivery in the Opentracing 0.30.x release

It is worth mentioning that Opentracing 0.30.x has a great design for thread passthrough. Again, take this picture

Span-1 and SPan-2 are nested relationships (here we are based on in-process span relationships); If the model is understood as a stack, then the creation of two spans is the process of pushing, as follows:

Since the stack is FILO, spA-2’s life cycle ends when spA-2 goes off the stack, triggering the reporting of SPAN data. A encapsulation of the above ideas is available in the Opentracing 0.30.x release to address the issue of Span passing in threads. The two core interfaces are Scope and ScopeManager.

Thread delivery in SOFATracer

Thread delivery of tracerContext is also based on ThreadLocal, which is not fundamentally different from OT, except that OT has a more global perspective. (SOFATracer is based on Opentracing 0.22.0, when 0.30.x was not available)

Thread passthrough in SOFATracer

ThreadLocal loses tracer context data when it involves cross-threading, so you need to be careful when dealing with this problem:

1. Forget to pass the parent thread’s tracerContext through to the child thread (link loss)

2. After passing through the child thread, the tracerContext is not cleared when the child thread ends (OOM risk)

TracerContext reuse (link pollution)

Solution in the SOFATracer: www.sofastack.tech/projects/so…

Performance evaluation

Here, it is necessary to clarify the additional consumption of a distributed link component:

1. The generation of tracerId, in real time, proves that this is indeed a time-consuming action, but it is generally generated by traffic entry services (such as gateways).

2. Span creation (memory generates a new object)

Serialization and deserialization of pass-through data

4. Data falling or data reporting

The consumption of the above four points is inevitable for any trace products. In essence, there is little difference between products 1-3, and the main method of point 4 is:

  • The sampling
  • Compressed gzip
  • Asynchronous batch reporting (message or GRPC)
  • Allowed loss degradation

The data analysis

The role of link data is not described here, but only a statistical log data provided by SOFATracer.

Statistics log: statistics logs generated at intervals. The current interval is 60 seconds

This statistical log is essentially a form of metric, similar to a Timer, which is the total count over a period of time. The current traffic monitoring of ants relies on statistical logs.

Distributed link component development and thinking

Cloud native direction, Jaeger has been added to the CNCF Foundation has been proven, skywaking also supports IStio monitoring; In both cases, some Angle seeks to achieve buried functionality without modifying the application (whether agent or interceptor); From the perspective of the overall development direction of the community, the further subsidence of infrastructure advocated by cloud native must also mean a new iteration of distributed link tracking system, such as interception at the network layer based on Service Mesh. Proxy traffic in the form of Sidecar (in fact, a lot of service governance capabilities are now done in Sidecar) to make it truly cross-lingual and non-intrusive.

From the end of 2018 to November 11, 2012, ant’s core business links have all been connected to MOSN (ant’s self-developed traffic proxy components, similar to envoy functions). On the link buried points, the GO language version of SOFATracer has been implemented, which is compatible with OpenTracing specifications. The link buried point in MOSN is represented in the form of SIDecAR.

conclusion

This paper introduces the background of distributed link tracking from the problem investigation and fault location in the actual operation of distributed system, and makes a simple explanation to the existing distributed link tracking products and modern distributed link tracking theory. From Dapper to Opentracing, the continuous evolution from theory to practice enables the community to produce a batch of excellent distributed link tracking products for simple comparison. Finally, taking SOFATracer as an example, based on chivalrous classification, this paper decomposed the points that a distributed link component needs to pay attention to and gave the implementation of SOFATracer. At the end of the paper, we briefly discussed the future development and thinking of distributed link components. Cloud native is no longer a strange topic. From micro-services to cloud native, the most important thing is the sinking of infrastructure.

reference

A review of distributed tracking technologies

📎 Magpie_Online_Modelling_and_Performance – aware_Syst. PDF

Storage.googleapis.com/pub-tools-p…

Bigbully. Making. IO/Dapper – tran…

www.4k8k.xyz/article/xvs…

Medium.com/opentracing…