📄

Article | Zhao Chen (SOFA Open Source Summer Link Project Team)

Master of Computer Engineering in Wuhan University of Technology

Research direction: Automatic coloring of Thangka line manuscript

Proofreading | SOFATracer Commiter

Read this article in 6971 words in 18 minutes

â–¼

The back view

I was lucky to participate in the Open source Software Supply Chain Lighting Program — an open source project supported by summer 2021. At present, SOFATracer has been able to report buried point data to Zipkin. The main goal of this project is to report the generated buried point data to Jaeger and SkyWalking for visual display.

PART. 1 SOFATracer

SOFATracer is a distributed link tracking system developed by Ant Group based on the OpenTracing specification. Its core idea is to connect the same request distributed on each service node through a global TraceId. The unified TraceId is used to record all network calls in the invocation link in logs to achieve the perspective of network calls. The link data can be used for fault discovery and service management.

SOFATracer provides log printing capabilities for asynchronous landing disks and the ability to report link tracking data to Zipkin for distributed link tracking display. The mission of the open Source Summer was to submit link-tracking data to Jaeger and SkyWalking for presentation.

SOFATracer data report

Span# Finish is the last execution method of the SPAN life cycle, which is the entry point of the whole data reporting. SOFATracer’s Report SPAN method contains two parts: the link reporting end and the log falling disk. Note that SOFATracer does not separate reporting data collectors from log listeners, but calls SOFATracer#invokeReporListeners before logs fall. Found in the system all realized and joined SpanReportListenersHolder SpanReportListener interface instance, call its onSpanReport method link report data to the data collector. The code snippet below is an implementation of the invokeReportListeners method.

protected void invokeReportListeners(SofaTracerSpan sofaTracerSpan) { List<SpanReportListener> listeners = SpanReportListenerHolder .getSpanReportListenersHolder(); if (listeners ! = null && listeners.size() > 0) { for (SpanReportListener listener : listeners) { listener.onSpanReport(sofaTracerSpan); }}}Copy the code

Instances in SpanReportListenerHolder are added at project startup and can be divided into Spring Boot applications and Spring applications:

  • In the Spring the Boot applications will automatically configure class SOFATracerSpanRemoteReporter all current SpanReportListener save type of bean instance to SpanReportListenerHolder List object. Instance objects of SpanReportListener are injected into the IOC container in their respective AutoConfiguration auto-configuration classes.

  • By implementing the Spring-provided bean lifecycle interface InitializingBean in Spring applications, Instantiate the SpanReportListener instance object in the afterPropertiesSet method and add it to the SpanReportListenerHolder.

To achieve the goal of the trace data uploaded to the Jaeger and SkyWalking SOFATracer need to implement SpanReportListener interface and the corresponding instance in application startup to join SpanReportListenersHolder In the.

PART. 2 Jaeger data reporting

The following is a partial diagram of Jaeger data reporting, where the CommandQueue holds the refresh or add instructions, the producer is the sampler and flush timer, and the consumer is the queue processor. The sampler determines that a span needs to be reported and adds a AppendCommand to the CommandQueue. The Flush timer keeps adding FlushCommands to the queue based on the set flushInterval. The queue handler continuously reads instructions from the CommandQueue to determine whether they are AppendCommand or FlushCommand. If the flush instruction sends data from the current byteBuffer to the receiving end, Add the span to byteBuffer if it is an add instruction.

In the process of reporting to Jaeger, the main work is the conversion of Jaeger Span and SOFATracer Span model. After the conversion, the above logic is used to send Span to the back end.

The UML diagram of the Sender in Jaeger shows that there are two types of Sender, HTTPSender and UDPSender. Send data to application HTTP and UDP respectively, and use UDPSender to send SPAN data to Jaeger Agent in implementing SOFATracer report Jaeger. Use HTTPSender to send data directly to jaeger-Collector.

Jaeger Span and SOFATracer Span model conversion

Model transformation comparison

Treatment of TraceId and SpanId

TraceId conversion:

  • The TracerId generation rule in SOFATracer is as follows: server IP address + ID generation time + increment sequence + current process number

Such as: 0 ad1348f1403169275002100356696 before 8 0 ad1348f TraceId IP of the machine, it is a hexadecimal number, each one section in the two representative IP, we put the number, The common IP address representation 10.209.52.143 can be obtained by converting every two bits to base 10. You can also find the first server that the request passed through according to this rule. The next 13 digits 1403169275002 indicate the time when the TraceId is generated. The next four digits, 1003, are a self-increasing sequence, rising from 1000 to 9000 and back to 1000 after 9000. The last five digits of 56696 are the current process ID. To prevent TraceId conflicts between multiple processes in a single machine, the current process ID is added to the end of the TraceId. — TraceId and SpanId generation rules

In SOFATracer, TraceId is a String, but in Jaeger, TraceId is two Long integers that form the final TraceId.

The solution

In Jaeger, TraceIdHigh and TraceIdLow are internally converted to String. TraceIdAsString converts the two ids to corresponding ids during the concatenation process HexString, 0 is added to the header when HexString falls short of 16 bits.

    StringBuilder builder = new StringBuilder(desiredLength);
    int offset = desiredLength - id.length();

    for (int i = 0; i < offset; i++)
        builder.append('0');
    builder.append(id);
    return builder.toString();
}
Copy the code

The transformation of the SpanId

  • The problem is that SpanId is a Long integer in Jaeger and a String in SOFATracer.

  • The solution to this problem, like the previous solution translated to SpanId in Zipkin, uses FNV Hash to map strings to less conflicting longs.

Two upload methods

Cooperate with Jaeger Agent

The Jaeger agent is a network daemon that listens for spans sent over UDP, which it batches and sends to the Collector. It is designed to be deployed to all hosts as an infrastructure component. The agent abstracts the routing and discovery of the Collectors away from the client.

The Jaeger Agent is designed to be deployed as a basic component on the host that decouple the task of routing and discovering the Collector from the client. The Agent can only accept Thrift data sent over UDP, so to use the Jaeger Agent you need to use UDPSender.

Report to the Collector using HTTP

When the Jaeger Agent is reported to the Collector using UDP, the Jaeger Agent should be deployed on the server to ensure that the data is not lost during transmission. However, if the preceding requirements cannot be met, the Jaeger Agent can be directly sent to the Collector using HTTP. You use HTTPSender.

PART. 3 SkyWalking data reporting

SkyWalking is an application performance monitoring tool for distributed systems designed for microservices, cloud native architectures, and container-based architectures, providing an all-in-one solution for distributed tracking, service grid telemetry analysis, measurement aggregation, and visualization. SkyWalking uses bytecode injection to make code non-intrusive and performs well. SkyWalking’s receiver-Trace module accepts Trace data in SkyWalking format via gRPC and HTTPRestful services. In SkyWalking, the selected reporting mode is HTTPRestful.

Model transformation comparison

The conversion of SegmentId, SpanId and PatentSpanID

In SOFATracer SpanId is a string, but in SkyWalking SpanId and ParentSpanId are int integers and spanids in each segment are numbered from 0. The maximum SpanId is specified by the maximum number of spans in a segment configured. The SpanId needs to be specified during the transformation, because there is now only one SPAN in each segment, so the ID of the span in the segment generated by the transformation can be fixed to 0.

SegmentId is used to uniquely identify a segment. If the SegmentId is the same, the previous segment will be overwritten by the following segment and the span will be lost. The last segmentId used is constructed as segmentId = traceId + SpanId hash + 0/1, where 0 and 1 represent server and client respectively. The last reason to add client and server is because there is a server -> server situation in Dubbo and SOFARPC, The SpanId and parentId of the client, server span, and RPC call are the same. You need to distinguish the SpanId and parentId from the client. Otherwise, the span of the client will be overwritten.

Dubbo and SOFARPC’s treatment

The basic model is client-server-client-server-. In Dubbo and SOFARPC, there is a server -> server situation, where client span and server span are the same except that the type is different.

  • parentSegmentId

To find parentSegmentId, in the case of non-sofarpc and Dubbo, follow server -> client, client -> server which is the parent spa of client can only be of server type, The parent span of the server type can only be empty or client type. The conversion mode is in SOFARPC and Dubbo. According to the link display of the two when SkyWalking Java Agent is used to report, the conversion is as follows:

Server span: parentSegmentId = traceId + parentId hash + client(1)

Client span: parentSegmentId = traceId + parentId hash + server(0)

Server span: parentSegmentId = traceId + spanId hash + client(1)

Client span: parentSegmentId = traceId + parentId hash + server(0)

  • Field and networkAddressUsedAtPeer field:

Peer field

In Dubbo, the Peer field can be composed of two tags: remote. Host and remote. Port SOFARPC contains IP and port in remote. This is because the span reported by the server cannot be used by the client.

networkAddressUsedAtPeerDubbo

SOFARPC cannot obtain the local IP address from the SPAN. Instead, SOFARPC obtains the first valid IPv4 address of the local server, but does not have a port number. Therefore, only the IP address is used in the peer field.

Display the topology view

During link construction, the key fields are peer, networkAddressUsedAtPeer, parentService, parentServiceInstance, and parentEndpoint. Peer and networkAddressUsedAtPeer indicate the Peer address and the address used by the client to invoke the current instance respectively. These two fields connect instances on the link. If these two fields are missing, the link will be disconnected. These two fields are obtained during conversion by looking for or getting the first valid IPv4 address of the host in the span tag. The last three fields indicate the parent instance node. If these fields are not set, an empty instance is generated, as shown in the following figure. Only TraceIdSpanId, parentId, sysBaggage and bizBaggage can be propagated in the context of SOFATracer. Seven fields have been added to the SOFATracer context in order to show the topology Service, serviceInstance, endpoint, parentService, parentServiceInstance, parentEndpoint, and peer you can obtain information about the parentService during conversion.

Asynchronous upload

The segment data in Json format is reported to the backend using HTTP. The unit of the report is Message. Multiple segments are combined into one message.

The process is as shown in the figure below. After the span is over, the transformed segment is added to the segment buffer array, and another thread keeps refreshing data into the array to Message. When the size of Message reaches the maximum or the waiting time for sending reaches the set value, the data is sent once. The default maximum message setting is 2MB.

PART four pressure measurement

Test configuration

  • Windows 10

  • Memory 16G

  • Disk 500GB SSD

  • Intel(R) Core(TM) i7-7700HQ CPU @2.80GHz 2.80GHz

The test way

Deploy an invocation link with six services. Three control groups were set:

  • Do not collect span

  • 50% collection

  • Full amount collected

Jaeger test results

Several parameters in the test are set as follows:

Jaeger Agent way

Full amount collected

50% collection

Don’t collect

Report the Jaeger Collector

Full amount collected

50% collection

Don’t collect

SkyWalking test results

The complete collection

50% collection

Don’t collect

Test summary

In the case of full sampling, the native throughput rate of SkyWalking reporting is the lowest, only 512.75/ SEC, which is about 14% lower than the throughput rate of Jaeger Agent reporting and 11.89% lower than the throughput rate of Jaeger Agent uploading. For each method, compare the change of throughput rate with full sampling and without sampling: Jaeger Agent was reported for 14.6% full sample throughput reduction, Jaeger Collector was reported for 17% full sample throughput reduction, and SkyWalking was reported for approximately 23% full sample throughput reduction.

Link visualization for SOFATracer will be available in the next release.

“Harvest”

Very lucky to participate in the open source summer activities, in the process of reading SOFATracer source code to learn a lot of excellent design ideas and implementation, the implementation of the process will go to imitate some source code implementation in the process of their own learning a lot. In the process of project implementation, I also found some problems of my own. For example, WHEN solving problems, I started with some ideas without digging into whether this idea was feasible. This bad habit wasted a lot of time. This is the first time for me to participate in relevant activities of the open source community, during which I have learned the operation mode of the open source community. I will work harder to improve my code ability in the future learning process and try to make some contributions to the open source community.

In particular, I would like to thank Mr. Song Guolei for his patient guidance. During the project, Mr. Song helped me solve many doubts and learned a lot. I would like to thank The SOFAStack community for their help during the whole process and the host for providing me with a platform.

“References”

  1. The ant group distributed link tracking component SOFATracer | anatomy data reporting mechanism and source analysis

  2. Use SkyWalking to achieve full link monitoring

  3. Zipkin-SkyWalking Exporter

  4. STAM: Automatic topology detection method for large distributed application systems

Recommended Reading of the Week

  • Scaling the peak – Ant Group large-scale Sigma cluster ApiServer optimization practice

  • SOFAJRaft’s practice in simultaneous travel

  • Ant Group technical Risk Coding Platform Practice (MaaS)

  • The next Kubernetes frontier: multi-cluster management