background

Sleuth provides a complete suite of service tracking solutions, including link tracking, performance analysis, and in distributed systems, SleUTH is responsible for monitoring and Zipkin is responsible for presentation. In distributed systems, sleUTH can be used to connect the requested nodes to facilitate problem location and troubleshooting. But a problem has recently been discovered in the use of Sleuth.

As shown in the figure, after the external request passes through the Gateway, the Gateway sends the request to the back-end service through the OKhttp utility class. After a request is sent to the Gateway, sleUTH automatically generates a traceId. However, after the request is sent to the back-end server serverA or serviceB, the traceId generated in the Gateway is replaced by a new traceId. The traceId returned by the client cannot trace the entire invocation process. However, the request traceId that was invoked through feign was retained, indicating that sleuth itself was not a problem.

Problem analysis

TraceFilter screening

The traceId returned by the client is in an httpResponse. First check the TraceFilter.

@Component
@Order(TraceWebServletAutoConfiguration.TRACING_FILTER_ORDER + 1)
public class TraceFilter extends GenericFilterBean {

    private final Tracer tracer;

    TraceFilter(Tracer tracer) {
        this.tracer = tracer;
    }

    @Override
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
        throws IOException, ServletException {
        Span currentSpan = this.tracer.currentSpan();
        if (currentSpan == null) {
            chain.doFilter(request, response);
            return;
        }
        traceId = currentSpan.context().traceIdString();
      
        ((HttpServletResponse) response).addHeader("TRACE-ID", traceId); chain.doFilter(request, response); }}Copy the code

TraceFilter simply sets the traceId generated in Tracer to an httpResponse and is certainly not the problem. But there is a TraceWebServletAutoConfiguration annotations, the annotations is stem what of, understand this annotation to us through the query related information sleuth today’s leading role.

Sleuth traceId Tracing principle

There are many materials about SLEUTH on the Internet, but most of them are about how SLEUTH is used. This official document is actually quite well summarized and of little value. If FEIGN can be invoked, sleUTH supports FEIGN. It was natural to wonder if Feign could support OKHTTP. A lap down, search on the net sleuth component has a lot of support, common feign, RGPC, zuul, redis, messaging. Httpclient is a kinds of support resttemplate webclient, nettyhttpclient, HttpClientBuilder etc., but regret to say that does not support OKHttp, do OKHttp can’t use sleuth. Looks like we need to see sleuth’s source code.

Sleuth source analysis

The basic principle of Spring-Cloud sleuth traceId is implemented through the Filter of servlets. No matter how many layers are encapsulated externally, the basic principle will not change. Find the TracingFilter class directly in the source code. Analyze the implementation of the class.

 public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
      throws IOException, ServletException {
    HttpServletRequest httpRequest = (HttpServletRequest) request;
    HttpServletResponse httpResponse = servlet.httpResponse(response);

    // Prevent duplicate spans for the same request
    TraceContext context = (TraceContext) request.getAttribute(TraceContext.class.getName());
    if(context ! =null) {
      // A forwarded request might end up on another thread, so make sure it is scoped
      Scope scope = currentTraceContext.maybeScope(context);
      try {
        chain.doFilter(request, response);
      } finally {
        scope.close();
      }
      return;
    }

    Span span = handler.handleReceive(extractor, httpRequest);

    // Add attributes for explicit access to customization or span context
    request.setAttribute(SpanCustomizer.class.getName(), span.customizer());
    request.setAttribute(TraceContext.class.getName(), span.context());

    Throwable error = null;
    Scope scope = currentTraceContext.newScope(span.context());
    try {
      // any downstream code can see Tracer.currentSpan() or use Tracer.currentSpanCustomizer()
      chain.doFilter(httpRequest, httpResponse);
    } catch (IOException | ServletException | RuntimeException | Error e) {
      error = e;
      throw e;
    } finally {
      scope.close();
      if (servlet.isAsync(httpRequest)) { // we don't have the actual response, handle later
        servlet.handleAsync(handler, httpRequest, httpResponse, span);
      } else { // we have a synchronous response, so we can finish the spanhandler.handleSend(ADAPTER.adaptResponse(httpRequest, httpResponse), error, span); }}}Copy the code

Span Span = handler.handleReceive(extractor, httpRequest); . Keep following.

  /** Creates a potentially noop span representing this request */
  Span nextSpan(TraceContextOrSamplingFlags extracted, Req request) {
    Boolean sampled = extracted.sampled();
    // only recreate the context if the http sampler made a decision
    if (sampled == null&& (sampled = sampler.trySample(adapter, request)) ! =null) {
      extracted = extracted.sampled(sampled.booleanValue());
    }
    returnextracted.context() ! =null
        ? tracer.joinSpan(extracted.context())
        : tracer.nextSpan(extracted);
  }
Copy the code

Context () is not empty, joinSpan is essentially reusing traceId from Httpreqeuest, otherwise a new span is created. So the key follow TraceContextOrSamplingFlags creation process.

  static final class ExtraFieldExtractor<C.K> implements Extractor<C> {
    final ExtraFieldPropagation<K> propagation;
    final Extractor<C> delegate;
    final Propagation.Getter<C, K> getter;

    ExtraFieldExtractor(ExtraFieldPropagation<K> propagation, Getter<C, K> getter) {
      this.propagation = propagation;
      this.delegate = propagation.delegate.extractor(getter);// Proxy creation
      this.getter = getter;
    }
Copy the code

The real class creation is B3Propagation

@Override public TraceContextOrSamplingFlags extract(C carrier) { if (carrier == null) throw new NullPointerException("carrier == null"); // try to extract single-header format TraceContextOrSamplingFlags extracted = singleExtractor.extract(carrier); if (! extracted.equals(TraceContextOrSamplingFlags.EMPTY)) return extracted; // Start by looking at the sampled state as this is used regardless // Official sampled value is 1, though some old instrumentation send true String sampled = getter.get(carrier, propagation.sampledKey); Boolean sampledV = sampled ! = null ? sampled.equals("1") || sampled.equalsIgnoreCase("true") : null; boolean debug = "1".equals(getter.get(carrier, propagation.debugKey)); String traceIdString = getter.get(carrier, propagation.traceIdKey); // It is ok to go without a trace ID, if sampling or debug is set if (traceIdString == null) return TraceContextOrSamplingFlags.create(sampledV, debug); // Try to parse the trace IDs into the context TraceContext.Builder result = TraceContext.newBuilder(); if (result.parseTraceId(traceIdString, Propagation. TraceIdKey) // Check whether the request contains X-B3-traceid && result.parseSpanId(getter, Carrier, SpanIdKey) // Check whether the request contains x-b3-SPANId && result.parseParentId(getter, carrier, propagation.parentSpanIdKey)) { if (sampledV ! = null) result.sampled(sampledV.booleanValue()); if (debug) result.debug(true); return TraceContextOrSamplingFlags.create(result.build()); } return TraceContextOrSamplingFlags.EMPTY; // trace context is malformed so return empty } } }Copy the code

Under the condition of the above three conditions are met will create new TraceContextOrSamplingFlags, otherwise returns TraceContextOrSamplingFlags. EMPTY. So the solution is clear, as long as the Request contains X-B3-SPANID, X-B3-PARENtSPANID, and X-B3-parentSPANID, it should solve the traceId loss problem.

The solution

Based on the above analysis, the solution is as follows:

  1. Request summary for tracer.
Tracer tracer = Tracing.currentTracer()
Copy the code
  1. X-b3-spanid, X-B3-PARENtSPANID, and X-B3-PARENtSPANID are added to the header before okHTTP sends the request
private void addTracer(Map<String, String> headers, Tracer tracer) { TraceContext context = tracer.currentSpan().context(); headers.put("X-B3-TraceId",context.traceIdString()); headers.put("X-B3-SpanId",context.spanIdString()); if(StringUtils.isBlank(context.parentIdString())){ headers.put("X-B3-ParentSpanId",context.spanIdString()); }else{ headers.put("X-B3-ParentSpanId",context.parentIdString()); }}Copy the code
  1. Verify that the modification results meet expectations.

summary

  1. Sleuth does not support httpClient should be able to use this method to resolve the problem.
  2. This is really not an elegant way to solve the problem, and a better solution would be to encapsulate the OKHttp call by referring to the restTemplate or some other component’s integration.