A series of

  • 1 minute Quick use of The latest version of Docker Sentry-CLI – create version
  • Quick use of Docker start Sentry-CLI – 30 seconds start Source Maps
  • Sentry For React
  • Sentry For Vue
  • Sentry-CLI usage details
  • Sentry Web Performance Monitoring – Web Vitals
  • Sentry Web performance monitoring – Metrics
  • Sentry Web Performance Monitoring – Trends
  • Sentry Web Front-end Monitoring – Best Practices (Official Tutorial)
  • Sentry Backend Monitoring – Best Practices (Official Tutorial)
  • Sentry Monitor – Discover Big data query analysis engine
  • Sentry Monitoring – Dashboards Large screen for visualizing data
  • Sentry Monitor – Environments Distinguishes event data from different deployment Environments
  • Sentry monitoring – Security Policy Reports Security policies
  • Sentry monitoring – Search Indicates the actual Search
  • Sentry monitoring – Alerts Indicates an alarm
  • Sentry Monitor – Distributed Tracing

Welcome to part 1 of our series on Distributed Tracing for full-stack developers. In this series, we’ll learn the details of distributed tracing and how it can help you monitor the increasingly complex requirements of full-stack applications.

In the early days of the Web, writing Web applications was simple. Developers use languages such as PHP to generate HTML on the server, communicate with a single relational database such as MySQL, and most of the interaction is driven by static HTML form components. While the debugging tools are primitive, understanding the flow of code execution is simple.

In today’s modern Web stack, it’s nothing. Full-stack developers need to write JavaScript that executes in browsers, interoperate with multiple database technologies, and deploy server-side code on different server architectures (e.g., Serverless). Without the right tools, it’s nearly impossible to understand how user interactions in the browser relate to 500 Server errors deep in the server stack. Enter: distributed tracing.

I’m trying to explain the bottleneck in my Web stack for 2021.

Distributed tracing is a monitoring technique that links operations and requests that occur between multiple services. This allows developers to “trace” the path of an end-to-end request as it moves from one service to another, allowing them to pinpoint errors or performance bottlenecks in individual services that negatively affect the entire system.

In this article, we’ll learn more about the concept of distributed tracing, look at examples of end-to-end tracing in code, and see how trace metadata can be used to add valuable context to your logging and monitoring tools. When you’re done, you’ll not only know the basics of distributed tracing, but also how to apply tracing techniques to debug full-stack Web applications more effectively.

But first, let’s go back to the beginning: What is distributed tracking?

Distributed Tracking basics

Distributed tracing is a method of recording connection operations for multiple services. Typically, these operations are initiated by requests from one service to another, where a “request” can be an actual HTTP request, or it can be a job invoked through a task queue or some other asynchronous means.

Tracing consists of two basic components:

  • SpanDescribes the operations or operations that occur on the service"Work".SpanA wide range of operations can be described — for example, responsesHTTPThe request ofwebServer operations – can also describe calls to individual functions.
  • traceRepresents one or more connectionsspanEnd to end (end-to-end) journey. iftraceConnections performed on multiple servicesspan("Work"), thetraceThis is known as distributed tracking.

Let’s look at a hypothetical example of distributed tracing.

The figure above illustrates how Trace starts with a service (a React application running on a browser) and continues by calling the API Web Server and even further calling the background task worker. The SPAN in this figure is work executed in each service and each span can be “traced” to initial work started by a browser application. Finally, because these operations occur on different services, the trace is considered distributed.

Spans that describe a wide range of operations (for example, the full lifecycle of a Web server responding to an HTTP request) are sometimes referred to as Transaction Spans, or even just transactions. We’ll talk more about transactions vs. Spans (Transactions vs. Spans) in Part 2 of this series.

Trace and span identifiers

So far, we have identified the components to trace, but we have not yet described how the components are linked together.

First, each trace is uniquely identified by a Trace identifier. This is done by creating a unique, randomly generated value (UUID) in the root span — the initial action that starts the entire trace. In our example above, the root span appears in the browser application.

Second, each SPAN first needs to be uniquely identified. This is done by creating a unique span identifier (or SPAN_id) when the span begins its operation. This SPAN_ID creation should occur at each span (or operation) that occurs within the trace.

Let’s revisit our hypothetical trace example. In the figure above, you’ll notice that the trace identifier uniquely identifies the trace, and that each span in the trace also has a unique span identifier.

However, generating trace_id and SPAN_id is not enough. To actually connect to these services, your application must propagate a so-called trace context when making a request from one service to another.

Trace context

A trace context usually consists of just two values:

  • Trace identifier (ortrace_id) : A unique identifier generated in the root span to identify the entire trace. This is the same trace identifier we introduced in the previous section; It propagates to each downstream service in a constant manner.
  • Parent identifier (orparent_id) : produces the parent span of the current operationspan_id.

The following figure shows how a request initiated in one service propagates the trace context downstream to the next service. You’ll notice that trace_id remains the same, while parent_id changes from request to request, pointing to the parent span where the latest operation was started.

With these two values, for any given operation, you can identify the original root service and rebuild all parent/ancestor services in the order that led to the current operation.

Working Example (code demo)

Sample source code:

  • Github.com/getsentry/d…

To better understand this, let’s actually implement a basic trace implementation in which the browser application is the initiator of a series of distributed operations connected by the trace context.

First, the browser application renders a form: in this case, an “Invite user” form. The form has a submit event handler that fires when the form is submitted. Let’s think of this submission handler as our root span, which means that trace_id and SPAN_id are generated when the handler is called.

Next, do some work to collect user-entered values from the form, and then finally issue a FETCH request to our Web server to the /inviteUser API endpoint. As part of this FETCH request, the trace context is passed as two custom HTTP headers: trace-id and parent-ID (the span_id of the current span).

// browser app (JavaScript)
import uuid from 'uuid';

const traceId = uuid.v4();
const spanId = uuid.v4();

console.log('Initiate inviteUser POST request'.`traceId: ${traceId}`);

fetch('/api/v1/inviteUser? email=' + encodeURIComponent(email), {
   method: 'POST'.headers: {
       'trace-id': traceId,
       'parent-id': spanId,
   }
}).then((data) = > {
   console.log('Success! ');
}).catch((err) = > {
   console.log('Something bad happened'.`traceId: ${traceId}`);
});
Copy the code

Note that these are non-standard HTTP headers for illustrative purposes. As part of the W3C Traceparent specification, there is an active effort to standardize the tracing HTTP header, which is still in the Recommendation stage.

  • www.w3.org/TR/trace-co…

On the receiving side, the API Web Server processes the request and extracts tracing metadata from the HTTP request. It then queues up a job to send an email to the user and appends the trace context as part of the “meta” field in the job description. Finally, it returns a response with a 200 status code indicating that the method succeeded.

Note that while the server returned a successful response, the actual “work” is not complete until the background task worker picks up the newly queued job and actually sends the email.

At some point, the queue processor begins to process queued E-mail jobs. Again, traces and parent identifiers are extracted, just as they were in the earlier days of Web Server.

// API Web Server
const Queue = require('bull');
const emailQueue = new Queue('email');
const uuid = require('uuid');

app.post("/api/v1/inviteUser".(req, res) = > {
  const spanId = uuid.v4(),
    traceId = req.headers["trace-id"],
    parentId = req.headers["parent-id"];

  console.log(
    "Adding job to email queue".`[traceId: ${traceId}, `.`parentId: ${parentId}, `.`spanId: ${spanId}] `
  );

  emailQueue.add({
    title: "Welcome to our product".to: req.params.email,
    meta: {
      traceId: traceId,

      // the downstream span's parent_id is this span's span_id
      parentId: spanId,
    },
  });

  res.status(200).send("ok");
});

// Background Task Worker
emailQueue.process((job, done) = > {
  const spanId = uuid.v4();
  const { traceId, parentId } = job.data.meta;

  console.log(
    "Sending email".`[traceId: ${traceId}, `.`parentId: ${parentId}, `.`spanId: ${spanId}] `
  );

  // actually send the email
  // ...

  done();
});
Copy the code

Distributed system Logging

You’ll notice that at each stage of our example, a logging call is made using console.log, which also issues the current trace, SPAN, and parent identifiers. In a perfectly synchronized world — where every service could log into the same centralized logging tool — each of these logging statements would appear in sequence:

If an exception or error occurs during these operations, it is relatively simple to use these or additional logging statements to pinpoint the source. But the unfortunate reality is that these are distributed services, which means:

  • Web servers typically handle many concurrent requests. The Web server may be performing work attributed to other requests (and issuing logging statements).
  • The network delay affects the operation sequence. Requests from upstream services may not arrive at their destinations in the order in which they were triggered.
  • The backgroundworkerThere might be a linejob. The exact number of queues that arrive in this tracejobBefore,workerYou may have to finish what was previously queuedjob.

In a more realistic example, our log call might look like this, reflecting multiple operations happening at the same time:

Without tracking metadata, it is impossible to understand the topology of which actions invoke which actions. But by emitting trace meta information on each logging call, you can quickly filter all logging calls in the trace by filtering traceId, and reconstruct the exact order by checking the spanId and parentId relationships.

This is the power of distributed tracing: by attaching metadata that describes the current operation (span ID), the parent operation that produced it (trace ID), and the trace id, we can add logging and telemetry data to better understand the exact sequence of events occurring in distributed services.

In a real distributed tracking environment

Throughout this article, we’ve been using a somewhat artificial example. In a truly distributed trace environment, you do not manually generate and pass all span and trace identifiers. You also don’t rely on console.log (or other logging) calls to emit trace metadata yourself. You will use the appropriate trace library to handle the detection and send trace data for you.

OpenTelemetry

OpenTelemetry is a set of open source tools, apis, and SDKS for detecting, generating, and exporting telemetry data in running software. It provides language-specific implementations for most popular programming languages, including browser JavaScript and Node.js.

  • opentelemetry.io/
  • Github.com/open-teleme…

Sentry

Sentry uses this telemetry in a number of ways. For example, Sentry’s performance monitoring feature set uses trace data to generate waterfall diagrams that illustrate the end-to-end latency of distributed service operations in tracing.

Sentry also uses trace metadata to enhance its error monitoring capabilities to understand how errors triggered in one service (such as the server backend) propagate to errors in another service (such as the front end).