Tempo is a new back-end service for distributed tracking that was opened to the public by Grafana Labs at ObservabilityCON 2020. Like Cortex and Loki, Tempo is a system that combines high scalability with low cost effectiveness. As mentioned earlier, the only part of Grafana Labs cloud-native Observability universe is trace, so today let’s experience the last part of this Observability with Loki’s distributed trace.

About the Tempo

Tempo is essentially a storage system that works with a number of open source Trace protocols (including Jaeger, Zipkin, OpenCensus, etc.) and stores them in cheap S3 storage, TraceID is used in collaboration with other monitoring systems such as Loki and Prometheus.

You can see that Tempo’s architecture is still divided into Distributor, Ingester, Querier, Tempe-Query, and CompActor, and those familiar with Loki and Cortex may know what they do by name. For those who are not familiar with it, let’s simply say the functions of each module:

  • distributor

Start multiple ports that accept data from the Jaeger, Zipkin, and OpenCensus protocols, hash it according to TraceID, map it to a hash ring, and send it to Ingester for storage. The current DISTRIBUTOR supports the following trace protocols:

Protocol Port
OpenTelemetry 55680
Jaeger – Thrift Compact 6831
Jaeger – Thrift Binary 6832
Jaeger – Thrift HTTP 14268
Jaeger – GRPC 14250
Zipkin 9411
  • ingester

Specifically responsible for block storage (Memcache, GCS, S3), cache (memcache) and index processing of trace data

  • querier

Is responsible for retrieving trace data from Ingester and back end storage, and provides apis to the inqueriers

  • compactor

Compress back-end storage blocks to reduce the number of data blocks

  • tempo-query

A visual interface for Tempo, using the Jaeger Query, allows you to query tempo trace data.

Loki link tracing

For example, “Docker-compose” is the “Compose” of docker-compose on GitHub: github.com/CloudXiaoba…

Loki aspects

Before we do that, let’s take a look at what the Loki documentation says:

The tracing_config block configures tracing for Jaeger. Currently limited to disable auto-configuration per environment variables only.

You can see that Loki’s current support for Trace is focused on Jaeger, that configuration is turned on by default, and that Jaeger information can only be read in environment variables. Example for docker-compose:

Querier -frontend: image: grafana/ Loki :1.6.1 runtime: runc scale: 2 environment: - JAEGER_AGENT_HOST = tempo \ \ tempo address - JAEGER_ENDPOINT = http://tempo:14268/api/traces - JAEGER_SAMPLER_TYPE = const type \ \ sampling rate -jaeger_sampler_param =100 \\ Sample rate 100Copy the code

API Gateway

The API gateway is not a native component of Loki, but in the case of distributed Loki deployment, there needs to be a unified gateway to route the interface. Nginx does not support OpenTracing. According to nginx1.14, white made a mirror with jaeger module for Loki entry trace generation and log collection.

Gateway: image: quay. IO/cloudxiaobai/nginx - opentracing: 1.14.0 runtime: runc restart: always ports: - 3100:3100 volumes: - ./nginx.conf:/etc/nginx/nginx.conf - ./jaeger-config.json:/etc/jaeger-config.json - 'gateway_trace_log:/var/log/nginx/'Copy the code

For Nginx that supports OpenTracing, we need to modify the nginx.conf configuration file as follows:

. # OpenTracing library load_module modules/ngx_http_opentracing_module.so; HTTP {# openTracing openTracing on; # loading jaeger library opentracing_load_tracer/usr/local/lib/libjaegertracing_plugin. So/etc/jaeger - config. Json; Opentracing '" traceid ":"$openTracing_context_uber_trace_id "'; server { listen 3100 default_server; ($uri) {opentracing_operation_name $URI; opentracing_trace_locations off; opentracing_propagate_context; proxy_pass http://querier:3100/ready; }}}Copy the code

For more information, see docker-compose for nginx.conf

In addition, nginx needs a jaeger-config.json to forward trace data to the Agent for processing.

{" service_Name ": "gateway", \\ service Name "diabled": false, "Reporter ": {"logSpans": true, "localAgentHostPort": "Jaeger-agent :6831" \\jaeger-agent address}, "sampler": {"type": "const", "param": "100" \\ sampling rate}}Copy the code

To facilitate the demonstration, the sampling rate of xiaobai configuration is 100%

Finally, we enable a Jaeger-agent for the API gateway to collect trace information and forward it to Tempo, which is configured as follows:

Jaeger-agent: image: jaegertracing/jaeger-agent:1.20 runtime: runc restart: always ["--reporter.grpc.host-port=tempo:14250"] ports: - "5775:5775/udp" - "6831:6831/udp" - "6832:6832/udp" - "5778:5778"Copy the code

Why does API gateway not directly send to Tempo through Jaeger-agent? Xiaomai thinks that using agent is more flexible.

We have completed the configuration of Loki distributed tracking. Next, we will use docker-compose up-d to run all the services.

Grafana aspects

When all of Docker’s services are up and running, we access Grafana and add two data sources

  • Adding tempo Data source

  • Add Loki data source and parse API gateway TraceID

Loki extracts the regular part of TraceID from the API gateway log to match

Experience the Tempo

Select Loki from Explore and query trace.log to get the API gateway log.

From Parsed Fields we can see that Grafana extracted the 16-bit string from the API gateway log as TraceID, which is associated with the Tempo data sourceTempoButton to cut directly to Trace information like this:

Expanding the Trace information, we can see that the link of a Loki query goes through the following sections

gateway -> query-frontend -> querier -> ingester
                                    |-> SeriesStore.GetChunkRefs
Copy the code

It is concluded that the query time is mainly in Ingeter, because the queried logs have not been flushed into the storage, and querier needs to fetch the log data from Ingester.

Let’s look at another example of Loki receiving logs:

According to the trace link, when the log collector posts logs to Loki, the requested link passes through the following parts:

gateway -> distributor -> ingester
Copy the code

We also saw that the committed log flow was processed by the two Ingester instances with no significant difference in processing time.

conclusion

Logging and Tracing are not quite as smooth as ObservabilityCON 2020 yet, but according to information from the meeting, A more detailed collaboration between trace <–> log, metrics <–> trace and metrics <–> log should be released soon. Grafana will be the king of observable applications in the cloud (despite the additional cost of learning various query statements)


Experience a better cloud native log query system? Please check out our open source project Dagger

Github.com/CloudmindsR…


Follow the public account “Cloud native xiao Bai” on wechat, reply [Enter group] and enter Loki learning group