Spring Cloud Alibaba Series Sleuth (Link Tracking)

One, foreword

In a distributed system of large systems, the system is broken down into many modules. These modules are responsible for different functions, combined into a system that can ultimately provide rich functionality.

In this architecture, a single request often involves multiple services. Internet applications are built on different sets of software modules, which may be developed by different teams, implemented in different programming languages, and deployed on thousands of servers across multiple data centers, which means that there are some problems with this architecture:

  • How to find problems quickly?
  • How do I determine the scope of the fault?
  • How to sort out service dependencies and their rationality?
  • How to analyze link performance problems and plan real-time capacity?

To solve the above problems, Online Daniel has proposed a solution: Distributed link Tracing.

Distributed Tracing restores a Distributed request to a call link for logging, performance monitoring, and centralized display of calls of a Distributed request.

There are many landing schemes, we use Sleuth + Zinkin encapsulated by Spring Cloud. Since Spring Cloud Alibaba technology stack does not provide its own link tracking technology, we can use these two sets of components to achieve the link tracking solution.

Sleuth introduction

2.1 Basic Concepts

Spring Cloud Sleuth borrowed a lot from The design of Google Dapper. Let’s first understand the terminology and related concepts in Sleuth:

  • Span: represents a set of basic units of work. In order to count the latency of each processing unit, a unique identifier (SpanId) is also used to mark the start, progress, and end of a request when it arrives at each service component. The SpanId’s start and end timestamps count when the SPAN was called, as well as the name of the event.
  • Trace: A group of spans with the same Trace Id is connected together to form a tree structure. To achieve request tracing, when a request arrives at an entry endpoint in a distributed system, the service tracing framework only needs to create a unique identifier (TraceId) for the request, and the framework keeps passing the unique value as it flows through the distributed system until the entire request is returned. We can then use this unique identifier to concatenate all requests to form a complete request link.
  • Annotation: Use it to record events over a period of time. An important Annotation for internal use.
Cs (Client Send) The Client sends a request to start the life of a request. Sr (Server Received) The Server receives the request and processes it. Sr-cs = network delay (time of service invocation) Server Send (SS) Indicates the request processing time on the Server. Ss-sr = Client Reveived (CR) Indicates the request processing time on the Server. The Client receives the response from the Server, and the request ends. Cr-sr = Total time of requestCopy the code

2.2 Sleuth actual combat exercise

First, the project structure of the test is described:

The project name port describe
trace-test Pom project, parent factory
trace-common Jar project, common API project, including Model, Feign, sleuth
user-service 9001 User microservices, which rely on trace-common, are registered with nacOS
order-service 9002 Order microservices, which rely on trace-common, are registered with nacOS
gate-service 9090 Gateway microservices, services registered with NACOS

Readers who are not aware of gateways can browse through the portal first.

Test flow: The request order interface returns the order information and the user information associated with the order. The call chain is gateway-service -> order-service -> user-service.

Where the order microservice invokes the user microservice using the Open Feign component. In order to simplify the article, I only post some code:

@RestController
@RequestMapping("/order")
public class OrderController {

    @Autowired
    private UserServiceFeign userServiceFeign;

    private static Map<Integer, Order> orderMap;

    static {
        orderMap = new HashMap<>();
        orderMap.put(1.new Order(1.1.10.0));
        orderMap.put(2.new Order(2.2.20.0));
        orderMap.put(3.new Order(3.3.30.0));
    }

    @RequestMapping("/getOrderInfo/{orderId}")
    public Map<String, Object> getOrderInfo(@PathVariable Integer orderId) {
        Map<String, Object> result = new HashMap<>();
        // simulate a database query
        Order order = this.orderMap.get(orderId);
        if(order ! =null) {
            Integer userId = order.getUserId();
            User user = this.userServiceFeign.findById(userId);
            // Order information
            result.put("order", order);
            // User information
            result.put("user", user);
        }
        returnresult; }}Copy the code

To get information about the call chain, we need to add the Sleuth dependency, which is introduced in the Trace-common project:

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
Copy the code

The user, orders, gateway service launched, in turn order request interface: http://localhost:9090/order-api/order/getOrderInfo/1.

After the request result is successful, the call chain log can be seen on the console of the order microservice:

[] indicates microservice name, traceId, SPANID, and whether to output the link tracing result to the third-party platform.

Of course, we could print this information to a log file, but looking at a log file is not a good way to do it. When there are more and more micro services, there will be more and more log files, which is not conducive to our investigation. At this point, we need Zipkin to aggregate logs for visual presentation and full text retrieval.

Iii. Introduction to Zinkin

3.1 Basic Introduction

Zipkin is an open source project for Twitter, based on the Google Dapper implementation, which aims to collect timed data from the service to address latency issues in the microservice architecture, including data collection, storage, lookup, and presentation.

We can use it to collect the tracking data of the request link on each server, and through the REST API interface provided by it to assist us to query the tracking data to realize the monitoring program of the distributed system, so as to timely find the delay increase problem in the system and find out the root of the system performance bottleneck.

3.2 Architecture Introduction

The diagram above shows Zipkin’s infrastructure, which consists of four core components:

  1. Collector: A Collector component that processes trace information sent from external systems and converts it into a Span format processed internally by Zipkin for subsequent storage, analysis, presentation, and so on.
  2. Storage: A Storage component that processes trace information received by the collector. By default, this information is stored in memory. We can also modify this Storage policy to store trace information to the database by using other Storage components.
  3. RESTful apis: API components that provide external access interfaces. For example, to display tracking information to clients, or external system access to achieve monitoring, etc.
  4. Web UI: UI component, an upper application based on API components. Through UI components, users can easily and intuitively query and analyze tracking information.

In general, Zipkin is divided into two ends, one is Zipkin server and the other is Zipkin client (microservice project). The client will configure the URL address of the server. Once the call between services occurs, it will be monitored by the Sleuth listener configured in the microservice. The corresponding Trace and Span information is generated and sent to the server.

3.3 Zipkin combat exercise

  1. Download Zipkin:
https://search.maven.org/remote_content?g=io.zipkin&a=zipkin-server&v=LATEST&c=exec
Copy the code
  1. Start the Zipkin service
Java jar zipkin - server - 2.23.2 - exec. JarCopy the code
  1. Users, orders, and gateway microservices all add Zipkin client dependencies:
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-zipkin</artifactId>
</dependency>
Copy the code
  1. Users, orders, and gateway microservices all modify configurations:
spring:
  zipkin:
    base-url: http://127.0.0.1:9411/ After the interface is invoked, call information is sent to the Zipkin server via this URL
    discovery-client-enabled: false
  sleuth:
    sampler:
      probability: 1.0
Copy the code

Start the user, order, and gateway microservices in sequence. Request order interface: order interface: http://localhost:9090/order-api/order/getOrderInfo/1.

We visited Zipkin’s monitoring page: http://127.0.0.1:9411/, as shown below:

The chain of calls that just called the interface is recorded, click “show” to see the details.

3.4 Data Persistence

The Zipkin server saves tracking data information to memory by default, but this approach is not suitable for production environments. Fortunately, it supports persistence of trace data to a MySQL database or Elasticsearch.

If you want to know about other persistence methods, you can click on the resources provided at the end of the article.

  1. Create a library named Zipkin and execute the following SQL in the library:
CREATE TABLE IF NOT EXISTS zipkin_spans (
  `trace_id_high` BIGINT NOT NULL DEFAULT 0 COMMENT 'If non zero, this means the trace uses 128 bit traceIds instead of 64 bit',
  `trace_id` BIGINT NOT NULL,
  `id` BIGINT NOT NULL,
  `name` VARCHAR(255) NOT NULL,
  `remote_service_name` VARCHAR(255),
  `parent_id` BIGINT,
  `debug` BIT(1),
  `start_ts` BIGINT COMMENT 'Span.timestamp(): epoch micros used for endTs query and to implement TTL',
  `duration` BIGINT COMMENT 'Span.duration(): micros used for minDuration and maxDuration query'.PRIMARY KEY (`trace_id_high`, `trace_id`, `id`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED CHARACTER SET=utf8 COLLATE utf8_general_ci;

ALTER TABLE zipkin_spans ADD INDEX(`trace_id_high`, `trace_id`) COMMENT 'for getTracesByIds';
ALTER TABLE zipkin_spans ADD INDEX(`name`) COMMENT 'for getTraces and getSpanNames';
ALTER TABLE zipkin_spans ADD INDEX(`remote_service_name`) COMMENT 'for getTraces and getRemoteServiceNames';
ALTER TABLE zipkin_spans ADD INDEX(`start_ts`) COMMENT 'for getTraces ordering and range';

CREATE TABLE IF NOT EXISTS zipkin_annotations (
  `trace_id_high` BIGINT NOT NULL DEFAULT 0 COMMENT 'If non zero, this means the trace uses 128 bit traceIds instead of 64 bit',
  `trace_id` BIGINT NOT NULL COMMENT 'coincides with zipkin_spans.trace_id',
  `span_id` BIGINT NOT NULL COMMENT 'coincides with zipkin_spans.id',
  `a_key` VARCHAR(255) NOT NULL COMMENT 'BinaryAnnotation.key or Annotation.value if type == -1',
  `a_value` BLOB COMMENT 'BinaryAnnotation.value(), which must be smaller than 64KB',
  `a_type` INT NOT NULL COMMENT 'BinaryAnnotation.type() or -1 if Annotation',
  `a_timestamp` BIGINT COMMENT 'Used to implement TTL; Annotation.timestamp or zipkin_spans.timestamp',
  `endpoint_ipv4` INT COMMENT 'Null when Binary/Annotation.endpoint is null',
  `endpoint_ipv6` BINARY(16) COMMENT 'Null when Binary/Annotation.endpoint is null, or no IPv6 address',
  `endpoint_port` SMALLINT COMMENT 'Null when Binary/Annotation.endpoint is null',
  `endpoint_service_name` VARCHAR(255) COMMENT 'Null when Binary/Annotation.endpoint is null'
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED CHARACTER SET=utf8 COLLATE utf8_general_ci;

ALTER TABLE zipkin_annotations ADD UNIQUE KEY(`trace_id_high`, `trace_id`, `span_id`, `a_key`, `a_timestamp`) COMMENT 'Ignore insert on duplicate';
ALTER TABLE zipkin_annotations ADD INDEX(`trace_id_high`, `trace_id`, `span_id`) COMMENT 'for joining with zipkin_spans';
ALTER TABLE zipkin_annotations ADD INDEX(`trace_id_high`, `trace_id`) COMMENT 'for getTraces/ByIds';
ALTER TABLE zipkin_annotations ADD INDEX(`endpoint_service_name`) COMMENT 'for getTraces and getServiceNames';
ALTER TABLE zipkin_annotations ADD INDEX(`a_type`) COMMENT 'for getTraces and autocomplete values';
ALTER TABLE zipkin_annotations ADD INDEX(`a_key`) COMMENT 'for getTraces and autocomplete values';
ALTER TABLE zipkin_annotations ADD INDEX(`trace_id`, `span_id`, `a_key`) COMMENT 'for dependencies job';

CREATE TABLE IF NOT EXISTS zipkin_dependencies (
  `day` DATE NOT NULL,
  `parent` VARCHAR(255) NOT NULL,
  `child` VARCHAR(255) NOT NULL,
  `call_count` BIGINT,
  `error_count` BIGINT.PRIMARY KEY (`day`, `parent`, `child`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED CHARACTER SET=utf8 COLLATE utf8_general_ci;
Copy the code
  1. To restart the zipkin service, add MySQL connection parameters:
Java-jar zipkin-server-2.23.2-exec.jar --STORAGE_TYPE=mysql --MYSQL_HOST=127.0.0.1 --MYSQL_TCP_PORT=3306 --MYSQL_DB=zipkin --MYSQL_USER=root --MYSQL_PASS=tigerCopy the code

Note: Modify MySQL connection configuration information to suit your needs.

Request order interface, view database:

The request invocation chain information is recorded in the database.

Iv. Reference materials

openzipkin/zipkin