Introduction: This article mainly introduces the latest Jaeger plug-in backend interface and development methods, so that we can step by step according to the article to complete a Jaeger plug-in development. SLS also launched support for Jaeger, welcome to try it out.

With the promotion and implementation of cloud native + micro services, service monitoring is becoming more and more important. Medium-sized micro service scenarios, ops students cannot just log reduction request the invocation of the trajectory and the after service of execution, let alone abnormal returning to locate and analysis service, research and development operations students need a service monitoring tools, it can restore the trajectory of the service call each request as well as the execution time of service, and to form. Distributed link tracking system was born.

In recent years there have been a number of excellent commercial products, often referred to as APM (Application Performance Monitoring); For example, domestic commercial companies include AliYun ARMS, Tianyun, Bo Rui, Yunhui, etc., while outstanding foreign commercial companies include AppDynamic, DynaTrace, etc., which are very perfect in terms of products and can adapt to various scenarios. Similarly, open source has excellent solutions. Such as CNCF Jaeger, Apache SkyWalking, Cat, Pinpoint and so on. Jaeger, as the top project graduated from CNCF, usually becomes the preferred monitoring solution for operation and maintenance students in the cloud native scenario.

The Jaeger project was developed by Uber in 2015. In 2017, Jaeger was included in the Cloud Native Computing Foundation (CNCF) incubation program, and in 2019, Jaeger officially graduated. Below is a diagram of the Jaeger architecture. The figure contains two architectural patterns, which are roughly the same except that Kafka is added as a buffer to solve the problem of peak traffic overload. Jaeger includes Client, Agent, Collector, DB, UI, etc. Jaeger also supports a variety of backend storage, including: memory, Badger, Cassandra, ElasticSearch, gRPC plug-in.

Today we will talk about the gRPC plug-in, this powerful and easily forgotten feature. In a nutshell, the gRPC plug-in provides the ability to export Trace data from the Jaeger system. With this capability, the developers can easily connect Trace pair to a back-end service with Trace storage and analysis, which can conduct secondary analysis and processing of Trace, such as abnormal root cause analysis, abnormal detection and alarm, to help the operation and development students better find and locate potential problems in the system

Jaeger plug-in development process

In order to better understand the development of Jaeger plug-in, it is necessary to supplement the underlying implementation principle of gRPC plug-in. Jaeger gRPC plug-in is implemented using HashiCorp/ Go-Plugin framework. Next we will introduce the Go Plugin and the development process of the plug-in.

The Go Plugin is open source by HashiCorp. It follows the open and closed principle of design pattern, fixes the upper business logic through interfaces, and realizes business expansion by changing and calling different RPC service interfaces. At present, the Go Plugin contains two types of plug-ins: RPCPlugin and GRPCPlugin. The underlying calls of the clients of the two types of plug-ins are different. One through NET/RPC and one through GRPC service. Both plug-ins provide two methods, Server and Client. The Service method serves as a stub for the server, which, upon receiving the request, invokes the implementation of the interface. The Client method acts as a factory method that generates an implementation object for the interface for the Client.

Go Plugin will start a sub-process during the startup process to enable RPC/gRPC service. The main process directly achieves the plug-in mode through THE RPC/gRPC interface. It supports the coexistence of multiple versions of services (which will be described later), and it does not provide high availability solutions of services, which need to be provided by users themselves. Having said so much, the following is a brief introduction to the development process of Go Plugin

Plug-in development

The KV Example defines two methods, namely Put and Get. The KV Example contains multiple protocol versions. GRPC is taken as an Example in this paper.

Defining service Interfaces

Type KV interface {// KV interface is the interface defined by KV plug-in Put(key String, value []byte) error Get(key string) ([]byte, error)}

Implement the interface client

// KV interface client implementation, KVClient} func (m *GRPCClient) Put(key string, Value []byte) error {// Call gRPC service interface _, err := m.client.Put(context.background (), & proto.putrequest {... }) return err} func (m *GRPCClient) Get(key String) ([]byte, error) {// Call the gRPC service resp, err := m.client.Get(context.Background(), &proto.GetRequest{ ... })... return resp.Value, nilCopy the code

Implement the interface server

Func (m *GRPCServer) Put(CTX context.Context,req * proto.putrequest) (* proto.empty, error) {// After receiving the request, Empty{}, m.ipl.put (req.key, req.value)} func (m *GRPCServer) Get(CTX context.context, Req * proto.getrequest) (* proto.getresponse, error) { V, err := m.pl. Get(req.key) return & proto.getresponse {Value: v}, err } type KV struct{} func (KV) Put(key string, Value []byte) error {// Specific service implementation} func (KV) Get(key String) ([]byte, error) {// Specific service implementation}Copy the code

Implement the Go Plugin interface

KVGRPCPlugin type KVGRPCPlugin struct {plugin.Plugin Impl KV  } func (p KVGRPCPlugin) GRPCClient(ctx context.Context, broker plugin.GRPCBroker, c *grpc.ClientConn) (interface{}, Error) {// notice the implementation of return &GRPCClient{client: proto.NewKVClient(c)}, nil } func (p KVGRPCPlugin) GRPCServer(broker plugin.GRPCBroker, S * grpc.server) error {// Register GRPC service proto.registerkvServer (s, &GRPCServer{Impl: p.ip}) return nil}Copy the code

The plug-in USES

This part will introduce how the plug-in is used. The plug-in usage is divided into two parts, the plug-in server and the plug-in client part

Plug-in server

As mentioned in the previous section, when the Go Plugin is started, a local child process is started. The child process here refers to the plug-in server, which should be an executable file containing the main method. The following introduction begins with a brief introduction to using the plug-in server

  1. Write a main function and register the client implementation of the plug-in with the Go-plugin as follows:

    Plugin.Serve(&plugin.ServeConfig{// shakeConfig: shared.Handshake, Plugins: Map [string] plugin.plugin {// Plugin name “kv_grPC “: &shared.KVGRPCPlugin{Impl: &kv {}},}, GRPCServer: plugin.DefaultGRPCServer, })

  2. Compile into an executable using Go Build

Plug-in client

The process of plug-in Client mainly includes creating the Client of the plug-in, starting the server of the plug-in, obtaining the interface realization of the plug-in, and invoking the service interface

ClientConfig{// shakeConfig contains the version and authentication information. HandshakeConfig: = plugin.NewClient(&plugin.ClientConfig{// shakeConfig contains the version and authentication information. Plugins: shared.PluginMap, // Fill in the plugin executable path Cmd: Exec.Com mand (" sh ", "-c", OS, Getenv (" KV_PLUGIN ")), / / plug-in support agreement. AllowedProtocols: [] plugin.protocol {plugin.protocolgrpc, plugin.protocolnetrPC},}), // Get the client of the plugin. RpcClient, err := client.client () // Obtain the raw object of the interface client. Err := rpcClient.Dispense("kv_grpc") kv := raw.(shared. kv) // Result, err := kv.get (os.args [1])Copy the code

Jaeger plug-in interface specification

From the above introduction, we can already know that Jaeger has helped us to implement the client side & server side of the plug-in and the client side of the interface. We just need to complete the development of the server side of the interface, and the development of a gRPC plug-in is completed. Jaeger reserves two plugin interfaces in gRPC plugin, StorePlugin and ArchiveStorePlugin. The difference between StorePlugin and ArchiveStorePlugin is that StorePlugin has more definition of DependencyReader interface. The DependencyReader interface is used to query dependencies between services. Both of these plug-in interfaces expose the SpanReader and SpanWriter interfaces for Trace/Span reads and writes.

SpanReader

Operation Name func GetOperations(CTX context.context, query spanstore.OperationQueryParameters) ([]spanstore.Operation, Func GetServices(CTX context.context) ([]string, Error) / / by qualified Trace func FindTraces (CTX context, context of query. * spanstore TraceQueryParameters) ([] * model in the Trace, Func FindTraceIDs(CTX context.context, query *spanstore.TraceQueryParameters) ([]model.TraceID, Func GetTrace(CTX context.context, traceID model.traceid) (* model.trace, error) func GetTrace(CTX context.context, traceID model.traceid) (* model.trace, error)Copy the code

SpanWriter

// Write Trace func WriteSpan(CTX context. context, span * model.span) errorCopy the code

DependencyReader

// Read dependencies between applications, Func GetDependencies(CTX context.Context, endTs time.time, lookback time.Duration) ([]model.DependencyLink, error)Copy the code

Develop the SLS Jaeger plug-in

SLS has launched a unified storage and analysis solution of distributed link tracing (Trace), which supports Jaeger, Apache SkyWalking, OpenTelemetry, Zipkin and other Trace data access. If you are interested, you can check out the Demo.

The code logic in the SLS Jaeger plug-in is not covered here. The plugin code is now open source, GitHub address: github.com/aliyun/aliy… Welcome to add ⭐️ take brick, the warehouse also provides a one-click Run Demo example, welcome to use, use of the document has been provided on Github, below to show you the effect and the development of Jaeger plug-in development behind the significance.

The thinking behind plug-ins

The whole plug-in is now developed, and we need to think about what the plug-in gives us behind the scenes. Trace data collection is only the beginning of system monitoring, and mining the hidden information of Trace is the most important ability to build a monitoring system. Similarly, how to make use of the information value brought by Trace and how to continuously guarantee this ability is also the focus of our thinking

Hayne’s law states that for every serious accident there are 29 minor accidents, 300 near-misses and 1,000 potential accidents. Here’s what Hine’s Law tells us: Every security accident seems to be accidental, but in fact it is the inevitable result of various factors accumulated to a certain extent. Every day, the business system produces a large amount of Trace data, which cannot be understood by human eyes about the operating status of the system, let alone discover some hidden problems in the system. At this time, it is necessary for the system to provide the ability of analysis in big data scenarios. SLS log started, a number of PB level of log daily processing volume, in addition to provide a pile of log analysis operator, processing tools, for users to analyze the story behind the system. Trace can be understood as a specific log, but the log is associated with the context (TraceID, parentSpanID, SpanID). It is believed that SLS will be able to handle Trace logs with ease.

As Jaeger is part of an observability/monitoring system that is an important source of data for locating and discovering business system problems, we need to ensure that the monitoring system outlives the business system. Once the monitoring system is down before the service system, the monitoring is meaningless. As an open source project, Jaeger itself only provides solutions, but does not provide the deployment scale evaluation scheme and how to ensure high availability of services. In this case, how to provide high availability and high performance back-end services? Who is going to provide the last layer of security for the surveillance system? SLS, as a cloud service, is characterized by high performance, flexibility and free operation and maintenance, allowing users to easily cope with the problem of surging traffic or inaccurate scale assessment. SLS service itself provides 99.9% availability and 11 nines data reliability.

conclusion

To construct a complete monitoring system, it is necessary not only to ensure the availability of the monitoring system, but also to have strong analytical ability. Analysis ability helps o&M students quickly locate and discover faults and improve system availability. The Jaeger plug-in provides us with the ability to expand access to multiple analysis systems, which enables professional analysis teams to provide professional analysis capabilities and enables operations and development teams to focus more on business operations.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.