Summary: The Getty maintenance team does not pursue meaningless benchmark data, does not do meaningless show off optimization, only to improve their own production environment needs. As long as the maintenance team is there, Getty’s stability and performance will continue to improve.

I have been engaged in the research and development of Internet infrastructure system for more than ten years, and many of my friends, including myself, are wheelers.

When I worked at a large factory in 2011, many of my COLLEAGUES who were developing in C had their own private SDK libraries, especially for network communications. When I first got into this environment, I felt that I couldn’t write an asynchronous network communication library based on the epoll/ IOCP /kqueue interface. Now I remember that at that time, many colleagues were bold enough to use their encapsulated communication libraries directly in the test production environment. It is said that there were 197 RPC communication libraries put into the production environment at that time.

I spent two years of my weekend off building a personal C language SDK library: C language implementation of most C++ STL containers, timers, TCP/UDP communication library, log output library with output speed up to 150MiB/s, various locks based on CAS, lockless queue of multiple producers and consumers that avoids ABA problems, etc. I didn’t know PHP at the time, but you could actually make a framework like Swoole by wrapping it up a little bit. If you insist on writing down, it may be comparable to the ACL library of old friend Zheng Shuxin teacher.

I started to learn Go language in 2014. After a period of learning, I found that it has the same characteristics as C language: there are too few basic libraries — you can build wheels again. I vividly remember the first wheel I built, xorList, a bidirectional linked list with only one pointer per element [see Ref 1].

In June 2016, when I was working on an instant messaging project, the original gateway was a Java project based on NetTY implementation, and the interface implementation of its TCP network library was directly borrowed from Netty when it was reconstructed using Go language. When webSocket support was added in August of the same year, I found the onOpen/ONclose/onMessage network interface provided by WebSocket extremely convenient. It’s network interface to OnOpen/OnClose/OnMessage/OnClose, put all the code in the making, and publicity on a small scale (see reference 2).

Getty Layered design

Getty adheres closely to the principles of layered design. It is mainly divided into data interaction layer, business control layer and network layer. At the same time, it also provides very easy to expand the monitoring interface, which is actually the exposed network library interface.

1. Data interaction layer

Many people provide network frameworks that define the network protocol format, at least the network header format, and only allow its upper users to do extensions below this header, which limits its scope of use. Getty makes no assumptions about the format of the upper-layer protocol and provides the data interaction layer up, as it is defined by the user.

As far as it is concerned, the data interaction layer does a very simple thing. It deals with the data interaction between the client and the server and is the carrier of the serialization protocol. It is also very simple to use, as long as you implement the ReadWriter Interface.

Getty defines the ReadWriter interface, leaving the serialization/deserialization logic up to the user to implement manually. When one end of the network connection reads the byte stream sent by the peer through Net.conn, the Read method is called to deserialize the byte stream. The Writer interface is called in the network sending function. Before a network packet is sent, Getty calls the Write method to serialize the sent data into a byte stream, which is then written to net.conn.

The ReadWriter interface is defined as above. The Read interface returns three values to handle sticky TCP packets:

– If a network stream error such as protocol format error occurs, return (nil, 0, error)- if the stream read is so short that its headers cannot be resolved, return (nil, 0, nil)- If the stream read is so short, Can parse its header but can’t parse the whole package (nil, pkgLen, nil)- can parse a whole package (PKG, 0, error)

2. Business control layer

The business control layer is the essence of Getty’s design and consists of Connection and Session.

– Connection

Manages established Socket connections, including connection status management, connection timeout control, connection reconnection control, and packet processing, such as packet compression and packet splicing and reassembly.

– Session

Responsible for the management of a Connection establishment of the client, recording the status data of this Connection, managing the creation and closure of Connection, and controlling the sending of data/interface processing.

2.1 the Session

Session is arguably the most core interface in Getty. Each Session represents a Session connection.

Down –

Session fully encapsulates Go’s built-in network library, including data flow read and write to Net.conn, timeout mechanism, etc.

Up –

Sessions provide a business-accessible interface, and users simply implement EventListener to plug Getty into their business logic.

At present, the Session interface is only implemented by the Session structure. The Session as an interface only provides external visibility and follows the mechanism of programming interface. When we talk about the Session, we are actually talking about the Session structure.

2.2 the Connection

Connection abstracts and encapsulates the Go built-in network library according to different communication modes. Connection has three implementations respectively:

– gettyTCPConn: The bottom layer is *net.TCPConn

– gettyUDPConn: The bottom layer is * net.udpconn

– gettyWSConn: the underlying implementation uses a third-party library

2.3 Network API EventListener

As mentioned at the beginning of this article, the Getty Web API interface name was borrowed from the WebSocket Web API. Hao Hongfan, one of Getty’s maintainers, likes to call it a “monitoring interface” for the reason that the most troublesome part of network programming is not knowing how to troubleshoot when something goes wrong. Through these interfaces, you can know the status of each network connection at each stage.

** “OnOpen” : Provided to the user when the connection is established. If the total number of current connections exceeds the number set by the user, a non-nil error can be returned and Getty will close the connection at the initial stage. “OnError” : used to monitor when the connection is abnormal. Getty closes the connection after executing this interface. “OnClose” : used to monitor when the connection is closed. Getty closes the connection after executing this interface. “OnMessage” : After Getty calls the Reader interface to successfully parse a package from the TCP stream /UDP/WebSocket network, the packet is handed to the user for processing through this interface. OnCron: ** Timing interface. You can perform timing logic such as heartbeat detection in this interface function.

The core of the five interfaces is OnMessage, which takes a parameter of type interface{} to receive data from the peer end.

The bottom layer of the network connection is binary. The protocol layer we use usually reads and writes the connection in byte stream mode.

In order for us to focus on writing business logic, Getty extracts serialization and deserialization logic out of EventListener, which is the Reader/Writer interface mentioned earlier. During the session, the byte stream is first read from Net.conn. Deserialization is performed through the Reader interface and the result of deserialization is passed to the OnMessage method.

If you want to plug metrics into Prometheus, you can easily add a collection of metrics to these EventListener interfaces.

Getty network side data flow

Below is a class diagram of the Getty core structure, which covers the design of the entire Getty framework.

| Description: The gray part is the Go built-in library

The following uses TCP as an example to describe how Getty is used and the functions of each interface or object in this class diagram. Server/Client is a encapsulated structure provided for users. The logic of the client is similar to that of the server to a large extent. Therefore, this chapter only focuses on the server.

Getty Server startup code flowchart is shown above. In Getty, the startup process for a Server service requires only two lines of code:

The first line is obviously a process to create the server, where options is a func (*ServerOptions) function that is used to add some additional functionality to the server, such as enabling SSL, performing tasks in the form of submitting tasks in a task queue, etc.

Server. RunEventLoop(NewHelloServerSession) starts the server, and is also the entry of the entire server service. It listens for a certain port (specified by options). And process the data sent by the client. The RunEventLoop method needs to provide a NewSessionCallback parameter of type defined as follows:

This is a callback function that is called when a connection is successfully established with the client. It is usually provided to the user to set network parameters, such as keepAlive parameters for the connection, buffer size, maximum message length, read/write timeout, etc. But most importantly, the user needs to pass this function. Set up the Reader, Writer, and EventListener for the session.

So far, the processing flow of server in Getty is shown as follows:

Another excellent example of the use of these interfaces, in addition to the code examples provided by Getty itself, is Seata-Golang. For those interested, see the Distributed Transaction Framework Seata-Golang Communication Model [Ref. 6].

To optimize the

The rule of thumb for software development is: “Make it work, Make it right, Make it fast.” Premature optimization is the root of all evil.

An early example is Joe Armstrong, the inventor of Erlang, who spent a lot of energy working late to improve The performance of Erlang. One consequence of this was that he later realized that many of the early optimizations he had made were useless. Another consequence was that early optimizations had damaged Joe’s health. He died in 2019 at the age of 68.

By stretching out the timeframe to five or even ten years, you may find that some of the optimizations made early on become a drag on maintenance later on. In 2006, many experts were still recommending that you only use Java for ERP development and not use Java in background programming on the Internet because of its poor performance compared to C/C++ on single-core CPU machines at the time. This can be blamed on the nature of the interpreted language and JVM GC. But complaints about its performance have been rare since 2010.

In 2014, I met zhou aimin, former architect of alipay, at a dinner party. Zhou joked that if alipay switched its main business programming language from Java to C++, the number of servers could be saved by 2/3.

By analogy, as a new language much younger than Java, Go defines a programming paradigm in which programming efficiency is the primary concern. As for its program performance, especially network IO performance, such problems can be left to time, and many of the current complaints may not be a problem in five years. If your application really hits a network IO performance bottleneck and your machine budget is tight, consider switching to a lower-level language such as C/C++/Rust.

In 2019, MOSN’s underlying network library used Go language native network library, and each TCP network connection used two Goroutines to deal with network transceiver respectively. Of course, after optimization, single TCP connection was achieved and single TCP only used one Goroutine. It was not optimized by means of epoll system calls.

Here’s another example.

Since 2020, Bytedance has published an article on Zhihu to promote the excellent performance of kitex, its Go language network framework [see Reference 3], saying that after being based on native Epoll, “the performance has far exceeded the official NET library”. It didn’t have open source at the time, so people had to believe it. At the beginning of 2021, Toutiao started to advertise again [see Reference 4], claiming that “test data show that the current version (2020.12) has up to 30% throughput, 25% latency AVG ↓, 67% TP99 ↓ compared with the last version (2020.05), and the performance is far better than the official NET library”. And then finally open-source the code. At the beginning of August, Bird’s Nest Boss was tested, and the test conclusion was presented in the paper “Go Ecosystem RPC Framework Benchmark 2021” (see Reference 5 for a link).

Having said all that, we will take back the topic and conclude with one sentence: Getty only considers using the native network interface of Go language, and will only look for an optimization breakthrough at its own level if it encounters a network performance bottleneck.

Getty has a major upgrade every year, and this article gives some of the major upgrades in recent years.

1, Goroutine Pool

The Getty initial version enables two Goroutines for a network connection: A Goroutine receives the network byte stream, calls the Reader interface to unpack the network package, and calls the eventListener.onMessage () interface for logical processing. The other Goroutine is responsible for sending network byte streams and calling eventListener.oncron () to perform timing logic.

Later, in order to improve network throughput, Getty made a major optimization: the logic processing step was separated from the first Goroutine task and a Goroutine Pool was added to handle network logic.

That is, network byte stream receiving, logic processing, and network byte stream sending all have separate Goroutine processing.

Gr Pool members include task queues (M in number), Gr array (N in number) and tasks (or messages). According to the number of N, Gr Pool members can be classified into scalable Gr Pool and fixed size Gr Pool. The advantage of a scalable Gr Pool is that it can increase or decrease N as the number of tasks changes to save CPU and memory resources.

1.1 Fixed Gr Pool Size

According to the ratio of M to N, Gr pools of fixed size can be divided into 1:1, 1:N, and M:N.

Kafka-connect-elasticsearch Gr Pool kafka-connect-ElasticSearch Gr Pool Read data from Kafka as a consumer and put it into a message queue. Then each worker GR takes out tasks from the queue for consumption processing.

In this model, only one CHAN is created in the entire Gr pool, and all THE Gr pools read this one chan. Its disadvantages are as follows: The queue read-write model is one write, one read, because of the low efficiency of the Go Channel (using a mutex lock), the competition is fierce, and of course the network packet processing order is not guaranteed.

The Gr pool model of Getty initial version is 1:1, and each Gr has its own CHAN. Its read and write model is one write and one read, which can ensure the sequence of network packet processing. For example, when reading kafka messages, Hash (message key) % N to a task queue based on the hash(message key) % N of the Kafka message key. However, the defect of this model is that each task has to be processed for a certain time. This scheme will cause a task to be blocked in the CHAN of one Gr, and even if other Gr are idle, it cannot be processed [task “hungry”].

Further improvement of 1:1 model: one chan per Gr. If THE Gr finds that its chan has no request, it will find another Chan, and the sender will try to send to the coroutine with the fastest consumption. This scheme is similar to the Goroutine queue used by the MPG scheduling algorithm inside Go Runtime, but the algorithm and implementation can be too complex.

Getty later realized Gr pool of M:N model version, in which each task queue is consumed by N/M Gr. The advantage of this model is to balance processing efficiency and lock pressure, and achieve balanced task processing at the overall level. Task distribution adopts RoundRobin mode.

The overall implementation is shown in the figure above. See gr Pool [Ref 7] for the TaskPool implementation.

1.2 Unlimited Gr Pool

In a Gr pool with a fixed amount of resources, throughput and RT cannot be guaranteed when the number of requests increases. In some scenarios, users want to use all resources to ensure throughput and RT.

Later, referring to “Goroutine Pool” in “A Million WebSockets and Go” [Ref. 8], A GR pool with unlimited capacity was implemented.

See the taskPoolSimple implementation in GR Pool [Ref. 7] connection for details.

1.3 Network Packet Processing Sequence

The advantage of a fixed-size GR pool is that it limits the CPU/MEMORY usage of the machine by the logical processing process. However, an unlimited GR pool may exhaust the resources of the machine and cause the container to be killed by the kernel. However, No matter what form of GR pool is used, Getty cannot guarantee the order in which network packets are processed.

For example, when the Getty server receives two network packets A and B from the same client, the Gr pool model may cause the server to process packet B first and then packet A. Similarly, the client may receive A response from the server to packet B before receiving A response from packet A.

If each request from the client is independent and has no sequential relationship, then Getty with the Gr Pool feature has no problem ignoring the sequential relationship. If upper-layer users are concerned about the sequence in which requests A and B are processed, they can combine requests A and B into one request or disable gr pool.

2, Lazy Reconnect

In Getty, session represents a network connection, while client is actually a network connection pool that maintains a certain number of connection sessions, which are of course set by the user. In the initial version of Getty Client [prior to 2018], each client separately started a Goroutine poll to detect the number of sessions in its connection pool, and initiated new connections to the server if the number of sessions did not reach the user-defined number.

When the client is disconnected from the server, the server may be offline, unexpectedly logged out, or faked dead. If the upper layer user determines that the peer server does not exist (such as receiving a notification from the registry that the server is offline), the client.close () interface is called to Close the connection pool. If the upper-layer user does not invoke the interface to close the connection pool, the client considers that the peer address is still valid and attempts to initiate reconnections to maintain the connection pool.

To sum up, the reconnection process of the initial version of the Getty Client from closing an old session to creating a new one is:

  • Old session closes the network to receive goroutine;
  • The old session network sends a Goroutine. The network sends a goroutine. The current session is invalid after resource reclamation.
  • The client’s polling goroutine detects an invalid session and removes it from the session connection pool.
  • When the client’s polling Goroutine detects that the number of valid sessions is less than the number set by the Getty upper user and that the Getty upper user does not Close the connection pool through the client.close () interface, it calls the connection interface to initiate a new connection.

The above method of constantly checking the validity of each session in the session pool of the client through periodic polling can be called active connection. The obvious downside to active connections is that each client needs to have a separate Goroutine enabled. Of course, one of the further optimizations is that you can start a global Goroutine that periodically polls the session pool of all clients, rather than having to start a goroutine for each client. However, I have been thinking about a problem since 2016: Can we change the maintenance mode of session pool, remove the scheduled polling mechanism, and maintain the session pool of each client without using any Goroutine at all?

In May 2018, while taking a walk after lunch, I went over the reconnection logic of the Getty Client and came up with a different approach. It was perfectly possible to “recycle” the network goroutine in Step 2. After the goroutine marks the logical step in which the current session is invalid, add another logic:

  • If the maintainer of the current session is a client (because the user of the session can also be a server);
  • If the number of sessions in the current session pool is less than the session number set by the upper-layer user,
  • If the upper layer user has not set the current session pool to invalid through client.close (), the current session pool is valid.
  • If the above three conditions are met, the network sends a Goroutine to perform connection reconnection.
  • After a new network connection session is successfully established and added to the session pool of the client, the network sends the Goroutine and exits immediately.

I call this lazy reconnect, and the network send Goroutine should be called the network reconnect Goroutine at the end of its life cycle. With lazy Reconnect, the logic of step 3 and Step 4 is merged into Step 2, so the client doesn’t have to start an extra Goroutine to maintain its session pool through timed polling.

Lazy Reconnect The overall flow chart is shown above. If you are interested in the code flow, please follow the link in “Reference 13” for easy self-analysis.

3. Timer

After the Gr pool is introduced, a network connection uses at least three Goroutines:

  • A Goroutine receives the network byte stream, calls the Reader interface and unpacks the network package
  • The second Goroutine calls the EventListener.onMessage () interface for logical processing
  • The third goroutine is responsible for sending the network byte stream, calling eventListener.oncron () to perform the timing logic, and lazy reconnect

The model can run stably with fewer connections. However, when the cluster size reaches a certain scale, such as the connection number of each server is more than 1K, the network connection alone will use at least 3k Goroutines, which is a great waste of CPU computing resources and memory resources. Of the above three goroutines, the first goroutine cannot be disassembled, the second goroutine is actually part of the GR pool, and the optimizable object is the task of the third Goroutine.

In late 2020, Getty’s maintenance team first put network byte stream tasks into a second Goroutine: synchronous network delivery immediately after processing logical tasks. After this improvement, the third Goroutine is left with a single eventListener.oncron () timed processing task. This timing logic could have been left to the Getty upper callers, but for user convenience and backward compatibility, we used another optimization idea: the introduction of a time wheel to manage timed heartbeat detection.

In September 2017, I implemented a Go language Timer time wheel library (see Reference 10 for a link), which is compatible with all native interfaces of Go timer. The advantage is that all time tasks are executed in a Goroutine. When it was introduced to Getty in December 2020, all of getty’s EventListener.oncron () timed processing tasks were handed over to the Timer wheel, and the third Goroutine disappeared perfectly. Two months later it was discovered that the Timer library had been “borrowed” by another RPC project that was good at pulling stars.

The third Goroutine is left with a final task: lazy Reconnect. When the third Goroutine does not exist, the task can be put into the first Goroutine: perform lazy reconnect as the last step before the network byte stream receives a network error and the Goroutine exits.

Optimized to use a maximum of two Goroutines per network connection:

  • A Goroutine receives the network byte stream, calls the Reader interface to disintegrate the network package, and lazy reconnect
  • The second Goroutine calls the EventListener.onMessage () interface for logical processing, sending network byte streams

The second goroutine comes from the GR pool. Given that goroutines in gr pools are reusable common resources, a single connection actually only occupies the first Goroutine in isolation.

4. Getty pressure measurement

Hao Hongfan, a student from Getty maintenance team, realized Getty Benchmark [Ref. 11] by referring to the BENCHMARK program of RPCX, and carried out overpressure measurement on the optimized V1.4.3 version.

“Pressure test environment” :

“Pressure Test Result” :

The number of TPS on the server is 12556, and the network throughput is 12556 * 915 B/s ≈ 11219 KiB/s ≈ 11 MiB/s.

As shown in the figure above, the CPU/MEMORY resources of the service sheet changed before and after the network compaction. The getty server only used 6% of the CPU and only consumed more than 100MiB of MEMORY, and the resources were returned to the system soon after the test pressure was removed.

When tested, the Goroutine usage on the Getty server is shown below: Only one Goroutine is used for a single TCP connection.

Please refer to Benmark Result for full test results and related codes (see [Ref 12] for links). The pressure test certainly didn’t push Getty to its limits, but it was enough to meet the needs of Alibaba’s main scenarios.

The development timeline

Getty has come a long way since I wrote it in 2016 and now has a dedicated open source team to maintain it.

The main development time nodes of the timeline are as follows:

– Developed the first production usable version in June 2016, supporting both TCP/ Websocket communication protocols, and posted on GoCN in October 2016

gocn.vip/topics/8229

Promotion;

– In September 2017, a Go language Timer timer wheel was implemented

Github.com/AlexStocks/…

– Added UDP communication support in March 2018;

– May 2018 support for RPC based on Protobuf and JSON;

– Added zooKeeper and ETCD based service registration and discovery function named Micro in August 2018;

– In May 2019, Getty’s underlying TCP communication implementation was separated and moved to github.com/dubbogo, and then moved to github.com/apache/dubbo-getty;

– May 2019 Getty RPC package was moved by two students from Ctrip [github.com/apache/dubb… Built dubbogo RPC layer based on Hessian2 protocol;

– Fixed size Goroutine pool added in May 2019;

– At the end of 2019, Liu Xiaomin told her that she realized Seata-Golang based on Getty;

– In November 2020, network sending and logical processing were combined into gr pool for processing;

– Completion of timer optimization in May 2021;

Finally, as mentioned at the beginning of optimization in section 3, Getty’s maintenance team does not pursue meaningless benchmark data, does not do meaningless show-of-the-art optimization, and only makes its own improvements based on the needs of the production environment. As long as the maintenance team is there, Getty’s stability and performance will continue to improve.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.