The ant group network communication framework SOFABolt function is introduced and analytical | open source framework agreement

SOFA:Channel/, interesting and practical distributed architecture Channel. Review the video and PPT to see the address at the end of the article. Welcome to join the live interactive nail group: 30315793, not to miss every live broadcast.

Hi, I’m cheng Yi, the sharing lecturer for this issue of SOFAChannel. I’m from Ant Group and the open source leader of SOFABolt. Today we will talk about the framework analysis and function introduction of SOFABolt, ant Group’s open source network communication framework. This sharing will cover the following four aspects:

SOFABolt introduction;
Basic communication capability analysis;
Protocol framework analysis;
Private protocol implementation parsing;

What is SOFABolt

SOFABolt produces the background

SOFAStack(Scalable Open Financial Architecture Stack) is a set of middleware for rapidly building Financial level cloud native architectures. It is also a best practice honed in Financial scenarios.

SOFABolt is a network communication framework in SOFAStack, which is a lightweight, easy-to-use, high-performance, easily extensible communication framework based on Netty best practices. His name is Bolt from the Disney animation “Lightning Dog”. How it first came to be within Ant Group is an analogy to how Netty came to be:

In order to let Java programmers can put more energy on the implementation of business logic based on network communication, rather than too much entanglement in the implementation of NIO at the bottom of the network and deal with difficult to debug network problems, Netty came into being;
SOFABolt was born to enable middleware developers to focus more on product features rather than building the wheels of a communication framework over and over again.

Over the years, in the service and message middleware in network communication, the ant group solved many problems, accumulated a lot of experience and continuous optimization and improvement of our solution to the precipitation of summary to SOFABolt this fundamental component and feedback to the open source community, hoping to make more use of network communication scenarios. At present, this component has been used in many products of Ant Group middleware, such as SOFARPC, message center, distributed transaction, distributed switch, and configuration center.

At the same time, several enterprises have already used SOFABolt in the production environment. Thank you for your recognition, and I hope SOFABolt can bring practical value to more enterprises.

The above enterprise information is collected based on the feedback from enterprise users on Github – by 2020.06.

SOFABolt:github.com/sofastack/s…

SOFABolt framework composition

The SOFABolt whole can be divided into three parts:

Basic communication ability (efficient NETWORK IO and thread model based on Netty, connection management, timeout control);
Protocol framework (command and command processor, codec processor);
Private protocol implementation (implementation of private RPC communication protocol);

Below, we describe the specific capabilities of each part of SOFABolt.

Basic communication capability

Basic communication model

As shown in the figure above, SOFABolt has multiple communication models: Oneway, Sync, Future and callback. Below, we take a look at each communication model and their usage scenarios.

Oneway: not paying attention to the result, that is, the client does not pay attention to the result returned by the server after initiating the call. This applies to the scenario where the calling party does not need to get the processing result of the request, or the request or processing result can be lost.
Sync: when a call is made synchronously, the calling thread is blocked until the response is received or timed out. This is the most common method.
Future: asynchronous invocation. The calling thread will not be blocked until the result of the invocation is obtained through the future. It is suitable for concurrent invocation scenarios, such as calling multiple servers and waiting for all results to be returned to execute specific logic.
Callback: asynchronous call, the calling thread will not be blocked, the call result will be processed in the callback thread, suitable for high concurrency scenarios;

The oneway call scenario is very clear. Callers can use this pattern when they don’t need the call results, but when they need to process the call results, should they use sync or asynchronous future and callback? How to choose between future and callback modes when both are asynchronous?

Obviously what synchronization can do asynchronously can also be done, but asynchronous calls involve switching thread contexts, setting up asynchronous thread pools, and so on, which can be complicated. If your scenario is simple, such as the whole process is a single call and the result is processed, then it is recommended to use synchronous processing; If the entire process needs to be performed in several steps, you can split the different steps and execute them asynchronously to allocate more resources to time-consuming operations to improve overall system throughput.

Of the choices between Future and callback, callback is the more thoroughly asynchronous call, and future is suitable for scenarios where multiple asynchronous calls need to be coordinated. For example, when multiple services need to be invoked and logic is executed according to the response results of multiple servers, the future mode can be used to send requests to multiple services, and all futures can be processed in a unified manner to complete collaborative operation.

Timeout control mechanism

In the previous part of the communication model, after Oneway, the other three types (sync, Future, callback) all require timeout control because the user needs to get the results within the expected time. Timeout control simply means that after the user initiates a call, if the result of the response from the server is not received within the expected time, the call will time out. The user needs to be aware of the timeout to avoid blocking the calling thread or never executing the callback.

In communication framework, timeout control must meet the requirements of high efficiency and accuracy, because communication framework is the basic component of distributed system, once the performance of communication framework appears problems, then the performance of the upper system obviously cannot be improved. The accuracy of timeout control is also important, such as when a user expects a call to be executed for a maximum of three seconds, because an inaccurate timeout control results in a thread blocking for four seconds while the user is calling, which is obviously unacceptable.

SOFABolt’s timeout control adopts The HashedWheelTimer in Netty, as shown in the figure above. If a tick is 100 ms, the tick cycle is 800 ms. If a timeout is triggered after 300 ms, the timeout task will be placed in the ‘2’ bucket and triggered when the tick reaches ‘2’. If a timeout task needs to be triggered in 900 milliseconds, it will be placed in a bucket like ‘0’ and mark remainingRounds=1. When the first tick goes to ‘0’ it will find that remainingRounds is not equal to 0. RemainingRounds = remainingRounds = remainingRounds = remainingRounds = remainingRounds = remainingRounds = remainingRounds = remainingRounds = remainingRounds = remainingRounds

If the tick of the time wheel is set to 1 second and the ticksPerWheel is set to 60, it is the second hand of the real clock, and a complete turn represents one minute. If a task needs to be executed in 1 minute and 15 seconds, this is triggered when the second hand points to the 15th square after a round. Hashed and Hierarchical Timing Wheels: Data Structures to Efficiently Implement a Timer Facility is recommended for understanding the principles of the time wheel.

Quick failure mechanism

A timeout control mechanism ensures that a client call will receive a response after an expected amount of time, whether the response is a real response returned by the server or a timeout is triggered. What if, for some reason, the client calls time out, and the server actually returns the response to the client after the timeout?

The response result will be discarded on the client because the corresponding request has been released due to timeout, and the server will discard the response because no corresponding request can be found. Since a response returned to the client after a timeout is discarded, can the server simply return a timeout response to the client without processing the request if it is determined that the request has timed out? This is SOFABolt’s fail-fast mechanism.

The quick failure mechanism can reduce the burden on the server and enable the server to recover services as soon as possible. For example, some external dependency causes the server to block processing a batch of requests, while the client is still sending more requests to the server and storing them in the Buffer for processing. When the external dependency is restored, the server will block the normal subsequent request queue because it has to process requests already in the Buffer that have actually timed out and no business sense to process them. After the fast failure mechanism is added, in this case, the requests in the Buffer can be discarded and the newly added requests that have not timed out can be served, so that the service can be quickly recovered.

The premise of the fast failure mechanism is that a request can be judged to have timed out, which depends on the time, and the time dependence requires a unified time reference. In a distributed system it is impossible to rely on time on different machines because of network delays and machine time deviations. To avoid inconsistencies in reference times (clocks inconsistent between machines), SOFABolt’s fast failure mechanism relies only on the server machine’s own clock (a unified time reference) to determine that the request has timed out:

System.currentTimestamp – request.arriveTimestamp > request.timeout

Request. arriveTimestamp indicates the time when the request reaches the server. Request. timeout indicates the timeout period set for the request. The request must have timed out on the client side and can be safely discarded.

You are recommended to read Time, Clocks, and the Ordering of Events in a Distributed System. In this article, Lamport thoroughly analyzes the Ordering problem in a Distributed System.

Framework agreement

The protocol commands included in SOFABolt are shown in the figure above. Protocol commands in the RPC version contain only two types: RPC request/response and heartbeat request/response. RPC request/response is responsible for carrying the user’s request data and response data, while heartbeat request is used for connection keepalive and only carries a small amount of information (generally only contains the necessary information such as request ID).

Once you have a command, you need a command codec and a command processor to encode and process the command. The processing model of RemotingCommand is as follows:

The core components of the entire request and response process design are shown in the figure above, where:

Client side:
- Connection Encapsulates the operation on the underlying network.
- CommandEncoder is responsible for coding RemotingCommand, encoding RemotingCommand into byte data using proprietary protocols;
- The RpcResponseProcessor handles the response on the server side;
Server side:
- CommandDecoder decoders decode byte data into RemotingCommand objects using proprietary protocols.
- RpcHandler forwards RemotingCommand to the corresponding CommandHandler according to the protocol code;
- CommandHandler forwards RemotingCommand to the corresponding RpcRequestProcessor for processing according to CommandCode.
- The RpcRequestProcessor forwards the request to the user’s UserProcessor to execute the business logic according to the Class of the object carried by RemotingCommand, and returns the result to the client through CommandDecoder encoding.

Private protocol implementation

Built-in private protocol implementation

In addition to providing basic communication capabilities, SOFABolt has a built-in implementation of proprietary protocols that can be used right out of the box. The built-in private protocol implementation is an extensible private protocol implementation polished by practice.

Proto: reserved protocol code field. When a protocol changes greatly, the field can be distinguished by the protocol code.
Ver1: after the protocol is determined, the protocol version is compatible with minor adjustments of future protocols, such as adding fields.
Type: indicates the Command types: oneway, Request, and Response.
Cmdcode: indicates the command code. For example, RpcRequestCommand and HeartbeatCommand need to be distinguished by different command codes.
Ver2: Command version, used to identify different versions of the same Command.
RequestId: requestId that uniquely identifies a request and is used to map requests and responses in asynchronous operations.
Codec: Serialization code that identifies which method is used to serialize business data;
Switch: indicates whether to enable certain protocol-level capabilities, such as CRC verification.
Timeout: specifies the timeout period set when the client makes a request. The timeout period depends on the quick failure mechanism.
ClassLen: The length of the class name of the business request class;
HeaderLen: length of the business request header;
ContentLen: The length of the business request body;
ClassName: the className of the business request class.
Header: service request header.
Content: business request body;
CRC32: indicates the CRC check code.

Implement custom protocols

Implementation in SOFABolt private protocol is the key to realize codec (CommandEncoder/CommandDecoder) and command processor (CommandHandler).

Above is the class you need to write to implement a custom private protocol lock in SOFABolt. SOFABolt binds both codecs and command handlers to Protocol objects, and each Protocol implementation has its own set of codecs and command handlers.

Implement custom private protocols in codecs. It is important to consider the extensibility of the protocol when designing the proprietary protocol so that protocol incompatibilities will not occur in future enhancements.

After the codec is complete, the rest of the work is to implement the processor. The processor is divided into two parts: command processing entry CommandHandler and specific service logic executive RemotingProcessor.

After completing the above work, the development work of using SOFABolt to implement custom private protocol communication is basically completed. However, in the actual writing of this part of the code, there will be many difficulties and limitations, mainly reflected in the following aspects:

Poor scalability: For example, the built-in codec is used by default in RpcClient, and there is no reserved interface for setting. When a custom protocol is used, only RpcClient can be inherited for overwriting.
Framework and protocol coupling: For example, CommandHandler->RemotingProcessor->UserProcessor is provided by default, but this model is heavily coupled to the protocol (depending on CommandCode and RequestCode), CommandHandler->RemotingProcessor->UserProcessor; CommandHandler->RemotingProcessor->UserProcessor;
Protocol restriction: Although custom Encoder and Decoder can be used to customize the protocol, the framework relies on the ProtocolCode during internal organization. As a result, the ProtocolCode needs to be added to the protocol, which limits the user’s freedom to design private protocols.

Overall, SOFABolt currently offers very strong communication capabilities and years of protocol design. If the user needs to adapt to their current running private protocol there is still room for improvement, the fundamental reason is that the design was designed in accordance with the RPC framework (as can be seen from the naming of many codes), so the separation of protocol and framework can be done better.

conclusion

SOFABolt’s basic communication model, timeout control, and fast failure mechanism are introduced. Examples of private protocol implementation are analyzed. To sum up, SOFABolt provides:

Netty based best practices;
Basic communication model and efficient timeout control mechanism, fast failure mechanism;
Built-in proprietary protocol implementation, out of the box;

Welcome to Star SOFABolt: github.com/sofastack/s…

That’s the main content of this share. For a more detailed introduction to SOFABolt, due to limited live time, you can read the series “Anatomy of SOFABolt’s Framework,” produced by the SOFABolt team and the open source community:

“Dissect SOFABolt Framework” analysis: www.sofastack.tech/blog/ Click the tag “Dissect | SOFABolt Framework”

one more thing

SOFABolt currently has some areas where it can be improved. It is relatively difficult to try to implement a fully customized proprietary protocol, requiring some inheritance changes to the code.

In response to this situation, we submitted a SOFABolt project, “Breaking SOFABolt’s Framework and Protocol”, in the “Alibaba Summer of Programming”. We hope that by breaking the framework and protocol first, and then modularizing it, SOFABolt will become a flexible and extensible communication framework best practice!

Everyone is welcome to work together to solve this problem and make SOFABolt better: github.com/sofastack/s…

SOFAStack also welcomes more open source enthusiasts to join the community to become Contributor, Committer ~

SOFACommunity: www.sofastack.tech/community/

This video review and PPT view address

Tech.antfin.com/community/l…

Financial Class Distributed Architecture (Antfin_SOFA)