Abstract: With the gradual transformation of the system to micro-servitization, the RPC requirements are also changing slowly. Today, I’ll focus on Motan, the Twitter RPC framework, and some of the improvements that have been made to better accommodate hybrid cloud deployments.

Editor’s note: High availability Architecture shares and disseminates articles of typical significance in the field of architecture. This article was shared by Lei Zhang on the High Availability Architecture Group. Please quote from the HA Architecture public account “ArchNotes”.

Zhang Lei, Sina Weibo technical expert, MotanRPC framework technical director. In 2013, I joined Sina Weibo and participated in several key projects such as RPC servitization and hybrid cloud of Weibo as a core technical member. Currently, I am responsible for the maintenance and architecture improvement of MotanRPC framework. Focus on high availability architecture and service middleware development direction.

“In 2013, The Micro RPC framework Motan was born out of the careful design and hard work of our predecessors (Fu Lin, Fishermen, Maitian and Wang Zhe, etc.). We salute all the masters and also get the support and continuous improvement from various technical teams of Micro blog. Now Motan has been widely used on the micro blog platform. Nearly 100 billion calls are made every day for hundreds of services.” – zhang lei

With the rapid development of containerized micro-blog deployment and hybrid cloud platform, RPC becomes more and more important in the process of microservitization, and the demand for RPC also changes. Today, I’ll focus on Motan, the Twitter RPC framework, and some of the improvements that have been made to better accommodate hybrid cloud deployments.

The development and present situation of RPC framework

Remote Procedure Call (RPC) is a Remote Call protocol. Simply speaking, it enables applications to Call Remote processes or services as local methods. It can be applied in many scenarios such as distributed services, distributed computing, and Remote service invocation. Speaking of RPC, we are familiar with many excellent open source RPC frameworks, such as Dubbo, Thrift, gRPC, Hprose, etc. Here’s a quick look at the characteristics of RPC and common remote calls, as well as some excellent open source RPC frameworks.

RPC compared to other remote calls

RPC, HTTP, RMI, and Web Services can all make remote calls, but the implementation methods and emphasis are different.

HTTP

HyperText Transfer Protocol (HTTP) is an application-layer communication Protocol that uses standard semantics to access specified resources (such as images and interfaces). The Transfer server on the network can identify the content of the Protocol. HTTP is a resource access protocol that can complete remote requests and return the request results.

The advantages of HTTP are simple, easy to use, understandable and language independent. It is widely used in remote service invocation, including microblog. The disadvantage of HTTP is that the protocol header is heavy, and the link to the specific server is long. There may be DNS resolution, Nginx proxy, and so on.

RPC is a protocol specification. HTTP can be regarded as an implementation of RPC or applied as a transport protocol for RPC. RPC service has a high degree of automation and can realize powerful service governance functions. It is more friendly to combine with language and has excellent performance. The disadvantage of RPC compared to HTTP is that it is relatively complex and slightly more expensive to learn.

RMI

Remote Method Invocation (RMI) is the Java language Remote Method Invocation. Each Method in RMI has a Method signature, which is used by the RMI client and server. RMI is only available in the Java language and can be thought of as object-oriented Java RPC.

Web Service

Web Service is a framework for publishing, querying and invoking services based on The Web, focusing on the management and use of services. Web Services typically describe the Service through WSDL and invoke the Service through HTTP using SOAP.

RPC is a remote access protocol, and Web Service is an architecture. Web Service can also invoke services through RPC. Therefore, Web Service is more suitable for comparison with the same RPC framework. When the RPC framework provides the discovery and management of services and uses HTTP as the transport protocol, it is a Web Service.

Compared with Web Service, RPC framework can carry out more fine-grained governance for services, including traffic control and SLA management, and has greater advantages in microsertization and distributed computing.

Introduction to RPC Framework

The RPC protocol only defines the point-to-point invocation process between the Client and Server, including stub, communication protocol, and RPC message parsing. In practical applications, high availability of services and load balancing need to be considered. So the RPC framework here refers to the solution that can complete RPC call. In addition to the specific implementation of point-to-point RPC protocol, it can also include the discovery and cancellation of services, load balancing of multiple servers providing services, high availability of services and other functions. Current RPC frameworks have roughly two different focuses, one on service governance and the other on cross-language invocation.

Service governance RPC framework

Service governance RPC frameworks include Dubbo, DubboX, etc. Dubbo is an open source distributed service framework of Alibaba, which can realize high-performance RPC calls and provide rich management functions. It is a very excellent RPC framework. DubboX is an RPC framework based on the Dubbo framework. It supports REST-style remote calls and adds some new features.

This kind of RPC framework is characterized by rich functions, providing high-performance remote call and service discovery and governance functions. It is suitable for micro-servitization separation and management of large services, and can be very friendly and transparent access for Java specific projects. But the disadvantage is that the language coupling degree is high and the cross-language support is difficult.

Cross-language invocation

Cross-language invocation RPC frameworks include Thrift, gRPC, Hessian, Hprose, etc. These RPC frameworks focus on cross-language invocation of services and can support language-independent invocation of most languages, which is very suitable for the scenario of providing universal remote services for different languages. However, this kind of framework has no service discovery mechanism and generally needs the proxy layer to carry out request forwarding and load balancing policy control in practice.

Motan, weibo’s RPC framework, belongs to the type of service governance. It is a lightweight RPC framework with high performance based on Java development. Motan provides practical service governance functions and excellent RPC protocol extension ability.

The main features offered by Motan include:

  • Service discovery: Service publishing, subscription, notification
  • High availability policies: Failover, Failfast, and exception isolation (When the Server fails for more than a specified number of times, the Server is set to unavailable and heartbeat detection is performed periodically)
  • Load balancing: supports low-concurrency priority, consistent Hash, random requests, and polling
  • Scalability: Support SPI extension (Service Provider Interface)
  • Others: Call statistics, access logs, etc

Motan supports different RPC protocols and transport protocols. Motan seamlessly supports the use of RPC services in Spring configurations, which can be provided or consumed with simple, flexible configurations. By using the Motan framework, it is very convenient for service separation and distributed service deployment.

Compared with Dubbo of the same type, Motan is not so comprehensive in terms of functions, nor has it achieved much extension, but Motan is a small and refined RPC framework, which is characterized by simplicity and ease of use. It is an RPC service framework that continuously develops towards the direction of practicality and ease of use.

Introduction to Motan RPC framework

Motan interaction flow

There are three roles in Motan: service provider RPC Server, service caller RPC Client and Service Registry.

  • The Server registers services with Registry and sends heartbeat status reports to the Registry.
  • The Client needs to subscribe RPC services to the Registry. According to the list of services returned by Registry, the Client makes RPC calls to the specific Sever.
  • When the Server changes, Registry synchronizes the changes, and the Client realizes the changes and adjusts the local service list accordingly.

The interaction relationship is shown as follows:

Motan can support different registries, such as ZK and Consul. The current registry used by The platform is Vintage, which is a lightweight KV storage system based on Redis that can provide namespace service, service registration, service subscription and other functions.

Motan framework

Motan consists of register, Transport, serialize and Protocol function modules. Each function module supports SPI extension. The interaction of each module is shown as follows:

Register: Used to interact with the registry, including registration service, subscription service, service change notification, service heartbeat, and other functions. During system initialization, the Server registers services through the Register module. During system initialization, the Client subscribes to the list of servers that provide services through the Register module. The Register module also notifies the Client when the Server list changes.

Protocol: used to describe RPC services and configure and manage RPC services. Filters of different functions can be added at this layer to complete statistics and concurrency limitation.

Serialize: Serialize and deserialize objects such as parameters and results in RPC requests, that is, convert objects to byte streams. By default, the more Java-friendly Hessian2 is used for serialization.

Transport: used for remote communication. The default mode is Netty NIO TCP long link.

Cluster: module used by the Client. Cluster is the logical encapsulation of a group of available servers. It contains several servers that can provide RPC services.

When making an RPC request, the Client calls the Cluster module through the proxy mechanism. The Cluster selects an available Server based on the configured HA and LoadBalance, and converts the RPC request into a byte stream through the seriALIZE module. Then it is sent to the Server through the Transport module.

After receiving the data, the Transport module on the Server restores the data to an RPC request through the serialize module, and finds the implementation class that provides the service according to the parameters configured in the Protocol layer, and calls it through reflection. The result of the call is returned to the Client in a similar manner to complete an RPC request.

New requirements and challenges for Motan RPC in hybrid clouds

During the construction of hybrid cloud platform, RPC service needs to adapt to cloud deployment, and some new requirements for RPC are put forward in actual deployment. Hybrid cloud platform includes private cloud and public cloud. Private cloud is located in the local machine room. To ensure stable communication between clouds, private cloud and public cloud are connected by private line. There are three scenarios for RPC service expansion in cloud:

  • Expanding the RPC Server
  • Expand the capacity of both clients and servers
  • In actual capacity expansion, we will try to use the third method for nearby access. However, the first or second cases will inevitably occur in some cases. For example, RPC services need to be expanded independently, or resources dependent on RPC services cannot be deployed in the public cloud temporarily. In the first and second cases, cross-line calls (blue dotted line in the figure) occur. To flexibly control RPC calls, RPC needs to support cross-room calls and control the proportion of cross-room calls.

    In addition, IN the past, RPC was used to call the same machine room through group configuration, so there is no need to pay special attention to the bandwidth occupation in actual use. However, in the hybrid cloud environment, cross-machine room call will occupy the bandwidth of private lines, so the bandwidth occupied by RPC should be saved as much as possible.

    We made the following improvements:

    Flow compression

    In Microblog, there are two typical RPC usage scenarios: one is that the request parameters and return values are very small, but the request volume is large and the QPS is high. For example, the unread service of Weibo. Another scenario is that the QPS is not high, but each time the return value is large.

    In the first scenario, we compress the protocol itself. Motan request information roughly includes four parts: header information, Service and request method description, parameter values, and additional information.

    • Header Information includes fixed contents such as RPC version, message type, and message length. The length is 16 bytes.
    • Service and method description include the full name of the interface class, method name, and method parameter description of the request Service.
    • The parameter value is the parameter object serialized by hessian2.
    • Additional information includes the caller information (Application), interface version number, Group, and requestId.

    Typically the Service and method description, additional information, and so on are over 200 bytes, so the payload of the protocol is low when sending a request with only long parameters and so on. The idea of compression is to replace method description information with method signatures and cache additional information that is fixed in transit.

    The RPC Server provides limited interfaces and methods. As long as it identifies the method and version to be invoked, it does not need to carry complete method information. Both Server and Client generate 16-byte signatures for interface names, method names, parameter descriptions, and version information and cache the corresponding relationship. In this way, each request can find the specific invocation method only with 16-byte signatures.

    Method signatures need to be unique within a Server, so global collisions do not need to be considered, as long as the method signatures in the corresponding Server do not cause collisions. According to the estimation of 10W methods in a single Server, the collision probability of 16-byte signatures is about 1/2 * 10W * (10W-1) * 1 / (2 ^ 128), which is about 1/2 ^ 99, and the collision probability is negligible. When a new method is added on the Server, you can change the method name to avoid the collision.

    The additional information in the request is used to count the number of invocations. The additional information does not change from Client to Client during the run, so if the Server can cache this information, it does not need to be carried with each request.

    When the Client invokes the service for the first time, it sends the fixed additional information and the corresponding additional information signature to the Server. After receiving the request, the Server caches the additional information and signature based on the IP address, and sends the corresponding signature in the return value as a reply. The Client only needs to pass the signature when it receives the reply message and requests it again. If the Server Server loses the signature information due to a restart, the Server requires the Client to carry complete additional information in the next request. In this way, the Client can cache the signature information again.

    After the protocol information was compressed, RPC requests with a single long parameter at a time were reduced from 280 bytes to 94 bytes. In the measured scenario, each request is compressed by 60% on average, and the uplink bandwidth is significantly reduced in the scenario with high QPS.

    For another scenario with a large return value, we tried different serialization protocols, such as using Protocol Buffers instead of Hessian2. The actual test results were reduced by about 10% ~ 15%, and the effect was not ideal. After using QuickLZ or GZIP for compression, Depending on the original size, the compression can be reduced by 30% to 70%. The disadvantage is that the CPU load will increase.

    Finally, we added the compression function. The service side can configure whether to enable compression and the minimum threshold for enabling compression according to service characteristics. To minimize third-party package dependencies, GZIP is used for compression by default. In the actual test, the compression effect is better when the return value is 2KB to 20KB, and the average compression time is higher when the return value is greater than 50KB.

    Compressed statistics are as follows:

    Dynamic flow adjustment

    Services (interfaces) in Motan are divided into groups. Groups generally consist of equipment rooms and Service lines. A group can have multiple services. For example, if the group is yf-user, it indicates the user RPC service of the Equipment room of Yongfeng. The Client of yF-user can only subscribe to the Server of yF-user and access the equipment room nearby through the group.

    To control Client’s cross-room calls, you essentially allow clients to make cross-group calls at a certain rate. Therefore, this function can be implemented from the registry or the Client itself. In order for the Motan framework to have flow control function when using different registries, we chose to implement it on the Client side.

    We designed a set of instruction system, which is convenient to expand more management functions in the future. The instruction is stored in the registry in JSON format, and the Client also subscribes to the corresponding instruction when subscribes to RPC service. Instructions are stored in groups. That is, clients in the same group receive the same instructions. When a command is issued from the management background, the corresponding Client receives the command and parses the execution command.

    An example of a single instruction is as follows:

        ” index ” : 1,

    Version: “1.0”,

        ” dc ” :  ” yf “,

        ” pattern ” :  ” * “,

        ” mergeGroups ” : [

            ” openapi-tc-test-RPC : 12 “,

            ” openapi-yf-test-RPC : 1 “

    ].

        ” routeRules ” :  [],

    “Remark” : “Switch 50% traffic to another machine room”

    }

    MergeGroups attribute is used to set the call across machine rooms, you can set multiple machine rooms and the weight of each machine room call. The cross-room call process is shown as follows:

    Group1 and group2 belong to two different equipment rooms and provide the same Service. Client1 and Server1 belong to group1, and Client2 and Server2 belong to group2. Normally, the Client calls the Server in the same way as the group, as shown in the blue arrow. When the Server pressure of Group1 suddenly increases, the mergeGroup directive of Group1 is set in Registry through the management background Manager

    ” mergeGroups ” :  [

        ” group1 : 5 “,

        ” group2 : 1 “

    Client1 will subscribe to group1 and group2 at the same time according to the instruction and press 5: The ratio of 1 accesses Server1 of group1 and Server2 of group2 at the same time, represented by the red dotted line in the figure.

    Client2 does not receive this directive because it does not belong to Group1 and still accesses only Server2.

    The granularity of instruction control can be down to the interface class level, set through the Pattern field. Set the weight of the group to control the proportion of traffic switching. The minimum proportion can be 1%.

    The routeRules field in the directive can implement routing functions for precise control of calls or for functions such as preview. Such as:

        ” index ” : 3,

    Version: “1.0 “,

        ” dc ” : ” yf “,

        ” pattern “: ” com.weibo.xxxx.Preview “,

        ” mergeGroups ” :  [],

        ” routeRules ” : [

            ” * to !10.75.0.1 “

    “Remark” : “Preview a machine and turn off its online traffic”

    }

    Each rule in routeRules denotes a relationship from to, “* to! 10.75.0.1 “rule disables access to preview 10.75.0.1.

    Rule Wildcard characters are supported. For example, 10.75.0. * to 10.75.0.1 indicates that all clients in the 10.75.0 segment can request only the 10.75.0.1 Server.

    Other optimization

    There are also some minor details in RPC calls. For example, the exception stack will be serialized and passed to the Client when the business exception occurs on the Server side. An exception stack may be 1-2K in size. Considering that in most scenarios, the Client only cares about the cause of the exception and does not care about the content of the exception stack, we replace the exception stack when the service is abnormal to avoid transmitting unnecessary stack information.

    Some adjustments have also been made to the registration and deregistration mechanism for RPC services and support for using Consul as a registry. Make RPC service deployment in hybrid cloud more flexible.

    Afterword.

    Motan RPC framework plays an important role in the various businesses of weibo platform. After nearly three years of online operation on Weibo, it is becoming more and more excellent with more perfect functions. Motan’s design concept is simple and easy to use. The previous masters worked towards this direction when designing Motan framework, and will continue to develop in this direction.

    How to build an easy-to-use high-performance RPC framework? I think in addition to advancing with The Times to achieve new requirements, it is more necessary to consider the ease of use of the framework from the details, such as seamless upgrade of new functions, as much as possible to reduce the access cost of business side, increase the necessary switch to improve flexibility, background instruction preview to prevent misoperation, etc., improve the Motan framework from every detail.

    In the near future, we plan to overhaul the Motan framework, removing specific dependencies, implementing it in a more general way, and improving documentation. Open source is one of the directions of Motan design at the beginning. We want to make Motan an easy to use high-performance RPC open source framework. Welcome to join us.

    Q & A

    Is Motan open source? Can I download the code now?

    We are sorting it out and will open source the core in the near future.

    2. Do you support cross-language calls such as PHP?

    Currently Motan does not support PHP and other languages. We are working on this and have tested it on a small scale.

    3. Is the time in the figure the compression time?

    The time taken in the figure is the average time taken for a single compression.

    4, starting from scratch to implement an RPC, generally need to consider the implementation of what things, what is the general idea?

    Simple RPC calls only need to consider protocol (method description), serialization (parameter value passing), transport (TCP, HTTP, etc.)

    5. Does Motan support asynchronous invocation? How to do that? What policies does Load Balance support? Do you support custom policies?

    Motan’s requests are invoked asynchronously at the transport level, but asynchronous execution cannot be displayed when local references are used. The Load Balance policy supports low-concurrency priority, consistent hash, random, and polling. Support for extending other policies through the SPI mechanism.

    The serialization protocol, communication protocol, framework module definition, architecture layer, service group are all the same, the command line configuration seems to be not convenient, Dubbo management interface is very convenient, compared to Dubbo specific how light? Dubbo also seems to require only a registry.

    Motan’s module hierarchy is simpler than Dubbo’s. There is no exchange or directory, and Motan provides SLA policies such as concurrency limits.

    7, if the design of monitoring is generally involved in what indicators, how to quickly find service problems?

    You can find service problems by monitoring the time consumption, framework exceptions, and number of service exceptions in statistics logs

    RPC calls are counted by Metric in memory and then collectively output to the statistics log.

    8. What is the consideration behind the re-implementation of a set of Dubbo, or what needs cannot be solved by Dubbo in the business?

    Dubbo is relatively rich in functions, but at that time, we wanted a lightweight RPC framework to facilitate us to make some modifications and functional features suitable for our own business scenarios, so as to achieve the purpose of smooth transformation and migration of internal services. In this case, it may be more expensive to change on Dubbo than to write a new one. Eventually we decided to develop Motan RPC. On the other hand, the choice between reuse and self-development depends on the r&d capabilities and resources of the team, and we just had the right engineers interested in doing it.

    9. Does Motan support feature extensions?

    Supports SPI extension

    How does Registry ensure high availability? Are Server and Registry bidirectional heartbeat or unidirectional heartbeat? Or how to ensure that the online status of the Server is timely aware?

    Registry itself requires strong Dr To ensure high availability. For example, ZK and Consul support strong Dr. In addition, when each node of Registry unfortunately hangs, it will affect the release and logout of the service, and will not affect the normal call of the Client. The Server and Registry are unidirectional heartbeat. There may be a delay in Server offline and Client awareness, usually in seconds.

    12. Can you elaborate on the load policy of consistent Hash?

    Compute a Hashcode based on the Request Request parameter, using hashCode to Request the same server each time. The consistent hash policy is mainly applied to stateful RPC service scenarios, such as Session-enabled IM services.

    This article was planned by Qingfeng Li, edited by Jie Wang and edited by Tim Yang