<SOFA:Channel/>, interesting and practical distributed architecture Channel.

<SOFA:Channel/> serves as the host for all SOFA online content, including live/audio/video tutorials, which epitomise the full picture of SOFAStack’s capabilities.


This article is compiled based on the live broadcast content of January 17, 2018.

Welcome to join the live interactive peg group: 23127468 (search the group number to join).


Good evening, everyone. Today is the first live broadcast of SOFAChannel. Thank you for watching.

Let me introduce myself first. My name is Yu Huai, zhang Geng from Ant Financial Middleware. I am currently responsible for the work related to application framework and SOFAStack open source.


The evolution of Ant servitization architecture

People have heard much about ant Financial’s technology concepts, such as “Yu ‘ebao”, “Mutual Treasure”, “Ant Forest”, “Facial payment”, “Scanning for bus and subway”, “Double 11” and so on.

Behind these products is a set of core technologies of Fintech, including “three places and five centers remote Multi-activity architecture”, “Financial level distributed architecture SOFAStack”, “financial level distributed database OceanBase”, “intelligent risk control”, “blockchain” and many other black technologies.

Of course, the micro service architecture system of Ant Financial, like other traditional enterprises, was not so high in the beginning, but also gradually evolved with the development of business. The following is a brief introduction to the evolution process.

This schematic diagram of alipay’s earliest architecture shows that At that time, Alipay was just a payment system in the background of e-commerce. It was a single application, which simply divided into several business modules and connected to a database. However, with the continuous expansion of the business scale, the single-system architecture has been unable to meet the business requirements.

Therefore, Alipay split the large system into multiple independent subsystems, which is a typical SOA-oriented architecture.

At first, the hardware load device of F5 was used to balance the load between the systems, but due to the single point problem of F5 device, a registry component was introduced in the middle. The service provider goes to the registry to register the service, the service consumer goes to the registry to subscribe to the list of services, and the service consumer invokes the service provider directly through the RPC framework in a soft load manner. This may seem like an obvious service-oriented architecture now, but it was a bit ahead of its time back in 2007.

Alipay is doing system split at the same time, the database also according to the subsystem of the vertical split. The split of database will introduce the problem of distributed transaction. Ant Financial middleware provides a distributed transaction component DTX based on TCC.

Services continue to expand, and more and more systems are added. When the number of system nodes reaches a certain number, a single physical equipment room cannot afford it. In addition, considering the problem of same-city disaster, Alipay will expand another machine room in the same city, deploy it as an internal network through special line, and then deploy the application.

The problem of cross-room remote access is introduced in the same room, and this delay loss must be higher than in the same room call. There are two main types of remote access: RPC calls and database access. In order to solve the problem of RPC call across machine rooms, the engineers of Alipay chose to deploy the registry in each machine room and call the service of the same machine room first, which became the deployment mode in the figure. However, the problem of database access across machine rooms is not solved at this stage.

Another problem is the call link chaos and database connection bottleneck calls. For example, in this figure, we fragment the data, and then when the request of User0 in the figure comes in, it is randomly transferred to S2, then to B0, then to C1, and finally the follow-up data allocation falls to database D0. As you can see, the call link is random, and each core layer also needs to establish long connections with all branches.

In order to solve the above problems of cross-room data access, database connection number bottleneck and future horizontal expansion of data, ant engineers designed a set of unitary architecture, this is a schematic diagram of unitary.

Firstly, the requests of ants are all related to users, so we divide the data horizontally according to the dimensions of users. For example, in this diagram, we divide all users into three groups. Then we also deployed our application into three groups of independent luS. The application and data of each LUS are independent, equivalent to each LUS processing 1/3 of the total user data.

At this time, the users of three different terminals, no matter on PC or mobile phone or scanning two-dimensional code, when the request enters the unified access layer, the access layer will forward the user request to the corresponding logical unit according to the grouping rules of the above logical unit. For example, the request of User0 is forwarded to S0. Subsequent calls and data between applications are only in UNIT 0. Unified user1 only goes to LOGICAL unit 1, and user2 only goes to logical unit 2.

We call this logical unit a RegionZone. In actual deployment, the number of PHYSICAL DATA center IDCs and luS is not equal. For example, in the diagram, the physical room can be two and three centers, while the RegionZone is divided into five.

Geo-redundant disaster recovery (GEO-redundant) is a disaster recovery guide for financial institutions. It requires that two data centers be set up in the same city or a nearby area (≤ 200 KB). One data center is responsible for daily production operations. The other is the disaster backup center, which is responsible for running application systems after a disaster. At the same time, establish a remote Dr Center (> 200KM).

With this unitized architecture as the guiding ideology, ant carried out large-scale transformation, including application transformation, basic framework transformation, data center construction.

After the machine room construction was completed, Ant Financial divided its users into several parts, divided several logical units, and deployed them into different physical machine rooms, and completed large-scale data migration at the same time.

From the two regions and three centers to the three regions and five centers with stronger DISASTER recovery capabilities, you only need to build an equipment room in the third city, deploy part of the RegionZone to the third city, and then complete data migration and traffic diversion.

Each RegionZone is backed up in the remote region. When a city-level fault occurs, the unified control center pushes the new logical rule to the unified access layer and the standby RegionZone in the remote region, so that the city-level disaster recovery switchover can be implemented.

After that, ant engineers also built flexible LDC capabilities based on the idea of unitization. For example, when large-scale activities started, we dynamically dispatched related applications to other temporary machine rooms (such as the machine room on the cloud), and then introduced traffic. For example, the example in the figure directs 10% of the traffic to ZoneX. When the event is over, we redirect the traffic back.


SOFAStack Open-source status

As can be seen from the previous servitization evolution, the scenarios facing ant’s microservice architecture have gradually developed from simple remote invocation to complex remote multi-activity scenarios of three places and five centers. In order to support these scenarios, Ant middleware developed a set of middleware SOFAStack.

SOFA in SOFAStack is an acronym for Scalable Open Financial Architecture, a set of middleware for rapidly building finance-level distributed architectures and best practices honed in Financial scenarios.

This is our internal technology stack, and you can see that we have internal systems or components for each function point in the microservices space. It includes RPC framework, service discovery, dynamic configuration, fusing flow limiting, distributed transaction, sub-library sub-table and other middleware.

This is our OpenSource Landscape. At present, only some components are OpenSource, and some components are still in preparation for OpenSource. Although many internal components are not OpenSource, we will open mature components that have been OpenSource in each micro-service field.

For example, for service discovery in microservices, we don’t open source our internal SOFARegistry, but we have connected with mature registry products such as ZooKeeper/ Etcd/Nacos. For distributed tracking, although we open source our own SOFATracer, But at SOFARPC we also offer SkyWalking as our implementation of distributed tracking. By maintaining compatibility with many of the best open source products in the industry, SOFAStack becomes even more possible.

Details of the above part can directly read the ant gold suit micro service practice | source China share year-end celebration activity


SOFARPC of ant microservice system

In microservices, the service framework is basically an essential component. In SOFAStack, this component is called SOFARPC. Like Ant’s entire microservice system, SOFARPC has evolved internally over several generations.

It can be seen that SOFARPC has gone through more than ten years of development from the earliest support for WebService to the current open source version. From the timeline, it can be clearly seen that with the evolution of the overall architecture in front of the RPC framework has also been upgraded for several generations. SOFARPC is now open source on Github.

During SOFARPC’s fifth gen refactoring in 2017, three design principles were outlined.

The first is the core development, which is to open the interface layer, API layer, and core implementation logic, while the internal implementation retains some codes that are compatible with internal systems, compatible with old apis, or heavy historical burden.

The second is the unified model, which should unify the API model and domain model left over from history and reserve extension mechanism.

The third is equal extension. Our model should maintain good extensibility, both internal and external implementations are interface oriented programming, and treat third-party extensions equally.

Based on these principles, we first of SOFARPC system model layer, on the basis of the original more detailed, the left side is some module layer server, the right side is the client module layer, Such as the Filter/Router/Cluster/Loadbalance/Serilization/Protocol/Transport.

In order to ensure scalability, we also designed two extension mechanisms to make it easy for people to expand.

One is the ExtensionLoader shown in the sample code on the left. We can easily extend our capabilities in a manner similar to the JDK SPI mechanism. The extension method is relatively simple. You can see that you only need to implement the interface and add annotations, add configuration items, and finally obtain the extension instance according to the extension alias when using it. Registry, Filter, Router, Loadbalance, Transport, and so on are currently implemented through this extension.

The other is the Eventbus mechanism shown in the sample code on the right, in which different stages of an RPC call generate different events that are sent to the Eventbus. If an extension capability listens for events on the event bus, it can handle those events synchronously or asynchronously, and then extend some capabilities. Currently Metrics/Tracer/Fault tolerance and so on are implemented through this extension.

Registries are also part of the expansion. Currently we have Zookeeper, Consul, Nacos, etc. The community is helping us build other registries such as EtCD Eureka. If our registry SOFARegistry becomes open source, SOFARPC will be the first to support it.

Based on good scalability, the communication protocol and serialization combinations currently supported by SOFARPC are shown in figure 1. The default combination is Bolt+Hessian. Bolt is also ant’s open source NetTY-based high-performance communication framework. Bolt protocol is a self-developed TCP protocol. Hessian is a version of the open source version of Hessian with a few enhancements and deserialization security features.

In addition to this kind of TCP long connection + binary serialization, there is also the traditional Restful+ JSON way. Other Dubbo /gRPC services can be integrated into SOFARPC.

This is the server thread model under bolt protocol. We divide it into three main thread pools:

The boss thread listens for connect /read/write events on the port;

The worker thread pool processes specific data unpacking and packing actions. In addition, the header of the deserialization request will be made in the worker thread. If the corresponding interface of the request has a business custom thread pool, it will be distributed to this thread pool. Otherwise, it is distributed to the default business thread pool.

The body of the request is parsed in the business thread pool, the business logic is executed, and the response is serialized into a byte array.

There is also a callback thread pool that performs some callback processing, event processing.

The call link on the client side is shown here. The invocation of the client mainly goes through Proxy, Router, loadBalance, and Filter. These are extension points, so you can implement a wide variety of capabilities. Of course, we also built in some extension implementations, such as load balancing. We built in random, polling, weight, consistent hash algorithms, etc.

The default implementation of distributed tracking for SOFARPC is SOFATracer, which is also a distributed tracking component of Ant open source.

This SOFATracer is based on the EventBus extension mentioned earlier. As you can see, events are generated at each stage of the request, such as starting the call, sending data, receiving the request, receiving the response, and so on, all of which generate events to the event bus. The SOFATracer module has a subscriber that subscribes to these events and logs them.

In addition to basic invocation and service discovery, SOFARPC has two other unique features that I will briefly introduce here.

One of these is SOFARPC with a built-in stand-alone fault removal capability. This capability mainly applies to nodes that are in sub-health state due to some reasons (such as FullGC for a long time, hardware failure, or BUSY I/O) when the long connection between the Consumer and Provider is still on and the registry is not removed. This type of node is usually the performance of invoking high exception rate actions such as timeout.

We designed a set of models and strategies for this, as you can see in the figure above. We listen for statistical behavior based on an event mechanism (similar to Tracer, but asynchronous) and set up a unified portal to collect the results of calls. Then do the calculation according to the calculation strategy, such as every 1 minute; After calculation, the calculation results are measured. For example, if the error rate of a node exceeds 5 times of the average value, we think it needs to be degraded. For a degraded node, we perform a downgrading strategy, such as reducing the weight to 5%, and the number of requests sent to that node will be 1/20 of the original number (5% is reserved for a small number of requests for recovery). The next time this node is evaluated and measured, if the error rate is back to normal, then execute a recovery strategy, such as a 5x recovery, that is, slowly recover from 5% to 25% to 100%. In this way, the overall client availability will increase in the event of some node failure.

Another feature is link transparent transmission. In fact, SOFATracer built-in a link data, and can be transmitted to each component. However, SOFATracer’s data transparency is one-way. SOFARPC implements bidirectional transparent link transmission based on the implicit parameter transmission capability, which can be used in some authentication and A/B Test scenarios. During service use, API can be used to access data conveniently. In addition, services can be disconnected or blocked in advance on the whole link.

For more features, you can go to our open source website. In recent releases we’ve also integrated with Hystrix, Skywalking, etc.

Document the address is: https://www.sofastack.tech/sofa-rpc/docs/Home

This is the Roadmap for the next step. We will work on data verification capabilities to prevent black swan incidents, encryption and decryption capabilities, and registry integration under our SOFA system. In 6.0.0, we will upgrade to JDK8, integrate with gRPC, support etCD as a registry, service registration and configuration model separation, etc. Welcome to pay attention.

If you have the ability you want, you are also welcome to give us an Issue or PR.

https://github.com/alipay/sofa-rpc


conclusion

In conclusion, today I will briefly introduce the evolution of ant microservice architecture, from single module application to the final elastic architecture of three places and five centers.

Then we introduced our open source situation, welcome everyone to Star or contribute code. Finally, some design concepts, ideas and characteristics of SOFARPC are introduced.

In fact, we now see that more and more enterprises will choose to transform to the direction of micro-service, but the construction of micro-service system cannot be achieved overnight, and the enterprise micro-service architecture will often evolve slowly with the development of business. Microservices are moving from a trend to a best practice that greatly reduces the complexity and overhead of microservices introduction.

This is the end of today’s live sharing, thank you!


A link to the

Video playback is also ready for you:

https://tech.antfin.com/activities/148


Address mentioned:

  • SOFAStack website:

    https://sofastack.tech

  • SOFA Github address:

    https://github.com/alipay

  • SOFA code cloud address:

    https://gitee.com/alipay

  • SOFARPC features:

    https://www.sofastack.tech/sofa-rpc/docs/Home

  • SOFARPC Github

    https://github.com/alipay/sofa-rpc


Lecturer at the view




Long click to follow, not miss every technical live broadcast

Welcome to jointly create SOFAStack https://github.com/alipay