Nacos Consistency protocol

Nacos technology architecture

Firstly, the technical architecture of Nacos is briefly introduced so as to have an overall understanding of Nacos. As shown in the figure, Nacos architecture is divided into four layers: user layer, application layer, core layer and various plug-ins. Then, the development process and principle realization of THE CONSISTENCY protocol of Nacos are analyzed in depth

Why does NACOS need a conformance protocol

Nacos is a component that needs to store data and in order to do that, you need to store data inside Nacos which is actually not a problem on a single machine, Simple embedded relational database but in cluster mode we need to consider how to ensure data consistency and synchronization between nodes and to solve this problem we have to introduce consensus algorithm to ensure data consistency between nodes, okay

Why does Nacos run both CP and AP protocols in a single cluster

  • In terms of service registration

Service awareness between each other current instance information can provide the normal service, must be conducted from the service discovery registry access, so for the usability of the service registry found central components, puts forward the high requirements, need in any scenario, as much as possible in order to ensure service registry found ability to provide services

Service A and service B register with the registry. Service A obtains the information access service of service B from the registry

If the data is lost, the heartbeat mechanism can quickly make up for the lost data

  • For nonpersistent data (that is, the client needs to report heartbeat for service instance renewal)

Strong consensus algorithms cannot be used

The entire cluster can provide services properly only when more than half of the nodes in the cluster are running properly

Final consensus algorithm

In the final consensus algorithm, the availability of services is more guaranteed, and the data between nodes can be agreed within a certain period of time

  • For persistent data

Strong consistency

Because all data are directly created by invoking the Nacos server, it is necessary for Nacos to ensure the strong consistency of data among nodes. Therefore, for this type of service data, the strong consistency consensus algorithm is selected to ensure the consistency of data

The NACOS cluster consists of three nodes located in different network partitions respectively. Service A registers with node 1 and then persists to the local storage corresponding to node 1. At this point, the persistent data corresponding to node 1 must be synchronized to node 2 and node 3

  • From the perspective of configuration management

    Strong consistency

On the NACOS server, the configuration must be synchronized to most of the nodes in the cluster, that is, most of the nodes in the cluster must be highly consistent

Enforce consistency algorithm selection

Nacos chose JRaft because JRaft supports multiple RAFtgroups, which opens up the possibility of multiple data sharding behind Nacos

Final consistency algorithm selection

The final consistency protocol algorithm such as Gossip and Eureka data synchronization algorithm, while Nacos uses Distro algorithm developed by Ali, which concentrates the advantages of Gossip and Eureka synchronization algorithm

Gossip shortcomings

For the original Gossip, because the nodes that send messages are randomly selected, it is inevitable that messages are repeatedly sent to the same node, which increases the transmission pressure of the network and brings extra processing load to the message nodes

Distro

The concept of Server is introduced. Each node is responsible for a part of the data and synchronizes its own data to other nodes, which effectively reduces the problem of message redundancy

The early Nacos consistency protocol

Early Nacos version architecture

The service registration and configuration management conformance protocols are separate, the kernel module that does not sink into Nacos evolves as a common capability, the implementation of the service discovery module conformance protocol is strongly coupled to the logic of the service registration discovery module, and some concepts of service registration discovery are rife. This makes the logic of service registration and discovery module of Nacos complicated and difficult to maintain, coupled with the data state of the consistency protocol layer, it is difficult to achieve complete separation of computing and storage, and has a certain impact on the infinite horizontal capacity expansion of computing layer

The idea of solving this problem

It is necessary to abstract the consistency protocol of Nacos and submerge the Nacos architecture to make it a Core module, which completely makes the service registration and discovery module only serve as computing power, and lays the architectural foundation for configuring the module to external database storage

The current Nacos architecture

New architecture has completed the agreement from the original service registration found module sank to the kernel module, and as far as possible the abstraction that provides a unified interface, makes the upper service registry module and configuration management module, no longer need any consistency semantics, coupling decoupling abstraction layers, each module can rapid evolution, And performance and availability have improved dramatically

How does Nacos achieve the consistent sinking protocol

The two basic methods of consistency protocol are getData and Write.

Consistency protocol has been abstracted in the package of consistency, Nacos for AP, CP consistency protocol interface using abstractions are in it, and in the implementation of specific consistency protocol, using plug-in plugable form, Further, the consistency protocol is implemented to decouple the two modules of logic and service registration discovery and configuration management

The two computing modules couple the interface with state

It is not enough to do consistency protocol abstraction only, then service registration discovery and configuration management, still need to rely on consistency protocol interface, in the two computing modules coupled with the interface with state; And although compare a high degree of consistency protocol abstraction, service module and configuration module is still or in your own code in the module to display the processing of the agreement of the read and write requests logic, he needed to realize a butt joint agreement and storage, it is not good, the service discovery and configuration module, More attention should be paid to the use and calculation of data, rather than how to store data and how to ensure data consistency. Data storage and multi-node consistency should be guaranteed by the storage layer. To further reduce the frequency of conformance protocols in the service registry discovery and configuration management modules, and to make conformance protocols as aware as possible only in the kernel module, Nacos does another work here – data storage abstraction

Data storage abstraction

If consistency protocol is used to implement a storage, then the service module and configuration module will be changed from relying on the consistency protocol interface to relying on the storage interface, and the concrete implementation of the storage interface is much richer than consistency protocol. And the service module and the configuration module do not have to undertake the extra coding work (snapshot, state machine implementation, data synchronization) to rely directly on the consistency protocol. These two modules can focus more on their core logic

Architecture evolves further - the computing and storage layers of Nacos are completely separated

Nacos develops Distro protocol

Distro protocol, an AP distributed protocol developed by THE Nacos community, is a distributed protocol designed for temporary instances, which ensures the normal operation of the whole temporary instance processing system even if some Nacos nodes go down. Serving as an embedded protocol for stateful middleware applications, Distro ensures unified coordination and storage of massive registration requests among Nacos nodes

Design idea

  • Nacos each node is equal and can process write requests while synchronizing new data to other nodes
  • Each node is only responsible for part of the data, and periodically sends its own data verification value to other nodes to maintain data consistency
  • Each node processes read requests independently and sends responses locally in a timely manner

Distro protocol works

Data initialization

The new Distro nodes will perform full data pull by polling all Distro nodes and sending requests to other machines for full data pull

After the full pull operation is complete, all the currently registered nonpersistent instance data is maintained on each Nacos machine

Data validation

Distro clusters start up and periodically send heartbeats between each machine. The heartbeat information is the meta-information of all data on each machine (the meta-information is used to ensure that the magnitude of data transmission in the network is kept at a low level). Each machine sends a data verification request to other machines at a fixed interval

If a machine finds that data on another machine is inconsistent with local data during data verification, it initiates a full pull request to complete data

The write operation

For a Distro cluster that has been started and completed, a client-initiated write is processed as follows when a write request for a registered non-persistent instance is sent to a Nacos server

The whole step consists of several parts (from top to bottom)

  • The front-loaded Filter intercepts the request, calculates Distro’s responsibility node based on the IP and port information contained in the request, and forwards the request to the Distro’s responsibility node
  • The Controller on the responsible node parses the write request
  • Distro protocol periodically performs Sync tasks to synchronize all instance information to other nodes

A read operation

Because Distro machines have the full amount of data on each machine, the Distro machine pulls the data directly from the locality during each read operation. A quick response

This mechanism ensures that Distro protocol can act as an AP protocol that responds to read operations in a timely manner. In the case of network partition, all read operations can also return normally; When the network is restored, Distro nodes will merge and restore the data from each data fragment

summary

Distro protocol is Nacos’ consistency protocol for AD hoc instance data development. Data is stored in the cache, and full data synchronization is performed upon startup and data verification is performed periodically

Distro protocol is designed so that each Distro node can receive read and write requests. All Distro protocol request scenarios fall into three main categories

  • When the node receives a write request belonging to the instance it is responsible for, it writes directly
  • When the node receives a write request from an instance that does not belong to the node, it forwards the request to the corresponding node in the cluster to complete the read and write
  • When the node receives any read requests, it queries directly on the local machine and returns them (because all instances are synchronized to each machine)

Distro protocol, as a temporary instance consistency protocol embedded in Nacos, ensures that in the distributed environment, the state of service information on each node can be timely notified to other nodes, and can maintain the storage and consistency of hundreds of thousands of service instances

note

This paper refers to “Nacos Architecture and Principle”