The introduction

Registry is a very important component of microservices technology system, mainly responsible for the registration and discovery of services. I don’t know if you’ve ever thought, if we were designers, how would we design the registry? This article discusses the core design ideas of the registry from the perspective of designers.

Why is a registry needed

In microservices, it is common for business services to call each other. It is worth considering how to manage the IP address, port and other routing information of each service in a unified manner. Especially in cluster and cloud scenarios, routing information of service nodes may change frequently. It is particularly important in the current micro-service system that how to make the service node unaware of the call of the business side when the routing information changes. Registries were created to solve this problem.

Core functions of the registry

The core functions of the registry mainly include service registration, service discovery, health monitoring, etc. It is equivalent to the transportation hub in the micro-service system, with address information of all business services.

(1) Service registration

Service providers register with the registry and provide routing information such as service provider IP address, port, and context. In order to facilitate service consumers to obtain call information and service invocation. Serialization protocol, weight of nodes.

(2) Service discovery

Service discovery main implementation Service consumers can find the call routing information of service providers through the registry.

After the service consumer starts, it pulls the service list, and the client caches the service list information locally. Why cache? The main purpose is to get the invocation information of the service provider method in the local cache information in case the registry fails.

In addition, the registry can inform service consumers of service node changes in the services in the registry. The service consumer can pull the latest service list to update the service list data.

But do these two things guarantee that the client will update the local service list when the node changes? If network jitter occurs when the registry notifies the consumer of the callback, the notification may fail. Therefore, we also need a bottom-of-the-line measure, that is, the service consumer can periodically pull the service list information from the registry and update the local service list data of the consumer client.

(3) Health detection

The purpose of the service health check is to ensure that the service nodes registered in the registry can be invoked normally, and avoid the problem of wasting invocation resources caused by the death of invalid nodes. The registry can remove service nodes in time after detecting node anomalies.

The heartbeat of a service node is periodically reported to the registry by the client, indicating the status of the service node. But can heartbeat alone be enough for health monitoring? If a deadlock or false death occurs inside the service node, the heartbeat can still be reported normally, and the registry mistakenly believes that the service provider is still alive.

Therefore, when registering with the registry, the service provider also needs to register an internal service interface for testing to ensure the correctness of health monitoring.

(4) Data storage

For a distributed middleware, how to organize its data efficiently is an important design reference for the overall performance and usability of the middleware. In terms of data storage, we mainly consider three important indicators, namely data reliability, service availability and data consistency.

In distributed systems, CAP theory is the guiding ideology and the cornerstone of architecture. However, CAP does not satisfy both AP and CP. Sometimes data consistency requires sacrificing availability, and sometimes data consistency requires sacrificing availability. But partition fault tolerance is the foundation of high availability in a distributed system. So for the registry, is it AP or CP?

It is more inclined to AP model in usage scenarios. Registries are the base middleware, and multi-node cluster deployment is a must in order to ensure high availability. If network partitioning occurs. Each service partition has only a few nodes. If the CP model is used at this time, the service can be provided only after the partition problem is resolved and the data consistency of each registry node is restored. In between, the entire registry is unavailable. In this scenario is not acceptable to business consumers. Because such an operation is tantamount to terminating the business. Even if network partitioning occurs, some node information is known, but at least some nodes can be used, and the corresponding service provider can be queried. In this scenario, the AP model should be more consistent with the business scenario.

The data in the registry is mainly concerned with two points, one is how to store the service node registration or offline, and the other is how to find the corresponding subscribing service consumer to notify when the service node changes. For the first point, we can organize data in the dimension of cluster, for example, the three service nodes below the ProviderA cluster can be appended to the ProviderA cluster after adding points. For the second point, when the service node changes, you can find out which consumers the ProviderA cluster is subscribed to, and then notify each consumer in a timely manner.

However, for a registry cluster, each service party may be registered on different registry nodes when registering, so when the node changes, the service subscribers on different nodes need to be notified. We can handle this through the Gossip protocol, spreading messages across the registry cluster. Ensure that change messages are notified to consumers.

Third, summary

This paper mainly introduces the core functions and design ideas of the registry. Through the analysis of the registry, it can help us better understand and use the registry, and even think about whether the existing registry is inadequate and can be optimized. After that, we’ll take a look at how the brief version of the registry performs code landing.