preface

Welcome to the previous series:

  • Microservices Design Learning (I) About Microservices and how to model services
  • Microservice Design Learning (II) About service integration

In today’s world of microservices, the granularity of services is broken down to a very fine level, followed by a rapid increase in the number of services. In the wave of cloud native, service governance is more often combined with container scheduling platform to form a one-stop automatic scheduling governance platform.

Of course, the principles and scope of service governance will remain the same regardless of whether container-based scheduling systems are used, but only in different ways.

Service governance includes service discovery, load balancing, traffic limiting, fusing, timeout, retry, and service tracing. What we’re going to talk about today is service discovery.

This chapter summary

This chapter mainly introduces the following contents:

  1. What is service discovery? (what)
  2. Why do you need service discovery in a microservices framework? (why)
  3. How does service discovery work? (how)
  4. The CAP theorem
  5. Implementations of several solutions available

Now, let’s begin our journey.

Service Discovery (WHAT)

Service discovery refers to finding network location information for a service provider.

This refers to the use of a registry to record information about all services in a distributed system so that other services can quickly find those registered services.

Service discovery is a core module that supports large-scale SOA and microservice architectures and requires key functions such as service registration, service lookup, service health check, and service change notification.

Why is service discovery needed? (according to)

Without a service discovery module, the configuration of service network location information is coupled to the configuration of specific service consumers, making the system difficult to maintain.

Consider this basic question: How do service consumers know the IP and port of a service provider?

In simple architectures, static configuration (such as DNS, Nignx load balancing configuration, etc.) works well. Each service is deployed in the same location and rarely changes. There is no need for elastic expansion. The network address of a traditional single application is less likely to change. When the network address changes, o&M personnel can manually update and load the configuration file.

But is not the case in the micro service architecture, micro service updates, publish frequently, and often according to load condition for elastic slip, because the service application example of network address change is a very normal thing, and we mentioned earlier static configuration solution, obviously not suitable for such a highly dynamic scenarios. Therefore, we need to provide a mechanism (module) that enables service consumers to quickly and timely obtain the latest service information when the IP address of the service provider changes.

How does service discovery work? (how)

As mentioned earlier, a registry is used to record information about all services in a distributed system so that other services can quickly find those registered services.

Let’s use two examples to illustrate how service discovery works (simple version) :

  1. biz serviceStart, tell the service center its service information, and the service center finishes writing
  2. admin serviceStart, request service centerbiz serviceService information of
  3. The service center searches for the location information of the corresponding service and returns it toadmin service
  4. admin serviceAfter obtaining the actual address, thebiz serviceThe initiating

What does the workflow look like when Biz Service is clustered and more nodes come online?

  1. The newly started node tells the service registration discovery center its service information, and the service center writes it

  2. The admin service initiates a request to update the biz Service address list

    Here is the client to actively request to update information, is the way of “pull”; Another option is for the client to register a callback and wait for notification from the service center.

The CAP theorem

Although the whole operation mechanism of “service discovery” is simple to understand, in the actual distributed scenario, as a core of microservices architecture, we definitely need to adopt the way of clustering to ensure its high availability. At this point, you need to consider some of the problems that a distributed system might encounter.

In a distributed computer system, only two of the three basic properties of Consistency, Availability and Partition tolerance can be simultaneously satisfied. This is the famous CAP theorem.

  • Consistency: The ability of all nodes to return the same latest copy of data at the same time;

  • Availability: the ability to return a non-error response on every request;

  • Partition fault tolerance: Refers to the fact that the communication between servers does not affect the system operation even if the communication cannot be maintained for a certain period of time.

Partitioning fault tolerance is a must for distributed systems. Therefore, a trade-off must be made between consistency and availability, which is called “choose AP or CP”.

If you are not familiar with the CAP theorem, you can read the CAP theorem

For service discovery and registry clusters, if consistency is chosen at the expense of availability (CP), then in order to ensure the consistency of data on the multi-point service center, once the service center at one point is down, the service center cluster needs to stop providing external data writing service. The availability of write services is sacrificed while ensuring data consistency in the service center cluster. If you choose to sacrifice consistency availability (AP), so in order to ensure that services do not interrupt, when a point of service center is down, still alive service center node can choose to write data into the local store first and then returned to the client directly, but this will result in data inconsistency between multiple nodes.

The systems provided by the industry for service discovery registration are essentially AP or CP compliant systems.

Implementations of an existing solution

In distributed service system, all service providers and consumers depend on [service center]. If there is a problem in the service center, there will be insensitive service state perception and other phenomena, which will affect the whole system. Therefore, ensuring the availability of registries for service discovery is critical. In order to ensure the availability of the registry, it is necessary to ensure multi-node deployment. If it is a large website, it is usually deployed across multiple computer rooms to ensure that the registry can still provide services when a single computer room is unavailable. The service center with the high availability feature must have the following capabilities:

  1. Supports multi-node deployment
  2. Ability to heal and adjust in distributed situations
  3. The node health check function allows you to remove a node whose access has timed out from the current cluster or add the node to the current cluster after its access has been restored

In the following sections, we introduce several common products that can be used directly as registries.

Zookeeper

Zookeeper aims to provide a highly available distributed coordination system with strict sequential access control capabilities. It is a distributed data consistency solution.

ZooKeeper provides complete solutions such as distributed notification and coordination, configuration management, naming service, primary node election, distributed lock, and distributed queue. Distributed notification and coordination are widely used in service discovery. It is by far the oldest and most widely used product in service discovery.

This article does not specifically introduce Zookeeper consistency protocols, data structures and other knowledge. If you are interested, you can read the author’s previous Zookeeper articles

Zookeeper cluster architecture, read/write mechanism, and consistency principle (ZAB protocol)

Zookeeper is a CP system because of its read/write mechanism and consistency protocol.

Advantages and Disadvantages

As the most widely used distributed coordination component, ZooKeeper has many advantages. Its wide use is one of its greatest strengths, which makes It easy for ZooKeeper to take advantage of technology selection by architects. However, to be clear, Zookeeper is not the best choice for service discovery. Its advantages are mainly reflected in distributed and highly consistent scenarios such as elections and distributed locks. When the primary ZooKeeper node loses contact with other nodes due to a network failure and triggers a system-wide election, the cluster becomes unavailable, which causes the registration service system to break down during the election.

Why didn’t Alibaba use ZooKeeper for service discovery?

The service center is not very demanding for data consistency, and it is difficult to realize real-time outage awareness (there will be delays), but more important is self-healing ability. The caching capability of Zookeeper’s client Curator can make Zookeeper more adaptable in the field of service discovery, but this is not Zookeeper’s original capability or design intention.

etcd

With projects such as CoreOS and Kubernetes gaining popularity in the open source community, the ETCD component used in both projects is increasingly being recognized by developers as a highly available, consistent repository for service discovery.

Etcd is a ZooKeeper-inspired project with a similar architecture and functionality, based on the more straightforward Raft protocol and GO language implementation. Etcd is also a CP system that requires more consistency than availability. Etcd uses TTL(Time To Live) To implement functions similar To those of the Zookeeper temporary node. The ETCD client needs To continuously renew the node lease periodically To determine the running status of the service.

You can read the extended article below.

Read more about ETCD: From application scenarios to implementation principles

Liverpoolfc.tv: etcd. IO /

Again introduce a high quality ETCD implementation principle interpretation of the article: high availability distributed storage ETCD implementation principle

Compared with Zookeeper, ETCD has the following advantages:

  1. Simple. Easy to write and deploy using Go. Using HTTP as an interface is simple to use; Use Raft algorithm to ensure strong consistency and make it easy for users to understand.
  2. Data persistence. Etcd defaults to persist data as soon as it is updated.
  3. Security. Etcd supports SSL client security authentication.

Eureka

Eureka is open-source by Netflix and is primarily used to locate middle-tier services in AWS domains. Eureka has received a lot of attention because it is used as a registry for Spring Cloud. Eureka consists of two components, a server and a client. The Eureka server is typically used as a service registry server, and the Eureka client is used to simplify the interaction with the server and to provide support for service failover as a polling load balancer.

Eureka is better suited as a registry in a service discovery architecture than a CP system like ZooKeeper. Eureka prioritizes availability by adopting a decentralized design concept where the entire service cluster is composed of peer nodes and does not have to elect a master node as ZooKeeper does. Failed nodes in the cluster do not affect the service registration and query capabilities of normal nodes. The Eureka client has failover capability. If the Eureka client fails to register services with one Eureka server, it will automatically switch to another node. Therefore, as long as one Eureka server node is still functioning, there is no need to worry about the availability of the registry. However, ensuring availability inevitably leads to the loss of data consistency, and the information queried by the client may not be the latest.

Eureka 2.x has been announced as no longer being maintained, but its current features are so stable that they are sufficient for service registration/discovery without an upgrade

Consul

Consul, a product from HashiCorp, offers a list of features, including service discovery, richer health checks (fine-grained service status detection capabilities such as memory and disk usage), key-value storage, and multi-data centers (the four main features described on the website). Compared with other schemes, it has the characteristics of “one-stop shop”.

Consul is written in the Go language, making it naturally portable (Linux, Windows, and Mac OS X support); The installation package contains only one executable file for easy deployment and works seamlessly with lightweight containers such as Docker.

Like ETCD, Consul is based on raft protocol and requires that more than half of all nodes have been written to Consul to be considered registered. If the Leader fails, the entire Consul becomes unavailable during a re-election. Consistency is guaranteed at the expense of availability.

The author himself is not familiar with Consul research, and interested readers can consult relevant documents to learn.

Website: www.consul.io/

Nacos

Nacos is a new open source project of Alibaba in China. Its core positioning is “a dynamic service discovery, configuration and service management platform that is easier to help build cloud native applications”. (Commonly known as registry + configuration center)

Service is a first-class citizen of the Nacos world. Nacos supports the discovery, configuration, and management of almost all major types of “services” :

  • Kubernetes Service

  • gRPC & Dubbo RPC Service

  • Spring Cloud RESTful Service

Key features of Nacos include:

  • Service discovery and service health monitoring
  • Dynamically configured service
  • Dynamic DNS Service
  • Services and their metadata management

For more information, please read the official Nacos documentation:

Nacos. IO /zh-cn/docs/…

summary

This chapter introduces some knowledge related to service discovery, and the relevant features of the more popular service registry in the market, hoping to inspire you.

See you next time.

Refer to the article

  1. Service Discovery in a Microservices Architecture
  2. Microservices: Service Registration and Discovery
  3. Service Discovery
  4. nacos.io/zh-cn/
  5. Etcd: A comprehensive interpretation from application scenarios to implementation principles
  6. From Servitization to Cloud Native
  7. Microservice Design

If this article is helpful to you, I hope you can give a thumbs up. This is the biggest motivation for me.