The ins and outs of service discovery and load balancing

Question why

In the Monolithic era, most traditional software was Monolithic/Monolithic. People submit CODE to a repository, which leads to bloated applications that are hard to understand and modify, and limited scalability that can’t be scaled on demand. How does a single architecture solve the problem of multi-player cooperation? Modularity, yes, splitting by function, defining programming interfaces (apis) between modules, caring about function rather than implementation.

With the development of The Times, stand-alone programs meet the double bottleneck of computing power and storage, and distributed architecture arises at the historic moment. In distributed systems, services (RPC/RESTful apis) play a similar role. However, the service name alone is not enough to request services. The service name is only an indication of the service capability (service type), and it also indicates where the service is located on the network. And deployment in the cloud service instance IP is dynamically allocated, enlarge shrinks capacity, failure, and update the problem becomes more complex, static configuration service instance can’t adjust to the new change, the need for more refinement service governance ability, in order to solve or simplify the problem, the service discovery, as a kind of basic ability is abstract and provide, It tries to make requesting a network service as simple and transparent as calling a local function.

A service is a function. Only service associated with network closely, all network services can only be seen this noun, the service provider through the network publishing services, service user requests through the network service, the distributed system breaks through the single power and the limitation of the storage, improve the system stability, make it possible to have high concurrent mass services available, but it also increased the software complexity, Introduce new problems and challenges such as software layering, load balancing, microservices, service discovery/governance, and distributed consistency.

Service discovery

Service is divided into Service Provider and Service Consumer. If a large number of Service capabilities are to be provided, a single Service instance is obviously not enough. If thousands of services are to be provided, there needs to be a place to record the mapping between the Service name and the Service instance list. It is necessary to introduce a new role: Service mediation, which maintains a Service Registry, which can be understood as a Service dictionary, with key as the Service name and value as a list of Service providing instances. The service registry is the bridge between service providers and service consumers. It maintains information such as the latest network location of service providers and is also the core part of service discovery.

When the service is started, the service information is registered (put) into the service registry. When a service terminates, remove its own service information from the service registry.

When a service consumer requests a service, it first goes to the service registry to query (get) the list of service providers by name, and then selects a service instance from the list and requests the service to the instance.

This is the simplest service discovery model and the basic principle of service discovery. At this point, everything seems to be OK, but there are a few issues that remain unclear.

Problem settlement method

First question

If a service is not stopped but is killed by the system, it does not have the opportunity to notify the service registry to delete its service information. This leaves the registry with a message pointing to an invalid service instance, and the service consumer does not know about it. The solution is simple: Keepalive. The service provider sends a Keepalive message to the service intermediary periodically (for example, every 10 seconds). After receiving the keepalive message, the service intermediary updates the Keepalive timestamp of the service instance, and the service intermediary checks the timestamp periodically. If overdue, the service instance is removed from the registry.

Second question

How are service consumers notified of service instance list changes? There are only two methods, polling and Pub-sub. Polling is when the consumer actively asks the service intermediary whether the list of services has changed, and if so, sends the new list of services to the consumer. If there are too many consumers, service mediations are stressed to handle polled messages, and it can even become a bottleneck when there are many service categories and a large list of services. Pub-sub is a service intermediary that actively notifies service consumers. Timeliness is better than polling, but the disadvantage is that it takes up separate threads or connection resources.

Third question

What if the service mediation dies? So to solve the single point problem, clustering is often used to combat this vulnerability. There are many open source solutions for service registries, such as ETCD/ZooKeeper/Consul, that essentially use a distributed consistency database to store registry information, which addresses both read and write performance issues and improves system stability and availability.

Fourth question

Would it be inefficient for a service consumer to query the service mediation for a list of instances before requesting the service each time he or she uses a remote service? The pressure on service intermediaries is not small, right? Typically, the client caches the list of service instances so that multiple requests for the same service do not have to be queried repeatedly, reducing latency and reducing access pressure to the service mediation.

Fifth question

The preceding Keepalive has intervals. If the service instance is unavailable within this interval, the service consumer will not be aware of it, so it is still possible to send the request to a remote machine on the network that cannot provide the service, which will not work. There is no way to eliminate this completely. The system needs to tolerate this error, but some improvements can be made, such as masking failed requests to an instance to avoid multiple requests to the same invalid service instance.

Sixth question

How does a service consumer choose one of multiple service instances? How do you ensure that multiple service requests from the same service consumer are assigned to a fixed service instance (sometimes necessary)? It’s all about load balancing. There are multiple strategies, like RR, priority, weighted randomness, consistent hashing.

Service discovery pattern

There are two main modes of service discovery: client-side discovery and server-side discovery.

Client discovery mode

The client is responsible for querying the list of service instances and deciding which instance to request services, that is, the load balancing policy is implemented on the client. This pattern includes two parts: registration and discovery.

The service instance calls the registration interface of the service mediation for instance registration, the service instance is renewed through Keepalive, and the service mediation removes the unusable service instance through health check.

Service consumer request, be the first to service registry query service instance list, the registry is a database service, in order to improve performance and reliability, the client will usually cache service list (after the cache is used to ensure that the registry to hang, also can continue to work), get the instance list after the client select an instance based on load balancing strategy sends the service request.

advantages

Directly, clients can flexibly implement load balancing policies.
Decentralized, non-gateway, effectively avoid single point of bottleneck and reliability degradation.
Service discovery direct SDK integration into the client, this language integration is very good, program performance is also very good, easy to troubleshoot.

disadvantages

The client is coupled to the service registry and the service discovery logic needs to be developed for each language and framework used by the service client.
Such intrusive integration results in any changes in service discovery requiring client applications to be recompiled and deployed, and strong binding violates the principle of independence.
The offline service may affect the caller, causing the service to be temporarily unavailable.

Server discovery mode

Discovery: A service consumer sends a service request through a load balancer. The load balancer queries the service registry, selects a service instance, and forwards the request to the service instance.

Registration: Service registration/deregistration can be consistent with the above client discovery mode, or can be completed through the built-in service registration and discovery mechanism of the deployment platform, that is, the container deployment platform (Docker/K8S) can actively discover service instances and help service instances to complete the registration and deregistration.

In contrast to the client discovery mode, the client using the server discovery mode does not store the list of service instances locally and does not perform load balancing. The load balancer plays both the role of service discovery and gateway, so it is often called API gateway server.

Because a load balancer is central, it must also be a cluster; a single instance is not sufficient to support high concurrent access, and DNS is often used for service discovery and load balancing against the load balancer itself.

Http servers, Nginx, and Nginx Plus are load balancers for this server-side discovery pattern.

advantages

Service discovery is transparent to the service consumer, decoupled from the registry, and updates to the service discovery function are not aware of the client.
The service consumer only needs to send requests to the load balancer and does not need to develop a service discovery logic SDK for each of the programming languages and frameworks of the service consumer.

disadvantages

Since all requests are forwarded through the load balancer, the load balancer can become a new performance bottleneck.
Load balancers (service gateways) are central, and central architectures have stability concerns.
Because the load balancer forwards requests, RT is higher than client direct connection mode.

Microservices and service discovery

Service Mesh Is a configurable infrastructure layer that serves microservice applications and is designed to handle the large amount of network-based interprocess communication between services.

Service Mesh The Service gateway decouples invocation and communication. Without a Mesh, applications need to be aware of protocol and Service discovery methods. After using the Mesh, applications only need to be invoked.

The service discovery mode of Mesh is an upgraded version of the client discovery mode. It is implemented based on sidecar and Pilot. Sidecars (Data Plane) is responsible for discovering the address list of target service instances and forwarding requests. Pilots, the Control Plane, manage all the service registration information in the service registry.

Service Registration Mode

One option is service instance self-registration, or self-registration mode. Another option is for other system components to manage the registration of service instances, namely the third-party registration pattern.

The self-registration pattern, as described earlier, is simple enough that no third party components are required, with the disadvantage that you must implement the registration code for each programming language and framework used in the service.

The Service Registrar does not register Service instances on their own. The system component is called Service Registrar, which polls the deployment environment or tracks subscription events to know the changes to the Service instance and Registrar automatically registers the Service instances.

The main advantage of the third-party registration pattern is the decoupling of services from service registries. There is no need to implement service registration logic for every language and framework. Service instance registration is implemented by a dedicated service set. The downside is that in addition to being built into the deployment environment, it is itself a highly available system component that needs to be started and managed.

other

If a service has a large number of service instances, for example, in some header companies, a service name may correspond to tens of thousands of service instances. In this way, the query and comparison of service changes will be slow, and the amount of I/O will be much larger than expected. Usually, Version num is used to solve this problem.

Click to follow, the first time to learn about Huawei cloud fresh technology ~