On the logic of micro-service infrastructure

The main purpose of this article is to briefly introduce the various modules needed for micro-service infrastructure construction and the reasons for those who are new to micro-service.

The starting point

First, we have to have a “service.” By definition, we can treat each service instance as a black box. The box has well-defined input and output points and is (ideally) only associated with the outside world through those input and output points. Each service instance has its own network address, independent computing resources, and is deployed independently. The client invokes the service API by accessing the address of the service instance. Different services can also call each other.

Configuration manager: Manages configurations in a unified manner

In microservices, each service is deployed and run independently, and the team can add and subtract computing resources as needed. A service may run multiple instances, and each instance needs to be configured. For the convenience of uniform Configuration adjustment, we can centralize the Configuration, and each service instance goes to the Configuration Manager for Configuration. When the configuration is updated, we can also ask the service instance to pick up the new configuration.

Service roster: decoupled host address

This leads to a problem: network addresses (such as IP addresses) are easily changed by capacity expansion and maintenance, making it difficult for callers to know the available addresses in real time.

For this reason, we can abstract the network address into a concept that is not easily changed, such as giving each service a fixed name. The Internet uses DNS to solve this problem, which corresponds to the Service Registry in the microservices infrastructure.

During running, each service instance sends registration information, including the service ID, access address, and health status, in the form of a heartbeat to the service registry. In this way, when a service needs to be accessed, the client can first ask the service roster for the available instance address, and then access the instance to invoke the service. In addition to better locating instance addresses, the service roster can temporarily remove instances from the roster when they are taken offline, maintained, or upgraded, bringing the service online.

The same is true for calls between services, where the network address is retrieved from the roster and then called.

API gateway: entry and route

To find register address, and then call the service API, these are to do the chores every client, we can put these things completely abstract, concentration, integrated service API into the center of a large, and then to address and call the service API details such as packaging, all the client only with the center of the dialogue, No more direct access to individual services.

Structurally, this central point divides the entire architecture into two parts: inside are all the services, outside are the clients, and in the middle is the central point. As the only channel between inside and outside, it is appropriately named “API Gateway” and sometimes called “Edge Service.”

As the only gateway, the API gateway is on the cutting edge, so it sometimes hosts other common functions, such as authentication, which we’ll talk about in a moment.

Authentication service: Identity and permission issues

As we continue to develop with this architecture, we run into a new problem: inconvenient authentication.

Auth consists of two parts: Authentication and Authorization. Authentication is about who you are. Authentication is about whether you can do something.

Identity and permissions are highly centralized concepts.

For a system, the identity of users must be uniform. You can’t say that this user is Sam when he does this thing and Sam when he does that thing. In addition, the authentication status of users should be uniform. It is not possible to say that a user is logged in when accessing one service and is logged off when accessing another. Therefore, there can only be one authentication party.

Permissions are a little more complicated. Unlike identity, permissions are usually divided into two categories: functional permissions and data permissions. This division mirrors a pattern of authority that is common in the real world: your role defines your functions, and the scope of your functions is usually limited by conditions attached. For example, if you are A judge, you have the right to decide cases, but you are A judge in District A, you can only decide cases in district A. Another example is the manager of a fast-food restaurant who has access to staff details, but only to his own staff.

Both permissions are determined by global rules, not in the executive branch. Who decides a case, for example, depends on the law, not the courts. Who can access whose data is not determined by the data storage department, but by regulations.

In real life, an organization may have a special audit department to verify permissions, but for those permissions that are not particularly sensitive, the organization will let the departments verify themselves. But whoever performs the verification must have the same copy of the rules and regulations, not the same words. This system must be formulated and maintained by the central organization. In other words, the management of permissions should also be centralized.

Once the centralization of authentication is clear, we can develop a common authentication service that performs authentication and authorization. The next question is: who initiates authentication?

All invocations of services require the caller to know his or her identity, so the more advanced the natural authentication, the better. The API gateway as the gateway is the natural choice for initiating identity authentication. Permission validation is a bit more complicated and deserves another article. For the moment, we assume that permission validation is also initiated by the API gateway.

Message mediation: asynchrony and notification

Development continued, everything was smooth, there were no technical problems for the time being. There is, however, a business problem to solve.

For example, when we build an online shopping mall, the warehouse is required to start the process of preparing and delivering goods at the moment the order is successfully created. The problem is that the order service and warehousing are two different services with different teams in charge, and the order service does not care about warehousing related issues in terms of focus, so it is impossible for the order service to actively inform the warehousing service when creating the order. The warehouse service can only poll the order service periodically to see if there are new orders. Not only is this cumbersome, but it’s not real-time enough.

If we think about it, we see that this demand is very common, and the producer of the information does not know (and does not care) who is interested in the information. For example, we may have a monitoring service to display product sales in real time, and a BI service to obtain customer purchase information for analysis, etc. Since this is a common requirement, we might as well model it into a mechanism where the information producer sends out the notification, and the person receiving the notification decides whether action is needed.

This means we need to introduce another centralized common service: Message Broker. When an event occurs (such as a successful user activation or order creation), the service can send a message to the message queue. Other services can subscribe to these messages and react to them.

For example, a warehouse service can subscribe to a successful order creation message. In this way, after the order is successfully created, the order service sends this message to the message intermediary, the message intermediary notifies the warehouse service, the warehouse service takes a look, asks the order service for the new order information, and finally, starts the outbound process.

In addition to broadcasting events, message mediations can also make asynchronous calls. Convert synchronous calls to asynchronous callbacks. Calls that take a long time and do not require real-time results can improve performance and experience.

Front end: Optimize front end development

Up to here, in fact, the system is relatively complete. The question now is how to better align the microservices infrastructure with the structures that are common to r&d teams. This requires us to look at the design of the entire infrastructure from the perspective of Conway’s Law.

In the software development process around users and value, we often use user history and user story to capture and track the realization of value. A user story usually consists of a business step with clear boundaries, clear acceptance criteria, and clear values.

The problem is that there are two sides of a story that are out of sync. The front end is driven by business processes and design and wants to produce sequentially; The back end, driven by business resources and modeling, is expected to be modular.

For example, the front end often adjusts the fields it needs for design reasons, while the back end does not need this from the perspective of modeling and has no motivation to frequently follow the front end’s adjustment. As a result, the front end has to transmit excess information under unstable network conditions, occupying precious network bandwidth.

In addition, when the front end presents a business step, there are two types of information that are not currently required but often need to be displayed together with the required information. One is status information, such as the current login status and user name, the number of short messages and so on. One is vertically relevant information, for example, when presenting the relevant articles in passing.

This requires the front end to invoke multiple different services at the same time as the main service. Not to mention the possibility of call timeouts and errors, just a bunch of asynchronous requests can make a big dent in front-end efficiency.

In microservices, these problems are even more acute because it’s not just the front end that is different now, but the different services that are being managed by different teams. These teams have different demands and agendas, making it difficult to respond as quickly as the front end requires.

These problems and annoyances may create a “buffer zone”, such as a person on the back end to meet the needs of the front end, or a person on the front end to talk about the needs of the back end. According to Conway’s Law, this communication system, over time, is easily precipitated in the form of software, forming a dedicated middle layer.

It is impossible to reconcile the synchronization of the front and back ends, and this middle layer is a natural solution that can be preserved. The new question is, what is its job? Where should I put it? Who should maintain it?

Analysis down, its responsibility has two. The first is to decouple the work of the front and back ends to reduce mutual influence. What the front end needs can be written in the middle layer, and it doesn’t matter if it changes frequently. If the back end is not ready, the front end can simulate fake data at this layer without blocking. The second is to improve the operational efficiency of the front end. The front end can be the need for multiple services of the unified summary of things, one take, so as not to send multiple requests.

It is placed within the API gateway so that it can enjoy the benefits and protection provided by the API gateway.

Finally, there is maintenance. According to the principle of “who advocates, who provides proof”, since there is this middle layer, the benefits let the front end get, so, theoretically should be maintained by the front end.

Thus, a middle tier is defined that primarily serves the front end. Different types of front ends (desktop, mobile) may have different needs, and to avoid fragmentation of the middle tier, we can tightly couple each middle tier with specific front end types, such as desktop specific, mobile specific. In this way, each middle tier is like a proprietary Backend for some type of Frontend, hence the name “backend-for-frontend” (BFF).

Loop fuse: improve fault tolerance

Now that debugging is convenient, we move on to development. There were no problems at first, but when deployed to a pre-production environment, there was another problem: the overall system had very low fault tolerance. A small mistake can easily be passed on and amplified, leading to the collapse of the entire system.

As we all know, the hardest part of programming is remote calls. Local calls most of the time result in “success” or “failure,” but remote calls are likely to result in “no response.” “No response” may be normal, the person may give you the result later, or it may be because the person is dead and cannot give you a response. The worst result is that the door is crowded with people, everyone is waiting for you to give the results, and you are also waiting for others to give the results, all the resources are occupied, and so on, nothing can be done.

However, remote calls are unavoidable. In microservices, the problem is magnified. This is because the modularity of microservices is based on service units, and each service is independently deployed and operated, making calls between services commonplace.

In such a severe situation, we must improve the fault tolerance of the entire service system as far as possible from the architecture, so that the problems of individual services do not affect the overall situation.

In particular, a fuse threshold check is added to the remote call. When the call times out exceeds the threshold, the call will not be called and an error will be returned directly. After a period of time, restore the threshold and try to continue the call, repeating the previous process. The mechanism is a Circuit Breaker, and the tool is a Circuit Breaker.

In addition to isolating service instances that have failed, fuses have an important function of providing backup. Although we split everything into services, there are high and low levels of services. Some services are critical and the process can’t continue if something goes wrong, and some are branch services that don’t matter if they go wrong.

For example, when buying a product, recommendations are often made based on the user’s habits and what they are currently buying. If there is a problem with the recommendation service, it will not be recommended, and it should not affect the normal purchase process of users. Similarly, if there’s a problem with the online ordering service, we should also allow users to manually select a restaurant to order their food — it’s not a great experience, but at least the normal process can still go through. With this in mind, fuses should provide an alternative for non-essential service invocations to keep the core process as smooth as possible.

With loop fuses, the problem of remote call errors is alleviated to some extent. Combined with loop fuses and monitoring of fusing threshold changes, developers can more easily detect problems and take timely action.

Load balancer: improves service elasticity

To officially launch, we must also do a good job in Load Balancing (LB) to enhance the elasticity of the whole service. In theory, there are two ways to perform load balancing:

Client-side load balancing: Clients decide how to distribute requests. Middle-tier load balancing (MID-Tier LB) : Intermediaries such as DNS and gateway decide how to distribute requests.

The service roster now has a list of services and their corresponding instance addresses, so the easiest way for the client to load balance is to pull the addresses down and select the available addresses sequentially or at random. In the middle, there are more options for load balancing, from the outermost DNS to the gateway.

Extend the infrastructure

Now, the microservices infrastructure is almost complete. We can expand this infrastructure if we need to. When doing extensions, the architect should be careful to distinguish between what should be centralized and what should be left to the discretion of the service. For example, among the infrastructure mentioned in this article, the modules that (almost) enforce complete centralization are:

Configuration management
The service directory
The message queue

Among them, configuration management and service rosters are the infrastructure required by all services and must be unified. Message queuing and log collection, both for operation and tracing across services, must also be centralized.

Semi-centralized modules are:

routing
authentication

Both routing and authentication must be unified, as we discussed earlier. However, the service may be exposed to the world “personal use” and “passenger” and so on more than the public API (such as express company internal use of API and open to the third party logistics API), so there may be two API gateway, there will be two sets of corresponding API directory and two sets of authentication system, therefore, they are “half centralized”.

These are examples of centralised, semi-centralised choices. Each choice of centralization can make the architecture rigid and inflexible, so we should pay special attention to this problem when designing and expanding our infrastructure.

In addition to the centralization option, another focus of architecture development is to keep the business “black box”.

We extracted the association between each service, as well as the definition and validation of permissions, and each service became simple and pure, becoming a “pure business service,” equivalent to a black box containing only business rules. This way, no matter how many services and modules there are, it doesn’t matter. Businesses are also highly reusable.

In summary, once the necessary infrastructure for microservices has been set up, the rest will be adjusted based on the actual situation and project experience. For example, we may choose to consolidate many functions into a single layer to avoid unnecessary performance loss caused by excessive layering, or fine-tune the details of the entire infrastructure. As long as the relationship between “central-self-care” and “business-non-business” is well managed, the infrastructure can develop healthily.

Summary of micro service infrastructure

In conclusion, a microservice infrastructure should include the following components (in order of appearance in the request flow) :

Configuration management: Centralized configuration management.
API gateway: external API master directory; API dependencies; Initiate authentication.
Service roster: Registration and discovery of services.
Authentication service: provides authentication service: authenticates identity and authenticates function permissions.
Front-end and back-end: Unpack requests, invoke services, and summarize and transform results according to front-end requirements.
Message mediation: global notification mechanism; Asynchronous invocation mechanism.
Loop fuses: Isolate faulty services and wait for them to recover; Provide backup plan.
Load balancing: Avoid service overload.

It should be noted that the combination form of these components, the specific split form, whether it is necessary, need to be adjusted according to the actual situation of the project and the team. This article is for readers to know.

WeChat
Sina Weibo
Evernote
Pocket
Instapaper
Email
LinkedIn
Pinterest