preface

“Micro-service architecture” has been a hot word for a period of time. There are many discussions and topics about micro-service architecture on various technical public accounts or architecture sharing conferences. For most of the start-up Internet companies, the early application of monomer structure is the most appropriate choice, only when the business of rapid development, the system pressure, business complexity and expansion rate of personnel are rising fast, how fast, safe and orderly to upgrade the whole Internet software system into micro service architecture, To meet the needs of business development and the reshaping of the technical organization is the primary driving force behind microservices architecture, otherwise there is no point in talking about microservices architecture.

Once the decision is made to upgrade the entire application system according to the micro-service architecture system, it is necessary to carry out the upgrading of business system, infrastructure, operation and maintenance system and other aspects in an organized and planned way. And another reality is that which may be embarrassing general business development into the need for micro service architecture level, business development and often is very rapid, the fast development of business and growth of stress tend to bring the whole technical team again very big challenge, because you need to choose at this time, is a simple solution fast support? Or do you choose something a bit more long-term? Of course, most of this is the technical details of the problem, the control of the “degree” most of the situation is in the hands of specific engineers.

How to ensure the rapid and orderly transition of the application system and organization structure to the microservice era as a whole is a great test of the ability of the team and the level of architecture management. Achieving an 80 is considered excellent, because there are objective laws!

The author has personally experienced the whole process of a rapidly developing Internet company from single application to microservice architecture system based on SpringCloud. This article will discuss how to use SpringCloud to split microservices architecture from a technical point of view, as well as some thoughts in the process. The level is limited, the inadequacy also please forgive!

Overview of system architecture evolution

Business start-up period in the company, the main problem facing is how to turn an idea into an actual software implementation, at this time of the whole software system architecture is not so complicated, for the sake of rapid iteration, the whole software system is the “App + back-end services” * * * *, and background services only from the point of view of engineering application will split the Jar package. The software system architecture is as follows:

At this time, the functions of the whole software system are relatively simple, with only basic user, order, payment and other functions, and because the business process is not so complex, these functions are basically coupled together. With the popularity of the App (the author’s company happens to be in an Internet hot spot), App downloads have skyrocketed in 2017, while online registrations have also skyrocketed.

With the rapid growth of traffic, the pressure of the whole background service becomes very large at this time. In order to resist the pressure, we can only continuously add machines and expand the background service nodes in parallel. The deployment architecture is as follows:

In this way, the entire software system to withstand a wave pressure, the system often or occasionally some accident, however, especially as an interface in the API performance problems lead to the entire service is not available, because the interfaces are in a JVM process, while the deployment of multiple nodes, but because the underlying database, caching system is a set, So there will still be a hung all hung.

On the other hand, with the rapid development of business, complicated relatively simple function, these functions in addition to the visible to the user, will also include many users are invisible, as if baidu search, the user can see the may only be a search box, but in fact the corresponding service background may be hundreds of thousands, such as some of the growth strategy related functions: Red envelopes, sharing, and so on. There are also some such as advertising recommendations related cash functions.

In addition, the growth of traffic/business also means the rapid growth of the number of team members. If everyone still uses a set of service code to develop their own business functions at this time, it is hard to imagine what kind of scene a hundred or more people will be in the same project to overlay functions. So how to divide the business boundary, reasonable team configuration is also a very urgent thing!

In order to solve the above problems and adapt to the business and team development, the architecture team decided to split microservices. In order to implement microservice architecture, in addition to reasonably dividing the boundaries of business modules, it also needs a complete set of technical solutions.

In terms of the choice of technical solutions, there are many frameworks for service separation governance, such as WebService in the early stage, and various Rpc frameworks in the recent stage (such as Dubbo, Thirft, Grpc) **. Spring Cloud is a complete set of microservices solutions based on Springboot. Because the technology stack is relatively new and the support of various components is very comprehensive, Spring Cloud is the first choice.

After a series of reconstruction and extension, the entire system architecture finally forms a set of microservice software system centered on APP, with the structure as follows:

At this point, the entire software system based on SpringCloud has initially completed the separation of the microservice system. Core functions such as payment, order, user and advertisement are separated into independent micro-services. Meanwhile, the database corresponding to each micro-service is split according to the service boundary.

After the separation of service, the original code calls the relation between the logic function, network call relationship between transformed into service, and various micro service need according to the function of each to provide the corresponding services, how service by other service discovery and calls, became the whole micro is the key part of the service system, used Dubbo framework of students know, Service registration & discovery in Dubbo relies on the Zookeeper implementation, whereas in SpringCloud we do it via Consul. In addition, springCloud-based architecture provides a configuration center (ConfigServer) to help microservices manage configuration files, and the original API service has gradually evolved into a front gateway service with the removal of various functions.

In SpringCloud, key components such as Consul, ConfigServer and gateway service are mentioned respectively. How do these key components support this huge service system?

Key components of SpringCloud

Consul

Consul is an open source registry service developed in the GO language. It has built-in service discovery and registration framework, distributed consistency protocol implementation, health check, Key/Value storage, multi-data center and many other solutions. Eurke can also be selected as the registry in the SpringCloud framework. Consul is chosen because of its support for heterogeneous services such as GRPC services.

In fact, in the subsequent evolution of system architecture, GRPC was adopted as the call method between subsystem services when some service modules were further divided into sub-systems. For example, the payment module continues to expand, and the payment service itself is split into microservice architectures. At this time, the payment microservice is invoked in the form of GRPC, while the service registration and discovery itself still rely on the same Consul cluster.

The system architecture evolution is as follows:

Module services in the original microservice architecture will develop into independent systems after reaching a certain scale or complexity, which will make the whole microservice call link become very long. From Consul’s perspective, all services are flat.

Consul is a key service component in the entire system as microservices grow in scale. Once Consul fails, all services will cease to be available. So what exactly is Consul? How should the disaster recovery mechanism be designed?

To ensure Consul is highly available, Consul should be a cluster in a production environment (refer to network documentation for installing and configuring Consul clusters). However, there are two roles in Consul: Server and Client. These two roles have nothing to do with application services running on Consul but are classified based on Consul. In fact, Server nodes are responsible for maintaining Consul status information. Just like the Zookeeper registry in Dubbo, Server nodes in Consul are elected using the GOSSIP protocol and Raft consistency algorithm. This can be discussed separately in a later article) to elect the Leader node in the entire cluster to handle all queries and transactions and synchronize state information to other nodes.

The Client role is relatively stateless and simply forwards RPC requests to the Server node by proxy. The existence of Client nodes is mainly to share the pressure of Server nodes and serve as a layer of buffer. This is mainly because the number of Server nodes should not be too many. Because the more servers there are, the slower the consensus process is, and the more expensive synchronization between nodes is. You are advised to deploy three to five Server nodes. However, you can deploy thousands or tens of thousands of Client nodes based on the actual situation. In reality, this is only a policy. In the actual production environment, most applications require only three to five Server nodes. The Consul cluster node configuration of the author’s company is five Server nodes, without additional Client nodes.

In addition, there is the concept of Agent in Consul. In fact, every Server or Client is a Consul Agent. It is a daemon running on every member in Consul and its main function is to run DNS or HTTP interfaces. It is also responsible for runtime checking and keeping service information in sync. When we start a node (Server or Client) in Consul cluster, we start it using Consul Agent. Such as:

Consul agent-server-bootstrap-syslog \ -ui \ -data-dir=/opt/consul/data \ -dns-port=53 - recurSOR =10.211.55.3 - config - dir = / opt/consul/conf \ - pid - file = / opt/consul/run/consul pid \ - client = 10.211.55.4 \ - bind = 10.211.55.4 \ -node=consul-server01 \ -disable-host-node-id &Copy the code

In the actual production environment, the deployment structure of Consul cluster is as follows:

In the actual production case, there is no Client node, but a cluster consisting of five Consul Server nodes serves application registration and discovery of the entire production cluster. There are some details here. The IP addresses of the five Consul Server nodes are different, and specific services must be connected to the Leader node’S IP address when connecting to Consul cluster for service registration and query. However, if the Leader node fails, the corresponding application service node, How do you connect to the new Leader node that was elected through Raft at this point? Can’t you switch IP manually?

In practice, each node in the Consul cluster actually runs DNS on Consul Agent (as shown in red in startup parameters), and the IP address of the application service is the DNS IP address when connecting to Consul. The DNS resolves the address and maps it to the IP address corresponding to the Leader node. If the Leader node fails, the new Leader node will notify the DNS service of its IP address, and the DNS updates the mapping relationship. This process is transparent to all application services.

Consul ensures the stability and high availability of Consul service through cluster design, Raft election algorithm, and Gossip protocol. If a higher Dr Level is required, you can deploy two Data centers to form a remote Consul data center cluster. However, the cost will be higher depending on whether the disaster recovery is required.

ConfigServer(Configuration Center)

The configuration center is a service that manages the configuration of microservice applications, such as the configuration of databases, the configuration of certain external interface addresses, and so on. ConfigServer is an independent service component in SpringCloud. Like Consul, it is also a key component in the whole microservice system. All microservice applications need to invoke its service to obtain the configuration information required by the application.

With the expansion of microservice application scale, the access pressure of the entire ConfigServer node will gradually increase. At the same time, the various configurations of each microservice will also increase. How to manage these configuration files and their update policies (to ensure that there is no risk of online failures caused by random changes in the production configuration)? Setting up a highly available ConfigServer cluster is also an important aspect to ensure the stability of the microservice system.

In production practice, key components such as Consul and ConfigServer need to be set up in separate clusters and deployed on physical machines instead of containers. When we introduced Consul in the last section, we set up five Consul Server nodes independently. The ConfigServer is mainly an HTTP configuration file access service and does not involve node election or consistency synchronization. Therefore, the HA configuration center is set up in a traditional way. The specific structure diagram is as follows:

Git configuration files can be managed by git configuration files alone. Normally, ConfigSeever will pull git repository configurations directly from the network for the server to fetch, so that the configuration center will immediately know when the Git repository configuration is updated. However, the instability lies in the fact that Git itself is a code management tool for Intranet development. If it is read directly by the online real-time service, it is easy to pull down the Git repository. Therefore, in the actual operation and maintenance process, we carry out the version control of configuration files through Git. After completing the Mr, you need to manually synchronize the configuration of the new master branch to the local path of each configServer node. In this way, the ConfigServer service node can obtain the configuration file from its local directory instead of making multiple network calls to obtain the configuration file.

On the other hand, as the number of microservices increases, so does the number of configuration files in the Git repository. To facilitate configuration management, you need to organize configurations of different application types in a certain way. In the early days, all the applications were not classified. As a result, hundreds of microservices’ configuration files were stored in a single repository directory, which increased the cost of configuration file management. On the other hand, it also affected the performance of ConfigServer, because configurations that were not needed by a microservice would also be loaded by ConfigServer.

Therefore, the later practice is to organize according to the hierarchical relationship of configuration. The global project configuration of the company is abstracted to the top level, which is loaded by ConfigServer by default, while all other microservices are grouped by application type (by way of Git project space), and the same applications are grouped in one group. Then set up a git repository named config under this group to store the configuration files of related microservices under this group. The hierarchy is as follows:

The priority of the application loading configuration is local configuration > Common configuration > Group common configuration > Project configuration. In project engineering, for example, A service A default configuration file (” the bootstrap. Yml/application. Yml “) the configuration parameters of A, as well as local project configuration “application – production. Yml” configuration parameter B, while at the same time, Yml/application-production. Yml contains parameters C and D respectively, and has a group named “pay”. The default configuration file “application.yml/application-production. Yml” contains parameters E and F, The specific project pay-API also has configuration file “pay-api-production. Yml”, which covers the values of parameter C and parameter D in common warehouse. If this should be started with * “spring.profiles. Active =production” *, then the configuration parameters it can get (accessed via link: http://{spring.cloud.config.uri}/pay-api-production.yml) is A, B, C, D, E, F, where parameter values of C and D are the last overwritten value in pay-api-production.yml.

For the ConfigServer, the configuration types need to be matched in this way. For example, if the Finance configuration repository exists, the pay group service does not need the configuration file in the Finance space to access the configuration center. Therefore, the ConfigServer does not need to be loaded. This is where some configuration is required in the ConfigServer service configuration. Details are as follows:

spring:
  application:
    name: @project.artifactId@
    version: @project.version@
    build: @buildNumber@
    branch: @scmBranch@
  cloud:
    inetutils:
      ignoredInterfaces:
        - docker0
    config:
      server:
        health.enabled: false
        git:
          uri: /opt/repos/config
          searchPaths: 'common,{application}'
          cloneOnStart: true
          repos:
            pay:
                pattern: pay-*
                cloneOnStart: true
                uri: /opt/repos/example/config
                searchPaths: 'common,{application}'
            finance:
                pattern: finance-*
                cloneOnStart: true
                uri: /opt/repos/finance/config
                searchPaths: 'common,{application}'
Copy the code

This is done by setting the ConfigServer service’s own configuration search mode in its application.yml local configuration.

Gateway Services & Service Fusing & Monitoring

In the two sections above, we have introduced in relative detail the two key service components in the SpringCloud-based architecture. However, there are many key issues that need to be addressed in microservice architecture. For example, how can callers implement load balancing when application services deploy multiple nodes in Consul?

In traditional architecture, Consul is implemented using Nginx. However, in the introduction of Consul, only service registration & discovery and election mechanisms are mentioned, and the load balancing of Service calls in Consul is not mentioned. Is it true that application services in springCloud-based microservices are provided by a single node, even if multiple service nodes are deployed? In fact, load balancing is achieved when a service consumer initiates a call with the ** @enableFeignClients annotation and makes a service call with the @FeignClient**(“user”) annotation. Why? By default, this annotation will enable the Robbin proxy, a component that implements client load balancing by pulling service node information from Consul and polling client call requests to different service nodes. All of this is done in code inside the process on the consumer side. This kind of load is hosted on the consumer side of the application Service, which is a bit of code intrusion. This is one of the reasons why the concept of “Service Mesh” is introduced later. I won’t expand it here, but I will talk about it later.

Another key issue that needs to be addressed is the implementation of service circuit breakers and flow limiting mechanisms, which SpringCloud supports by integrating with Netflix’s Hystrix framework, as well as load balancing mechanisms on the consumer side. Due to the lack of space, I will not expand here, and I will have the opportunity to communicate with you in later articles.

In addition, Zuul components implement API gateway services, providing route distribution and filtering related functions. Other auxiliary components include Sleuth for distributed link tracking, Bus for message Bus, and Dashboard for monitoring dashboards. As the open source community of SpringCloud is active, there are many new components that are constantly being integrated, so you can keep an eye on them!

The operation and maintenance of microservices

Under the microservice architecture, as the number of services increases, the workload of online deployment and maintenance will become very large, and it will be difficult to meet the needs if the original operation and maintenance mode is still adopted. At this time, the operation and maintenance team needs to implement Devops strategy, develop automated operation and maintenance release platform, get through the process of product, development, testing, operation and maintenance, and pay attention to r&d efficiency.

On the other hand, it is necessary to promote containerization (Docker/Docker Swarm/ K8S) strategy, so as to quickly scale the service nodes, which is also an inevitable requirement under the microservice system.

The proliferation of micro services

Another issue that needs attention is how to control microservices in engineering after the implementation of microservices architecture. Blindly splitting microservices is also not a very rational thing to do, because it will lead to the entire service call link becomes unfathomable, difficult to solve problems, and waste online resources.

Reconfiguration problem

In the process of the transformation from monomer architecture to microservice architecture, reconstruction is a very good way, and also an important means to ensure service standardization and rationalization of business system application architecture. But, in general, in the stage of rapid development also means that the rapid growth of the team size, something to do in a short period of time how to make the new team is a very tough management level, because if you hire a lot of people, and a state of transition of the competition between them, will appear to reconstruct the incident a bit utilitarian, As a result, the reconfiguration is not complete, avoiding the important points, and leading to the appearance of a very high micro service architecture, but the business system is actually poor.

In addition, refactoring is an important decision made after a certain stage. It is not only a re-split, but also a re-molding of the business system. Therefore, it is important to consider the system structure of the application software and the cost of implementing them.