Author: Chao Liu, graduated from Shanghai Jiao Tong University, 15 years of cloud computing r&d and architecture experience, has worked in EMC, CCTV Securities Information Channel, HP, Huawei, netease cloud computing and big data architecture.

From: Liu Chao’s Popular Cloud Computing (ID: PopSuper1982)

What are the key points of microservices? Take a look at the entire ecology of SpringCloud.

In the process of implementing micro service, unavoidable face service aggregation and break up, when the back-end service split is relatively frequent, as a mobile phone App, often require an unified entrance, will request is routed to the different service, no matter how to split and aggregation, for mobile terminal is transparent.

 

With the API gateway, simple data aggregation can be completed at the gateway layer, which does not need to be completed at the App end of the mobile phone, so that the mobile App consumes less power and has better user experience.

 

A unified API gateway, also can undertake unified authentication and authentication, despite the each other between service call is more complex, the interface will be more, API gateway often only exposure to external interface, and the interface for unified authentication and authentication, makes the internal service access to each other, need not undertake certification and authentication, with a higher efficiency.

 

With A unified API gateway, you can set certain policies at this layer, do A/B testing, blue-green release, pre-release environment diversion, and so on. API gateways tend to be stateless and scaleable so that they do not become performance bottlenecks.

 

An important factor affecting application migration and horizontal expansion is the state of the application. Stateless service is to move this state outward and store Session data, file data and structured data in unified storage at the back end, so that the application only contains business logic.

 

States are unavoidable, such as ZooKeeper, DB, Cache, etc., and all these stateful things converge into a very centralized cluster.

 

The whole business is divided into two parts, one is stateless part, one is stateful part.

 

The stateless part can achieve two points, one is the random deployment across the machine room, namely mobility, and the other is elastic expansion, easy to expand.

 

Stateful parts, such as DB, Cache, and ZooKeeper, have their own high availability mechanism, and use their own high availability mechanism to implement this state of clustering.

 

Although stateless, but the current data processing, will still be in memory, hang up the current process data, must be part of the lost, in order to achieve this, the service should have retry mechanism, the mechanism of interface to be idempotent, through the service discovery mechanisms, to call a back-end service another instance.

 

The database is the state to save, is the most important and most prone to bottlenecks. Having a distributed database allows the performance of the database to increase linearly with the number of nodes.

 

Distributed database at the bottom is RDS, is the master/slave, through the MySql kernel development ability, we can achieve the master/slave switch data zero loss, so the data falls in this RDS, is very relieved, even if a node is hung, after the switch, your data will not be lost.

 

On top of that, there is a load balancing NLB with LVS, HAProxy, Keepalived, and a layer of Query Server. Query Server can scale horizontally based on monitoring data, and if a failure occurs, it can be replaced and fixed at any time, with no awareness of the business layer.

 

Another is the deployment of the double room, DDB has developed a data canal NDC components, can make the different between DDB synchronized in different rooms, this time not only in a data center is distributed, in multiple data centers will have a similar double live a backup, high availability has very good guarantee.

 

Caching is important in high concurrency scenarios. Have hierarchical caching so that the data is as close to the user as possible. The closer the data is to the user, the greater the concurrency and the shorter the response time.

 

There should be a layer of cache on the mobile client App. Not all data should be taken from the back end all the time, but only important, critical and constantly changing data.

 

Especially for static data, it can be fetched once after a period of time, and there is no need to fetch it from the data center. Data can be cached on the nearest node to the client through CDN for nearby download.

 

Sometimes there is no CDN, or to go back to the data center to download, called back source, in the outermost layer of the data center, we call access layer, you can set a layer of cache, will be most of the request interception, so as not to cause pressure on the background database.

 

If it is a dynamic data, or need to access the application, through the business logic of the application of generated, or to read the database, in order to reduce the pressure of a database, the application can use local cache, you can also use a distributed cache, such as Memcached or Redis, make the most of request read cache can, don’t need to access the database.

 

Of course, dynamic data can also be statically, or demoted to static data, to reduce the stress on the back end.

 

When the system is overwhelmed and applications change rapidly, it is often necessary to consider splitting the larger services into a series of smaller services.

 

So the first advantage is that the development is independent, when so many people in the maintenance of the same code warehouse, often code changes will influence each other, often can appear what I didn’t change test is not passed, and the code submitted, frequently conflict, the need for code merge, greatly reduce the efficiency of development.

 

Another advantage is that it is independent online. The logistics module connects with a new express company, and it needs to be online together with the order. This is very unreasonable behavior.

 

Another is the capacity expansion during the period of high concurrency, often only the most critical order and payment process is the core, as long as the key transaction link can be expanded, if many other services are attached at this time, capacity expansion is not economic, but also very risky.

 

Disaster recovery and degradation, in the big push, may need to sacrifice part of the function of the corner, but if all the code is coupled together, it is difficult to degrade part of the function of the corner.

 

Of course, after the separation, the relationship between applications is more complex, so the mechanism of service discovery is needed to manage the relationship between applications, to achieve automatic repair, automatic correlation, automatic load balancing, automatic fault-tolerant switchover.

 

When services are unbundled, there are so many processes that service choreography is required to manage the dependencies between services and to code the deployment of services, which is often referred to as infrastructure as code. In this way, services can be published, updated, rolled back, expanded, or scaled down by modifying the orchestration file, increasing traceability, manageability, and automation capabilities.

 

Since layout file code can also be used for warehouse management, can achieve one hundred service, update the five services, as long as the modified format file of five service configuration, when layout file to submit code automatic trigger automatic warehouse deployment upgrade script, so as to update the online environment, when found there is something wrong with the new environment, Of course you want to roll back the five services atomically, but if there is no choreography file, you need to manually record which five services were upgraded this time. With marshalling files, you can simply revert to the previous version in the repository. All operations are visible in the code repository.

 

Service after break up, the number of very much, if all configuration in a configuration file on the application of local, is very difficult to manage, as you can imagine when hundreds of thousands of process in a configuration there is a problem, it is difficult to find out, so need to have unified configuration of the center, to manage all configuration, the configuration of the distributed centrally.

 

In micro service, configuration is often divided into several classes, one kind is almost the same configuration, this configuration can be directly inside the container mirror, the second is the startup will determine the configuration of the this kind of configuration is often through the environment variable, at the time of container startup handed in, the third class is unified configuration, issued by configuring center will be required, such as in the case of large presses, You can configure the functions that can and cannot be degraded in the configuration file.

 

Number is the same process a lot of time, it is very hard to hundreds of thousands of a container, a login in to view a log, so you need to log center to collect the log, in order to make it easy to the collected log analysis, specification for logs, needs to have certain request, when all the services are subject to unified log specification, A transaction process can be traced uniformly in the log center. For example, in the final log search engine, search for the transaction number and you can see where the error or exception occurred in the process.

 

The service should have the ability of fusing, limiting flow, and degrading. When a service calls another service and there is a timeout, it should be returned in time, rather than blocked in that place, thus affecting the transactions of other users. It can return the default base data.

 

When a service finds that the called service is too busy, the thread pool is full, the connection pool is full, or there are always errors, it should be disconnected in time to prevent the fault or busy of the next service from causing the abnormal of the service and gradually spreading forward, resulting in the avalanche of the whole application.

 

When it is found that the whole system is really overburdened, you can choose to downgrade some functions or some calls to ensure that the most important transaction flows pass, and the most important resources are devoted to the most core processes.

 

Still have a kind of means is the current limit, when setting the fusing strategy, and setting the reversion strategy, through the link of the stress tests, should be able to know the supporting capacity of the whole system, and therefore requires current-limiting strategy and guarantee system in the tested support ability within the scope of services, beyond the support ability, denial of service. When you place an order and the system pops up a dialog box saying “system busy, please try again”, it does not mean that the system is down, but it means that the system is working properly, but the traffic limiting policy is working.

 

When the system is very complex, there are two main aspects of unified monitoring, one is health, and one is where the performance bottlenecks are. When the system is abnormal, the monitoring system can cooperate with the alarm system to discover, inform and intervene in time, so as to ensure the smooth operation of the system.

 

When the pressure test, often encountered bottlenecks, also need to have a comprehensive monitoring to find bottlenecks, while preserving the site, so that traceability and analysis, all-round optimization.