To traditional enterprise friends: not enough pain micro service, pit

1. The implementation of micro-services is a complex issue involving IT architecture, application architecture and organizational architecture

After many enterprises in traditional industries visited and implemented micro-service, they found that micro-service is a very complicated problem, and even not entirely a technical problem.

At that time, I thought that since microservice is to transform applications, do microservice governance, such as registration, discovery, fusing, limiting, downgrade, etc., of course, I should start from the application development team. Generally, I would have fun at the beginning of the conversation, from single architecture, TO SOA, to microservice architecture, from Dubbo to SpringCloud. However, it will inevitably involve the release and operation of microservices, DevOps and container layer, which are not under the control of the development team. Once the o&M team is involved, the acceptance of containers becomes a problem, the difference between traditional physical machines and virtual machines, what risks will be brought, etc., etc. The fact that containers are definitely not lightweight virtualization, in particular, is not immediately clear. What’s more, even if it is clear that there are online application containers, once something goes wrong, the problem of who is responsible for the container will often lead to the blurring of the boundary between the application layer and the infrastructure layer, which makes both sides of the responsibility will hesitate.

In some enterprises, microservitization is initiated by the operation and maintenance department. The operation and maintenance department has realized the pain brought by various inconsistent applications to operation and maintenance, and is willing to accept the container operation and maintenance mode, which involves the direct service discovery of container whether the operation and maintenance should be handled at the container layer, or the application should be handled by itself. It also involves the question of whether Dockerfile is written for development or operation. Once in the process of containerization, the development does not cooperate with the operation and maintenance unilaterally to do this thing, which will only increase the annoyance but limited benefit.

The following figure shows the layers involved in the implementation of microservices. For detailed description, refer to the cloud Architect Advanced Guide

In some relatively advanced enterprises, between the operations team and development team, a set of middleware, or architecture group, responsible for promoting micro value-chain reconstruction, architecture group is responsible for both to develop and implement business service, also want to persuade operations group carry out the container, if the authority of architecture group, also tend to be driven more difficult.

Therefore, the promotion of microservices, containers and DevOps is not only a technical issue, but also an organizational issue. In the process of promoting microservices, the role of Conway’s Law can be more felt, and the intervention of a higher level of TECHNICAL director or CIO is needed to promote the implementation of microservices.

In CIO layer, however, in many companies and not experience the pain points of the technical level, and pay more attention to the level of business, business to make money, as long as the architecture of the pain, the pain of middleware, operational pain, top is not very can sense, also can’t understand micro service, technical advantage of the container, and micro service and container for business advantage, many manufacturers in said, Can speak of the surface, not the heart.

So the transformation of microservices and containers is more likely to happen in a flat organization, driven by a CIO who can feel the pain of the technical details at the grassroots level. This is why the landing of micro services is generally the first landing in the Internet company, because the organizational structure of the Internet company is taiping, even the high-level, also very close to the front line, understand the pain of the front line.

However, the traditional industry is not so lucky, there are often more layers, at this time, the technical pain is enough to affect the business, the pain can affect the income, the pain can be left behind by the competitors, in order to achieve success.

We’re going to tease out some of the pain that’s going on in this process.

Ii. Stage 1: Single architecture group, multiple development groups, unified operation and maintenance group

2.1. Organizational status of Stage 1

The organizational state is relatively simple.

A unified O&M group manages physical machines, physical networks, and Vmware virtualization resources. The O&M department is responsible for deployment and online deployment.

Each business of the development team is independent and responsible for writing codes. There is not much communication between different businesses. In addition to developing their own systems, the development team also needs to maintain the systems developed by outsourcing companies.

The traditional smokestack architecture is shown below

2.2. Operation and maintenance mode of Phase I

In the traditional architecture, the infrastructure layer is usually deployed on physical machines or virtualization devices. To facilitate mutual access between applications, a flat layer 2 equipment room network is used. That is, IP addresses of all machines can access each other, but those that do not want to access each other are isolated by firewalls.

Whether using physical machines or virtualization machines, the configuration is relatively complex. It is difficult to create a machine independently for people who have worked in O&M for many years. In addition, network planning needs to be very careful. All these need unified management by the operation and maintenance department. General IT personnel or developers have no expertise and cannot give them permission to operate. How to apply for a machine?

2.3. Phase 1 application architecture

The database layer of the traditional architecture is developed independently by outsourcing companies or different development departments. Different businesses use different databases, including those for Oracle, SQL Server, Mysql and MongoDB.

For the middleware layer of traditional architecture, each team selects the middleware independently:

Files: NFS, FTP, Ceph, S3

Cache: Redis Cluster, primary/secondary, Sentinel, Memcached

Distributed frameworks: Spring Cloud, Dubbo, Restful or RPC

Sub-table: Sharding- JDBC, Mycat

Message queues: RabbitMQ, Kafka

Registry: Zk, Euraka, Consul

In traditional architectures, the service layer, the system is either developed by outsourcing companies or by independent teams.

Traditional architecture front-end, developing their own front-end.

2.4. Are there any problems with stage 1?

There is nothing wrong with phase one, and we can even find ten thousand reasons why this model is good.

The operation and maintenance department and the Opening department are naturally separated. No one wants to take care of each other. The bosses on both sides are also rated.

Of course, the computer room can only be touched by operation and maintenance personnel. There are security problems, professional problems and serious problems in the online system. The question of who is to blame if the system is compromised and who is to blame if the system is down can silence any debate if it is deployed by a less professional developer.

There is no problem in using Oracle, DB2 or SQL Server database, as long as the company has enough budget, and the performance is really good, there are a lot of stored procedures, which will make the application development a lot easier, and there are professional Party B to help the operation and maintenance, the database is so critical, if the replacement is called Mysql, Once the anti-hanging, or open source no one to maintain, online things, who will be responsible for?

The middleware, service layer and the front end are all handled by the outsourcers or Party B, and maintained end-to-end. Any changes can be made at once. Moreover, each system is a complete set, convenient for deployment and operation and maintenance.

In fact, there is no problem, this time on the container or micro services, really asking for trouble.

2.5. Under what circumstances will stage 1 be considered a problem?

Initial pain points should be in the business level, of course, when the user begins to demand a variety of business going to a new feature in every now and then, when a new system, you will find that the outsourcing company, not can get all the things, they are the development of the waterfall model, and the developed system is hard to change, at least it is difficult to change quickly.

So you start to want to hire developers and build systems that you can control, or at least take over from the outsourcing company, so that you can be more flexible in responding to the needs of the business unit.

However, self-development and maintenance brings new problems. It is impossible to recruit such diverse DBAs for a variety of databases. People are very expensive, and with the increase of systems, the lisense of these databases is also very expensive.

There are various kinds of middleware, and each team chooses the middleware independently. There is no unified maintenance and knowledge accumulation, so SLA cannot be guaranteed uniformly. Once the message queues, caches, and frameworks used are broken, no one on the team can fix them because they are busy with business development and no one has the time to dig into the underlying principles, common problems, tuning, etc.

Front-end frameworks have the same problem, inconsistent stacks, inconsistent interface styles, and no way to automate UI testing.

When you maintain multiple systems, you will find that these systems have a lot in common at all levels, a lot of capabilities can be reused, a lot of data can be accessed. It’s the same logic here, it’s the same logic there, it’s the same type of data here, it’s the same type of data there, but the information is isolated, the data model is not unified, it’s impossible to get through.

It is when these questions arise that you consider moving to phase two.

3. Stage 2: Service-oriented organization, SOA-oriented architecture, cloud-oriented infrastructure

3.1. Organizational form of Stage 2

How to solve the above problem?

According to Conway’s theorem, there needs to be some organizational adjustment, the whole company is divided into operation and maintenance group and development group.

Since the pain points are at the business level, it is the development team that should start adjusting.

An independent front-end group should be established to unify the front-end framework and the interface, so that everyone can master the unified front-end development ability, accumulate front-end code, and develop quickly when there are new requirements.

Set up a middleware group, or architect group, who is not close to business development, and their daily task is to study how to use the middleware, how to tune, how to Debug problems, and form knowledge accumulation. If there is a unified group of people focusing on middleware, a limited number of middleware can be selected for centralized research according to their own conditions, and the business group can only use these middleware, which can ensure the consistency of selection. If the middleware is uniformly maintained by this group, reliable SLA can also be provided to the business side.

Detachments the business development team to establish China group, the ability to reuse and code, to the several groups to develop service, for use in business group, so will be unified data model, business development, first look at what a ready-made services can use need not all from zero to develop, also can improve the development efficiency.

3.2. Application architecture of Phase 2

To establish a mid-stage that becomes a service for use by other businesses, you need to use an SOA architecture to servitize reusable components and register them in a service registry.

For wealthy enterprises, they may purchase a commercial ESB bus or use Dubbo’s own encapsulation called a service registry.

The next thing to think about is, what should I take out? And then the last thing you want to think about is how do you pull it out?

Answers to these two topics, different enterprise is different, it is divided into two stages, the first stage is to try to stage, namely the whole company to split without any experience of service, of course can’t take the core business to fit in business tend to choose a corner, open out the first, this time itself is important, is to dismantle and demolition, dismantling is idealized, In line with the domain driven design best, how to dismantle it? Of course, for one or two months, the core staff will develop them behind closed doors, split and combine them to accumulate experience. Many companies are at this stage right now.

But in fact, the disassembly method at this stage can only be used to accumulate experience, because we initially want to disassemble in order to quickly respond to business requests, and this corner module is often not the most painful core business. Originally the business is on the edge of the corner, dismantlement is not big, and there is no good ability to do reuse. Reuse, of course, wants to reuse core capabilities.

So the most important thing is the second phase, the real servitization of the business. Of course, the core business logic with the most business needs can quickly respond to business requests and reuse capabilities.

For example, Koala was a single application with only one online business using Oracle at the beginning, and the real separation was carried out around the core order business logic.

What of the core business logic should be taken apart? Many enterprises will ask us, in fact, the enterprise’s own development is the most clear.

A common mistake made at this time is to separate the core business logic from the monolithic application first. For example, order logic is formed into order service, which is separated from online service.

Of course, this should not be the case. For example, when there is a battle between the two armies and the soldiers are being smoked by the cooking squads, should the Battalion of the Chinese Army be moved out or should the cooking squads be moved out? Cooking class, of course.

Another point is that components that can be reused are often not core business logic. This is easy to understand, two different business, of course, is the core business logic is different (or as a business), the core business logic is often combinational logic, while the complex, but often do not have reusability, even order, different electricity is different, the introduced what beans, that launched what stamps, another one what activities, Are core business logic differences that change frequently. What can be reused is often the peripheral logic of core business such as user center, payment center, warehouse center and inventory center.

Therefore, we should separate the peripheral logic of these core businesses from the core business, and finally Online only leaves the core path of ordering, which can be changed into ordering service. When the business side suddenly has a need to launch a buying campaign, the peripheral logic can be reused. Panic buying becomes the core logic of another application. The core logic is the fax fuse, and the peripheral logic is the data storage and atomic interface.

Which peripheral logic should be removed first? Ask your own development, those who are afraid of failing the core logic after revision are motivated to split from the core logic, which does not need the supervision of technical director and architect, they have their own original motivation, it is a natural process.

The original motive here, one is to develop independent, one is on-line independent, like koalas in the online system, warehouse group wanted to go out on his own, because they want to butt all kinds of storage system, the world so much warehouse, the system is very traditional, interface is not the same, no new docking, a development of time, are worried about the core logic order to hang up, In fact, the warehouse system can define its own retry and disaster recovery mechanism, which is not as serious as ordering. Logistics group also want to go out independently, because there are too many docking logistics companies, but also often online, do not want to put the order to hang.

You can also comb your company’s business logic, there will be their own willing to split the business, the formation of Taiwan services.

When the surrounding logic is split, some core logic, afraid of mutual influence, can also be split out, such as ordering and payment, payment connected with multiple payers, do not want to affect the ordering, can also be independent.

And then we’re going to look at, how do we break it up?

For the premise, timing, method, specification, etc., refer to the article on service separation and service discovery in microservitization

The first thing to do is to standardize the original engineering code, which we often call “anyone who takes over any module can see a familiar face”.

For example, to open a Java project, you should have the following package:

API package: All of the interface definitions are here. For internal calls, the interface is also implemented, so that once you split out, local interface calls can become remote interface calls

Access to external service packs: If this process accesses other processes, the wrapper for external access is here. For unit testing, this part of the Mock allows functional testing without relying on a third party. For service split, invoke other services, also here.

Database DTO: Define the atomic data structure here if you want to access the database

Accessing the database package: The logic for accessing the database is all in this package

Services and business logic: This is where the main business logic is implemented and where the split comes from.

External services: The logic for providing services externally is here, and for the provider of interfaces, the implementation is here.

The other is the test folder, where each class should have unit tests, and to review unit test coverage, integration tests should be implemented through Mock methods within the module.

Next comes configuring folders, configuring profiles, which fall into several categories:

Internal configuration items (no change after startup, need to restart)

Centralized configuration items (Configuration center, which can be delivered dynamically)

External configuration items (external dependencies, environment specific)

When the structure of a project is very standardized, then in the original service, first independent functional modules, standardized input and output, forming the separation within the service. The new JAR is separated before the new process is separated, and as long as the new JAR can be separated, loose coupling is basically achieved.

Next, you should create a new project, start a new process, register with the registry as early as possible, and start providing the service. At this point, the code logic in the new project can be removed first, and just call the original process interface.

Why independence as soon as possible? Even before it’s logically independent? Split because the service is the process of gradual, along with the development of new functions, the introduction of new demand, at this time, for the original interface, also can have the new demand changes, if you want to put the business logic independence, independent by half, to the new demand, GaiJiu, change new are not appropriate, the new haven’t provide service independently, if changed the old, This will result in moving from old projects to new projects, changing as they move, and making mergers more difficult. If it becomes independent as soon as possible, all new requirements will go into the new project, all callers will call the new process when they update, and the old process will be called less and less, and eventually the new process will proxy all the old process.

Then the logic in the old project can be gradually transferred to the new project. Since the code migration cannot guarantee the complete correctness of the logic, continuous integration, grayscale publishing and the microservice framework can switch between the old and new interfaces are required.

Finally, when the new project is stable and there are no calls to the old project in the call monitoring, the old project can be taken offline.

3.3. Operation and maintenance mode of Phase 2

The servitization of the business layer also puts pressure on the operation and maintenance team.

Applications are gradually split and the number of services increases.

One of the best practices for service separation is that the separation process requires continuous integration to ensure consistent functionality.

Continuous integration processes require frequent deployment of test environments.

With the separation of services, different business development teams will receive different requirements, more parallel development functions, frequent release, resulting in more frequent deployment of test environment, production environment.

Frequent deployment requires frequent creation and deletion of VMS.

If the above approved mode is still adopted, the operation and maintenance department will become a bottleneck, or it will affect the development schedule, or it will be exhausted by various deployments.

This requires the change of operation and maintenance mode, namely, the stratification of infrastructure.

What’s the difference between virtualization and cloud?

First of all, good tenant management is required, from centralized operation and maintenance management to tenant self-service mode transformation.

In other words, the centralized management mode of manual creation, manual scheduling and manual configuration has become the bottleneck. It should be changed into tenant self-service management, automatic machine scheduling and automatic configuration.

Second, resource control based on Quota and QoS should be implemented.

Control means for tenants to create resource, need not fine to the operations manual management of everything, as long as give this customer assigned tenants, allocation of the Quota, set the Qos, tenants can limit the operational range, freer to create, use, delete a virtual machine, don’t need to inform the ops, it will accelerate the speed of iteration.

Thirdly, network planning based on virtual network, VPC and SDN should be realized.

The original network uses a physical network. The problem is that the physical network is shared by all departments and cannot be configured and used freely by one service department. Each tenant can configure its own subnet, routing table, and connections to external networks at will. Different tenant network segments can be in conflict with each other. Tenants can use software to plan networks based on their requirements.

In addition to cloud-based infrastructure, o&M should also automate the deployment of applications.

If cloud computing applications are ignored and capacity expansion or automatic deployment is required, VMS created on the cloud platform are still empty and need to be manually deployed by O&M. Cloud platforms, therefore, must also manage applications.

How does cloud computing manage applications? We divide the applications into two types, one is called general purpose applications, which generally refer to some complex applications that are used by everyone, such as databases. Almost all applications use databases, but database software is standard, and although installation and maintenance are complex, it is the same regardless of who installs it. Such applications can become standard PaaS layer applications on the cloud platform interface. When the user needs a database, the point comes out and the user can use it directly.

So the second change to the operations model is the PaaS of generic software.

As mentioned earlier, there are middleware groups in the development department that are responsible for these common applications, and operations automatically deploy these applications. What are the boundaries between the two groups?

As a general practice, the PaaS of the cloud platform is responsible for the stability of the middleware created, ensuring the SLA, and automatically fixing problems when they occur.

The middleware group of the development department mainly studies how to correctly use these PaaS, what parameters to configure, the correct posture to use and so on, which is related to business.

In addition to generic applications, there are also personalized applications that should be deployed using scripts such as Puppet, Chef, Ansible, SaltStack, etc.

As a rule of practice, bare-metal deployment is not recommended because it is very slow. Automatic deployment based on virtual machine images is recommended. On the cloud platform, any virtual machine creation is image-based, and we can deploy most of the environment to be deployed in the image, with only a small amount of customization, which is done by the deployment tool.

In addition to creating VMS based on images by calling OpenStack API, the master of SaltStack is also called to send customized commands to agents in VMS.

The AUTOMATIC deployment platform NDP can be constructed based on VM images and script delivery

In this way, complete application deployment and online deployment can be performed based on VM images, which is called orchestration. Based on choreography, you can have good continuous integration, such as automatically deploying a set of environments every night for regression testing to ensure changes are correct.

After the second stage, the state looks like the one above.

Here, the functions of the operation and maintenance department have changed to some extent. In addition to the most basic resource creation, they also need to provide self-service operation platform, PaAS-oriented middleware, and automatic deployment based on images and scripts.

The function of the development department has also changed to some extent, which is divided into front-end group, business development group, middle stage group and middleware group, among which the middleware portfolio operation and maintenance department is most closely related.

3.4. Are there any problems with stage 2?

In fact, most enterprises, to this stage, has been able to solve most of the problems.

Companies that can make their architecture SOA-BASED and infrastructure cloud-based are already leading traditional industries in information technology.

The center development team can basically solve the center’s ability reuse problem, continuous integration is also basically running, making the business development team’s iteration speed significantly accelerated.

A centralized middleware group or architecture group that can centrally select, maintain, and research middleware such as message queues and caches.

At this stage, due to the business stability requirements, many companies will still use Oracle commercial database, and there is no problem.

By stage two, it has gained a certain competitive advantage in the same industry.

3.5. When will phase 2 be considered problematic?

We find that when the traditional industry is no longer satisfied with the leading position in the industry, hoping to receive Internet business, the above model will appear a new pain point.

The biggest problem facing the Internet is that the amount of requests and data brought by the huge number of users will be N times the original, and we are not sure whether it can hold up.

For example, some customers launched the Internet financial management flash sale, but the original architecture could not bear the instantaneous traffic of nearly 100 times.

Some customers are connected to Internet payment and even the largest food delivery platform in China, while the original ESB bus may not be able to survive even at its maximum scale (13 nodes).

Although some customers have used Dubbo to realize servitization, but there is no fusing, limiting traffic, degraded service governance strategy, it is possible that a request is slow, spread to a large area in the peak period, or all the requests come in, and finally can not hold up and hang a piece.

Some customers want to realize the industrial Internet platform, but the amount of data access is often PB level, if it is a big problem to carry.

Some clients started with open source caches and message queues, distributed databases, but when the read and write frequency reached a certain level, they had all kinds of weird problems and didn’t know how to tune them.

Some customers find that once they reach the Internet promotion level, Oracle database is certainly unable to carry, and they need to migrate from Oracle to DDB distributed database, but how to migrate, how to smooth the transition, the heart is not sure.

After some customer services are split, the original atomized operation is divided into two service invocations. How to maintain atomization? Either all of them succeed or all of them fail.

It is only when these problems arise that you should consider moving to the third stage, microservitization

Stage 3: DevOPs-oriented organization, microservitization of architecture and containerization of infrastructure

4.1. Application architecture of Phase 3

The transition from SOA to microservices is a critical and complex step that requires caution.

In order to be able to carry the High concurrency of the Internet, the business often needs to split the granularity is very fine, fine to what extent? Let’s look at the picture below.

Among the well-known Internet companies that use microservices, the intercalls between microservices have become so dense that they are almost incoherent.

Why break it down to this granularity? The main requirement is high concurrency.

But high concurrency isn’t cost-free, so what’s the problem with breaking it down into this granularity? You’ll find that when you’re done, you can’t miss any of these steps.

How do you keep functionality intact and Bug free — continuous integration — the cornerstone of microservices

Static resources should be separated and cached at the access layer or CDN. Most traffic should be intercepted at the edge nodes close to users or cached at the access layer. Refer to the access layer design of micro-services to isolate static and static resources

The state of the application should be separated from the business logic to make the business stateless, which can be extended horizontally based on containers. Refer to stateless and containerization of microservitization

Core business and non-core business should be split to facilitate the expansion of core business and the degradation of non-core business, refer to the service separation and service discovery of microservitization

Only in the case of large amount of data, the database has the ability of horizontal expansion and does not become a bottleneck. Please refer to the design of microservitization database and the separation of read and write

To cache layers, only a small amount of traffic reaches the PLA database, refer to the design of the cache of microservices

Message queues are used to shorten the core logic by asynchronizing multiple services that were previously invoked consecutively into listening message queues

Fuses, traffic limiting and degradation strategies should be set between services. Once the call blocking should fail quickly, instead of being stuck there, the sub-healthy services should be fuses in time without chain reaction. Non-core businesses are downgraded and no longer invoked, leaving resources for core businesses. To call the current limit within the capacity measured by pressure, it is better to deal with it slowly than to put it all in at once and break down the whole system.

There are too many services to be configured one by one. A unified configuration center is required to deliver configurations

There are too many services to view logs one by one. A unified log center is required to summarize logs

Too many services are separated, which makes it difficult to locate performance bottlenecks. You need to use APM full-link application monitoring to detect performance bottlenecks and rectify performance bottlenecks in time

There are too many services to be divided into. Without pressure test, no one knows how much it can withstand, so a full-link pressure test system is needed.

The application layer has to deal with all twelve of these issues, the last of which must be one. Are you ready to implement microservices? Do you really think springCloud can do all that?

4.2 operation and maintenance mode of Phase 3

After the transformation of business micro-service, there is an impact on the mode of operation and maintenance.

If the business is broken down into such a network of fine granularity, the number of services is very large, and each service will be published independently, live independently, and therefore have many versions.

The environment becomes so numerous that manual deployment is no longer possible and automatic deployment must be implemented. Fortunately, in the previous phase, we had implemented automated deployment, either script-based or mirror-based, but we had problems in the microservices phase.

In script-based deployment, scripts are written by o&M, but due to the number of services and changes, scripts must be constantly updated, and every company has far more developers than O&M, and O&M has no time to maintain scripts that are automatically deployed. Can scripts be written by developers? Generally, it is not feasible. Development has limited knowledge of the operating environment, and there is no standard for scripts. Operation and maintenance cannot control the quality of scripts written by development.

Virtual image-based versions are much better, because there is less scripting to do and most of the application configuration is in the image. If the delivery is based on the virtual machine image, it can also achieve the effect of standard delivery. In addition, if there is a problem with online access, it can be rolled back based on the version of the VM image.

However, the virtual machine image is too large, often hundreds of GIGABytes, if a total of 100 services, each service version per day, a day of 10,000 GIGABytes, this storage capacity, no one can stand.

That’s where the container comes in. Image is the fundamental invention of the container, the standard for packaging and operation. Other namespaces, cgroups, have been around for a long time.

The original development delivered to o&M was a WAR package, a series of configuration files and a deployment document. However, because the deployment document was not updated in time, errors often occurred in o&M deployment. With the container image, the developer delivers a container image to the operation and maintenance. The runtime environment inside the container should be reflected in the Dockerfile file, which should be written by the developer.

At this time, from the process perspective, the environment configuration is pushed forward, pushed to the development, requires that after the development is completed, we need to consider the deployment of the environment, not just hand in hand. Since container images are standard, there is no problem with scripts that can’t be standardized. If a single container doesn’t work, it must be a Dockerfile problem.

As long as the operation and maintenance team can maintain the container platform, the environment within a single container is handed over to the development to maintain. The advantage of this is that although there are many processes, configuration changes and frequent updates, the amount is very small for the development team of a certain module, because 5-10 people are dedicated to maintaining the configuration and update of this module, which is not prone to errors. You know what you change.

If all this work is handed over to a few operations teams, not only will information transfer lead to inconsistent environment configuration, but deployment will be much larger.

One of the things containers do is deliver the environment ahead of time, allowing each development to do just 5% more work to save 200% of the o&M effort and be less error-prone.

Another function of the container is to not change the infrastructure.

Container image has a feature, is SSH into the inside of any changes, restart is gone, restore to the image of the original look, also eliminate the original deployment environment, this change, that repair finally deployed the bad problem.

Because if a machine number is less, but also can log in to change things on each machine, once out of the error, is better, but the service state, the environment is so complex, the size is so big, once a node, because human modify configuration results in an error, is very difficult, so you should carry out immutable infrastructure, once deployed,, do not have to manually adjust the, Want to adjust the release process from scratch.

There is also a concept called all the code, the running environment Dockerfile is code of a single container, the container layout file is a code, the relationship between the configuration file is a code, all of the code, the advantage of the code is who changed what, Git, inside can track, some configuration is wrong, who can be found unified change.

4.3. Organizational form of stage 3

When it comes to microservices, you implement containerization, and you find that the development has done what the operation and maintenance should have done, is the development leader willing to do it? Will old developers complain to the o&M boss?

This is not a technical problem. In fact, it is DevOps. DevOps does not distinguish between development and operation and maintenance.

In fact, development and operation and maintenance have become a process of integration. Development will help operation and maintenance to do some things, such as the advance of environment delivery and the writing of Dockerfile.

O&m can also help R&D to do some things, such as registration discovery, governance, configuration, etc. It is impossible for each business of the company to have a separate set of framework, which can be sunk into the O&M group to become a unified infrastructure, providing unified management.

After implementing containers, microservices, and DevOps, the entire division interface looks like this.

This is the model of netease. As a public technical service department, Hangzhou Research Institute has the computer room managed by the operation and maintenance department. Above is the cloud platform group, which has developed a cloud platform for tenants to operate independently based on OpenStack. PaaS components are also part of the cloud platform and are available with a click to provide SLA protection. The container platform is also part of the cloud platform and provides continuous integration, continuous deployment tool chains based on the container.

The management and governance of microservices is also part of the cloud platform, which business units can use to develop microservices.

The middleware group or architecture combination cloud platform group of the business unit communicates closely about how to use cloud platform components in the right manner.

The business department is divided into front end group, business development group and middle stage development group.

5. How to implement microservices, containerization and DevOps

Microservices, containerization, DevOps, there are many technologies to choose from.

For the containerized part, Kubernetes is a good choice. But Kubernetes isn’t just about containers. It’s designed for microservices. For the implementation of micro services are involved in all aspects.

Why is Kubernetes a natural fit for microservices

However, Kubernetes is good for container runtime lifecycle management, but not strong enough for service governance.

Therefore, Dubbo or SpringCloud are preferred for microservice governance. Compared to Dubbo, SpringCloud is newer and has richer components. But SpringCloud’s components are not ready out of the box and require a high learning curve.

Thus, based on Kubernetes and SpringCloud, we have the following integrated management platform for microservices, containers, and DevOps. Includes kubernetes-based container platform, continuous integration platform, test platform, API gateway, microservices framework, APM application performance management.

Mainly to address the pain points of improvement from stage one to stage two, or from stage two to stage three.

Here are a few scenarios.

Scenario 1: How does the regression test function set remain the same when the architecture SOA is split

As mentioned before, after service splitting, the biggest fear is the introduction of a lot of bugs, through rational cannot guarantee that the function set after splitting is unchanged, so it needs to have regression test set guarantee, as long as the test set passes, the function will not be too big a problem.

Regression testing is best based on interface, because it is very dangerous based on UI. Some interfaces exist, but the UI can not be clicked. If there is a Bug in this interface, it will be temporarily hidden.

After the interface test based on Restful API, scenario test can be formed. Multiple API calls can be combined into a scenario, such as placing an order, deducting coupons, and reducing inventory, which is a combined scenario.

In addition, test sets can be formed, such as smoke test sets, to be executed when development delivers functionality to tests. Another example is the daily test set, which runs every night to see if the code submitted that day introduces any new bugs. For regression test sets, run them through before going live to make sure most of the features are correct.

Scenario 2: When the architecture is soA-oriented, how to centrally manage and provide mid-platform services

When a business wants to provide a service, the service first wants to be registered in a place where it can find the documentation of how the interface of the service is invoked when the business group develops business logic. When the business group registers, it can be invoked.

In addition to the normal registration discovery function of the microservices framework, it also provides the function of knowledge base, so that the interface and the document are maintained uniformly, and the document and the runtime are consistent, so that the caller can call by looking at the document.

In addition to provide registration, discovery, call during the authentication function, not who see the central service can be called, the need for the central administrator authorization.

To prevent the service from being invoked maliciously, the account audit function is provided to record operations.

Scenario 3: How to ensure the security of calling key services when services are soA-based

Some services are very critical, such as payment services, related to money, not who wants to call can call, once illegally called, serious consequences.

In service governance, the routing function can be configured. In addition to flexible routing functions, you can also configure the blacklist and whitelist based on the IP address or service name to specify only the applications that can be invoked. In coordination with the VPC function of the cloud platform, the calling party can be restricted.

Scenario 4: After the ARCHITECTURE is SOA-based, API services are provided to build an open platform

After the architecture is soA-BASED, in addition to providing domestic services, many capabilities can also be opened to external partners, forming an open platform. You are a logistics company, for example, in addition to the outside of your pages up and down the single send it special delivery, other electricity can also call your API to send express, which requires a API gateway to management API, docking your electricity as long as login to this API gateway, and how to can see API calls, API gateway above document management is the function.

In addition, the API gateway provides unified interface authentication and timing switch function of the API interface to flexibly control the API life cycle.

Scenario 5: Grayscale publishing and A/B testing in the Internet scenario

Next we switch to the Internet business scenario, where we often do A/B testing, which requires the traffic distribution capability of the API gateway.

We do Internet business, for example, when A new function is not clear whether the customer like, then can open to shandong customers first, when there is A field from shandong HTTP header, then access B system, other customers or access to A system, this time we can see shandong customers are like it, if all like, into the country, if you don’t like, So they took it down.

Scenario 6: Pre-delivery test in the Internet scenario

The pretest probability is often met in Internet scenario testing, although we had a test in the test environment, but the online scene is more complex, sometimes need to use real data to test online, this time can be online formal environment next to deploy a pretest environment, using the API gateway to the request of the real traffic, Part of the mirror is sent to the pre-delivery environment. If the pre-delivery environment can handle the real traffic correctly, it is much easier to go online again.

Scenario 7: Performance Pressure Test in the Internet scenario

In the Internet scenario, the real performance pressure test should be done online to know the real bottleneck of the whole system. However, performance pressure data cannot enter the real database, so it needs to enter the shadow database. Performance pressure traffic also needs to be marked in the HTTP header to let the passing business system know that it is pressure data and not enter the real database.

This particular flag needs to be added to the API gateway, but because the requirements of different compression systems are different, the API gateway needs to have a custom routing plug-in function, which can add its own fields to the HTTP header and work with the compression system.

Scenario 8: Fuses, traffic limiting, and degradation in microservice scenarios

In micro-service scenarios, fuses, current limiting, and downgrading are required when large promotion is performed. This can be done on the API gateway. The flow exceeding the pressure measurement value can be blocked out of the system by limiting the flow, so as to ensure as much flow as possible and to place orders successfully.

Between services, the micro-service framework can also be used to fuse, limit traffic, and degrade. Dubbo’s service control is at the interface level, while SpringCloud’s service management is at the instance level. Customers with different granularity choose different ones. Dubbo’s granularity is too fine, while SpringCloud’s granularity is too coarse, so it needs to be flexibly configured.

Scenario 9: Fine-grained traffic management in micro-service scenarios.

In Internet scenarios, traffic needs to be carefully managed. For example, traffic can be diverted based on the PARAMETERS in the HTTP Header. For example, VIP users can access one service and non-VIP users can access another service.

To traditional enterprise friends: not enough pain micro service, pit

Related Posts

Discussion on DHCP Protocol

Conflux CFX governance voting tutorial

Implementation principle of Java thread pool and its practice in Meituan business