This article has been authorized by the author Liu Chao netease cloud community.

Welcome to visit netease Cloud Community to learn more about Netease’s technical product operation experience.


1. The implementation of micro-services is a complex issue involving IT architecture, application architecture and organizational architecture


After many enterprises in traditional industries visited and implemented micro-service, they found that micro-service is a very complicated problem, and even not entirely a technical problem.

At that time, I thought that since microservice is to transform applications, do microservice governance, such as registration, discovery, fusing, limiting, downgrade, etc., of course, I should start from the application development team. Generally, I would have fun at the beginning of the conversation, from single architecture, TO SOA, to microservice architecture, from Dubbo to SpringCloud. However, it will inevitably involve the release and operation of microservices, DevOps and container layer, which are not under the control of the development team. Once the o&M team is involved, the acceptance of containers becomes a problem, the difference between traditional physical machines and virtual machines, what risks will be brought, etc., etc. The fact that containers are definitely not lightweight virtualization, in particular, is not immediately clear. What’s more, even if it is clear that there are online application containers, once something goes wrong, the problem of who is responsible for the container will often lead to the blurring of the boundary between the application layer and the infrastructure layer, which makes both sides of the responsibility will hesitate.




In some enterprises, microservitization is initiated by the operation and maintenance department. The operation and maintenance department has realized the pain brought by various inconsistent applications to operation and maintenance, and is willing to accept the container operation and maintenance mode, which involves the direct service discovery of container whether the operation and maintenance should be handled at the container layer, or the application should be handled by itself. It also involves the question of whether Dockerfile is written for development or operation. Once in the process of containerization, the development does not cooperate with the operation and maintenance unilaterally to do this thing, which will only increase the annoyance but limited benefit.

The following figure shows the layers involved in the implementation of microservices. For detailed description, refer to the cloud Architect Advanced Guide




In some relatively advanced enterprises, between the operations team and development team, a set of middleware, or architecture group, responsible for promoting micro value-chain reconstruction, architecture group is responsible for both to develop and implement business service, also want to persuade operations group carry out the container, if the authority of architecture group, also tend to be driven more difficult.

Therefore, the promotion of microservices, containers and DevOps is not only a technical issue, but also an organizational issue. In the process of promoting microservices, the role of Conway’s Law can be more felt, and the intervention of a higher level of TECHNICAL director or CIO is needed to promote the implementation of microservices.

In CIO layer, however, in many companies and not experience the pain points of the technical level, and pay more attention to the level of business, business to make money, as long as the architecture of the pain, the pain of middleware, operational pain, top is not very can sense, also can’t understand micro service, technical advantage of the container, and micro service and container for business advantage, many manufacturers in said, Can speak of the surface, not the heart.

So the transformation of microservices and containers is more likely to happen in a flat organization, driven by a CIO who can feel the pain of the technical details at the grassroots level. This is why the landing of micro services is generally the first landing in the Internet company, because the organizational structure of the Internet company is taiping, even the high-level, also very close to the front line, understand the pain of the front line.

However, the traditional industry is not so lucky, there are often more layers, at this time, the technical pain is enough to affect the business, the pain can affect the income, the pain can be left behind by the competitors, in order to achieve success.

We’re going to tease out some of the pain that’s going on in this process.


Ii. Stage 1: Single architecture group, multiple development groups, unified operation and maintenance group




2.1. Organizational status of Stage 1

The organizational state is relatively simple.

A unified O&M group manages physical machines, physical networks, and Vmware virtualization resources. The O&M department is responsible for deployment and online deployment.

Each business of the development team is independent and responsible for writing codes. There is not much communication between different businesses. In addition to developing their own systems, the development team also needs to maintain the systems developed by outsourcing companies.

The traditional smokestack architecture is shown below




2.2. Operation and maintenance mode of Phase I


In the traditional architecture, the infrastructure layer is usually deployed on physical machines or virtualization devices. To facilitate mutual access between applications, a flat layer 2 equipment room network is used. That is, IP addresses of all machines can access each other, but those that do not want to access each other are isolated by firewalls.

Whether using physical machines or virtualization machines, the configuration is relatively complex. It is difficult to create a machine independently for people who have worked in O&M for many years. In addition, network planning needs to be very careful. All these need unified management by the operation and maintenance department. General IT personnel or developers have no expertise and cannot give them permission to operate. How to apply for a machine?


2.3. Phase 1 application architecture


The database layer of the traditional architecture is developed independently by outsourcing companies or different development departments. Different businesses use different databases, including those for Oracle, SQL Server, Mysql and MongoDB.

For the middleware layer of traditional architecture, each team selects the middleware independently:

  • Files: NFS, FTP, Ceph, S3

  • Cache: Redis Cluster, primary/secondary, Sentinel, Memcached

  • Distributed frameworks: Spring Cloud, Dubbo, Restful or RPC

  • Sub-table: Sharding- JDBC, Mycat

  • Message queues: RabbitMQ, Kafka

  • Registry: Zk, Euraka, Consul

In traditional architectures, the service layer, the system is either developed by outsourcing companies or by independent teams.

Traditional architecture front-end, developing their own front-end.


2.4. Are there any problems with stage 1?

There is nothing wrong with phase one, and we can even find ten thousand reasons why this model is good.

The operation and maintenance department and the Opening department are naturally separated. No one wants to take care of each other. The bosses on both sides are also rated.

Of course, the computer room can only be touched by operation and maintenance personnel. There are security problems, professional problems and serious problems in the online system. The question of who is to blame if the system is compromised and who is to blame if the system is down can silence any debate if it is deployed by a less professional developer.

There is no problem in using Oracle, DB2 or SQL Server database, as long as the company has enough budget, and the performance is really good, there are a lot of stored procedures, which will make the application development a lot easier, and there are professional Party B to help the operation and maintenance, the database is so critical, if the replacement is called Mysql, Once the anti-hanging, or open source no one to maintain, online things, who will be responsible for?

The middleware, service layer and the front end are all handled by the outsourcers or Party B, and maintained end-to-end. Any changes can be made at once. Moreover, each system is a complete set, convenient for deployment and operation and maintenance.

In fact, there is no problem, this time on the container or micro services, really asking for trouble.


2.5. Under what circumstances will stage 1 be considered a problem?

Initial pain points should be in the business level, of course, when the user begins to demand a variety of business going to a new feature in every now and then, when a new system, you will find that the outsourcing company, not can get all the things, they are the development of the waterfall model, and the developed system is hard to change, at least it is difficult to change quickly.

So you start to want to hire developers and build systems that you can control, or at least take over from the outsourcing company, so that you can be more flexible in responding to the needs of the business unit.

However, self-development and maintenance brings new problems. It is impossible to recruit such diverse DBAs for a variety of databases. People are very expensive, and with the increase of systems, the lisense of these databases is also very expensive.

There are various kinds of middleware, and each team chooses the middleware independently. There is no unified maintenance and knowledge accumulation, so SLA cannot be guaranteed uniformly. Once the message queues, caches, and frameworks used are broken, no one on the team can fix them because they are busy with business development and no one has the time to dig into the underlying principles, common problems, tuning, etc.

Front-end frameworks have the same problem, inconsistent stacks, inconsistent interface styles, and no way to automate UI testing.

When you maintain multiple systems, you will find that these systems have a lot in common at all levels, a lot of capabilities can be reused, a lot of data can be accessed. It’s the same logic here, it’s the same logic there, it’s the same type of data here, it’s the same type of data there, but the information is isolated, the data model is not unified, it’s impossible to get through.

It is when these questions arise that you consider moving to phase two.


3. Stage 2: Service-oriented organization, SOA-oriented architecture, cloud-oriented infrastructure



3.1. Organizational form of Stage 2


How to solve the above problem?

According to Conway’s theorem, there needs to be some organizational adjustment, the whole company is divided into operation and maintenance group and development group.

Since the pain points are at the business level, it is the development team that should start adjusting.

An independent front-end group should be established to unify the front-end framework and the interface, so that everyone can master the unified front-end development ability, accumulate front-end code, and develop quickly when there are new requirements.

Set up a middleware group, or architect group, who is not close to business development, and their daily task is to study how to use the middleware, how to tune, how to Debug problems, and form knowledge accumulation. If there is a unified group of people focusing on middleware, a limited number of middleware can be selected for centralized research according to their own conditions, and the business group can only use these middleware, which can ensure the consistency of selection. If the middleware is uniformly maintained by this group, reliable SLA can also be provided to the business side.

Detachments the business development team to establish China group, the ability to reuse and code, to the several groups to develop service, for use in business group, so will be unified data model, business development, first look at what a ready-made services can use need not all from zero to develop, also can improve the development efficiency.


3.2. Application architecture of Phase 2

To establish a mid-stage that becomes a service for use by other businesses, you need to use an SOA architecture to servitize reusable components and register them in a service registry.

For wealthy enterprises, they may purchase a commercial ESB bus or use Dubbo’s own encapsulation called a service registry.

The next thing to think about is, what should I take out? And then the last thing you want to think about is how do you pull it out?

Answers to these two topics, different enterprise is different, it is divided into two stages, the first stage is to try to stage, namely the whole company to split without any experience of service, of course can’t take the core business to fit in business tend to choose a corner, open out the first, this time itself is important, is to dismantle and demolition, dismantling is idealized, In line with the domain driven design best, how to dismantle it? Of course, for one or two months, the core staff will develop them behind closed doors, split and combine them to accumulate experience. Many companies are at this stage right now.

But in fact, the disassembly method at this stage can only be used to accumulate experience, because we initially want to disassemble in order to quickly respond to business requests, and this corner module is often not the most painful core business. Originally the business is on the edge of the corner, dismantlement is not big, and there is no good ability to do reuse. Reuse, of course, wants to reuse core capabilities.

So the most important thing is the second phase, the real servitization of the business. Of course, the core business logic with the most business needs can quickly respond to business requests and reuse capabilities.

For example, Koala was a single application with only one online business using Oracle at the beginning, and the real separation was carried out around the core order business logic.




What parts of the business logic should be taken apart? Many enterprises will ask us, in fact, the enterprise’s own development is the most clear.

A common mistake made at this time is to separate the core business logic from the monolithic application first. For example, order logic is formed into order service, which is separated from online service.

Of course, this should not be the case. For example, when there is a battle between the two armies and the soldiers are being smoked by the cooking squads, should the Battalion of the Chinese Army be moved out or should the cooking squads be moved out? Cooking class, of course.

Another point is that components that can be reused are often not core business logic. This is easy to understand, two different business, of course, is the core business logic is different (or as a business), the core business logic is often combinational logic, while the complex, but often do not have reusability, even order, different electricity is different, the introduced what beans, that launched what stamps, another one what activities, Are core business logic differences that change frequently. What can be reused is often the peripheral logic of core business such as user center, payment center, warehouse center and inventory center.

Therefore, we should separate the peripheral logic of these core businesses from the core business, and finally Online only leaves the core path of ordering, which can be changed into ordering service. When the business side suddenly has a need to launch a buying campaign, the peripheral logic can be reused. Panic buying becomes the core logic of another application. The core logic is the fax fuse, and the peripheral logic is the data storage and atomic interface.

Which peripheral logic should be removed first? Ask your own development, those who are afraid of failing the core logic after revision are motivated to split from the core logic, which does not need the supervision of technical director and architect, they have their own original motivation, it is a natural process.




The original motive here, one is to develop independent, one is on-line independent, like koalas in the online system, warehouse group wanted to go out on his own, because they want to butt all kinds of storage system, the world so much warehouse, the system is very traditional, interface is not the same, no new docking, a development of time, are worried about the core logic order to hang up, In fact, the warehouse system can define its own retry and disaster recovery mechanism, which is not as serious as ordering. Logistics group also want to go out independently, because there are too many docking logistics companies, but also often online, do not want to put the order to hang.

You can also comb your company’s business logic, there will be their own willing to split the business, the formation of Taiwan services.

When the surrounding logic is split, some core logic, afraid of mutual influence, can also be split out, such as ordering and payment, payment connected with multiple payers, do not want to affect the ordering, can also be independent.

And then we’re going to look at, how do we break it up?

For the premise, timing, method, specification, etc., refer to the article on service separation and service discovery in microservitization





The first thing to do is to standardize the original engineering code, which we often call “anyone who takes over any module can see a familiar face”.

For example, to open a Java project, you should have the following package:

  • API package: All of the interface definitions are here. For internal calls, the interface is also implemented, so that once you split out, local interface calls can become remote interface calls

  • Access to external service packs: If this process accesses other processes, the wrapper for external access is here. For unit testing, this part of the Mock allows functional testing without relying on a third party. For service split, invoke other services, also here.

  • Database DTO: Define the atomic data structure here if you want to access the database

  • Accessing the database package: The logic for accessing the database is all in this package

  • Services and business logic: This is where the main business logic is implemented and where the split comes from.

  • External services: The logic for providing services externally is here, and for the provider of interfaces, the implementation is here.

The other is the test folder, where each class should have unit tests, and to review unit test coverage, integration tests should be implemented through Mock methods within the module.

Next comes configuring folders, configuring profiles, which fall into several categories:

  • Internal configuration items (no change after startup, need to restart)

  • Centralized configuration items (Configuration center, which can be delivered dynamically)

  • External configuration items (external dependencies, environment specific)

When the structure of a project is very standardized, then in the original service, first independent functional modules, standardized input and output, forming the separation within the service. The new JAR is separated before the new process is separated, and as long as the new JAR can be separated, loose coupling is basically achieved.

Next, you should create a new project, start a new process, register with the registry as early as possible, and start providing the service. At this point, the code logic in the new project can be removed first, and just call the original process interface.

Why independence as soon as possible? Even before it’s logically independent? Split because the service is the process of gradual, along with the development of new functions, the introduction of new demand, at this time, for the original interface, also can have the new demand changes, if you want to put the business logic independence, independent by half, to the new demand, GaiJiu, change new are not appropriate, the new haven’t provide service independently, if changed the old, This will result in moving from old projects to new projects, changing as they move, and making mergers more difficult. If it becomes independent as soon as possible, all new requirements will go into the new project, all callers will call the new process when they update, and the old process will be called less and less, and eventually the new process will proxy all the old process.

Then the logic in the old project can be gradually transferred to the new project. Since the code migration cannot guarantee the complete correctness of the logic, continuous integration, grayscale publishing and the microservice framework can switch between the old and new interfaces are required.

Finally, when the new project is stable and there are no calls to the old project in the call monitoring, the old project can be taken offline.


3.3. Operation and maintenance mode of Phase 2

The servitization of the business layer also puts pressure on the operation and maintenance team.

Applications are gradually split and the number of services increases.

One of the best practices for service separation is that the separation process requires continuous integration to ensure consistent functionality.




Continuous integration processes require frequent deployment of test environments.

With the separation of services, different business development teams will receive different requirements, more parallel development functions, frequent release, resulting in more frequent deployment of test environment, production environment.

Frequent deployment requires frequent creation and deletion of VMS.

If the above approved mode is still adopted, the operation and maintenance department will become a bottleneck, or it will affect the development schedule, or it will be exhausted by various deployments.

This requires the change of operation and maintenance mode, namely, the stratification of infrastructure.

What’s the difference between virtualization and cloud?

First of all, good tenant management is required, from centralized operation and maintenance management to tenant self-service mode transformation.




In other words, the centralized management mode of manual creation, manual scheduling and manual configuration has become the bottleneck. It should be changed into tenant self-service management, automatic machine scheduling and automatic configuration.

Second, resource control based on Quota and QoS should be implemented.

Control means for tenants to create resource, need not fine to the operations manual management of everything, as long as give this customer assigned tenants, allocation of the Quota, set the Qos, tenants can limit the operational range, freer to create, use, delete a virtual machine, don’t need to inform the ops, it will accelerate the speed of iteration.


Thirdly, network planning based on virtual network, VPC and SDN should be realized.





The original network uses a physical network. The problem is that the physical network is shared by all departments and cannot be configured and used freely by one service department. Each tenant can configure its own subnet, routing table, and connections to external networks at will. Different tenant network segments can be in conflict with each other. Tenants can use software to plan networks based on their requirements.

In addition to cloud-based infrastructure, o&M should also automate the deployment of applications.



If cloud computing applications are ignored and capacity expansion or automatic deployment is required, VMS created on the cloud platform are still empty and need to be manually deployed by O&M. Cloud platforms, therefore, must also manage applications.

How does cloud computing manage applications? We divide the applications into two types, one is called general purpose applications, which generally refer to some complex applications that are used by everyone, such as databases. Almost all applications use databases, but database software is standard, and although installation and maintenance are complex, it is the same regardless of who installs it. Such applications can become standard PaaS layer applications on the cloud platform interface. When the user needs a database, the point comes out and the user can use it directly.




So the second change to the operations model is the PaaS of generic software.

As mentioned earlier, there are middleware groups in the development department that are responsible for these common applications, and operations automatically deploy these applications. What are the boundaries between the two groups?

As a general practice, the PaaS of the cloud platform is responsible for the stability of the middleware created, ensuring the SLA, and automatically fixing problems when they occur.

The middleware group of the development department mainly studies how to correctly use these PaaS, what parameters to configure, the correct posture to use and so on, which is related to business.


In addition to generic applications, there are also personalized applications that should be deployed using scripts such as Puppet, Chef, Ansible, SaltStack, etc.

As a rule of practice, bare-metal deployment is not recommended because it is very slow. Automatic deployment based on virtual machine images is recommended. On the cloud platform, any virtual machine creation is image-based, and we can deploy most of the environment to be deployed in the image, with only a small amount of customization, which is done by the deployment tool.




In addition to creating VMS based on images by calling OpenStack API, the master of SaltStack is also called to send customized commands to agents in VMS.




The AUTOMATIC deployment platform NDP can be constructed based on VM images and script delivery




In this way, complete application deployment and online deployment can be performed based on VM images, which is called orchestration. Based on choreography, you can have good continuous integration, such as automatically deploying a set of environments every night for regression testing to ensure changes are correct.




After the second stage, the state looks like the one above.

Here, the functions of the operation and maintenance department have changed to some extent. In addition to the most basic resource creation, they also need to provide self-service operation platform, PaAS-oriented middleware, and automatic deployment based on images and scripts.

The function of the development department has also changed to some extent, which is divided into front-end group, business development group, middle stage group and middleware group, among which the middleware portfolio operation and maintenance department is most closely related.


The basic cloud computing services of netease deeply integrate IaaS, PaaS and container technologies, provide elastic computing, DevOps tool chain and micro-service infrastructure services, help enterprises solve IT, architecture, operation and maintenance problems, and make enterprises more focused on business. As a new-generation cloud computing platform, you can click to try IT for free.


Free experience cloud security (EASY Shield) content security, verification code and other services

For more information about netease’s technology, products and operating experience, please click here.



Python package management tool summary [recommended] Android efficient screenshot [recommended] OBS source code compilation and development