This article is from the author Zeng Xiaobo’s wonderful sharing on GitChat, “read the original” to see what people have done with the author.

background

With the rapid development of the company’s business volume, the challenges faced by the platform are far greater than the business. The demand keeps increasing, the number of technical personnel increases, and the complexity is also greatly increased. In this context, the technical architecture of the platform has also completed the evolution from traditional single application to microservitization.

The evolution of the system architecture

Single Application Architecture (First-generation Architecture)

This was the situation at the beginning of the platform. At that time, the traffic was small. In order to save costs, all applications were packaged into one application, and the architecture was.net+ SQLServer:

The presentation layer is located in the outermost layer (top layer), closest to the user. It is used to display data and receive data input from users, providing users with an interactive operation interface. The platform is based on the WEB form of.NET.

Business Logic Layer is undoubtedly the core value of the system architecture. Its focus is mainly on the formulation of business rules, the implementation of business processes and other system design related to business requirements, that is to say, it is related to the Domain logic that the system responds to. In many cases, the business logic layer is also called the Domain layer.

The position of the business logic layer in the architecture is critical. It is located between the data access layer and the presentation layer and acts as a link between the preceding and the following in the data exchange. Because layers are a weakly coupled structure, the dependencies between layers are downward, the bottom layer is “ignorant” to the top layer, and changing the design of the top layer has no effect on the bottom layer that it calls.

This downward dependency should also be a weak dependency if the idea of interface oriented design is followed when designing layers. For the data access layer, it is the caller; For the presentation layer, it is the called.

The data access layer is sometimes called the persistence layer. It is mainly responsible for accessing database systems, binary files, text documents, or XML documents. The platform uses hibernate.net+sqlserver in this phase.

The first-generation architecture seems to be very simple, but it supports the early business development of the platform and meets the processing needs of tens of thousands of website user visits. But when the number of visits increases dramatically, the problem becomes apparent:

  • Increasing maintenance costs: when the fault occurs, which may cause the cause of the fault combination is more, it also leads to failure, failure, repair and analysis of the failure of cost increases accordingly, the average failure repair cycle will cost a lot of time, and any one module failure will affect other applications module; Fixing one bug often introduces other bugs in the absence of a deep understanding of global functionality, leading to a vicious cycle of “more fixes, more bugs.”

  • Poor scalability: Having all the functional code of the application running on the same server makes horizontal scaling of the application very difficult and only vertical scaling can be used.

  • Longer lead times: Any minor change to an application and code submission triggers code compilation, running unit tests, code reviews, building and generating deployment packages, validation, and so on for the entire application, which makes the build cycle longer and less efficient per unit of time.

  • Longer induction cycles: As applications become more functional and code becomes more complex, seemingly simple tasks like learning the business background, familiarizing yourself with the application, and configuring the local development environment can take longer for new team members.

Vertical Application Architecture (Second Generation Architecture)

In order to solve the problems faced by the first-generation architecture, the team developed the following strategies and formed the second-generation application Architecture (vertical Application Architecture)

  • Applications are separated into independent application modules.

  • Each application module is deployed independently, and the horizontal expansion of application modules is resolved through session retention in load balancing.

    Sticky is a load-balancing solution based on cookies. The session between a client and a back-end server is maintained through cookies. Under certain conditions, the same client can access the same back-end server. When the request comes in, the server sends a cookie and says, “Next time, bring it, come to me!” . In the project, we used the Session_sticky module in Taobao’s open source Tengine.

  • The database is divided into different databases and accessed by corresponding applications.

  • Domain name splitting.

  • Static separation.

It can be seen that the second-generation architecture solves the horizontal extension of application level. After optimization, the architecture supports the access requirements of hundreds of thousands of users. At this stage, some applications have completed the rewrite of MVC architecture using Java. Of course, there are some problems.

  • The coupling degree between applications is high and the interdependence is serious.

  • The interaction between application modules is complex, sometimes directly accessing each other’s module database.

  • Database involves too many associated query and slow query, database optimization is difficult.

  • The single point access of the database is serious and cannot be recovered.

  • The data replication problem is serious, resulting in a large number of data inconsistency.

We tried to use SQL Server AlwaysOn to solve the scaling problem, but experiments found that there was a delay of at least 10s in the replication process, so we abandoned this solution.

  • System expansion is difficult.

  • Each development team is isolated and inefficient.

  • The test workload is huge and the release is difficult.

Microservice Architecture (Platform status: third-generation architecture)

In order to solve the problems existing in the first-generation and second-generation architectures, we sorted out and optimized the platform. Based on the business needs of the platform and the summary of the first-generation and second-generation architectures, we have determined the core requirements of the third-generation architectures:

  • Core business is extracted and served externally as an independent service.

  • Service modules are continuously deployed independently, reducing release lead times.

  • The database is divided into tables by service.

  • Use caching extensively to improve access.

  • The lightweight REST protocol is used for system interaction instead of RPC protocol.

  • De.net, development language using Java to achieve.

On this basis, the third generation architecture of the platform is reconstructed.

Look at the components of the third-generation architecture, which are mainly divided into eight parts:

  • CDN: THE CDN system is responsible for redirecting users’ requests to the nearest service node in real time according to the comprehensive information such as network traffic, connection of each node, load status, distance to users and response time. Its purpose is to enable users to obtain the content needed nearby, solve the situation of crowded Internet network, improve the response speed of users to visit the website.

    When selecting CDN vendors, the platform needs to consider the length of operation time, whether there are expandable bandwidth resources, flexible traffic and bandwidth selection, stable nodes, and cost-effective; Based on the above factors, the platform adopts Qiniu’S CDN service.

  • LB layer: A platform consists of multiple service domains. Different service domains have different clusters. The Load Balancer (LB) layer distributes traffic to multiple service servers to expand the external service capabilities of application systems and eliminate single points of failure to improve application system availability.

    When choosing a load, you need to consider various factors (whether it meets the requirements of high concurrency and performance, how to resolve Session persistence, load balancing algorithm, compression support, and memory consumption of cache), which are divided into the following two types:

    LVS: layer 4 Linux load balancer with high performance, high concurrency, scalability, and reliability. It supports multiple forwarding modes (NAT, DR, and IP Tunneling). DR mode supports load balancing over a wan. Supports hot standby (Keepalived or Heartbeat). It is highly dependent on the network environment.

    Nginx: event-driven, asynchronous, non-blocking architecture, multi-process, high-concurrency load balancer/reverse agent software that works at tier 7. HTTP can be shunted by domain names, directory structures, and regular rules.

    Internal server faults are detected through the port, such as status code returned by the server to process the web page, timeout, etc., and the returned error request will be re-submitted to another node, but the disadvantage is that the URL is not supported to detect.

    For sticky session, we implement it through the cookie-based extension nginx-sticky- Module. This is what the platform is currently doing.

  • Business layer: represents the services provided by the businesses in a certain field of the platform. For the platform, there are commodity, membership, live broadcast, order, finance, forum and other systems. Different systems provide services in different fields.

  • Gateway and Registry: provides a unified entry and registry management for the underlying microservices API. It encapsulates the internal system architecture and provides Rest APIS to each client, while realizing responsibilities such as monitoring, load balancing, caching, service degradation, and traffic limiting. At present, the platform uses Nginx + Consul to implement.

  • Service layer: This layer is composed of small, autonomous services that work together. The platform defines service boundaries based on business boundaries, and each service focuses on its own boundaries. This layer is built on spring Cloud.

  • Infrastructure layer: This layer provides infrastructure services for upper-layer services, including the following types:

    Redis cluster: Provides caching services for the upper layer with high response times and memory operations.

    Mongodb cluster: Because mongodb has flexible document model, high availability replication set, scalable sharding cluster and other features, the platform provides upper-layer storage services such as articles, posts, link logs and so on. Mongodb cluster adopts replication + sharding architecture to solve the problems of availability and scalability.

    MySQL cluster: Stores membership, goods, orders, and other transactional data.

    Kafka: All the messaging services that support the platform.

    ES (ElasticSearch) : provides products, members, orders, logs, and other search services on the platform.

  • Integration layer: This feature is the biggest highlight of the whole platform, including the practice of continuous integration CI, continuous delivery CD, DevOps culture, let everyone participate in the delivery, complete the automatic deployment and release of a service under the standard process and standard delivery, thus improving the overall efficiency of the version delivery link.

  • Monitoring layer: Splitting the system into smaller, fine-grained microservices brings many benefits to the platform, but it also increases the operational complexity of the platform system.

    All the task services provided to end users are completed with a large number of microservices. An initial call will eventually trigger multiple downstream service calls. How can request flow be rebuilt to reproduce and solve this problem?

    For this deployment, the open Source Open-Falcon platform provides application-level monitoring, application log analysis using ELK, link log tracking using self-built services, and unified configuration services based on Spring Config Server practices.

How microservices teams work

Conway’s Law: When any organization designs a system, it delivers a design that is structurally consistent with the organization’s communication structure.

Way to work

In implementing the 3G architecture, we made several changes to the team organization:

  • It is divided according to the business boundary and makes the whole stack within a team so that the team can be autonomous. In this way, the cost of communication is maintained within the system. Each subsystem will be more cohesive, the dependence and coupling energy of each other will be weakened, and the cost of communication across the system will be reduced.

  • An architect department has been set up to implement the 3G architecture. It is usually a reasonable structure for an architect team to have five roles: system architect, application architect, operations and maintenance, DBA, and Agile expert. So how to control the output of the architecture team to ensure the smooth implementation of the architecture work?

    • First: Building a self-organizing culture of continuous improvement is a key cornerstone for implementing microservices. Only through continuous improvement, continuous learning and feedback, and continuous building of such a culture and team, can the micro-services architecture continue to grow, maintain fresh vitality, and achieve our original intention.

    • Secondly: the products of the architecture group must go through strict procedures, because the architecture group promotes universal solutions. In order to ensure the quality of the solutions, we have a strict closed loop from project investigation to review and implementation.

Moving on to the delivery process and development model for the entire team, it’s hard to get real value out of a microservice architecture if it’s not defined up front, so let’s take a look at the delivery process for a microservice architecture.

Using microservices architecture to develop applications, we actually design, develop, test, and deploy for each microservice, because each service is not dependent on each other, and the delivery process is roughly like the one shown above.

Design stage:

The architecture group divides product functions into several micro-services and designs API interfaces (such as REST API) for each micro-service. API documents need to be provided, including the API name, version, request parameters, response results, error codes and other information.

During the development phase, the development engineer implements the API interface, including unit testing the API, while the front-end engineer develops the Web UI part in parallel, making mock data (we call it “mock data”) from the API documentation. Front-end engineers don’t have to wait for the back-end apis to be fully developed before they can start their work, enabling parallel development of the front and back ends.

Test phase:

This stage is fully automated. Developers submit code to the code server, which triggers continuous integration build and test. If the test passes, it is automatically pushed to the simulation environment through Ansible scripts.

In practice, the online environment should go through the review process first, and then push to the production environment. Improved work efficiency and controlled some of the line instability that might result from inadequate testing.

Development mode

In the above delivery process, the development, test, and deployment phases may involve control over code behavior, and development patterns need to be developed to ensure that multiple people work well together.

  • Practice the “Strangler Mode” :

Due to the large span of the third-generation architecture and the unmodifable.NET legacy system, we adopted the strangler mode, adding new Proxy microservices outside the legacy system and controlling upstream in LB, rather than directly modifying the original system and gradually replacing the old system.

  • The development of specification

Experience shows that we need to make good use of the code version control system. I once worked with a development team where the last small version of the code took hours to complete because there was no specification for the branch, and the developers themselves didn’t know which branch to merge into.

Gitlab, for example, has good support for multi-branch code versions, which we need to take advantage of to improve development efficiency. The above is our current branch management specification.

The most stable code is on the master branch. We do not commit code directly to the master branch. We can only merge code on the master branch, such as merging code from other branches into the master branch.

Our daily development code needs to pull a Develop branch from the Master branch, which everyone can access, but we don’t commit code directly to the Develop branch. Code is merged into the Develop branch from other branches.

When we need to develop a feature, we need to pull a feature branch from the Develop branch, such as feature-1 and feature-2, and develop the feature on those branches in parallel.

After a feature is developed and we decide that we need to release it, we need to pull a release branch from the Develop branch, such as release-1.0.0, and merge the features from the relevant feature branch into the Release branch. The release branch is then pushed to the test environment, where the test engineer does functional testing and the development engineer fixes bugs.

When the test engineer cannot find any bugs, we can deploy the release branch to the pre-release environment. After verification again, there are no bugs, and then we can deploy the Release branch to the production environment.

Once live, merge the code on the Release branch into both the Develop and Master branches, and tag the master branch with a tag such as v1.0.0.

When a bug is found in production, we need to pull a hotfix branch (such as hotfix-1.0.1) from the corresponding tag (such as v1.0.0) and fix the bug on that branch. Once the bug is fully fixed, merge the code from the Hotfix branch into both the Develop and Master branches.

We also have requirements for the version number, which is in the format of X.Y.Z. X will be upgraded only when there is a major reconstruction, Y will be upgraded only when a new feature is released, and Z will be upgraded only after a bug is modified. For each microservice, we need to strictly follow the above development pattern.

Microservice development system

We have described the architecture, delivery process and development mode of the microservice team. Now let’s talk and summarize the microservice development system.

What is microservices Architecture

Martin Flower’s definition:

In short, the microservice architectural style [1] is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API.

These services are built around business capabilities and independently deployable by fully automated deployment machinery. 

There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies

To put it simply, microservices is a design style in software system architecture that advocates the partitioning of a previously independent system into multiple small services that run in separate processes and communicate and collaborate with each other through RESTful LIGHTWEIGHT APIS based on HTTP.

Each microservice being split is built around one or more well-coupled businesses in the system, and each service maintains its own data storage, business development, automated test cases, and independent deployment mechanisms. Thanks to lightweight communication, these microservices can be written in different languages.

Split granularity of microservices

To what granularity microservices should be divided into, it is more important to find a balance between granularity and the team. The smaller the microservices are, the more benefits the independence of microservices will bring. But managing a lot of microservices can also get more complicated. Basically split needs to follow the following principles:

  • The single responsibility principle: “bring together things that change for the same reason and separate things that change for different reasons.” Use this principle to determine microservice boundaries.

  • Team autonomy principle: the larger the team is, the higher the cost of communication and assistance will be. In practice, a team will not exceed 8 people, and it is a full-function team with full stack within the team.

  • Database first, service second: Whether the data model can be completely separated determines whether the boundary function of microservices can be completely separated. In practice, we first discuss the boundary of the data model, which maps the boundary of the business, and then completes the service separation from bottom to top.

How to build a microservice architecture

In order to build a good micro-service architecture, technology selection is a very important stage, only choose the right “actor”, in order to perform this drama.

We use Spring Cloud as our microservices development framework. Spring Boot has embedded Tomcat that runs a JAR package directly to publish microservices, and it also provides a series of “out of the box” plug-ins.

For example: configuration center, service registration and discovery, fuses, routing, proxy, control bus, one-time token, global lock, leader election, distributed session, cluster state, etc., can greatly improve our development efficiency.

function Spring Cloud
Routing and load balancing Ribbon
The registry Eureka
The gateway Zuul
The circuit breaker Hystrix
Distributed configuration Config
Service invocation trace sleuth
Log output elk
Certification integration oauth2
The message bus Bus
The batch task Task

Code for engineering structures

The figure above is the project composition structure that each service should have in our practice.

Among them:

  1. Micro service name + service:

    Provide service invocations for other internal microservices. The service name + API module is the interface specification defined between services, using Swagger + REST interface definition. The service name + Server module contains applications and configurations that can directly start the service.

  2. Micro service name + Web:

    An entry point for requests from upper-layer Web applications, where the underlying microservice is typically invoked to complete the request.

API Gateway Practices

As the access point for all microservices and apis in the back end, the API gateway performs auditing, flow control, monitoring, and billing for microservices and apis. Common API gateway solutions are:

  • Application layer scheme

    The most famous is of course Netflix’s Zuul, but that doesn’t mean it’s the best solution for you. Netfilx, for example, had to do zuul in the application layer because it uses AWS and has limited control over its infrastructure. This scheme is not the most suitable or the most suitable scheme.

    However, if your team has limited control over the overall technical infrastructure, and the team structure is not perfect, the application layer solution may be the best solution for you.

  • Nginx + Lua

    OpenResty and Kong are mature alternatives, but Kong uses Postgres or Cassandra. It is estimated that not many domestic companies choose these two products. But Kong’s HTTP API design is pretty good.

  • Our plan

    Using the nginx+ Lua + Consul combination solution, although most of our team is Java, choosing ZooKeeper would be a more natural choice, but as an innovative team, after analyzing the pressure test results, we finally choose to use Consul.

    Good HTTP API support for dynamic management of UpStreams means that service registration and discovery can be implemented seamlessly through a publishing platform or glue system, transparent to visitors to the service.

In the scheme above:

Consul as a state storage or configuration center (mainly using CONSUL’s KV storage function); As an API gateway, nginx dynamically distributes traffic to configured UpStreams nodes according to upStreams configurations in Consul.

Nginx connects to consul cluster based on configuration items.

A launched API or microservice instance registers/writes instance information to Consul manually/command line/publish deployment platform;

When nginx receives the corresponding upStreams information update, it dynamically changes the upStreams distribution configuration within NGINx to route and distribute the traffic to the corresponding API and microservice instance nodes.

When the above registration and discovery logic is solidified through scripts or a unified publishing deployment platform, transparent service access and extension can be achieved.

Link Monitoring Practices

Before we found that under a single application log monitoring is very simple, under the micro service architecture has become a big problem, if we can’t trace the business flow, can’t positioning problem, we will take a lot of time to find and locate the problem, in complex micro service interaction relations, we will be very passive, the distributed link monitoring arises at the historic moment, its core is the invocation chain.

Through a global ID, the same request distributed on each service node is connected in series, restoring the original call relationship, tracking system problems, analyzing call data, and counting system indicators.

Distributed link tracking was first seen in a paper “Dapper” published by Google in 2010.

So let’s take a look at what a call chain is. A call chain is essentially a distributed request restored to a call link. Explicitly view the invocation of a distributed request in the back end, such as the time spent on each node, which machine the request was directed to, the status of each service node, and so on.

It can reflect the number of services experienced in A request and the level of service (for example, if your system A calls B and B calls C, the level of the request is 3). If you find some requests with levels greater than 10, the service may need to be optimized. Common solutions include:

  • Pinpoint

    Naver, Naver, pinpoint: Pinpoint is an open source APM (Application Performance Management) tool for large-scale distributed systems written in Java.

    Friends who are interested in APM should have a look at this open source project, which is developed by a Korean team. It uses JavaAgent mechanism to do bytecode code implantation (probe) to achieve the purpose of adding Traceid and capturing performance data. Tools such as NewRelic, Oneapm and others perform performance analysis on the Java platform in a similar way.

  • Zipkin

    OpenZipkin · A Distributed tracing System

    Github: Zipkin is a distributed tracing system

    This is twitter open source, but also reference Dapper’s system to do.

    Zipkin’s Java application uses a component called Brave to collect performance analysis data within the application.

    Brave making address: https://github.com/openzipkin/brave

This component implements a series of Java interceptors to track the invocation of HTTP /servlet requests and database access. You can then collect performance data for Java applications by adding these interceptors to configuration files such as Spring.

  • CAT

    Github – Dianping /cat: Central Application Tracking

    This is the open source dianping, the implementation of the function is quite rich, there are also some domestic companies in use. But the way CAT does tracing is by hard-coding some “buried” points in the code, which is intrusive.

    This has both advantages and disadvantages, the advantage is that you can add buried points in their own needs, more targeted; The downside is that you have to change existing systems, which many development teams don’t want to do.

    The first three tools, if you don’t want to repeat the wheel, I recommend the order of the order is Pinpoint — >Zipkin — >CAT. The reason for this is simply that these three tools are progressively more intrusive into program source code and configuration files.

  • Our solution

    For microservices, we extend the microservices architecture based on Spring Cloud, and design a set of distributed tracking system (appELLate M) based on The concept of Google Dapper.

As shown in the figure above, we can query the response logs by parameters such as service name, time, log type, method name, exception level, and interface time. In the TrackID obtained, the entire link log of the request can be queried, which provides great convenience for reproducing problems and analyzing logs.

Circuit breaker practice

In the microservice architecture, we split the system into microservices one by one, so that invocation failure or delay may occur due to network reasons or dependent service problems, and these problems will directly lead to the external service delay of the caller.

If the caller continues to increase the number of requests, it will end up with a backlog of tasks waiting for the response of the failed dependent party, which will eventually lead to the breakdown of its own service. To solve this problem, the breaker mode was created

We use Hystrix in practice to implement the circuit breaker function. Hystrix is one of Netflix’s open source suite of microservices frameworks that aim to provide greater fault tolerance for delays and failures by controlling those nodes that access remote systems, services, and third-party libraries.

Hystrix has thread and signal isolation with fallback mechanism and circuit breaker functionality, request caching and request packaging, and monitoring and configuration.

The process of using circuit breakers is as follows:

Enabling circuit breaker

@SpringBootApplication @EnableCircuitBreaker public class Application { public static void main(String[] args) { SpringApplication.run(DVoiceWebApplication.class, args); }}Copy the code

Alternate use mode

@Component public class StoreIntegration { @HystrixCommand(fallbackMethod = "defaultStores") public Object getStores(Map<String, Object> parameters) { //do stuff that might fail } public Object defaultStores(Map<String, Object> parameters) { return /* something useful */; }}Copy the code

The configuration file

Resource Control practices

When it comes to resource control, it is estimated that many friends will contact Docker. Docker is indeed a good solution to resource control. We also reviewed whether to use Docker in our preliminary investigation, but finally chose to give up and use Linux LibcGroup script control. Here’s why:

  • Docker is more suitable for large memory for resource control and containerization, but our online servers are generally about 32G, so using Docker will waste resources.

  • The use of Docker will make operation and maintenance complicated, and the pressure from business will be great.

Why cgroup?

There is often a requirement in Linux systems to limit the allocation of resources to one or more processes. This is the concept of being able to complete a set of containers in which there is a specific proportion of ALLOCATED CPU time, IO time, available memory, etc.

Thus came the concept of Cgroup. Cgroup is the Controller Group, which was initially proposed by Google engineers and later integrated into the Linux kernel. Docker is also based on this.

Libcgroup use process:

The installation

    yum install libcgroupCopy the code

Start the service

    service cgconfig startCopy the code

Configuration file template (memory as an example) :

    cat /etc/cgconfig.confCopy the code

The memory subsystem is mounted under /sys/fs/cgroup/memory. Go to this directory and create a folder to create a control group.

Mkdir test echo "server process id ">> tasksCopy the code

This adds the current terminal process to the memory-limited Cgroup.

conclusion

To sum up, this paper starts from the background of our micro-service practice, introduces the working mode of micro-service practice, technology selection, and some related micro-service technologies.

Includes: API gateway, registry, circuit breaker, etc. I believe that these technologies will bring you some new ideas in practice.

Of course, the whole micro service practice path has many contents, and it is impossible to cover all of them in one article. If you are interested, you can mention them in Chat.

“Read the transcript” to view the Chat transcript