Hello everyone, I am phoenix old Bear, I am very glad to have the opportunity to communicate with you about the construction of micro-service system related topics. Recently, the topic related to micro services is very hot, we see a variety of development technology websites, micro services are a hot topic. Let me join in the fun today. I’m going to share with you a project I’ve done and a project I’m working on. Both of these projects are micro-service transformation of the existing system, and I will focus on the specific technical transformation to provide reference for future arrivals. Some of the lofty theoretical content about micro-service, such as the meaning of micro-service, why micro-service, the theoretical basis and implementation framework of current limiting circuit breaker and so on, we can refer to the previous experts to share.

First introduce the technical transformation of the first project. This is a data warehouse building project. Business data reading and writing is often distributed across multiple projects, which makes front-end data presentation and data analysis difficult. In the case of video data, metadata, such as author and actor information, is typically produced by content management systems. Video streaming data, such as encoding, duration, etc., are generated by the encoding system. The number of video views, clicks and so on are provided by the statistical system. When the page is displayed, all of these contents should be displayed simultaneously. This requires a data warehouse to act as an intermediary, gathering all the data and making it available to the front end. The data warehouse system essentially interconnects with all of the company’s business systems, providing data read, write, and analysis support.

When I took over this project in 2014, I already had a relatively stable operating system. The actual technical transformation began in early 2015 and was completed by the end of 2015. The main achievements include:

  • The data read and write performance is greatly improved. QPS read at peak times increased from 50 times per minute to 20,000 times per second, and daily page views also increased from around 20,000 to nearly 2 billion. And can be easily scaled horizontally. Data writing capacity was also increased from 30TPS/ s to 3,000 TPS/ s.

  • From single room service, extended to provide services in three remote rooms at the same time. Most business access can be done inside the same machine room. Data synchronization across equipment rooms is normally completed within 1s.

  • The effort invested in the development of the new interface has been shortened from the original 1-2 days to 2 hours.

In this process, while upgrading the original system technology, we also introduced the micro-service framework to do the implementation. As a technical reform project, our principle is: small steps to run fast, small wins for big wins. While ensuring the stable operation of the online system, the original system should be improved step by step. It’s a core system, and if it goes wrong, the whole business is affected. Technical transformation is a high-risk job. It is difficult to move forward unless the superior clearly requires and agrees to use this work as an indicator to motivate the team. Also get support from your team members. We had six months of preparation before we started this work, and a large part of that work was to enlist your support. Finally, the personnel involved in the system improvement work are recognized to ensure that the whole work is progressing according to the predetermined design.

This project is still relatively smooth, the foundation of the original system is relatively good. The old project had three major projects to realize data reading, writing and synchronization. The external RPC interface is provided, and Apache Thrift is used as the RPC framework and server. Data is written to MongoDB and HBase simultaneously. MongoDB is used to read data, while HBase is used to write data.

The main problems are:

  • Although the original project design intended to use Mongodb and HBase to do read and write separation, the actual implementation did not achieve this goal. Data is written to MongoDB and Hbase simultaneously. This often results in inconsistent data, and write failures often occur during peak writes due to the nature of MongoDB.

  • The project is large in scale and difficult to maintain. Several core classes, each over 2000 lines in size. After new employees are hired, they dare not move the core code without half a year’s familiarity.

  • Development is slow. Since this project provides background data function, it is easy to miss this work in general product planning. When it comes to figuring out the need for support, there’s not much development time.

We need to adjust the structure and process. Specifically, at the level of microservitization, we have done the following work:

1. Establish a service gateway

This is an important task, and with the support of the gateway, we can switch traffic between old and new systems as needed. We use zooKeeper and gateway services to achieve. All services are registered with ZooKeeper, and the gateway service sends user requests to specific workmachines in proportion to the registration items of ZooKeeper. This adds about 1ms of overhead over direct access to jobs. Use Netflix Hystrix on the gateway for fusing and limiting traffic.

2. Subdivide services and separate reading and writing

When it comes to read-write separation, the intuitive concept for many people is to use a master-slave approach. Implementation also needs to be analyzed in detail according to the business situation. In this project, we made a detailed analysis of data writing scenarios and split the original data reading and writing interfaces according to scenarios. In writing, we split it into the following interfaces according to the scenario:

  • High-speed write. It is mainly used for storing offline analysis results. The solution adopted is to analyze data and write it into Kafka through frameworks such as MapReduce and Spark. We accept Kafka data and pour it into HBase. Data changes in tens of millions of magnitude every day, and it takes about 1 hour to write. These data changes do not require real-time notification to the business side.

  • Medium speed write. It is mainly used to support online data storage. Data is directly imported to Hbase through RPC service interfaces. Data can be written at a rate of 3000TPS per second.

  • Low speed reliable write. It is mainly used to support manual production data warehousing. Data is also written to HBase through RPC interfaces. After data is written, business parties are notified of the changes through the message mechanism.

On the read function, we also split into two types of work:

  • Reliable read, mainly used to support data production work. The same piece of data will be written by different business parties through the workflow, and the later writer needs to determine the content to be written by the data of the previous writer. This situation requires reliable data to be read at any time. For these reads, we direct the request to the persistence library (Hbase).

  • High-speed reading, mainly used to support online data view. Such operations generally have low requirements for real-time data. We use Couchbase for caching and support online reading. Deprecate early MongoDB. Our single Couchbase can easily support tens of thousands of QPS online.

This is where data synchronization comes in:

  • Data synchronization between read and write libraries is realized by MQ (Apache ActiveMQ, bridge). A Message is sent when data is written to HBase. After receiving the Message, the reader updates its own data.

  • Cross-room data synchronization, we also go MQ. MQ is relatively less sensitive to the network than the internal database synchronization mechanism. In the event of a network problem, MQ can be restored as quickly as possible. It’s also easy to monitor.

There is a small detail here, actually across the room for Couchbase data synchronization, we also go MQ. Update the Couchbase data after each machine room accepts MQ. Our Couchbase is targeted at caching and only holds hot data. From practice, it is found that people in different regions pay attention to different data. This leads to quite a bit of variation in Couchbase content.

3. Interface splitting and microservitization

With the support of the gateway, we can split the huge implementation classes in the original Project and set up a Project according to services after the clear goal and architecture of the split. Each Project implements no more than five interfaces, all of which are highly cohesive and functional with slightly different parameters. After the split, there are few implementation classes in each project, no more than 10, and each class is small in size, with no more than 300 lines of code. New employees can hit the ground running when they come on board.

4. Improve infrastructure

In the microservice environment, we use Git to do version control, GitLab to do code review, and Jenkins to support automatic release and online. We are trying Spring Cloud on a small scale and the results are good. We will continue to promote it in the future.

So this is the first project. The second project is one I’m currently working on. This is a payment system facing a much more complex situation than the first project, and a more typical project. The original system is SSH framework, it is hard to imagine the payment system will adopt this technology selection. However, most existing Web systems also use this framework, and this modification can provide a more similar experience in the field. The old systems were huge, with more than 1,000 classes per project, and the largest project had more than 3,000 classes. As the project is ongoing, there is not much to share yet. After the completion of the project, I will try to have the opportunity to communicate with you again. The main points that can be shared so far are:

  • In this project, we adopted Spring Cloud as the framework of microservices.

  • We split the system into two large layers, excluding the front end. The first layer is the external Web services, using HTTP/JSON. The first layer is the business logic layer, which is invoked by the Web services layer and implemented by RPC.

  • Spring Boot is adopted in Web layer.

  • The RPC layer is implemented using Apache Thrift.

  • The service gateway uses Nginx + Lua to implement Load Balance, traffic limiting, and automatic service discovery.

  • In terms of infrastructure, it’s still Git + GitLab + Jenkins.

For the progress of this project, as well as some ideas and designs in the project development, you can follow my public account “Phoenix Old Bear”, or visit my personal blog (http://blog.lixf.cn/), thank you.

Q&A

Q: Have Nginx + Lua solutions been considered for service gateways?

A: Yes, I’m working on it for my second project.

Q: I have a question. How are distributed transactions handled after microservices?

A: Very good question. The first project did not have this problem. The second item, payment item, transaction processing problem is prominent. There is a lot of sharing online about how to use MQ to do distributed transactions. But we did it in a very simple and crude way. That is to eliminate distributed transactions and pursue ultimate consistency. In practice, this should suffice for most scenarios.

Q: Can you explain more about automatic registration and discovery of ZooKeeper/Curator+ discovery service?

A: Apache Curator is an encapsulation of the ZooKeeper API, supports event handling and retry, and integrates well with the Spring Framework. In the implementation of service registration, we write to ZooKeeper by the service provider through the Curator API, and the service consumer discovers the required service from ZooKeeper.

Q: 1-5 cohesive interfaces for a project. Can you elaborate on that?

A: Cohesive interfaces refer to interfaces that have the same functions, mainly with different input parameters. For example, the search interface, there are by keyword search, there are by author + time search. Such interfaces can be placed in the same project. But the interface that reads by ID, and the retrieval interface, are different items. This control project scale will not be too large, but also easy to maintain.

Q: I have a question. Your project must have many JAR packages of microservices. How are jar packages distributed on the server?

A: For Web interface type microservices, just like normal projects, each project manages its own JAR package and the JAR package it depends on. Some people will summarize all jar packages into a large JAR package. In this way, it is easy to overwrite resource files, and it is not easy to update. Therefore, it is not recommended.

Q: Why doesn’t RPC use domestic Dubbo?

A: That’s A good question. We get it A lot. Dubbo is a good framework, but it’s still too heavy for microservices. And, in terms of performance, it’s still a lot worse than Thrift. So we just went to Thrift.

Q: Does the payment need to be connected to multiple banks?

A: Yes, we’ve connected with almost ten banks now, and more are on the way. In addition to banks, third-party payment, wild card, etc., are also receiving.

Q: What are the key points of the service gateway?

A: We actually have two gateways. One is the external Web interface service gateway, which focuses on Load Balance and reliability. In particular, when the service is restarted, the gateway needs to be unregistered in advance. After the restart, the gateway needs to delay its registration. There are quite a few mature components that can be used, not necessarily developed ourselves like we did. RPC gateways are also important for stability and performance, and gateways must never be coupled to business.

Q: Microservices are split by table. Is a service connected to a database or is there a unified data layer?

A: The micro service of RPC layer is basically split by table and connected to A database by itself. There are actually two layers, the data access layer and the business logic layer. If the business logic is thin, it is consolidated into one layer.

Q: Using Thrift to do RPC, is service discovery also done by itself?

A: Yes, this is easier to do with Apache Curator, so we’ll do our own.

Q: Can you tell us more about Hbase and Mongo? What are the advantages compared to mongo replicas?

A: The problem is how to use the database. Each database has its own advantages. For example, HBase has excellent write performance and can handle tens of thousands of TPCS. However, the read performance is poor, with only a few thousand less. Couchbase reads very well, but writes are mediocre. Redis reads and writes pretty well, but it doesn’t support very granular data, and it has capacity limitations. MongoDB can seem awkward, full of functionality, but reads and writes are mediocre.

Q: How does Couchbase implement cross-room synchronization over MQ?

A: In the main flow, when data is modified, A Message is sent. Listeners receive this Message and update the data in Couchbase. MQ was adopted because it is less sensitive to network fluctuations than direct database synchronization.