If you browse through microservices materials often, you’ll come across microservices. IO, which summarizes the design patterns for all aspects of microservices. The website was written by Chris Richardson.

These related lessons were published in a book called Microservices Patterns in 2018 and introduced to China in 2019. I bought the book as soon as I could, but I was too lazy to finish it. Recently, WHEN I was preparing to share the contents of the book, I came back to the book and read it in its entirety. It feels like the best book in microservices right now. Another book, Building Microservices, is also good, but a little thinner.

The book provides us with a macro view of the entire ecosystem of microservices, such as:

Of course, at 18, things like Service Mesh weren’t very popular, so there was an updated version on the website:

Personally, I like this kind of big map very much. No matter what field, I just need to follow the map to fill in the holes bit by bit. Without such a map, I always feel that I am wandering around in the technology forest with no end in sight, unable to find the north.

Here is my summary of the book. I have omitted the saga and testing part.

The singleton service dilemma

In the monomer era, everyone developed in a warehouse, code conflict resolution is very troublesome, online CI/CD pipeline is also waiting to die.

With the split, at least everyone has their own code base, their own launch process, and their own online services. This line does not fight, after the line can also play their own gray process, generally will not affect each other.

Service split

Although say to tear down, split nevertheless also want to pay attention to method.

The book provides two ideas, one is by business/business capability split, the other is by DDD sub domain split.

  • Supplier management
    • Delivery man information management
    • Restaurant information management: manage restaurant orders, opening hours and locations
  • Consumer management
    • Managing consumer information
  • Order acquisition and fulfillment
    • Consumer order management: allows consumers to create and manage orders
    • Restaurant order Management: Allows restaurants to manage the order preparation process
  • Room management
    • Deliveryman status management: Manages the real-time status of deliveryman who can receive orders
    • Distribution management: order distribution tracking
  • accounting
    • Consumer bookkeeping: To manage consumer order records
    • Restaurant bookkeeping: Manages the restaurant’s payment records
    • Distribution personnel accounting: manage distribution personnel income information

These services are likely to emerge eventually.

Using DDD to do analysis, in fact, we get similar results:

When splitting, we should also use the SRP principle in SOLID and the common Closure Principle (CCP), another closure principle.

After the split, we should also note the additional problems that the split of microservices brings us:

  • Network latency
  • Synchronous communication between services reduces availability
  • Maintain data consistency between services
  • Get a consistent view of the data
  • Block split God class

The service integration

Distributed service communication can be roughly divided into one-to-one and one-to-many:

RPC is easy to understand, synchronous request/response. Asynchronous communication, one is callback request/ Response, the other is one-to-many PUB /sub.

When it comes to RPC, multiple protocols and frameworks can be used:

However, when the API is updated, it should follow semver’s specifications. Many gRPC updates in the community did not comply with Semver, causing a lot of trouble for its dependencies. Those of you who are interested should be able to search for some related events.

It must be said that Google’s programmers are not always reliable.

When RPC is integrating services, it is necessary to avoid being dragged to death by slow response of some unstable services. It is necessary to set timeout and fuse.

There are two ways for services to find each other. One is service discovery based on service registries.

One is dnS-BASED service discovery. There shouldn’t be too many dnS-BASED these days.

In addition to RPC, messages can also be used for integration between services.

It is also possible to emulate RPC’s request/response using MQ, but this will make your service heavily dependent on MQ, and if MQ fails, the whole system will crash.

In general, we use broker-based MQ communication, but there is also asynchronous communication with or without a broker. Here is an example of ZeroMQ in the book, which I didn’t know much about before, so I need to investigate again.

Event Sourcing

Event Sourcing is a special design pattern that does not record the final state of an entity, but rather all events that have changed their state. The final state of the entity is then calculated through events.

However, if too many events are accumulated, performance problems may occur. Therefore, you can calculate part of the historical data to obtain an intermediate snapshot, and then superimpose the subsequent calculation on the snapshot.

It looks like a cool solution, and we’ve actually used this design pattern in some downstream computational logic, but it has its drawbacks:

  • When the structure of the event itself changes, it is difficult to make compatibility between old and new versions
  • If you’re dealing with both old and new versions of data in your code, it can be very difficult to maintain after a few upgrades
  • Because it is easy to trace, deleting data becomes very troublesome. GDPR regulations require users to delete historical data clean when logging out, which is a huge challenge to Event Sourcing

We also encounter some practical business problems when using asynchronous messages for decoupling:

  • I need this data, could you help me to pass it on in the message
  • Why did you delete this field when you refactor, I still need to use it
  • Your original state machine now has more events. Originally there were three events, but now there are two?
  • When your API failed, why the message order was out of order?

This requires that we have a system to validate upstream domain events, which we can refer to in Google’s Schema Validation project, as I described earlier in MQ is Becoming a gutter. I won’t mention it here.

Query mode

Much of the query logic is simply API data composition, which involves the API combinator that needs to compose the data, and the data provider:

  • API combinator: Query operations are implemented by querying data provider services
  • Data provider: provide data

As simple as it may seem, when writing code, the following problems are difficult to deal with:

  • Who is responsible for assembling the data? Sometimes it is the application, sometimes it is the external API Gateway, it is difficult to establish a unified standard, and it is often bickering in the company
  • Additional overhead – one request queries many interfaces
  • Reduced availability – 99.5% availability per service, the actual interface may be 99.5^5=97.5
  • Transaction data consistency is not guaranteed – need to use distributed transaction framework/use transaction messages and idempotent consumption

CQRS

Business developers often refer to themselves as CRUD engineers, and in architectural design, THE CRUD R can be separated out, as follows.

What are the benefits of taking it out? The Internet is mostly a service that writes less and reads more. After the separation of concerns, the storage of read service and write service can be heterogeneous.

For example, write can be MySQL, while read can be a variety of NoSQL that are very easy to scale horizontally. You can also read Elasticsearch if you need to retrieve it.

The read service can subscribe to the domain events of the write service or the binlog of MySQL.

When consuming upstream data, it is necessary to determine how some state machines should be processed according to business logic. In fact, there is coupling in the data, it is not put MQ and domain events can be decoupled clean.

The disadvantages of CQRS are also obvious:

  • Architecture complex
  • Data replication delay problem
  • Query consistency Problem
  • Concurrent update problem handling
  • Idempotent problems need to be dealt with

External API pattern

Nowadays, Internet companies generally have multi-terminal clients, including Web, mobile and open API to third parties.

If we were to open up the internal API that we had previously split, it would be very, very difficult to upgrade the internal API in the future.

In the monomer era, clients weakened the Internet, requiring only one call. Microservitization, without any optimization, requires multiple calls over slow networks like the Internet.

This is why we need an API Gateway in the middle.

With Gateway, there is still one call on the Internet, and multiple network calls are relatively less bad in the state of internal IDC strong networks.

The API Gateway addresses the question of who should maintain these different ends of the API Gateway. In an ideal scenario, the Mobile team maintains their API in the Gateway (or perhaps a separate Gateway), the Web team maintains the Gateway for the Web, and the Open API team maintains the Gateway API for third-party applications.

The Gateway infrastructure team is responsible for providing the infrastructure required by all three parties.

When developing the API Gateway, we had several options:

  • Use open source products directly
    • Kong
    • APISix
    • Traefik
  • Since the research
    • Zuul
    • Spring Cloud Gateway
    • RESTFul make one yourself
    • GraphQL make one of its own

Most open source API Gateways do not support API data composition functions, so there are sometimes two layers of API gateways in the company. One layer is a Gateway such as Nginx which is responsible for simple routing and authentication, and then there is a BUSINESS BFF which is responsible for the data required on the assembly side.

If we were all self-developed, we could implement all the functionality required by the API Gateway in one module. One question that is often discussed here is whether we want to use REST or graphQL-like graph queries.

Netflix engineers published this article in 2012:

Why REST Keeps me up in the middle of the night. Netflix had hoped to provide a unified solution for all devices, using a unified REST API, but later found that this unified approach was to abandon the optimization of any device with its own characteristics, such as some devices with small memory, Some devices have small screens, and many of the fields you return on these devices are useless to them, wasting network bandwidth. Some devices perform better with a streaming response than with a full response, and this should also be an optimization point to consider.

So Netflix made a scheme called Falcor, which was actually very similar to Facebook’s GraphQL. It used JSON Graph to describe the data provided by the internal API, and then used JS to customize the query on the Graph.

Now most people are familiar with GraphQL:

GraphQL is a great thing, but I’ve been personally skeptical about using GraphQL before, mainly because:

  • The gateway is easily destroyed by client modification
  • Stability is hard to guarantee
  • Most GraphQL speakers on the Chinese Internet did not mention stream limiting or even mentioned it, which is not very responsible

Until recently, I found out that a foreign company published their GraphQL stream limiting scheme, which is very interesting and I will share it in the next article.