How to achieve ultimate consistency in distributed microservices architecture applications?

In distributed systems, achieving strong consistency is not easy. Even with 2PC and 3PC phase commits, absolute consistency is not guaranteed.

We should not let the system’s overall performance deteriorate, or scalability suffer, and the architecture become extremely complex because of the small probability of inconsistencies. Therefore, in the absence of large-scale use of 2PC/3PC submission, final consistency is a better solution and is widely used in the industry.

I. Retry mechanism

As shown in the figure below, the Service Consumer calls both Service A and Service B. If Service A is successful, Service B is recognized. The easiest way to ensure final consistency is to retry.

When you retry, set the timeout period for the Service Consumer to avoid a long wait or resource exhaustion.

When a Consumer tries again, note the following:

Timeout time;
Number of retries;
Retry interval;
Attenuation of retry interval time;

For details, see “Elegant Retry Schemes based on Spring-Tryer”.

2. Record logs locally

Start a periodic check tool by recording logs locally and collecting them in a distributed monitoring system or other back-end system. You can select manual processing based on the actual situation.

Log format: tranID-a-b-detail

TransID is the transaction ID, which can generate a random sequence number.
Detail is the detailed content of the data.
If the call to A succeeds, A success is logged;
If the call to B fails, or a failure occurs, there is no record, etc., i.e., B success is not in the log, then B is called again;
You can check periodically and process logs.

A design diagram for collecting identification logs is shown below.

Reliable message mode

In actual service scenarios, the probability of failure is low.

The Service Consumer failed to invoke Service B and tried again. If a certain number of retries still fail, the Message Queue is sent directly and the processing is switched to asynchronous.

You can use MQ with strong distributed capability, such as Kafka, RocketMQ and other open source distributed message systems for asynchronous processing.

Service B can specifically integrate an error-handling component to continuously collect compensation messages from MQ.
Or a separate error-handling component that handles MQ compensation messages independently, including exceptions from other Service components.

There is also a risk of message loss in this scenario, where the Service Consumer message hangs before it is sent, which is a small probability event.

There is another alternative, the reliable message pattern, as shown in the figure below. The Service Consumer sends a Message to the Message Queue Broker, such as RocketMQ, Kafka, and so on. Service A and Service B consume the messages.

MQ can be distributed MQ and can be persisted so that messages are not lost through MQ and MQ is considered reliable.

Advantages of the reliable message pattern:

Improved throughput;
Reduced response time in some scenarios;

Existing problems:

Inconsistent time Windows (business data enters MQ but not DB, causing some scenarios to fail to read business data);
Increased architectural complexity;
The consumer (Service A/B) needs to be idempotent;

In view of the above inconsistent time window problem, further optimization can be made.

Services are divided into core services and subordinate services
Core business services – direct invocation;
Dependent business services – consume messages from MQ;

Directly call the order service (core service) and land the business order data in DB; At the same time, send sends messages to MQ.

Consider that the Service Consumer (creating the order) can hang before sending a Message to MQ, meaning that the order Service must be called and the Message sent in one transaction because processing distributed transactions is cumbersome and affects performance.

Therefore, another table is created: the event table, in the same database as the order table, and transaction protection can be added to turn distributed transactions into single-database transactions.

The whole process is as follows:

(1) Create order – persist business order data and insert an event record into the event table. Note that this is done in a transaction to ensure consistency. If it fails, there is no need to worry about the rollback of the business service; if it succeeds, it continues.

(2) Send message – Send order message to message queue.

If the message fails to be sent, it is retried, and if it hangs before the retry succeeds, the compensation service resends the message (a low probability event).
The compensation service constantly polls the event table to find abnormal events to send compensation messages, and ignores them if successful.
If the message was sent successfully, or if the compensation service was sent successfully, you can consider deleting the event information record in the event table (logical deletion).

(3) Consuming messages – Other subordinate business services can consume order messages in MQ for their own business logic processing.

In the above design scheme, there are three points that need to be explained:

(1) Direct call of order service (core business) is to make business order data as soon as possible, avoid inconsistent time window and ensure consistency after write.

(2) Create order business to send messages directly to MQ, in order to increase real-time, only in abnormal cases, the compensation service will be used. If the real-time requirement is not high, the logic of sending Message directly can also be considered.

(3) An additional event table is introduced to change the distributed transaction into a single database transaction. To some extent, it also increases the pressure on the database.

The above content is some of my own feelings, share out welcome correction, incidentally beg a wave of attention, have ideas of partners can comment or private letter I oh ~

Author: Software architecture

Source:
Today’s headline

How to achieve ultimate consistency in distributed microservices architecture applications?

I. Retry mechanism

2. Record logs locally

Reliable message mode

Related Posts

Memory management

OKR eight Questions – Common questions and thoughts about OKR

Read 30+ top machine learning papers from GIthub.com