When using MQ, how can I ensure that 100% of messages are not lost?

When interviewing a candidate for a job and discovering the use of MQ technology (such as Kafka, RabbitMQ, RocketMQ) in a project, one question will be asked: How can I ensure that 100% of messages are not lost while using MQ?

This question is a common one in practice, both to test a candidate’s knowledge of MQ middleware technology and to distinguish a candidate’s level of competence.

Here, we start from this question to discuss the basic knowledge and answer ideas you should grasp, as well as the extension of the interview test points.

Case background

Taking JINGdong system as an example, when users purchase goods, they usually choose to use Jingdou to deduct part of the amount. In this process, the transaction service and Jingdou service communicate with each other through MQ message queue. When placing an order, the transaction service sends a “deduct account X 100 Peking beans” message to the MQ message queue, and Peking Beans service consumes this command on the consumer side for the real deduction operation.

The picture

What problems do you encounter in this process?

Case analysis

To know, in the Internet interview, the most direct purpose of introducing MQ message middleware is to do system uncoupling flow control, and to solve the problem of high availability and high performance of the Internet system.

System decoupling: can use MQ message queue, isolation system environment change of upstream and downstream of the unstable factors, such as the beans service system needs no matter how to change, trading service need not do any change, even when the bean service failure, the main trading process can also be the bean service downgraded, realize decoupling trading service and Beijing beans, do the high availability of the system.
Flow control: In the scenario of sudden increase of traffic, such as SEC, MQ can also be used to “peak load and valley fill” the flow. The flow can be automatically adjusted according to the downstream processing capacity.

But the introduction of MQ, while enabling system uncoupling flow control, also brings other problems.

The introduction of MQ message middleware to achieve system decoupling will affect the consistency of data transmission between systems. In a distributed system, if there is data synchronization between two nodes, there will be data consistency problems. Similarly, in this tutorial, you need to solve the problem of message data consistency between the message producer and the message consumer (that is, how to ensure that messages are not lost).

The introduction of MQ messaging middleware for traffic control, however, leads to a backlog of messages due to insufficient processing capacity on the consumer side, which is also a problem you need to address.

As you can see, questions are often linked to each other, and the interviewer is looking to see how consistent you are in solving problems and how well you have a body of knowledge.

So how do you answer the question, “How do you ensure messages are not lost when using MQ message queues?” First of all, you should analyze some of the questions.

How do I know if a message is lost?
What links may lose messages?
How do I ensure that messages are not lost?

Candidates in the answer, want to let the interviewer know your analytical thinking, and then provide a solution: the network data transmission unreliable, want to solve the problem of how to don’t throw the news, the first thing to know which links may throw messages, and how can we know whether the message is lost, the last is solution (rather than come up directly said his solution). “Architecture” represents the architect’s thought process, and “design” is the final solution. Both are indispensable.

Case to answer

Let’s first look at the link of message loss. The process from production to consumption of a message can be divided into three stages, namely message production stage, message storage stage and message consumption stage.

The picture

Message production phase: From the time a message is produced and then submitted to MQ, the MESSAGE is successfully sent as long as an ACK response from the MQ Broker is received normally, so there is no message loss at this stage as long as return values and exceptions are handled.
Message storage phase: This phase is typically handled directly by MQ messaging middleware, but you need to understand how it works, such as brokers making replicas to ensure that a message synchronizes at least two nodes before returning an ACK.
Message consumption phase: The consumer pulls messages from the Broker. If the consumer waits until the business logic has been executed, the message will not be lost as long as it does not immediately send the message to the Broker.

However, in distributed systems, failures are inevitable. As a message producer, you cannot guarantee whether MQ loses your messages or consumers consume your messages. Therefore, in line with the Design principle of “Design for Failure”, You still need a mechanism to Check if messages are lost.

Then, you can explain to the interviewer how to do message detection. The overall solution is as follows: On the message production end, assign a global unique ID to each sent message, or attach a continuously increasing version number, and then perform version verification on the consuming end.

How to implement it? You can use the interceptor mechanism. The message version number is injected into the message by the interceptor before the message is sent by the production side (the version number can be generated using either a continuously increasing ID or a distributed globally unique ID). Then, after the consumer receives the message, it detects the continuity or consumption status of the version number through the interceptor. The advantage of this implementation is that the message detection code will not invade the business code, and the lost message can be located through a separate task for further investigation.

Note here: if there are multiple message producers and message consumers at the same time, it is difficult to implement the method of incrementing the version number, because the uniqueness of the version number cannot be guaranteed. In this case, the message detection can only be carried out by using the globally unique ID scheme. The specific implementation principle is the same as the method of incrementing the version number.

Now that you know what might go wrong (the message storage phase, the message consumption phase) and have a plan for how to detect message loss, you need to come up with a design plan for how to prevent message loss.

After answering the question “How do I ensure that messages are not lost?” After that, the interviewer will often ask, “How can I solve the problem of messages being reused?”

For example, in the process of message consumption, if there is a failure, the sender will retry through the compensation mechanism, and the retry process may produce repeated messages. How to solve this problem?

In other words, the problem is how to solve the consumption-side idempotency problem (idempotency is a command that can be executed multiple times with the same effect as a single execution). Once the consumption-side idempotency is achieved, the problem of repeated consumption of messages is solved.

Again, let’s look at the example of the deduction of Jingdou. The number of gold beans in account X is deducted by 100. In this example, we can make the business logic idempotent by transforming it.

The picture

The simplest implementation is to create a message log table in the database with two fields: message ID and message execution status. Thus, our message consumption logic can be: add a message record to the message log table, and then asynchronously update the user’s jingdou balance based on the message record.

Because we check each time to see if the message exists before insertion, we don’t have to execute a message more than once, thus achieving an idempotent operation. Of course, based on this idea, not only can use relational database, but also through Redis to replace the database to achieve a unique constraint scheme.

Let me add here that one of the prerequisites for solving the problem of “message loss” and “message repeated consumption” is to implement a technical solution for globally unique ID generation. This is one of the questions interviewers like to ask, and one you need to master.

In distributed systems, globally unique ID generation methods include database auto-increment primary key, UUID, Redis, twitter-Snowflake algorithm. I summarized the characteristics of several solutions for your reference.

The picture

I caution you that either way, if you want simplicity, high availability, and high performance at the same time, there is a trade-off, so you need to stand in the real business and explain what the balance is in your selection. Personally, I prefer Snowflake algorithm in business, and I have made some modifications in the project, mainly to make the ID generation rules in the algorithm more consistent with business characteristics, and optimize such problems as clock callback.

Except, of course, “How do I solve the problem of messages being re-consumed?” In addition, the interviewer will ask about your “news backlog.” The reason for this is that message backlogs are a performance issue, and resolving them shows that a candidate is capable of handling the consumption problem in a high-concurrency scenario.

If there is a backlog, it must be a performance problem. In order to solve the performance problem from production to consumption, you need to know where the backlog is likely to occur, and then consider how to solve it.

Since backlogs occur after messages are sent, they have nothing to do with the message production side, and since most message queue nodes can handle tens of thousands of messages per second, performance is not seen in the message storage of the middleware relative to the business logic. There is no doubt that the problem is definitely in the message consumption stage, so from the consumer side, how to answer?

If there is a sudden problem online, it is necessary to temporarily expand the capacity to increase the number of consumer terminals, and at the same time, downgrade some non-core businesses. Take on traffic by scaling up and downgrading to show your ability to deal with emergent problems.

Secondly, it is necessary to troubleshoot and solve abnormal problems, such as analyzing whether the business logic code of the consumer side has problems through monitoring, logging and other means, and optimizing the business processing logic of the consumer side.

Consumption in the end, if it is the processing capacity is insufficient, can provide consumer side by horizontal expansion concurrent processing ability, but there is a test need to pay special attention to, that is the number of instances in expanding the consumer at the same time, must be synchronous expansion theme Topic partition number, to ensure that the consumer is the same number of instance and partition. If the number of instances of consumers exceeds the number of partitions, this expansion will not work because partitions are single-threaded consumption.

For example, in Kafka, a Topic can be configured with multiple partitions. Data can be written to multiple partitions. However, Kafka specifies that a Partition can only be consumed by one consumer. Consumer processing power can be increased by adding partitions.

conclusion

So far, we’ve covered solutions to the hot issues of MQ message queuing. Whether you’re a beginner or an advanced developer, this article has covered everything you need to know to have a friendly conversation with your interviewer. Let me summarize today’s highlights.

How do I ensure that messages are not lost? You need to know at every stage of a message from delivery to consumption, whether there are lost messages, how to monitor for lost messages, and finally how to solve the problem, based on the “Reliable message delivery of MQ” approach.
How to ensure that messages are not re-consumed? In the process of message compensation, there must be repeated messages, so how to realize the idempotency of the consumption side is the focus of this question.
How to deal with message backlog? In order to achieve true high performance through MQ, the highest priority is to resolve online exceptions, then monitor and log to troubleshoot and optimize the business logic, and finally expand the number of consumers and shards.

When answering questions, it’s important to give the interviewer a sense of your thought process. This kind of problem-solving ability is much more valuable to the interviewer than your direct answer to an interview question.

In addition, if you are applying for a position in the infrastructure department, you will need to master other knowledge systems of message-oriented middleware, such as:

How do I choose messaging middleware?
What is the difference between the queue model and the publish-subscribe model in messaging middleware?
Why can message queues achieve high throughput?
Serialization, transport protocols, and memory management
… >

Source: blog.csdn.net/gu131007416553article/details/120934738

Recommend 3 original Springboot +Vue projects, with complete video explanation and documentation and source code:

Build a complete project from Springboot+ ElasticSearch + Canal

Video tutorial: www.bilibili.com/video/BV1Jq…
A complete development documents: www.zhuawaba.com/post/124
Online demos: www.zhuawaba.com/dailyhub

【VueAdmin】 hand to hand teach you to develop SpringBoot+Jwt+Vue back-end separation management system

Full 800 – minute video tutorial: www.bilibili.com/video/BV1af…
Complete development document front end: www.zhuawaba.com/post/18
Full development documentation backend: www.zhuawaba.com/post/19

【VueBlog】 Based on SpringBoot+Vue development of the front and back end separation blog project complete teaching

Full 200 – minute video tutorial: www.bilibili.com/video/BV1af…
Full development documentation: www.zhuawaba.com/post/17

If you have any questions, please come to my official account [Java Q&A Society] and ask me

When using MQ, how can I ensure that 100% of messages are not lost?

Case background

Case analysis

Case to answer

conclusion

Build a complete project from Springboot+ ElasticSearch + Canal

【VueAdmin】 hand to hand teach you to develop SpringBoot+Jwt+Vue back-end separation management system

【VueBlog】 Based on SpringBoot+Vue development of the front and back end separation blog project complete teaching

Related Posts

Spring Security column (How to Build a User System with Spring Security)

Dynamically modify the string message of Protocol Buffers

What are the names of good open source software classes?