preface

Point_right: How to ensure that Kafka messages are not consumed twice? We are doing in order to program the robustness of the development, in the use of Kafka try again generally set the number of times, but because of some reasons of network, set up the retry is likely to lead to some messages sent repeated (of course lead to repeat it could also be other reasons), so how to solve the problem of repeat this information?

Here are three solutions to this problem for your reference.

The solution

Plan 1 / Save and query

Set a unique key for each message, record the key as it is consumed, and then query each time a new message is consumed to see if the key of the current message has been consumed. (This is nice, but not easy to implement)

Plan 2 / use idempotent

Idempotence is mathematically defined in such a way that a function f(x) satisfies Idempotence if f(f(x)) = f(x).

The concept has been extended to computers to describe an operation, method, or service. The property of an idempotent operation is that any number of executions have the same effect as a single execution. An idempotent method, called multiple times with the same arguments, has the same effect on the system as a single call. Therefore, for idempotent methods, there is no need to worry about any changes to the system caused by repeated execution.

Let’s make this clear. Without considering concurrency, “set teacher X’s account balance to 1 million yuan”, the impact on the system after one execution is that teacher X’s account balance becomes 1 million yuan. As long as the provided parameter of 1 million yuan remains unchanged, teacher X’s account balance will always be 1 million yuan, no matter how many times it is executed. This operation is an idempotent operation.

This operation is not idempotent. Each time it is performed, the account balance will increase by one million yuan. The effect of multiple execution and one execution on the system (the account balance) is different.

So, from these two examples, we can imagine that if the business logic of the system consuming messages is idempotent, there is no need to worry about message duplication, because consuming the same message once has exactly the same effect on the system as consuming it many times. It can also be considered that consumption multiple times is equal to consumption once.

So, how do you implement idempotent operations? The best approach is to design the business logic of consumption as idempotent operations, starting with business logic design. However, not all businesses can be designed to be naturally idempotent, and there are methods and tricks needed to achieve this.

A common approach is to implement idempotence using a database’s unique constraint.

For example, we just mentioned the non-idempotent transfer example: add 1 million yuan to teacher X’s account balance. In this case, we can transform the business logic to be idempotent.

First, we can limit each transfer bill to one change per account. In a distributed system, this limit can be implemented in many ways. The simplest is to create a transfer statement table in the database, which has three fields: Transfer billing ID, account ID, and change amount, then combine the transfer billing ID and account ID fields to create a unique constraint so that no more than one record can exist in the table for the same transfer billing ID and account ID.

Thus, our logic for consuming messages could be: “Add a transfer record to the transfer statement, and then asynchronously update the user balance based on the transfer record.” In the operation of adding a transfer record to the transfer flow table, we pre-defined the unique constraint of “account ID to bill ID” in this table, only one record can be inserted into one account for the same transfer bill, and subsequent repeated insert operations will fail, thus realizing an idempotent operation.

Option 3 / Set preconditions

Setting preconditions for updated data Another way to implement idempotent changes is to set a precondition for data changes. If the condition is met, the data will be updated; otherwise, the data will not be updated, and the data to be determined in the precondition will be changed when the data is updated.

In this way, the data to be determined has been changed during the first data update. If the conditions are not met, the data update operation will not be repeated.

For example, just now we said that the operation “increase the balance of Teacher X’s account by 1 million yuan” does not satisfy the idempotency. We can add a precondition to this operation and change it to: “If the current balance of teacher X’s account is 5 million yuan, increase the balance by 1 million yuan”, the operation becomes idempotency.

When it is used in the message queue, it can bring the current balance in the message body when sending a message, and determine whether the current balance in the database is equal to the balance in the message when consuming. Only when the balance is equal, the change operation can be performed.

But what if the data we want to update isn’t a value, or if we want to do a more complex update? What are the preconditions? A more general method is to add a version number attribute to your data. Before each update, compare whether the version number of the current data is consistent with the version number in the message. If the version number is inconsistent, refuse to update the data.

The last

Today we provide you with the solution of message repetition, but also refer to the “message queue master class” ideas, if you have any good solution, welcome to discuss!!