Kafka data loss, repeated consumption, sequential consumption issues

Interviewer: Today I want to ask, do you think Kafka loses data?

Candidate: Well, when using Kafka, it is possible to lose messages in the following scenarios

Candidates: For example, if we use Producer to send messages to brokers, we may lose messages

Candidates: If you don’t want to lose messages, select the API with callBack when sending messages

Candidate: This basically means that if you send successfully, a callback will tell you that you sent successfully. If that fails, you can just retry your business after receiving the callback.

Candidate: It is possible to lose messages after they are sent to the Broker

Candidates: In a clustered online environment, you can send a message and the broker will die. The data will be lost before it can be synchronized to another broker

Candidate: There is no guarantee that data will not be lost once it is sent to the Broker, which uses the operating system cache before storing the data to disk

Candidate: Also known as asynchronous flush this process can lead to data loss

Candidate: Well, I’ve already mentioned three scenarios: producer ->broker, broker-> synchronization between brokers, and broker-> disk

Candidate: To solve the above mentioned problem is relatively simple, there is nothing to say about this one…

Candidate: If you don’t want to lose data, use the API with callback and set parameters such as acks, retries, factor, etc., to ensure that the messages sent by the Producer are not lost.

Interviewer: HMM…

Candidates: Generally speaking, there are more scenarios where clients consume brokers and lose messages

Interviewer: How do you ensure the reliability of your data when you consume it?

Candidates: First, in order for the client to consume data without losing it, you cannot use autoCommit, so it must be committed manually.

Candidate: Here’s how it works on our side:

Pull messages from Kafka (batch pull 500 at a time, depending on configuration

Assign an msgId (incremental) to each pull message

Candidates: 3. Store the msgId in a memory queue (sortSet)

Use Map to store the mapping between msgId and MSG (with offset)

The msgId of the message is deleted from the sortSet. The msgId of the message is deleted from the sortSet.

If the current msgId<=sort Set first Id, commit the current offset

7, even if the system hangs, the next reboot will start from the sortSet queue head message pull, implement at least once processing semantics

Candidates: 8. There will be a small amount of message repetition, but as long as the downstream is idempotent, it is OK.

Interviewer: Well, you mentioned idempotence. How does your business achieve idempotence?

Candidate: Well, let’s take processing order messages for example.

Candidate: idempotent Key We consist of order number + order status (an order status is processed only once)

Candidate: Before processing, we first check Redis to see if the Key exists. If so, we have processed it and throw it away

Candidate: If Redis does not process it, proceed further, and the final logic is to insert the processed data into the business DB, and finally insert the idempotent Key into Redis

Candidates: Obviously, idempotent is not guaranteed by Redis alone (:

Candidate: So, Redis is really just a “pre” process, and the final idempotency is guaranteed by the unique Key of the database (which is actually the order number + state)

Candidates: In general, idempotency is achieved through Redis preprocessing and DB unique index as final guarantee

Interviewer: Do you have any problems with sequential spending?

Candidate: Well, there are, let me give you an example

Candidates: Status of an order such as payment, confirmation of receipt, completion, etc., and billing, refund information under the order

Candidate: Theoretically speaking, the payment of information must be before the refund of information, but the process of processing may not be certain

Candidate: So there is also a consumption order issue here

Candidates: But not “strongly ordered” in AD scenarios, as long as the final consistency is guaranteed.

Candidate: So our implementation for handling out-of-order messages looks like this:

Candidates: one, wide table: each order status, separate out one or more independent fields. Just update the corresponding field when the message arrives. The message will only have a transient state inconsistency problem, but the state will eventually be consistent

Second, message compensation mechanism: another consumes the same topic data, messages fall off the disk, delay processing. The message is compared with the DB, and if inconsistent data is found, the message is resend to the main process for processing

Candidate: There are also some scenarios where we just need to send the same userId/orderId to the same partition (because a partition is consumed by a Consumer) and we can solve most of the consumption order problems.

Interviewer: HMM… To understand the

Welcome to follow my wechat official account [Java3y] to talk about Java interview

Online Interviewer – Mobile seriesTwo continuous updates a week!

Line – to – line interviewers – computer – end seriesTwo continuous updates a week!

Original is not easy!! Three times!!

Kafka data loss, repeated consumption, sequential consumption issues

Related Posts

Here, a simple payment gateway

Java Concurrency – Thread Pools – With scenario analysis

MongoDB Middleware Tool MGM Introduction (3)