consumption

Messages are produced by producers and consumed by consumers. Kafka uses a traditional model for message delivery:

Messages are pushed by producers to brokers (Kafka instances) and pulled locally by consumers from brokers

Consumers consume messages from Broker Pull rather than Broker Push because Broker Push leaves the issue of the rate at which messages are pushed to the Broker, which may need to Push messages to multiple consumers. This process can result in messages being produced faster than they are consumed, resulting in consumers being unable to consume. In addition, still can let the Broker with Push Push them to Push the timing of the news, if it is to receive the news of the producers immediately Push, may every time will only Push little messages, which can lead to inefficient, if is a period of time after receive the message buffer batch Push again, may cause delayed message. So Kafka hands over the speed at which messages are consumed to consumers in the way that consumers Pull messages from the Broker. Of course, this can cause the consumer to Pull the message when there is no message, in which case the consumer will block until the message is generated and can be consumed

Message delivery

By default, Kafka uses at least once delivery mode for message delivery. However, the delivery mode can be at most once by canceling producer retries and updating offset after receiving messages. However, implementing the exactly Once approach requires using a transactional producer/consumer. In addition, when you need to interact with an external system (database, etc.), you can also achieve exactly once effect by writing data and offset together. Data and offset were either written at the same time or not at all

How does Kafka ensure that messages are not lost

Message loss can occur in the producer-push to Broker phase, in Kafka itself, or in the consumer phase. Here is an overview of how to avoid message loss in each phase

  1. The producer pushes the message to the Broker (Server) phase

When a producer pushes a message to the Broker, the producer may not receive a response indicating whether the message has been successfully sent due to network reasons. In this case, the producer can resend the message to ensure that the message has been successfully pushed to the Broker

  1. Kafka itself

If only one Broker exists, the message will be lost if the Broker fails. Therefore, by setting replication. Factor, you can set the number of replicas so that the followers can back up messages from the leader regularly

We all know that Kafka’s topics are partitioned into multiple partitions, and that each Partition is backed up to multiple configured brokers, one of which acts as the leader of the Partition and the rest as fllowers, Only the Leader accepts producer Push messages and consumer Pull messages. Therefore, at any stage, there will be messages at the end of the leader that are not synchronized by the fllowers, and if the leader hangs, the messages that are not synchronized will be lost. To solve this problem, we can set acks=all. By default, acks=1, That is, the message is regarded as successful as long as it is successfully written by the leader. By setting acks=all, the message is successfully written only when it is successfully backed up by all ISRs. However, this mode affects performance because of the backup cost

By setting the min.insync.replicas parameter, we can set the message to be regarded as successfully submitted only after it is successfully backed up by at least N replicas, avoiding the situation that the message is regarded as successfully written by the leader. When acks is set to -1 (all), this parameter takes effect. Note that Kafka cannot write successfully when this parameter is greater than the number of ISR copies.

In the event that all backup nodes fail, Kafka needs to elect a new leader. First, Kafka waits for the ISR node to recover. The node has an extremely high probability of owning all data. However, if all ISR nodes cannot be restored, the Kafka service is unavailable. The second is to allow the ISR node to become a new leader, but the node may have some data are not synchronized in time lead to data loss, Kafka default by the second strategy, can be configured to configuration properties unclean. Leader. Election. Enable disable this policy

  1. Consumers Pull messages from the Broker phase

By default, the consumer will automatically update the offset after the Pull message and then start consuming the message. If the consumer fails after updating the offset message and before consuming the message, Then the new consumer instance/consumer will continue to consume messages from the updated offset when it recovers, resulting in the loss of the previously pulled message not being consumed. Therefore, this place needs to be changed to manually update the offset, and only update the offset after the successful consumption of the message to ensure that the message is not lost