I received feedback from a partner in a business group. The specific questions are as follows:

One of the Kafka message groups in the project is particularly slow to consume, and sometimes you see in the Kafka-Manager console that some consumers have been kicked out of the consumption group.

The following information is displayed in the server log:

The consumer group was rebalanced over 600 times in a short period of time.

According to CAT, each message will be processed for 4 times through database interaction. After some communication, it is found that the processing time of each message is more than 200ms.

The following situations occur when Kafka is rebalanced:

  1. Members of the consumption group change, new consumers join or leave, or consumers crash;
  2. The number of topics subscribed to by the consumer group has changed;
  3. The number of partitions subscribed to by the consumer group changed. Procedure

In the absence of either point 2 or 3, a change in the consumer group membership causes Kafka to be rebalanced.

Kafka client log kafka client log kafka client log

The description of the log tells us that the consumer was dropped because it took too long to call the poll() method to consume, which mentioned two parameters, max.poll.interval.ms and max.poll.records, and also caused a commit

Max-poll.interval. ms indicates the maximum time for consumers to process message logic. For some services, it may take a long time to process messages, for example, one minute. In this case, set this parameter to a value greater than one minute. The default value is 300000.

Max.poll. records indicates the default number of pulled messages at a time. The default value is 500.

Let’s calculate:

200 * 500 = 100000 < max.poll.interval.ms =300000,

As I mentioned earlier, there is a high probability that the processing time of each message will exceed 200ms.

Conclusion:

The problem of this time is that the message consumption logic of the client takes too long. If the production end sends more messages, the consumer end consumes 500 messages each time, which is easy to cause the consumption time to be too long. If the time exceeds the time set by max.poll.interval.ms, Kafka cannot be consumed in the process of rebalancing. As a result, the consumer group is ina state similar to Stop the world and cannot submit its displacement during the rebalancing process. This leads to repeated message consumption which slows down the consumption rate of the consumer group and leads to message accumulation.

Solutions:

Adjust the balance between max.poll.records and max.poll.interval.ms according to the business logic to avoid rebalancing when consumers are frequently kicked out of the consumer group.

More exciting articles please pay attention to the author of the maintenance of the public number “back-end advanced”, this is a focus on back-end related technology of the public number.

Pay attention to the public number and reply “back end” to get free back end related electronic books.

Welcome to share, reprint please reserve source.