This is the 19th day of my participation in the Gwen Challenge in November. Check out the details: The last Gwen Challenge in 2021

Related: Consumer groups in Kafka

Consumer displacement

The previous article introduced Consumer groups for Kafka, where each consumer instance is assigned to a number of topic partitions and is responsible for consuming messages from those partitions. During consumption, the message “which one is consumed so far” in each partition needs to be recorded, which is the consumer shift in Kafka. Where is this displacement information stored?

Storage of consumer displacement

In the original version, this information was stored in ZooKeeper. When submitting a consumer shift (either manually or automatically), this information was submitted to ZooKeeper for storage. If the consumer restarted, it would automatically read the saved shift information from ZooKeeper. So we know where we went last time.

However, the consumer commits a shift every time a message is successfully consumed, and ZooKeeper is not suited for this kind of high frequency write operation. In later versions, Kafka updated the shift management mechanism. Kafka stores the shift information as plain Kafka messages in a particular Topic called __consumer_offsets, or shift topics, so that high frequency reads and writes are not a problem.

Note that while there is nothing special about this Topic compared to other topics, it is strongly recommended that you do not use this Topic and simply use Kafka’s API to commit shifts.

This theme is created when the first consumer starts. Kafka automatically creates this theme. By default, this theme is a 50 partition 3 copy of the theme. You can use the Broker the parameter offsets. Topic. Num. Partitions and offsets. The topic. The replication. The factor parameters to modify the two default values. Alternatively, you can manually create the theme and specify the number of partitions and copies using the API provided by Kafka before the first consumer program starts.

Shift the content of the topic

In a shift topic, the format of the message is defined by Kafka. Therefore, this is why it is strongly recommended not to manually manipulate the topic or send messages to it. If you send a message that does not conform to Kafka’s specified format, Kafka will not be able to parse it, resulting in a Broker crash.

The message in the displacement Topic can be simply understood as “what is the consumption displacement of a certain consumer Group in a certain Topic Partition”, or as a key-value pair, in which the Key is Group ID + Topic + Partition, and the Value is the displacement. The details are not discussed here.

Compact compaction strategy

You might consider a problem, a consumer group in a theme of a partition displacement data, with the consumption change constantly, then there will be save the displacement of message being sent to the displacement of the theme, but in fact, the same displacement of consumer groups in the same theme of the same partition, only keep the latest one is ok, that is to say, Just keep the latest results, there’s no need to keep them constantly changing, which saves a lot of storage.

This is what Kafka does. It uses the Compact policy to keep only the latest message for the same Key. (If you know Redis, this is very similar to Redis’s AOF rewriting mechanism).

This is done by a special background thread, the Log Cleaner, which periodically inspects the topic to be Compact to see if there is deletable data that meets the criteria.