References:

  • Simple _kafka
  • Kafka complete theory + practice
  • Kafka portal

Ps: In fact, the reference materials are very clear, but to avoid the “one look, one do” situation, still write a note

Knowledge preview:

Basic concept

Producer, consumer

This is easy to explain, whoever sends a message to a queue is the producer; Whoever reads the message from the queue is the consumer.

topic

A message-oriented middleware will have a number of queues, which we call a topic.

A topic can be posted by multiple producers and read by multiple consumers.

partition

Partition. To improve the throughput of a queue, Kafka partitions topics, meaning that a topic contains multiple partitions. So, in fact, topics are a virtual concept, and messages are really stored on partitions.

broker

A kafka machine is a broker, and a Kafka cluster is a number of Kafka machines.

Consumer groups

A consumer group consists of multiple consumers who consume multiple partitions under the same topic. Each consumer is responsible for one or more partitions, mainly to improve system throughput.

The offset

Because a topic may be subscribed by multiple consumers, the partition records an offset for each consumer group in order to keep track of where consumers have read messages.

The distribution of the partition

Partitions of the same topic can be distributed across different brokers. However, a partition is divided into a primary partition and a backup partition. Producers and consumers only interact with the primary partition, and the backup partition only backs up data and does not participate in reading and writing. To ensure high availability, a broker stores messages not only from its own primary partition, but also from other backup partitions. Therefore, a broker contains all the partitioned messages for a topic.

For example, we have A topic that has three partitions deployed on three machines. Machine A has partition 1, machine B has partition 2, and machine C has partition 3. These partitions are primary partitions on their own machines. To ensure high availability, machine A has backup partitions 2 and 3, and machine B has backup partitions 1 and 3

Ps: The primary partition and the backup partition, to be exact, should be called the leader and follower copies

persistence

After a producer posts a message to a partition, the message needs to be persisted to disk to prevent data loss. Kafka avoids slow I/O operations by sequential writes. Write to cache and wait a certain amount of time or more before flush.

Note: Messages are not deleted after they are consumed, unless expiration time is set and the file size is specified