This is the 13th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

Kafka looks like this:

The main character

  • Producer: Indicates the producers of messages
  • Consumer: There can be more than one Consumer of a message
  • Broker: Machine nodes that store Kafka

Topic & Partition

For all message subscriptions, you need to specify a topic. A topic can correspond to multiple partitions, but a partition can belong to only one topic.

A partition can be thought of as an append-only log file. Partitions are distributed across multiple brokers for high performance.

Each partition has an offset bit to ensure orderliness within the partition. So there is order within each partition, but there is no guarantee of order between partitions. (I.e. Kafka guarantees partition order rather than theme order).

Once the topic is created, you can continue to modify the number of partitions to scale out the data horizontally. If a topic corresponds to only one file, the machine I/O of that file becomes a performance bottleneck for that topic, and partitioning solves this problem. (Partition increased throughput)

A copy of the

To improve disaster recovery, replicas are added for partitions. Complete synchronization between replicas cannot be guaranteed at the same time. In a replica structure, the leader processes read/write requests, while the follower only synchronizes data with the leader. The replicas must be in different brokers. If the leader fails, the followers elect a new leader to troubleshoot the failure.

Some concepts for replica synchronization:

  • ISR

    • In-sync-replicas that are synchronized with the leader
  • OSR

    • Combine with out-of-sync-replica where the leader is too far behind
  • HW

    • High Watermark High Watermark. Only the data before HW can be consumed, the window behind it cannot be consumed.
  • LEO

    • Log End Offset: represents the Offset to be written to in the Log file of the current partition. This is for each partition. Each partition and copy maintains a LEO equal to offset+1 in the last entry of the current log file.

When the follower copy falls too far behind or fails, the leader copy removes it from the ISR set. If a follower copy in the OSR set catches up with the leader copy, the leader copy moves it from the OSR set to the ISR set. Generally speaking: When the leader fails, only the ISR copy can be selected as the leader, and the OSR copy has no chance.

The principle of HW is similar to the barrel theory in that the highest water level is determined by the shortest board in the barrel of these partitioned copies. HW is the shortest height.

Kafka’s replication mechanism is neither fully synchronous nor purely asynchronous.

  • Synchronous replication: The message is submitted successfully only after all replicas receive the message. However, the message deteriorates performance
  • Asynchronous replication: As long as it is written by the leader replica, the commit is considered successful

Kafka cannot easily calculate one of these two. The ISR approach effectively balances the relationship between data reliability and performance. Kafka pursues high performance through asynchronous writes, where replicas actively pull changes but do not guarantee strong consistency. However, the HW location in the public state is maintained to ensure that the message is reliable.