Kafka component description

1.Broker

Each Kafka server is called a Broker, and multiple Borkers form a Kafka Cluster.

One or more brokers can be deployed on a single machine, which are connected to the same ZooKeeper to form a Kafka cluster.

2.Topic

Kafka is a publish-subscribe messaging system with the following logical structure:

Topic is the name of the message category. A Topic usually has a class of messages in it. Each topic has one or more subscribers, consumers of messages.

Producers push messages to a topic, and consumers who subscribe to the topic pull messages from the topic.

3. The Topic and the Broker

One or more topics can be created on a single Broker, and the same Topic can be distributed among multiple brokers in the same cluster.

4. Partition log

Kafka maintains multiple partitions for each topic, and each partition maps to a logical log file:

  • Whenever a message is published to a partition on a topic, the broker should append the message to the last segment of the logical log file. These segments are flushed to disk, either by time or by number of messages.

  • Each partition is an ordered, immutable, structured sequence of commit log records. Each log entry in each partition is assigned a sequence number, usually called offset, which is unique within the partition. Argument logic files are divided into file segments (each segment has the same size).

  • The Broker cluster will retain all published Message Records, regardless of whether they have been consumed. The retention time depends on a configurable retention period. For example, if the retention policy is set to 2day, each message is retained for two days after it is published. Within this 2day retention period, messages can be consumed. No longer retained after expiration.

5.Partition distribution

Log partitions are distributed across multiple brokers in a Kafka cluster. Multiple copies of each partition exist on different brokers. This is done for disaster recovery. You can configure how many copies are copied and which brokers are copied to. After the associated replication strategy, each topic will host one to more partitions per broker. As shown in figure:

Each broker to a partition can play two roles: leader and follower. Look at the example above. The red one represents a leader.

Topic1 partition:

The Leader for Part 1 is Broker1 and Followers is Broker2 \3. Part2’s leader is Broker2, followers is Broker1 \4. The Part3 leader is Broker3 and Followers is Broker1 \3. Part4’s leader is Broker4, followers is Broker2 \3.

Topic2 partition:

The Part1 leader is Broker1 and followers is Broker2. The Part2 leader is Broker2 and Followers is Broker3. The Part3 leader is Broker3 and Followers is Broker4.

Topic2 partition:

The Leader for Part 1 is Broker4 and Followers is Broker1 \2\3. Part2’s leader is Broker2, followers is Broker1 \3\4. Part3’s leader is Broker3 and Followers are Broker1 \2\4. Part4’s leader is Broker1, followers is Broker2 \3\4.

Here’s a real example:

The leader of Partition 0 is Broker 2, which has three replicas: 2, 1, 3.

In-sync Replica: In Sync, that is, which brokers are In Sync. ISR of partition 0 is 2,1,3, indicating that the three replicas are in normal state. If there is a broker down, it does not appear in the ISR.

After broker1 is shut down:

The Leader of each partition is used to handle read/write requests to that partition, and the followers of each partition are used to asynchronously copy data from its Leader.

Kafka dynamically maintains an in-Sync Replicas (ISR) set that is consistent with the Leader and persists the latest ISR set to ZooKeeper. If the leader fails, one of the partition’s followers will be elected as the new leader.

So, in a Kafka cluster, each broker usually plays two roles: leader in one partition and followers in the other. The Leader is the busiest, handling read and write requests. In this way, leaders are evenly distributed among brokers to ensure load balancing.

6.Producer

As a message Producer, the Producer needs to send the message to a specified destination (a partition of a topic) after producing the message. The Producer can choose which partition to publish the message to according to the partition selection algorithm or random method.

7.Consumer

  • In Kafka, there is also the concept of a consumer group, which logically groups consumers into groups. Because each Kafka consumer is a process, consumers in a consumer group will likely be composed of different processes distributed on different machines.

  • Each message in a Topic can be consumed by multiple consumer groups, but only one consumer in each consumer group can consume the message. So, if you want a message to be consumed by multiple consumers, those consumers must be in different consumer groups.

  • Each consumer can subscribe to multiple topics.

  • Each consumer keeps the offset it reads into a partition, and the consumer keeps the offset through ZooKeeper.

Architecture diagram

With the above components introduced, it should now be easy to understand Kafka’s architecture diagram:

After version 0.8, consumers will no longer communicate directly with ZooKeeper, so the architecture diagram should be adjusted as well: