This article has participated in the activity of “New person creation Ceremony”, and started the road of digging gold creation together.

introduce

We all know that Kafka is a distributed streaming platform. It is widely used in the field of big data. And what characteristics can it support the massive data, with this question I will discuss with you in the subsequent articles

Message type

Peer-to-peer (p2p)

Take wechat chat for example, the peer-to-peer type is similar to the private chat between Xiao Ming and Lao Wang. Only Lao Wang can receive the content of xiao Ming’s speech, and others cannot receive the message

Publish/Subscribe (PUP/Sub)

Similarly, let’s take wechat chat as an example. Suppose xiao Ming posts a message in the group at this time. The “subscriber” of the publish/subscribe type is the jack, Jack, and Jack of the herd. The “publisher” is like xiaoming in the group, and the message posted by Xiaoming in the group chat will be accepted by all the “subscribers”

The term is introduced

topic

The main reason for the pub/ SUB message type in Kafka is its reliance on the topic mechanism. In everyday applications, we name different business messages as different topics. For example, user information change: user_Update, order message: Order, etc. Then we can let the downstream related business services to consume different topics (Consumer Group).

Producer/Consumer

  • Producer

The producer of the message. A service can be either a producer or a consumer, but it is best not to do this. It is recommended that the responsibilities of microservices be as single and clear as possible, and that producers send messages to Kafka. So for Kafka, Producer is a Client

  • Consumer

Consumers of information. Messages are consumed from Kafka, so for Kafka, Consumer is also a Client

Broker

A complete Kafka cluster usually consists of multiple Broker instances that receive client requests (Producer produces messages, Consumer consumes messages). These Broker instances can be deployed on the same node at the same time, but above I have placed them on different nodes. The reason for this is obvious, and Kafka designed the Broker for this very reason, so it is highly recommended to deploy multiple nodes in a production environment

Replication

Replication is an implementation of Kafka’s high availability (as many distributed applications seem to call it these days) by dividing data into multiple copies and distributing them on different nodes. These replicas have two roles: Leader and Follower. As shown in the figure above, the Leader directly connects with the clients (Producer and Consumer), while followers continuously obtain and store the latest messages from the Leader

Partition

Partitioning is also a way for Kafka to achieve high availability. The copy mechanism mentioned above only solves the problem of data redundancy. It can ensure that data is not lost, but it cannot scale horizontally. Kafka addresses this problem by introducing a partitioning mechanism in Kafka. A Topic can have multiple partitions, and Producer can only write messages to one of them. Each partition has only one Leader and multiple followers. The message displacement in the partition is marked by Offset. The above description can be a bit abstract, but let’s give you an actual business scenario:

Suppose you now have a scenario where a user updates, and the update message needs to be delivered to the downstream services. You created the topic of user_Update and assigned it three partitions. The partition numbers are 0, 1, and 2. If your business is acting as Producer, your messages will only be sent to one of the partitions at a time. Suppose we now write five messages to a brand new partition 1. The offset of the five messages is :0,1,2,3,4

conclusion

In the above article, we have explained some of the most important concepts in Kafka, hoping to help you comb through the overall structure of Kafka.

  1. We talked about topics, and a Topic can have multiple partitions, with multiple copies in each partition
  2. We talked about partitions. Each partition has only one Leader and multiple followers, and the Producer only writes messages to one partition of the Topic
  3. Finally, we mentioned Offset slightly, which is used to mark the displacement of the message

Writing is not easy. If it helps you, give it a thumbs up