Met all over the world, intimate to a few people. -- Ming Feng MenglongCopy the code


In this paper, from the author’s CSDN (blog.csdn.net/zhanshenzhi…). Original, because nuggets have removed the reprint function, so we have to upload again, the following pictures still maintain the original published watermark (such as CSDN watermark). (The original will be shared and published on multiple platforms in a newly created state)

preface

In other words, today is 1024, the program ape (yuan) festival, in the “diaosi” popular, known as the Diaosi festival. With the prospect of a well-off society in all respects, in order to show the identity of code farmers in the IT industry, the program monkey (yuan) has been promoted from diaosi to Imashi, perhaps in order to keep pace with the double 11. Above pure TX! Or to some dry goods……

Kafka architecture

As you can see from the figure above, Kafka has the concepts of producer, broker, consumer group, consumer and ZooKeeper.

Producer

Note: Producer ProducerRecord is a producer that pushes to a Kafka broker with a topic topic, partition (optional), key (optional), message body, and timestamp (optional). The source code is as follows:

Topic

Note: the theme

Kafka pushes messages in the form of topic, a topic that can be consumed by multiple consumers. The message body in a topic is stored as multiple partition partitions

Partition

Note: the partition

  • If the ProducerRecord partition specifies a partition, the message body is stored in the specified partition
  • If the partition of ProducerRecord is not specified, the default partition DefaultPartitioner is used to calculate, and then the ProducerRecord key is used to calculate
  • If the key is empty, each of the topics is polledThe available partitionsTo send a message (note here, yesThe available partitions).
  • If the key is not empty, the partition number is calculated based on the Hash algorithm (which uses the MurmurHash2 algorithm for high performance and low collision rate). The same key is allocated to the same partition.

Offset

Note: offset Each partition has an Offset, which is the producer’s Offset and the largest Offset of the partition. And, of course, the consumer offset. I’ll talk more about that in the next issue. # Segment

A partition consists of multiple segments of equal size. The number of messages in each Segment file can be different

Segment

  • Kafka Broker adds a message to the last Segment of the partition. If the number of Segment messages has reached a threshold or the number of Segment messages has expired, Kafka Broker adds a message to the last Segment of the partition. If the size of a Segment exceeds a certain value, a new Segment is flushed to disk.

  • The Segment of the file

    1. Index file: indicates the index file of the Segment, corresponding to the first message of the Segment
    2. Data file, with the suffix.log, indicates that the two data files appear in pairs
  • Segment naming rules

    1. The first Segment of the Partition starts at 0
    2. The following segments are the maximum Offset from the previous global Partition
    3. Values are up to 64 bits long, 19 digits long, and no digits are padded with zeros.
  • Each segment stores many messages, and the message ID is determined by its logical location. That is, the message ID can be directly located to the message storage location, avoiding additional mapping from ID to location

  • Find offset=1001 process:

    1. To find the first Segment file file, 00000000000000000000. The index into global Partition Segemtnt beginning file, So the message of the deviation of the offset = 1001 Segemnt file called 00000000000000001001. The index
    2. And then find the corresponding data file 00000000000000001001. The log, the start offset of 1001 + 1 = 1002, the same is true in other offset lookup (binary search file name)
    3. Positioning the index file physical location Position, its direct mapping memory physical address, and then in sequence corresponding to find the 00000000000000001001. The index.

Consumer Group

Note: Consumer group

Multiple consumers in the same group, become a consumer group. A partition message can only be consumed by a certain consumer in the same consumer group, but can be consumed by a certain consumer in other consumer groups.

#Consumer

Note: Consumers

Consumption group: Consumers = 1:1

Broker consumer partition allocation: Messages from all partitions are consumed by a single consumer

Consumer group: Consumer = 1: N, and the number of partitions is greater than that of consumers

P0 -> C1 consumption; P1 – > c2 consumption; The p2 – > c1 consumption

Consumer group: Consumer = 1: N, and the number of partitions = consumers

P0 -> C1 consumption; P1 – > c2 consumption; The p2 – > c3 consumption

Consumer group: Consumer = 1: N, and the number of partitions is less than consumers

P0 -> C1 consumption; P1 – > c2 consumption; The p2 – > c3 consumption; C4 is not consumed. Because in the same consumption group, a partition can only be consumed by a consumer under it. Why? It’s easy to think about it: avoid repeated spending.

Consumer group: Consumer = N: N

Broker group1: p0->c1 P1 – > c2 consumption; The p2 – > c1 consumption; Group2: p0->c1; P1 – > c2 consumption; The p2 – > c3 consumption; Group3: p0->c1; P1 – > c2 consumption; The p2 – > c3 consumption; C4 is not consumed.

subsequent

Kafka Offset = Kafka Offset = Kafka Offset

There is no end to learning.