How many questions can we ask about kafka’s architecture?

  • 1. How are Kafka topics and partitions stored internally?

  • 2. What are the advantages of Kafka’s consumption model over traditional messaging systems?

  • 3. How does Kafka implement distributed data storage and data reading?

Kafka architecture diagram

1. Kafka

In a Kafka architecture, there are multiple producers, brokers, and consumers. Each Producer can correspond to multiple topics, and each Consumer can correspond to only one ConsumerGroup.

The entire Kafka architecture corresponds to a ZK cluster that manages the cluster configuration, elects the Leader, and rebalance when the consumer group changes.

The name of the interpretation

Broker

Message-oriented middleware processes nodes. A Kafka node is a broker. One or more brokers can form a Kafka cluster

Topic

Topic. Kafka categorizes messages by topic. Each message published to a Kafka cluster needs to be assigned a topic

Producer

Message producer, the client that sends messages to the Broker

Consumer

Message consumers, clients that read messages from the Broker

ConsumerGroup

Each Consumer belongs to a specific Consumer Group, and a message can be sent to multiple Consumer groups, but only one Consumer in a Consumer Group can consume the message

Partition

Physically, a topic can be divided into multiple partitions, each of which is internally ordered

2. The Topic and Partition

Every message in Kafka has a topic. Generally speaking, different themes can be set for different types of data generated in our application. A topic typically has multiple message subscribers, and when a producer publishes a message to a topic, consumers who subscribe to the topic can receive new messages written by the producer.

Kafka maintains distributed partition log files for each topic. Each partition is an Append log on the Kafka storage layer. Any messages posted to this partition are appended to the end of the log file, and each message in the partition is assigned a monotonically increasing chronological number, called offset, which is a long number. We use this offset to determine a unique message under this partition. Order is guaranteed under partition, but not under topic.

In the figure above, our producer will decide which Partition to send to.

  • If there is no Key value, polling is performed.

  • If there is a Key value, Hash the Key value and mod the number of partitions to ensure that the messages with the same Key value will be routed to the same partition. If you want to achieve strong queue consistency, you can set all messages to the same Key.

3. Consumption model

Messages sent by producers to the Kafka cluster are consumed by consumers. Generally speaking, there are two types of consumption models: Push model (PSUH) and pull model.

A message system based on push model, which records consumption status by message agent. When a message broker pushes a message to a consumer, it marks the message as consumed, but this approach does not guarantee the processing semantics of the consumption. For example, after we send the message to the consumer, the message is permanently lost if we mark it as consumed in the consumer agent because the consumption process hangs or we do not receive the message due to network reasons. If we use the producer to reply after receiving the message, the message broker needs to record the consumption status, which is not desirable. With push, the rate at which messages are consumed is completely controlled by the consumer broker, and problems arise if the consumer blocks.

Kafka uses a pull model (poll) to control the rate of consumption and the pace of consumption. Consumers can consume at any offset. For example, consumers can consume already consumed messages for reprocessing, or consume recent messages, and so on.

4. Network model

4.1 KafkaClient — single-threaded Selector

Single thread mode is suitable for small number of concurrent links, simple logic, and small amount of data.

In Kafka, both consumer and producer use the above single-threaded mode. This mode does not work well on the Kafka server, where request processing is complex, threads block, and subsequent requests fail to be processed, resulting in an avalanche of requests timed out. In the server, you should make full use of multithreading to handle the execution logic.

Kafka–server — Multithreaded Selector

In Kafka, the server uses a multithreaded Selector model. Acceptors run on a single thread, and each thread in the Selector pool registers a read event, which is responsible for the logic of the server’s read request. After a successful read, the request is placed in the message Queue shared queue. Then, in the writer thread pool, the request is fetched and processed logically. Even if a request thread is blocked, the subsequent county obtains the request from the message queue and processes it. After processing the logical processing in the writer thread, the OP_WIRTE event is registered, so the response needs to be sent to it.

5. Highly reliable distributed storage model

The key to a highly reliable model in Kafka is the duplicating mechanism, which prevents data loss even if the machine goes down.

5.1 High-performance Log Storage

Kafka all messages under a topic are distributed as partitions on multiple nodes. On a kafka machine, there is a LogSegment for each Partition, and log segments for each Partition. The LogSegment file consists of a. Index file and a. Log file, which represent the segment index file and data file respectively. The command rules of the two files are as follows: The segment file is named as the offset value of the last segment in the preceding segment file. The value is 64 bits long and 20 digits long. No digits are filled with zeros. Each LogSegment is 100 in size. The following shows indexes and logs from 900 to 1000:

Because kafka message data is so large that it takes up space and time to index all kafka messages, kafka uses a sparse index method, so that the index can be directly into memory to speed up partial queries.

So a little bit about how to read the data. If we want to read the 911 data, the first step is to find which section it belongs to, find the file it belongs to according to the dichotomy, find 0000900.index and 00000900.log, Then we go to index and find the index (9-900) =11 or the nearest index less than 11. Here we find the index [10,1367] by dichotomy. Then we go back through the physical location of the index,1367, until we find 911 entries.

Above is about if you want to find an offset process, but we didn’t need to find an offset most of the time, you just need to read in sequence, in order to read, the operating system will be added between the memory and disk page cahe, namely we usually see pre-reading operation, so when we order of read operation is very fast. The problem with Kafka, however, is that if you have too many partitions, the log will be segmented too much, and the write will be random because of the batch write. Random I/O can have a significant impact on performance. So in general Kafka cannot have too many partitions. RocketMQ addresses this by putting all the logs in a single file so that they can be sequentially written and, with some optimization, read to be close to sequential.

Think about it:

1. Why is it necessary to partition, that is to say, the topic has only one partition, can’t it? 2. Why do logs need to be segmented

5.2 Duplicate Mechanism

Kafka’s replica mechanism is that multiple server nodes copy logs from other nodes’ topic partitions. When a node in the cluster fails, the request to access the failed node will be transferred to other normal nodes (this process is usually called Reblance). Each partition of each topic in Kafka has one primary copy and zero or more replicas. The replicas keep data synchronization with the primary copy and will be replaced when the primary copy fails.

In Kafka, not all Replicas can be used to replace the master Replicas. Therefore, the leader node In Kafka maintains an ISR(In Sync Replicas) set. In this set, two conditions must be met:

  • Nodes must remain connected to ZK

  • This copy must not lag too far behind the master copy during synchronization

In addition, AR(Assigned Replicas) is used to identify the complete set of Replicas, and OSR is used to represent the replica set that is removed due to backwardness. Therefore, the formula is as follows :ISR = leader + there are not too many Replicas behind; AR = OSR+ ISR;

HW(high level) is the location of the partition that the consumer can see, and LEO is the location of the last Message in each partition’s log. HW ensures that the broker where the leader resides fails and the message can still be retrieved from the newly elected leader without message loss.

When the producer sends data to the leader, the request. Required. Acks parameter can be used to set the data reliability level:

  • 1 (default) : This means that the producer’s leader in the ISR sends the next message after receiving data successfully and receiving confirmation. If the leader goes down, data is lost.

  • 0: This means that the producer does not wait for confirmation from the broker before sending the next batch of messages. In this case, the data transfer efficiency is the highest, but the data reliability is the lowest.

  • -1: The producer must wait for all followers in the ISR to confirm receiving data before sending data at a time. This ensures the highest reliability. However, this does not guarantee data loss. For example, when only the leader is in the ISR (all other nodes are disconnected from the ZK, or none of them can catch up with the ZK), the situation becomes acks=1.

Click on the menu “wechat group” to join the group and communicate with your partners!