Kafka operates in Python

What is Kafka

Kafka is a distributed streaming system that publishes or subscribe messages like a message queue. Distributed provides fault-tolerant, concurrent processing of messages

The basic concept of Kafka

Kafka runs on a cluster of one or more servers. Kafka stores messages in a topic. Each message contains a key, a value, and a timestamp.

Kafka has the following basic concepts:

Producer-message Producer, a client that sends messages to the Kafka Broker.

Consumer – Message Consumer, a message user, is responsible for consuming messages on the Kafka server.

A Topic is defined and configured on a Kafka server to establish a subscription relationship between a Producer and a Consumer. The producer sends messages to the specified Topic, and the messager consumes messages from the Topic.

Partition – Message Partition. A topic can be divided into multiple partitions

A partition is an ordered queue. Each message in the partition is assigned an ordered ID (offset).

Broker – A Kafka server is a Broker. A cluster consists of multiple brokers. A broker can hold multiple topics.

Consumer Group – A Group of consumers that is used to Group similar consumers. Each consumer belongs to a specific consumer group. Multiple consumers can jointly send messages under a Topic, and each consumer consumes part of the messages. These consumers form a group with the same group name, which is usually called a consumer cluster.

Offset – The Offset of the message in the partition. Each message has a unique offset on the partition, which the sender can specify to specify the message to consume.

Three, Kafka distributed architecture

As shown in the figure above, Kafka stores messages in a topic into different partitions. If there is a key value, the message is classified according to the key value and there are different partiition. If there is no key value, the message is divided into different partitions according to the Round Robin mechanism. By default, the key determines which partition a message will be stored on.

The sequence of messages in a partition is an ordered sequence of messages. Kafka uses an offset on a partition to specify the location of the message. A partition of a topic can be consumed by only one consumer in a consumer group. It is not allowed for multiple consumers to consume data on the same partition. However, a consumer can consume data on multiple partitions.

Kafka provides a backup of partition data by copying the partition data to different brokers. Each partition has a broker as the leader and several brokers as followers. All data reads and writes are done through the leader’s server, and the leader replicates the data between brokers.

In the figure above, for Partition 0, broker 1 is its leader, and broker 2 and broker 3 are its followers. For Partition 1, Broker 2 is its leader, and broker 1 and broker 3 are followers.

In the figure above, when a Client (Producer) writes data to Partition 0, it writes to the leader Broker 1, which then copies the data to the follower Broker 2 and Broker 3.

In the figure above, when the Client writes to Partition 1, it writes to Broker 2, which is the Leader of Partition 1. Broker 2 then copies the data to follower Broker 1 and Broker 3.

The topic in the figure above has three partitions, and the reads and writes to each partition are handled by different brokers, thus improving the overall throughput.

Kafka – Python implements producer consumer

Kafka – Python is a Python Kafka client that can be used to send and consume messages to a Kafka topic.

The experiment will implement a producer and a consumer, with the producer sending messages to Kafka and the consumer consuming messages from the topic. The structure is as follows:

Producer code

Consumer code

Next, create the Test Topic

With two Windows open, we run Producer in Window1, as follows

Run Consumer in Windows 2, as follows

You can see that the Consumer in Windows 2 successfully reads the data written by the producer

Fifth, the consumption group to achieve fault tolerance mechanism

This experiment will demonstrate the fault-tolerant characteristics of the consumer group. In this experiment, we will create a topic with two partitions, and two consumers. The two consumers will consume data from the same topic. The structure is shown below

Part of the code of producer is the same as experiment 1, so it will not be repeated here. A consumer needs to specify the consumer group to which it belongs. The code is as follows

Next we create a topic named test and set the number of partitions to 2

There are three Windows open, one for producer and two for Consumer.

The output of the two Windows running Consumer is as follows:

You can see that when two consumers are running at the same time, they consume data from different partitions. Windows 1 consumer consumes data from Partition 0, Windows 2 consumer consumes data from partition 1.

We tried to turn off consumer in Windows 1, and you can see the following results

Initially, the Consumer in Windows 2 only consumes the data in Partition 1. When the Consumer in Windows 1 exits, the Consumer in Windows 2 also consumes the data in Partition 0.

6. Offset management

Kafka allows a consumer to submit the offset of the currently consumed message to Kafka, so that if the consumer exits due to an exception, the next startup will continue to consume the message from the offset logged last time.

The structure of this experiment is the same as that of experiment 1. One producer and one consumer are used, and the number of partitions in the test topic is set to 1.

The code for Producer is the same as in experiment 1, so I won’t repeat it here. The code for the Consumer is slightly modified to print the offset for the next message to be consumed. The consumer code is as follows

Start the producer in one window and the consumer in the other. The output for consumer is as follows

You can try to quit the consumer and start the Consumer again. On each reboot, the consumer starts consuming from the message offset=98.

Change the code for the Consumer to submit the offset back to Kafka after the consumer consumes each message

Start the consumer

As you can see, the Consumer starts consuming from the message offset=98. At offset=829, we press Ctrl+C to exit the Consumer.

Let’s start consumer again

You can see that after the restart, the Consumer continues to consume messages from the offset logged last time. Each time the consumer restarts, the consumer picks up where he left off.

Kafka operates in Python