I’m Little Xia Lufei. Learning shapes our lives. Technology changes the world.

The article directories

  • What is a Kafka
    • The application of Kafka
    • Kafka as a messaging system
    • Core API
  • The basic concept of Kafka
    • Messages And Batches
    • Topics And Partitions
    • Producers And Consumers
      • Producers
      • Consumers
    • Brokers And Clusters
  • summary
  • A little attention, won’t get lost

What is a Kafka

Kafka is a distributed streaming platform with three key capabilities:

  • Subscribe to a publication record flow similar to that in an enterpriseThe message queueEnterprise messaging systems
  • Store record streams in a fault-tolerant manner
  • Real-time recording stream

The application of Kafka

  1. As a messaging system
  2. As a storage system
  3. As a stream processor

Kafka can build streaming data pipes that reliably fetch data between systems or applications. Build streaming application transmission and response data.

Kafka as a messaging system

As a messaging system, Kafka has three basic components

  • Producer: indicates the client that publishes messages
  • Broker: A client that receives and stores messages from producers
  • Consumer: Consumers read messages from the Broker

In large systems, where there are many subsystems to interact with and messaging is required, you will find both the source system (the sender of the message) and the destination system (the receiver of the message). In order to transfer data in such a messaging system, you need to have the right data pipes.



This kind of data interaction can seem confusing, and if we use a messaging system, the system becomes simpler and cleaner.

  • Kafka runs as a cluster on servers in one or more data centers
  • The directory where the Kafka cluster stores message records is called topics
  • Each message record contains three elements: key, value and Timestamp.

Core API

Kafka has four core apis

  • The Producer API, which allows an application to send message records to one or more topics
  • The Consumer API, which allows an application to subscribe to one or more topics and process the record stream generated for it
  • The Streams API, which allows an application to act as a stream processor, consuming input Streams from one or more topics and generating output Streams for them, effectively transforming input Streams into output Streams.
  • Connector API, which allows you to build and run available producers and consumers that connect Kafka topics to existing applications or data systems. For example, a connector for a relational database might capture all changes to a table

The basic concept of Kafka

Messages And Batches

Kafka’s basic unit of data is called a message. To reduce network overhead and improve efficiency, multiple messages are written to the same Batch.

Topics And Partitions

Kafka messages are categorized by Topics, and a topic can be divided into Partitions, each of which is a commit log. Messages are appended to the partition and then read in first-in, first-out order. Kafka implements data redundancy and scalability through partitions, which can be distributed across different servers, meaning that a Topic can span multiple servers to provide greater performance than a single server.

Because a Topic contains multiple partitions, sequential messages cannot be guaranteed across the entire Topic, but sequential messages can be guaranteed within a single partition.

Producers And Consumers

Producers

The producer is responsible for creating the message. In general, producers distribute messages evenly across all partitions of a topic, regardless of which partition the message is written to. If we want to write messages to a specified partition, we can do so through a custom partition.

Consumers

Consumers are part of consumer groups, and consumers are responsible for consuming messages. Consumers can subscribe to one or more topics and read them in the order in which messages are generated. Consumers distinguish between read messages by examining their offsets. The offset is an increasing value that Kafka adds to when creating a message, and is unique for each message within a given partition. The consumer stores the last read offset of each partition on Zookeeper or Kafka, and if the consumer shuts down or restarts, it can retrieve the offset to ensure that the read state is not lost.

A partition can only be read by one consumer in the same consumer group, but can be read by multiple consumers composed of different consumer groups. When consumers in multiple consumer groups read the same topic together, they do not affect each other.

Brokers And Clusters

A separate Kafka server is called the Broker. The Broker receives messages from the producer, sets offsets for the message, and commits the message to disk for saving. The Broker serves consumers and responds to requests to read partitions by returning messages that have been committed to disk.

A Broker is part of a Cluster. Each cluster elects a Broker as a cluster Controller, which manages the work, including assigning partitions to brokers and monitoring brokers.

In a cluster, a Partition is subordinate to a Broker, which is called the Leader of the Partition. A partition can be assigned to multiple Brokers, at which point partition replication occurs. This replication mechanism provides message redundancy for partitions so that if one Broker fails, other brokers can take over leadership.

summary

To summarize Kafka’s workflow, use a diagram:

A little attention, won’t get lost