The interviewer says I don’t know enough, so just learn

What can Kafka do

  • Messaging system: A messaging system is responsible for transferring data from one application to another. Applications only need to focus on the data, not how the data is transferred between two or more applications. Distributed messaging is based on reliable message queues that deliver messages asynchronously between client applications and messaging systems. There are two main messaging patterns: point-to-point delivery and publish-subscribe. Most messaging systems use a publish-subscribe model. Kafka is a publish-subscribe model.
  • Storage system: Kafka persists messages to disk.
  • Streaming platform: Kafka not only provides a reliable source of data for every popular streaming framework, but also provides a complete library of streaming classes, such as Windows, joins, transforms, and aggregations

What kafka

A typical Kafka architecture consists of producers, brokers, consumers, and a ZooKeeper cluster

  • ZooKeeper provides reliable metadata stores for Kafka, such as Topic/ partition metadata, Broker data, ACL information, and more. ZooKeeper also acts as a leader in Kafka to update topology changes in the cluster; According to notifications provided by ZooKeeper, producers and consumers find out if there are any new brokers or Broker failures in the entire Kafka cluster. Most O&M operations, such as capacity expansion and partition migration, interact with ZooKeeper.

  • An application that publishes new messages to a topic

  • The Broker is responsible for storing received messages to disk

  • The Consumer subscribes to an application for new messages from a topic.

    • Consumer Offset. Characterizing consumer consumption progress, each consumer has its own consumer displacement.
    • Consumer Group: Consumer Group. A group of multiple consumer instances consuming multiple partitions simultaneously to achieve high throughput.

Correction: Kafka has recently stopped using zooKeeper

  1. First of all, Kafka is a distributed system from a cluster operation and maintenance perspective. But it relies on another open-source distributed system, which is at the heart of Kafka’s system itself. This requires cluster r&d and maintenance personnel to understand the two open source systems at the same time, and the existence of ZooKeeper increases the cost of operation and maintenance.
  2. From a cluster size perspective, a core metric that limits the size of a Kafka cluster is the number of partitions that a cluster can host. The number of partitions in a cluster affects a cluster in two aspects: metadata volume stored on ZooKeeper and controller change efficiency. . Under the current architecture, a single Kafka cluster cannot support millions of partitions.

Topic vs. Partition

Messages in Kafka are grouped by Topic, a logical concept that can be subdivided into multiple partitions, each belonging to a single Topic, often referred to as topic-partitions. Different partitions under the same topic contain different messages. Partitions can be regarded as an append Log file at the storage level, and messages are assigned a specific offset when they are appended to the partition Log file.

Please add the following text and link at the end of the article: This article is participating in the “Gold Digging Booklet free learning!” Event, click to view details of the event