preface

Kafka is known to be one of the most popular programs in the world, and kafka has strong job opportunities and career prospects. Moreover, in the Internet age, having Kafka knowledge is a fast growing path for itself. Therefore, this article has collected some common Kafka interview questions, and provides detailed answers, I hope to help you!

Kafka often meet

Question 1: What is Apache Kafka?

Apache Kafka is a publish-subscribe open source message broker application. The messaging application is encoded in scala. Basically, this project is started by Apache software. Kafka’s design pattern is primarily based on transaction log design.

Question 2: What are the components in Kafka?

A: The most important elements of Kafka are:

  • Themes: A Kafka theme is a pile or group of messages.
  • Producers: In Kafka, producers publish communications and messages to Kafka topics.
  • Consumer: The Kafka consumer subscribes to a topic and also reads and processes messages from the topic.
  • Brokers: We use Kafka Brokers when managing message stores in topics.

Question 3: Explain the role of offset.

A: The messages in the partition are given a sequential ID number, which we call an offset. Therefore, to uniquely identify each message in the partition, we use these offsets.

Question 4: What is a consumer group?

A: The concept of consumer groups is unique to Apache Kafka. Basically, each Kafka consumer group consists of one or more consumers who collectively consume a set of subscribed topics.

5. What is the role of ZooKeeper in Kafka?

A: Apache Kafka is a distributed system built using Zookeeper. The main purpose of Zookeeper, though, is to establish coordination between different nodes in the cluster. However, if any node fails, we also use Zookeeper to recover from previously committed offsets, because it does periodic commit offsets.

Question 6: Can Kafka be used without ZooKeeper?

A: It is not possible to bypass Zookeeper and connect directly to the Kafka server, so the answer is no. If ZooKeeper is shut down in some way, it cannot service any client requests.

Question 8. Why is Kafka technology important?

A: Kafka has some advantages that make it important to use:

  • High throughput: We don’t need any large hardware in Kafka because it can handle high speed and large volumes of data. In addition, it can support message throughput of thousands of messages per second.
  • Low latency: Kafka can easily process these messages with extremely low latency in milliseconds, which is required for most new use cases.
  • Fault tolerance: Kafka is resistant to node/machine failures in a cluster.
  • Durability: Because Kafka supports message replication, messages are never lost. This is one of the reasons behind durability.
  • Scalability: Kafka can scale without causing any downtime in the run by adding additional nodes.

9. What are the main apis for Kafka?

A: Apache Kafka has four main apis:

  • Producers of the API
  • Consumer API
  • Streaming API
  • The connector API

Question 10: What is a consumer or user?

A: Kafka consumers subscribe to a topic and read and process messages from that topic. In addition, with the name of the consumer group, consumers label themselves. In other words, in each subscriber group, each record published to a topic is passed to a consumer instance. Ensure that the consumer instance may be in a separate process or on a separate machine.

Question 11: Explain the concept of leader and follower.

A: In each partition in Kafka, there is one server that acts as the leader and zero to multiple servers that act as followers.

12. What ensures load balancing for servers in Kafka?

A: Because the primary role of the leader is to perform all read and write requests for the partition, followers passively copy the leader. So when the leader fails, one of the followers takes over the role of leader. Basically, the whole process ensures that the load on the server is balanced.

Q13: What are the roles of replicas and ISRs?

A: Basically, the list of nodes that copy logs is a copy. Especially for specific partitions. But they are, whether or not they play a leadership role.

In addition, ISR refers to a synchronous copy. When defining an ISR, it is a copy of a set of messages synchronized with the leader.

14. Why is replication in Kafka important?

A: Because of replication, we can ensure that published messages are not lost and can be used in the event of any machine errors, program errors, or frequent software updates.

Q15: What does it mean if the replica is not in the ISR for a long time?

A: Simply put, this means that followers can’t get data as quickly as leaders can gather it.

16. What is the process for starting the Kafka server?

A: Initializing the ZooKeeper server is an important step, because Kafka uses ZooKeeper, so the process for starting the Kafka server is:

  • The ZooKeeper server needs to be started:>bin/zooKeeper-server-start.sh config/zooKeeper.properties
  • Next, start the Kafka server:>bin/kafka-server-start.sh config/server.properties

Question 17: When does a QueueFullException occur in a producer?

A: QueueFullException usually occurs whenever a Kafka producer tries to send a message as a broker at a speed that cannot be processed at the time. However, in order to cooperatively handle the increased load, users need to add enough agents because producers will not block it.

18. Explain the Kafka Producer API.

A: The APIS that allow applications to publish record streams to one or more Kafka topics are known as the Producer API.

19. What are the main differences between Kafka and Flume?

A: The main differences between Kafka and Flume are:

  • Tool type

    • Apache Kafka – Kafka is a universal tool for multiple manufacturers and consumers.
    • Apache Flume — Flume is considered a tool for specific applications.
  • replication

    • Apache Kafka – Kafka can copy events.
    • Apache Flume — Flume does not copy events.

20. Is Apache Kafka a distributed streaming platform? If so, what can you do with it?

A: Kafka is definitely a streaming platform. It can help:

  • Easy push record
  • A large number of records can be stored without any storage problems
  • It can also process records as they come in.

21. What can you do with Kafka?

A: It can be executed in a number of ways, for example:

  • To transfer data between two systems, we can use it to build real-time data flow pipelines.
  • In addition, Kafka can be used to build a real-time streaming platform that can react quickly to data.

22. What is the purpose of the retention period in a Kafka cluster?

A: The retention period preserves all published records in the Kafka cluster. It does not check if they have been consumed. In addition, records can be discarded by using configuration Settings for retention periods. Plus, it frees up some space.

23. What is the maximum number of messages that Kafka can receive?

A: The maximum message size that Kafka can receive is about 1,000,000 bytes.

24. What are the types of traditional messaging methods?

A: Basically, there are two traditional messaging methods, such as:

  • Queuing: This is a way in which the consumer pool can read messages from the server and each message is forwarded to one of them.
  • Publish-subscribe: In publish-subscribe, messages are broadcast to all consumers.

25. What does ISR stand for in a Kafka environment?

A: ISR stands for synchronous copy. These are typically classified as a set of message copies that are synchronized as leaders.

26. What is geographical replication in Kafka?

A: For our cluster, Kafka MirrorMaker provides geographic replication. Basically, messages are replicated across multiple data centers or cloud regions through MirrorMaker. Therefore, it can be used for backup and restore in active/passive scenarios; Data can also be placed closer to the user, or data location requirements can be supported.

Question 27. Explain what multi-tenancy is?

A: We can easily deploy Kafka as a multi-tenant solution. However, by configuring topics to generate or consume data, you can enable multi-tenancy. In addition, it provides operational support for quotas.

28. What does a consumer API do?

A: An API that allows applications to subscribe to one or more topics and process the stream of records generated to them is called a consumer API.

Question 29: Explain what the flow API does?

A: An API that allows an application to act as a stream processor, which also uses the input streams of one or more topics and generates an output stream to one or more output topics, and effectively converts the input streams into output streams. We call this a stream API.

30. What does the connector API do?

A: An API that allows you to run and build reusable producers or consumers that connect Kafka topics to existing applications or data systems is what we call a connector API.

Question 31: Explain what a producer is?

A: The primary role of producers is to publish data on topics of their choice. Basically, its job is to select records to assign to partitions within a topic.

Question 32: Compare RabbitMQ with Apache Kafka

A: Another alternative to Apache Kafka is RabbitMQ. So, let’s compare the two:

  • function

    • Apache Kafka – Kafka is distributed, persistent, and highly available, where data is shared and replicated
    • RabbitMQ has no such functionality
  • Performance speed

    • Apache Kafka – 100,000 messages per second.
    • RabbitMQ – 20000 messages per second.

Question 33: Compare traditional queue systems with Apache Kafka

A: Let’s compare the functionality of a traditional queue system to Apache Kafka:

  • Messages reserved

    • Traditional queue systems – it usually removes messages from the end of the queue after processing is complete.
    • In Apache Kafka, messages persist even after processing. This means that messages in Kafka are not deleted when a consumer receives a message.
  • Logic-based processing

    • Traditional queuing systems do not allow similar message or event processing logic.
    • Apache Kafka allows logic to be processed based on similar messages or events.

Question 34: Why use Apache Kafka cluster?

A: To overcome the challenges of collecting large amounts of data and analyzing collected data, we needed a message queuing system. Hence Apache Kafka. Its benefits are:

  • You can track Web activity simply by storing/sending events for real-time processing.
  • From this point, we can raise alarms and report operational indicators.
  • In addition, we can convert the data to a standard format.
  • In addition, it allows continuous processing of the topic’s stream data.
  • Due to its widespread use, it kills rival products such as ActiveMQ, RabbitMQ, etc.

Question 35: Explain the term “Log Anatomy”

A: We treat logs as partitions. Basically, the data source writes messages to the log. One of its advantages is that at any one time, one or more consumers are reading data from the log of their choice. The chart below shows that the data source is writing a log and the user is reading the log at a different offset.

36. What is a data log in Kafka?

A: We know that in Kafka, messages persist for quite a long time. In addition, consumers can also read at their own convenience. However, it is possible that if Kafka is configured to hold messages for 24 hours, and consumers may be down for more than 24 hours, consumers may lose these messages. However, we can still read these messages from the last known offset, but only if the consumer’s partial downtime is only 60 minutes. Moreover, Kafka doesn’t keep up with what consumers are reading about a topic.

Question 37: Explain how to tune Kafka for best performance.

A: So, the way to tune Apache Kafka is to tune several of its components:

  • Adjust Kafka producers
  • Kafka proxy tuning
  • Tuning Kafka consumers

Question 38: Apache Kafka is defective

A: The limitations of Kafka are:

  • There is no complete monitoring toolset
  • Message tuning issues
  • Wildcard theme selection is not supported
  • The speed problem

Question 39: List all Apache Kafka services

A: Apache Kafka does:

  • Add and remove Kafka themes
  • How do I modify the Kafka theme
  • How to turn it off
  • Mirror data between Kafka clusters
  • Find the consumer’s location
  • Expand your Kafka cluster
  • Automatic data migration
  • Exit the server
  • The data center

Question 40: Explain the Apache Kafka use case?

A: Apache Kafka has many use cases, such as:

  • Kafka indicators
  • Kafka can be used to manipulate monitoring data. In addition, to generate a centralized feed of operational data, it involves aggregating statistics from distributed applications.
  • Kafka log aggregation
  • Collect logs from multiple services in the organization.
  • Stream processing
  • Kafka’s strong durability is very useful in flow processing.

Question 41: Some of the most notable uses of Kafka.

A: Netflix, Mozilla, Oracle

Question 42: Characteristics of Kafka streams.

A: Some of the best features of Kafka streams are

  • Kafka Streams is highly extensible and fault tolerant.
  • Kafka is deployed to containers, VMS, bare metal machines, and the cloud.
  • We can say that Kafka flows are equally viable for small, medium, and large use cases.
  • In addition, it is fully integrated with Kafka for security.
  • Write standard Java applications.
  • Process the semantics all at once.
  • Also, there is no need for a separate processing cluster.

43. What does Kafka mean by stream processing?

A: The type of continuous, real-time, concurrent and record-by-record processing we call Kafka stream processing.

Question 44: What are the types of system tools?

A: There are three types of system tools:

  • Kafka migration tool: It helps migrate agents from one version to another.
  • Mirror Maker: The Mirror Maker tool helps provide mirrors from one Kafka cluster to another.
  • Consumer check: Displays subject, partition, and owner for the specified subject set and consumer group.

Question 45: What are replication tools and their types?

A: Replication tools are provided for increased persistence and availability. The type of

  • Creating the Theme Tool
  • List topic tool
  • Adding the Partitioning Tool

Question 46: What is the importance of Java in Apache Kafka?

A: To meet the high processing rate requirements of the Kafka standard, we can use the Java language. In addition, Java provides good community support for Kafka’s consumer customers. So, we can say that implementing Kafka in Java is the right choice.

Question 47: Describe an optimal feature of Kafka.

A: Kafka’s best feature is “a wide variety of use cases.”

This means that Kafka can manage a wide variety of use cases that are common to data lakes. Examples include log aggregation, Web activity tracking, and so on.

Question 48: Explain the term “topic replicators”.

A: It is important to consider topic replication when designing Kafka systems.

50. What is the guarantee Kafka offers?

A: Producers send messages to a particular topic partition in the same order.

  • In addition, the consumer instance views the records in the order they are stored in the log.
  • In addition, we can tolerate up to n-1 server failures without losing any records submitted to the log.

Write in the last

The first line of Internet factory, the knowledge of the interview can be more than Kafka, of course, more than have a technical point, for their own interview is also a good thing; So for Java members, the author sorted out a complete set of offline Internet enterprises interview topics; Kafka, Mysql, Tomcat, Docker, Spring, MyBatis, Nginx, Netty, Dubbo, Redis, Netty, Spring Cloud, distributed, high concurrency, performance tuning, microservices and other architecture technologies

Friends who need this interview question can follow the public account below to receive the complete interview topic document

Below are some screenshots of the interview questions