Kafka is a high-throughput distributed publish-subscribe messaging system that processes all action flow data in consumer-scale websites. This action (web browsing, searching and other user actions) is a key factor in many social functions on the modern web. This data is usually addressed by processing logs and log aggregation due to throughput requirements. This is a viable solution for logging data and offline analysis systems like Hadoop, but with limitations that require real-time processing. Kafka is designed to unify online and offline message processing through Hadoop’s parallel loading mechanism, and to provide real-time messaging across clusters.

Big data has exploded in recent years, and Kakfa has become the data pipeline of choice — and a favorite of interviewers — with its features of high reliability, high throughput, high availability and scalability.

A few days ago, some of my friends asked me about the interview, but they were confused by Kakfa. It was a pity that I passed. It’s scary to think that if I face the interviewer in the future, will I also be bombarded? So these days, I have collected Kakfa questions most frequently asked during job interviews. I have divided them into basic + advanced + advanced sections, and I will take Kakfa one step at a time.

01 Kakfa interview bombing 44 questions

1.1 Kakfa Basic Interview

  • 1. What are the uses of Kafka? What are the usage scenarios?

  • 2. What are ISR and AR in Kafka? What is scaling of ISR

  • 3. What do HW, LEO, LSO and LW in Kafka stand for respectively?

  • 4. How are messages sequential in Kafka?

  • 5. What are the partitions, serializers, and interceptors in Kafka? What is the order in which they are processed?

  • 6. What is the structure of the Kafka producer client?

  • 7. How many threads are used to process the Kafka producer client? What are they?

  • 8. What were the design flaws of Kafka’s old Version of Scala’s consumer client?

  • 9. Is the statement “if the number of consumers in the consumer group exceeds the topic partition, then some consumers will not consume the data” true? If so, is there any way to hack?

  • 10. What are the circumstances that lead to repeated consumption?

  • 11. What are the scenarios that cause information leakage?

  • 12.KafkaConsumer is not thread-safe, so how to implement multithreaded consumption?

  • 13. Describe the relationship between consumers and consumer groups

  • 14. What logic is implemented behind kafka after you create (delete) a topic using kafka-topics.sh?

  • 15. Can the number of topic partitions be increased? And if so, how? And if not, why?

  • 16. Can the number of topic partitions be reduced? If so, how? And if not, why?

  • 17. How to choose an appropriate number of partitions when creating a topic?

1.2 Kakfa Advanced Interview

  • 1. What internal topics Kafka currently has, and what are their features? What is the role of each?

  • 2. What is a priority copy? What special function does it have?

  • 3. Where does Kafka have the concept of partition allocation? Describe the general process and principles

  • 4. Describe the log directory structure of Kafka

  • 5. What index files are in Kafka?

  • 6. If I specify an offset, how does Kafka find the corresponding message?

  • 7. If I specify a timestamp, how does Kafka find the corresponding message?

  • 8. Talk about your understanding of Kafka’s Log Retention

  • 9. Tell us what you mean by Kafka’s Log Compaction

  • 10. Talk about your understanding of Kafka’s underlying storage

  • 11. Talk about how Kafka works with latency

  • Talk about what the Kafka controller does

  • 13. What were the design flaws of Kafka’s old Version of Scala’s consumer client?

  • 14. What is the principle of consumption rebalancing? (Tip: Consumer coordinator and consumer group coordinator)

  • 15. How is idempotent implemented in Kafka?

1.3 Kakfa Advanced Interview

  • 1. How are transactions implemented in Kafka?

  • 2. What does invalid copy mean? What are the measures to deal with it?

  • 3. Evolution process of HW and LEO in multiple copies

  • 4. What reliability improvements have Kafka made? (HW, LeaderEpoch)

  • 5. Why does Kafka not support read/write separation?

  • 6. How to implement delay queue in Kafka

  • 7. How to implement dead letter queue and retry queue in Kafka?

  • 8. How to audit messages in Kafka?

  • 9. How do message traces work in Kafka?

  • 10. How do you calculate Lag? (Note the difference between read_unCOMMITTED and READ_COMMITTED)

  • 11. What metrics should Kafka focus on?

  • 12. What is Kafka designed for such high performance?

02 Summary: Draw a Kakfa Framework Outline Brain Map (XMind)

In fact, Kafka, can ask too many questions, grilled for a few days, finally screened out 44 questions: basic 17 questions, advanced 15 questions, advanced 12 questions, all straight poke pain points, do not know if you do not worry about the answer, and can answer a few?

If you can’t remember what Kafka is, take a look at my hand-drawn summary brain map (Xmind can’t be uploaded, the article uses the image version) for the overall structure of the comb, and then answer our 44 questions (rest easy, the answer has been sorted out, to help you quickly get Kakfa! * *).