2021 the latest Kafka interview questions and answer analysis, easy to handle large factory interviewers

preface

Recently in view of the Internet company interview asked knowledge points, summed up the Java programmer interview involves most of the interview questions and answers to share with you, I hope to help you review before the interview and find a good job, but also save you on the Internet to search for information time to learn.

Content covers: Java, MyBatis, ZooKeeper, Dubbo, Elasticsearch, Memcached, Redis, MySQL, Spring, SpringBoot, SpringCloud, RabbitMQ, Kafka, Linux and other technology stacks.

Full version Java interview questions address: Java backend questions integration

1, How to get the list of topic topics

bin/kafka-topics.sh –list –zookeeper localhost:2181

What is the producer and consumer command line?

Producers post messages on topics:

Bin /kafka-console-producer.sh –broker-list 192.168.43.49:9092 — topichello-kafka Configuration of Listeners.

Each new line is then entered with a new message.

Consumers receive the message:

bin/kafka-console-consumer.sh –zookeeper localhost:2181 –topicHello-Kafka –from-beginning

3. Is consumer push or pull?

The question Kafka initially considered was whether customer should pull the message from brokes or brokers should push the message to the consumer, i.e. pull or push. In this respect, Kafka follows a traditional design common to most messaging systems: the producer pushes messages to brokerconsumer and pulls messages from the broker. Some messaging systems, such as Scribe and Apache Flume, use a push pattern to push messages downstream to consumers. This has both advantages and disadvantages: It is up to the broker to determine the rate at which the message is pushed, and it is not easy to handle consumers with different consumption rates. Messaging systems are designed to allow consumers to consume messages as quickly as possible, but unfortunately, in push mode, when the rate at which the broker pushes a message is much faster than the rate at which the consumer consumes it, the consumer crashes. In the end Kafka opted for the traditional pull model.

Another benefit of the Pull pattern is that consumers can decide whether to Pull data from the broker in bulk.

The Push pattern must decide whether to Push each message immediately or in batches after caching without knowing the consumption power and strategy of downstream consumers. If you push at a lower rate to avoid a consumer crash, you might waste pushing fewer messages at a time.

In the Pull mode, consumers can decide on these strategies based on their spending power. A drawback of Pull is that if the broker has no messages available for consumption, it will cause consumers to poll in the loop until new messages reach t. To avoid this, Kafka has a parameter that lets the consumer block know when a new message arrives (or until the number of messages reaches a certain number so that they can be sent in batches).

4. Talk about how Kafka maintains consumption status tracking

Most messaging systems maintain a record of how messages are consumed at the broker side: the broker marks a message immediately after it is sent to a consumer or waits for a notification from a customer to mark it. This also allows messages to be deleted immediately after consumption to reduce space usage.

But is there anything wrong with that? If a message is marked as consumed immediately after it is sent, the message is lost if the consumer fails to process the message (such as a program crash). To solve this problem, many messaging systems provide additional functionality: a message is only marked as sent when it has been sent, and only marked as consumed when it has been notified that the consumer has been successfully consumed. This solves the problem of message loss, but it creates a new problem. First, if the consumer processes the message successfully but fails to send a response to the broker, the message will be consumed twice. In the second case, the broker must maintain the state of each message, locking the message, changing the state, and releasing the lock each time. Again, not to mention maintaining large amounts of state data, such as the fact that if a message is sent and no notification of a successful consumption is received, it remains locked, Kafka takes a different approach. Topics are divided into partitions, each of which is consumed by only one consumer at a time. This means that the position in the log for each partition consumed message is just a simple integer: offset. This makes it easy to mark the consumption status of each partition, just an integer. This makes it easy to track consumption status. This brings another benefit: the consumer can reset the offset to an older value to re-consume the old message. This may seem strange to a traditional messaging system, but it is very useful. Who says a message can only be consumed once?

5. Talk about master/slave synchronization

Full version Java interview questions address: Java backend questions integration

6. Why do you need a messaging system? Can’t mysql meet the requirements?

(1) Decoupling:

Allow you to extend or modify both processes independently, as long as they adhere to the same interface constraints.

(2) Redundancy:

Message queues persist data until it has been fully processed, thus avoiding the risk of data loss. In the insert-retrieve-delete paradigm used by many message queues, your processing system needs to make it clear that a message has been processed before it can be removed from the queue, ensuring that your data is stored safely until you are done using it.

(3) Expansibility:

Because message queues decouple your processing, it is easy to increase the frequency with which messages are queued and processed, simply by adding additional processing.

(4) Flexibility & peak processing capacity:

Applications need to continue to function when traffic surges, but such bursts of traffic are rare. It would be a huge waste to invest resources in being able to handle these spikes. Using message queues enables key components to withstand sudden access pressures without completely collapsing under sudden overload of requests.

(5) Recoverability:

The failure of a component does not affect the entire system. Message queuing reduces coupling between processes, so that even if a process that processes messages dies, messages that are queued can still be processed after the system recovers.

(6) Order guarantee:

In most usage scenarios, the order in which data is processed is important. Most message queues are inherently sorted and ensure that the data will be processed in a particular order. (Kafka guarantees the order of messages within a Partition)

(7) Buffer:

It helps to control and optimize the speed at which data flows through the system, resolving the inconsistency between the processing speed of production and consumption messages.

(8) Asynchronous communication

Many times, users do not want or need to process messages immediately. Message queues provide asynchronous processing, allowing users to queue a message without processing it immediately. Put as many messages on the queue as you want, and then process them as needed.

7. What is the function of Zookeeper for Kafka?

Zookeeper is an open source, high-performance coordination service for distributed applications of Kafka.

Zookeeper is mainly used to communicate between different nodes in the cluster. In Kafka, it is used to commit offsets, so if the node fails in any case, it can take the offset from the previously committed offsets. In addition, it performs other activities such as: Leader detection, distributed synchronization, configuration management, identification of when new nodes leave or connect, clustering, real-time status of nodes, and more.

8. What are the three transaction definitions for data transfer?

The same three transaction definitions as MQTT.

(1) At most once: the message will not be sent repeatedly. At most, it will be transmitted once, but it may not be transmitted once

(2) At least once: the message will not be missed and transmitted at least once, but it may also be transmitted repeatedly.

Exactly once, every message is transmitted once and only once, which is Exactly what is expected.

9. What two conditions does Kafka have to determine if a node is alive?

(1) The node must be able to maintain the connection with ZooKeeper. ZooKeeper checks the connection of each node through the heartbeat mechanism.

(2) If the node is a follower, it must be able to synchronize the leader’s write operations in time without too long delay.

10. There are three key differences between Kafka and traditional MQ messaging systems

(1).Kafka persists logs. These logs can be read repeatedly and retained indefinitely.

(2).Kafka is a distributed system: it operates in a cluster mode and can be flexibly scaled to improve fault tolerance and high availability by replicating data internally.

(3).Kafka supports real-time streaming processing.

11. Talk about the three mechanisms of Kafka ack

Request. Required. Acks has three values 0 1-1 (all)0: The producer does not wait for the broker to ack. This has the lowest latency but the weakest storage guarantee.

1: The server will wait for the ACK value of the leader copy to confirm receipt of the message and send the ACK. However, if the leader fails to ensure whether the replication is complete, data will be lost. -1(all) : the server waits for all copies of the followers to receive data before sending an ACK sent by the leader. In this way, data will not be lost

12, How do consumers not automatically submit offsets by the application?

Set auto.mit.offset to false and commitSync() or asynchronously commit commitAsync() after processing a batch of messages

13. How to solve the problem of consumer failure and live lock?

A “live lock” occurs when it continues to send a heartbeat but does not process it. To prevent consumers from holding partitions all the time in this case, we use the active detection mechanism of max.poll.interval.ms. On this basis, if you call polls more frequently than the maximum interval, the client will voluntarily leave the group so that other consumers can take over the partition. When this happens, you will see the offset commit fail (CommitFailedException thrown by calling commitSync ()). This is a security mechanism that ensures that only active members can submit offsets. So to stay in the group, you have to keep calling poll.

The consumer provides two configuration Settings to control the poll loop:

Max.poll.interval. ms: Increasing the poll interval provides more time for consumers to process the messages returned (calling poll(long) usually returns a batch of messages). The disadvantage is that a larger value will delay group rebalancing.

Max.poll. records: This setting limits the number of messages returned per poll call, making it easier to predict the maximum number of messages to be processed per poll interval. By adjusting this value, the poll interval can be reduced, and the rebalanced grouping can be reduced to unpredictable times for message processing. These options are not sufficient. The recommended way to handle this situation is to move the message processing to another thread and let the consumer continue to call poll. However, care must be taken to ensure that the submitted offset does not exceed the actual location. In addition, you must disable auto-commit and manually commit offsets for records only after the thread has finished processing (up to you). Also note that you need the pause partition to receive no new messages from the poll and let the thread finish processing previously returned messages (if your processing power is slower than pulling messages, creating a new thread will cause your machine to run out of memory).

14. How to control the location of consumption

Kafka uses seek(TopicPartition, Long) to specify a new consumption location. Special methods are also available for finding the earliest and latest offsets reserved by the server (seekToBeginning(Collection) and seekToEnd(Collection))

15. If Kafka is distributed (not stand-alone), how can messages be consumed sequentially?

The unit of Kafka distribution is partition, and the same partition is organized by a write Ahead log, so the order of FIFOS can be guaranteed. The order between different partitions cannot be guaranteed. However, most users can be defined by message key, because messages with the same key are guaranteed to be sent only to the same partition.

When sending a message in Kafka, you can specify topic, partition, and key parameters.

Partiton and Key are optional. If you specify a partition, then all messages are sent to the same partition, which is ordered. And on the consumer side, Kafka guarantees that a partition can only be consumed by one consumer. Or if you specify a key (such as an order ID), all messages with the same key will be sent to the same partition.

16. What is kafka’s high availability mechanism?

This question is systematic. It answers the characteristics of Kafka system, the relationship between leader and follower, and the order in which messages are read and written.

17. How does Kafka reduce data loss

18. How does Kafka not consume duplicate data? For example, deduction, we can’t double deduction.

In fact, we still have to combine business to think, I give a few ideas here:

For example, if you want to write data to the library, you first check the primary key, if the data is already available, you do not insert, update ok.

If you write Redis, that’s fine, it’s set every time anyway, natural idempotent.

For example, if you are not in the above two scenarios, it is a little more complicated. You need to ask the producer to add a globally unique ID, such as order ID, when sending each piece of data. Then when you consume the data here, you can check it in the Redis first according to this ID. If it hasn’t been consumed, you process it, and then that ID is written Redis. If it’s consumed, don’t process it. Make sure you don’t process the same message twice.

Such as database based unique key to ensure that duplicate data does not repeatedly insert multiple. Because of the unique key constraint, duplicate inserts only generate errors and do not cause dirty data in the database. Full version Java interview questions address: Java backend questions integration