preface

Hello, everyone. I am a little boy picking up snails. Gold three silver four is coming, sorted out ten very classic message queue interview questions, read the interview must be helpful, everyone refuelling together ha

  1. What is a message queue
  2. Application scenarios of message queues
  3. How does message queuing solve message loss
  4. How message queues ensure that messages are sequenced.
  5. Is it possible for message queues to repeat consumption? How do you idempotent?
  6. How to deal with message backlog in message queue
  7. Message queue technology selection, Kafka or RocketMQ or RabbitMQ
  8. How can messaging middleware be highly available?
  9. How is data consistency guaranteed and transaction messages implemented
  10. If you were to write a message queue, how would you architecture it?
  • Public number: a boy picking up snails

1. What is message queue

You can think of a message queue as a component that uses queues to communicate. Its essence is a repeater, including the process of sending messages, storing messages and consuming messages. The simplest message queue model is as follows:

Message Queue (MQ) refers to Message middleware. Popular open source Message middleware include RabbitMQ, RocketMQ and Kafka.

2. What are the usage scenarios of message queues?

Sometimes interviewers will ask you the other way around why you use a message queue. You can answer the following points:

  1. The application of decoupling
  2. Traffic peak clipping
  3. Asynchronous processing
  4. Message communication
  5. The remote invocation

2.1 Application decoupling

For example, a common business scenario is to place an order to deduct inventory. After the user places an order, the order system notifies the inventory system of deduction. Traditionally, the order system calls the inventory system directly:

  • If the inventory system is not accessible, the order will fail, and there is a coupling between the order and the inventory system
  • If the business adds a marketing points service, the order system downstream will need to be expanded, and if more and more downstream systems are added in the future, the order system code will need to be changed frequently

How to solve this problem? Message queues can be introduced

  1. Order system: after the user places an order, the message is written to the message queue and the order is returned successfully
  2. Inventory system: subscribe to order message, obtain order information, inventory operation.

2.2 Flow peak cutting

Traffic peak clipping is also a common scenario for message queuing. When we do a split-kill implementation, we need to avoid the risk that traffic will explode and overwhelm the application. You can queue messages before the application.

Suppose the seckill system can process at most 2k requests per second, but 5K requests come in every second, it can queue up messages, and the seckill system can pull 2k requests from the message queue every second.

Some partners are worried about backlogs,

  • First of all, there will not be so many requests coming in every moment. After the peak period, the backlog of requests can be processed slowly.
  • Second, if the message queue length exceeds the maximum number, the user request can be discarded or the error page can be redirected.

2.3 Asynchronous Processing

We often encounter business scenarios like this: after the user registers successfully, they send it a text message and an email.

If it takes 30ms for the registration information to enter into the database, 30ms for sending SMS and email, it will be time-consuming to execute the three actions in serial, and the response will be 90ms:

If you execute in parallel, you can reduce the response time. After the registration information is stored, SMS and email are sent asynchronously at the same time. How to achieve asynchronous, with the message queue, that is, after the registration information is successfully entered into the database, written to the message queue (this is generally relatively fast, such as only need 3ms), and then asynchronously read send email and SMS.

2.4 Message Communication

Message queue has an efficient communication mechanism built in, which can be used for message communication. Such as the implementation of point-to-point message queues, chat rooms, etc.

2.5 Remote Invocation

Our company developed a remote invocation framework based on MQ.

3. How does message queue solve the problem of message loss?

From producer to consumer, a message mainly goes through the following three processes:

So how to ensure that MQ does not lose messages can be illustrated in three stages:

  • The producer guarantees that no messages are lost
  • The storage device does not lose messages
  • Consumers don’t lose messages

3.1 The producer guarantees not to lose messages

How does the production end keep messages from being lost? Ensure that the produced messages reach the storage.

In RocketMQ messaging middleware, Producer provides three ways to send messages:

  • The synchronous
  • Asynchronous send
  • One way to send

If a producer wants to send a message without losing it, it can:

  • If the message is sent synchronously and the send method returns a success status, the message has reached the storage Broker.
  • If the SEND message is abnormal or fails, try again.
  • Transaction messages can be used, and RocketMQ’s transaction messaging mechanism is designed to ensure zero loss

3.2 The storage device does not lose messages

How do you keep messages from getting lost on the storage side? Ensure that messages persist to disk. It’s easy to think of the swipe mechanism.

The brushing mechanism is divided into synchronous brushing and asynchronous brushing:

  • When a producer message is sent, RocketMQ’s story-side Broker returns a successful ACK response only if it is persisted to disk. This is synchronous flush. It guarantees message loss, but affects performance.
  • Asynchronous flush returns a successful ACK response as soon as the message is written to the PageCache cache. This improves MQ performance, but messages are lost if the machine is powered down at this point.

Brokers are typically clustered, with master and slave nodes. A successful ACK is reported to the producer only after the primary and secondary nodes have written successfully to the Broker. This is synchronous replication, which ensures that messages are not lost but reduces the throughput of the system. Asynchronous replication, on the other hand, returns a successful ACK as long as the message is successfully written to the master node, which is fast but has performance problems.

3.3 No messages are lost during consumption

After executing the business logic, the consumer reports back to the Broker that the purchase was successful. This ensures that no messages are lost during the purchase phase.

4. How does the message queue ensure the order of messages?

The ordering of messages means that they can be consumed in the order in which they are sent. Some businesses require the order of messages, such as placing an order, then paying for it, completing the order, and so on. Suppose the producer produces two messages successively, namely the order message (M1) and the payment message (M2). M1 is generated before M2. How can we ensure that M1 is consumed before M2?

To ensure the order of messages, M1 and M2 can be sent to the same Server. After M1 sends an ACK, M2 will send the ack again. As shown in figure:

This can still be problematic because there can be network latency from the MQ server to the server, and although M1 is sent first, it arrives later than M2.

So what else can you do to ensure that messages are sequential? Send M1 and M2 to the same consumer, and send M1, wait until the consumer ACK success, send M2.

So that’s the whole idea of message queues being sequential. Globally ordered messages in Kafka, for example, reflect this idea: when a producer sends a message, a Topic corresponds to only one Partition, a Consumer, and an internal single-threaded consumption.

But this throughput is too low, generally ensure that the message local order can be. When a message is sent, the Partition Key is specified. Kafka hashes the Partition Key and determines which Partition to add. In this way, messages with the same Partition Key will be placed on the same Partition. The multiple consumers then consume the specified Partition in a single thread.

5. Repeated consumption may occur in message queues. How to avoid it and how to achieve idempotency?

Message queues can be re-consumed.

  • To ensure message reliability, the production side may repeatedly send messages to the MQ server until it receives a successful ACK.
  • Then there is the consumer side, the consumer side consumption message is generally this process: pull message, business logic processing, commit consumption shift. Suppose the business logic is finished, the transaction is committed, but when it needs to update the consumption shift, the consumer hangs, and another consumer pulls the duplicate message.

How do you idempotent repeat messages?

I wrote an article about idempotent design before, if you are interested, you can read it: Talk about idempotent design

Idempotent handling of duplicate messages is simply a local table with a unique business token, using a primary key or unique index, and checking each time a business is processed. Or use Redis to cache the business token and check each time to see if it has been processed.

6. How to deal with message backlog in message queue

The news backlog is because producers are producing faster than consumers are consuming. When we encounter message backlog problem, we need to check whether there is a bug.

If it is not a bug, we can optimize the logic of consuming messages, for example, we can check whether we can optimize the processing of messages in batches. If it is still slow, we can consider horizontal expansion, increase the number of Topic queues and the number of machines in the consumer group, and improve the overall consumption capacity.

If a bug causes millions of messages to accumulate for hours. How to deal with it? Need to solve the bug, temporary emergency expansion, the general idea is as follows:

  1. Fix the problem of the consumer first to ensure that the consumer speed is restored, and then stop all existing consumers.
  2. Create a new topic with 10 times as many partitions and temporarily create 10 times as many queues.
  3. Then write a temporary consumer program that distributes data. This program is deployed to consume the backlog of data. After consumption, it does not do time-consuming processing, but directly polls and writes the 10 times as many queues as the temporary ones.
  4. Then 10 times as many machines are temporarily enlisted to deploy the consumers, with each batch consuming a temporary queue of data. This is equivalent to temporarily expanding the Queue and consumer resources by 10 times, consuming data at 10 times the normal rate.
  5. Once the backlog of data is quickly consumed, the deployed architecture needs to be restored to consume messages with the original consumer machine.

7. Selection of message queue technology, Kafka or RocketMQ or RabbitMQ

First, we can compare their advantages and disadvantages:

Kafka RocketMQ RabbitMQ
Single machine throughput 17.3 w/s 11.6 w/s 2.6 W /s (Message persistence)
Development of language Scala/Java Java Erlang
Primary maintainer Apache Alibaba Mozilla/Spring
The subscription form Publish and subscribe pattern based on topic and regular matching according to topic Publish and subscribe mode based on topic/messageTag, regular matching according to message type and attribute Four are offered: Direct, Topic,Headers and FANout. Fanout is broadcast mode
persistence Support mass accumulation Support mass accumulation Support for small amounts of stacking
The order message support support Does not support
The cluster approach Natural leader-slave, stateless cluster, each server is both Master and Slave In the open source version, you need to manually switch the Slave to Master Simple cluster, ‘copy’ mode is supported, advanced cluster mode is not well supported.
Performance stability poor general good
  • RabbitMQ is open source with stable support and high levels of activity, but is not developed in the Java language.
  • A lot of companies use RocketMQ. It’s made by Ali.
  • Kafka is standard for real-time computing and log collection in big data.

8. How can messaging middleware be highly available

How does messaging middleware ensure high availability? There is no single high availability. High availability is for clusters. Take a look at Kafka’s high availability.

Kafka’s basic cluster architecture consists of multiple brokers, each of which is a node. When you create a topic, it can be divided into partitions, and each partition holds a portion of the data stored on a different broker. In other words, the data of a topic is distributed among multiple machines, and each machine stores a portion of the data.

Some partners may wonder if each partition has a portion of data that is lost if the corresponding broker fails. What about high availability?

Since Kafka 0.8, replicas have been introduced to ensure high availability, where data on each partition is synchronized to other machines to form multiple replicas. Then all copies elect a leader to deal with production and consumers, while the other copies are followers. When writing data, the leader is responsible for synchronizing the data to all followers. When reading messages,

Simply read the data on the leader. How to ensure high availability? If a broker is down, partitions on this broker have copies on other machines. What if the leader broker is hung? The other followers choose a new leader.

9. How to ensure data consistency and how to implement transaction messages

A normal MQ message flows from generation to consumption as follows:

  1. The producer generates the message and sends it to the MQ server
  2. When MQ receives the message, it persists it to the storage system.
  3. The MQ server returns an ACk to the producer.
  4. MQ servers push messages to consumers
  5. After consuming the message, the consumer responds to the ACK
  6. The MQ server receives an ACK that the message was successfully consumed and deletes the message from the store.

Let’s take an example of placing an order. After the order system creates an order, it sends a message to the downstream system. If the order is created successfully and the message is not sent successfully, the downstream system will not be aware of this event, resulting in inconsistent data.

How do you ensure data consistency? Transaction messages can be used. Let’s take a look at how transactional messages are implemented.

  1. The producer generates a message and sends a semi-transactional message to the MQ server
  2. When MQ receives a message, it persists the message to the storage system, and the state of the message is to be sent.
  3. The MQ server returns an ACK acknowledgement to the producer, at which point MQ does not trigger a message push event
  4. Producers perform local transactions
  5. If the local transaction is executed successfully, commit the result to the MQ server. If the execution fails, send rollback.
  6. If it is a normal COMMIT, the MQ server updates the message status to deliverable. If it is rollback, delete the message.
  7. If the message status is updated to be deliverable, the MQ server pushes the message to the consumer. When the consumer finishes the purchase, he replies with ACK.
  8. If the MQ server does not receive a COMMIT or rollback from the producer for a long time, it backchecks the producer and performs the final status based on the query result.

10. Let you write a message queue, how should the architecture design?

The interviewer is looking at three main aspects of this question:

  • Are you familiar with the architecture of message queues
  • Examine your personal design skills
  • Examine programming ideas such as high availability, extensibility, exponentiation, and so on.

Encountered this kind of design problem, most people will be very confused, because usually did not think about similar problems. Most people are too busy adding, deleting, and changing things to think about the principles behind the framework. There are many similar problems, such as asked you to design a Dubbo framework, or asked you to design a MyBatis framework, what would you think?

To answer these questions, you do not need to have studied the source code of the technology, you know the basic structure of the technical framework, how it works. To design a message queue, we can think of the following aspects:

  1. The first is the overall process of the message queue. The producer sends the message to the broker, which stores the message, which then sends it to the consumer for consumption, and the consumer replies to the consumer for confirmation.
  2. Producer sends messages to the broker, and the broker sends messages to the consumer. Therefore, there are two RPCS required. How to design RPC? Consider Dubbo, the open source framework, and you can talk about service discovery, serialization protocols, and so on
  3. The broker considers how to persist, whether to put in a file system or database, whether to accumulate messages, and how to handle message accumulation.
  4. How to preserve the consumption relationship? Peer-to-peer or broadcast? How to maintain broadcasting relations? Zk or Config Server
  5. How can message reliability be guaranteed? If the message is repeated, how is it idempotent?
  6. How is the high availability of message queues designed? Consider Kafka’s high availability guarantee mechanism. Multiple copies -> leader & follower -> Broker hangs and re-elects the leader to serve.
  7. Message transaction feature, and local business in the same transaction, local message database; The message is deleted locally after being delivered to the server. The scheduled task scans the local message library and compensates sending.
  8. MQ scalability and extensibility, how can it support rapid expansion and throughput when messages are backlogged or resources are insufficient? Broker -> topic -> partition. Each partition is a machine that stores a portion of the data. If resources are not enough now, it is easy to add partition to topic, and then do data migration, add machine, not can store more data, provide higher throughput.

Reference and thanks

  • How does Ali RocketMQ solve the two major problems of message order & repetition?
  • Message middleware interview question: How to solve message queue delay and expiration problem?
  • Message queue design essentials
  • MQ messaging ultimate consistency solution