Body:

1. What is RabbitMQ

A message queue technology using AMQP advanced message queue protocol, the biggest characteristic is that consumption does not need to ensure the existence of providers, to achieve a high degree of decoupling between services

2. Why rabbitMQ

In distributed system with asynchronous, peak clipping, load balancing and a series of advanced functions;

There is a persistence mechanism, process messages, queue information can also be saved.

Achieve decoupling between consumers and producers.

For high concurrency scenarios, using message queue can make synchronous access into serial access to achieve a certain amount of flow limiting, which is beneficial to the operation of database.

Message queue can be used to achieve the effect of asynchronous ordering, queuing, the background logical ordering.

3. Rabbitmq scenarios

Asynchronous communication between services

Order consumption

Timing task

Request peak clipping

4. How do I ensure that messages are sent to RabbitMQ correctly? How do I ensure that message recipients consume messages?

Sender confirmation mode

Setting the channel to confirm mode (sender acknowledgment mode) assigns a unique ID to all messages published on the channel.

Once a message has been posted to the destination queue or written to disk (persistable messages), the channel sends an acknowledgement to the producer (containing the unique ID of the message).

If RabbitMQ encounters an internal error that causes the message to be lost, a NACK (Notacknowledged, unacknowledged) message is sent. The sender confirmation pattern is asynchronous, and producer applications can continue sending messages while waiting for confirmation. When the acknowledgement message arrives at the producer application, the producer application’s callback method is triggered to process the acknowledgement message.

Recipient confirmation mechanism

Recipient message acknowledgement mechanism

Each message received by the consumer must be acknowledged (message receipt and message acknowledgement are two different operations). RabbitMQ can safely remove a message from the queue only if the consumer confirms it. The timeout mechanism is not used and RabbitMQ only confirms that the Consumer needs to resend the message if the connection is broken. That is, RabbitMQ gives the Consumer enough time to process the message as long as the connection is not broken. Ensure final consistency of data;

Some special cases are listed below

If a consumer receives a message and disconnects or unsubscribes before confirming it, RabbitMQ will assume that the message has not been distributed and redistribute it to the next consumer that subscribes.

If a consumer receives a message without confirmation and the connection is not broken, RabbitMQ considers the consumer to be busy and will not send more messages to the consumer.

5. How to avoid repeated message delivery or consumption?

During message production, MQ internally generates an inner-msG-ID for each message sent by the producer as a basis for de-duplication (message delivery failure and retransmission) to avoid duplicate messages entering the queue.

In message consumption, there must be a bizId (globally unique for the same business, such as payment ID, order ID, post ID, etc.) in the message body as the basis for deduplication to avoid repeated consumption of the same message.

6. What transport is the message based on?

The creation and destruction of TCP connections is expensive and the number of concurrent connections is limited by system resources, resulting in performance bottlenecks. RabbitMQ uses channels to transmit data. A channel is a virtual connection established within a real TCP connection, and there is no limit on the number of channels on each TCP connection

7. How are messages distributed?

If at least one consumer subscribes to the queue, the message is sent to the consumer in a round-robin fashion. Each message is distributed to only one subscribing consumer (provided the consumer can process the message and confirm it properly).

Multiple consumption can be realized through routing

8. How to route messages?

Message provider -> route -> one or more queues

When a message is published to the exchange, it will have a routing key, which is set when the message is created.

Queues can be bound to switches through queue routing keys.

When the message arrives at the exchange, RabbitMQ matches the routing key of the message with the routing key of the queue (different routing rules apply to different switches).

Commonly used switches are divided into the following three types

Fanout: If the switch receives a message, it broadcasts it to all bound queues

Direct: If the routing keys match exactly, the message is delivered to the corresponding queue

Topic: Enables messages from different sources to reach the same queue. When using topic switches, you can use wildcards

9. How to ensure that messages are not lost?

Message persistence, of course, if the queue must be persistent

RabbitMQ ensures that persistent messages can be recovered from a server restart by writing them to a persistent log file on disk. When a persistent message is posted to the persistent exchange, Rabbit will send a response only after the message is committed to the log file.

Once a consumer consumes a persistent message from the persistent queue, RabbitMQ marks it in the persistence log as waiting for garbage collection. If RabbitMQ restarts persistent messages before they are consumed, Rabbit automatically reconstructs the exchange and queues (and bindings) and republishes the messages from the persistence log file to the appropriate queues.

10. What are the benefits of RabbitMQ?

Services are highly decoupled

High asynchronous communication performance

Traffic peak clipping

11. Cluster for RabbitMQ

Mirroring cluster mode

The queue that you create, both the metadata and the messages in the queue are going to live on multiple instances, and then every time you write a message to the queue, it’s going to automatically synchronize messages to the queues of multiple instances.

The good thing is, if one of your machines goes down, you can use the other machines. The disadvantages are, first, the performance overhead is too high, the message synchronizes all machines, resulting in the network bandwidth pressure and consumption is heavy! Second, it’s not scalable, because if you add machines to a queue that’s heavily loaded, you add machines that also contain all the data in that queue, there’s no way to linearly scale your queue

12. Disadvantages of MQ

The system availability decreases

The more external dependencies the system introduces, the more likely it will fail. Originally, you are system A calling BCD interface of three systems, but ABCD interface of four systems is fine, there is no problem, you prefer to add MQ to it, what if MQ fails? MQ goes down, the whole system goes down, you’re done.

System complexity enhancement

By adding MQ, how can you ensure that messages are not consumed twice? How to handle message loss? How to ensure sequential message delivery? Big head, lots of problems, lots of pain

Consistency problem A system processing directly return success, people think you this request is successful; But the problem is, what if BCD three systems, BD two system write library success, the result of C system write library failure? Your numbers don’t match up.

So message queuing is actually a very complex architecture, and you introduce it with a lot of benefits, but you have to do all kinds of additional technical solutions and architectures to get around it, and at the end of the day, you get, oh, my god, an order of magnitude more complex, maybe 10 times more complex. But when the chips are down, it’s — it’s still there

13. What is the difference between Kafka, ActiveMQ, RabbitMQ, RocketMQ?

For throughput Kafka and RocketMQ support high throughput, while ActiveMQ and RabbitMQ are an order of magnitude lower. RabbitMQ is minimal for latency.

From community activity

According to the current network data, RabbitMQ, activeM, ZeroMQ, taken together, RabbitMQ is the preferred.

Persistent message comparison

ActiveMq and RabbitMq are both supported. Persistent messages mainly refer to the mechanism by which our machine will not be lost in the event of force majeure or other circumstances.

Integrated technology realization

Reliability, flexible routing, clustering, transactions, highly available queues, message ordering, problem tracking, visual management tools, plug-in systems, and more.

RabbitMq/Kafka is the best, ActiveMq is next and ZeroMq is the worst. Of course, ZeroMq can also do, but you have to write code to achieve, code is not small. Especially in reliability: persistence, delivery confirmation, publisher confirmation, and high availability.

High concurrency

RabbitMQ is, of course, the highest because it is implemented in the naturally high concurrency and high availability Erlang language.

A more interesting comparison, RabbitMQ and Kafka

RabbitMq is more mature than Kafka in terms of availability, stability and reliability (in theory). In addition, Kafka is mainly positioned in the log, because Kafka is designed to process logs, can be regarded as an important component of a log (message) system, so if the business is recommended RabbitMq. Also, Kafka’s performance (throughput, TPS) is much higher than RabbitMq’s

14. How to ensure high availability?

RabbitMQ is typically a master-slave (non-distributed) approach to high availability, so we will use RabbitMQ as an example to explain how the first MQ high availability can be implemented. RabbitMQ has three modes: single-machine mode, common cluster mode, and mirrored cluster mode.

Single player mode, is the Demo level, generally is your local startup to play? No one production uses single-machine mode normal cluster mode which means to boot multiple machines

RabbitMQ instance, one per machine. The queue you create will only be placed on one RabbitMQ instance, but each instance synchronizes the metadata of the queue (metadata can be thought of as configuration information about the queue, which can be used to find the instance of the queue). When you consume, in fact, if you’re connected to another instance, that instance will pull data from the instance where the queue is. The main purpose of this solution is to improve throughput by having multiple nodes in the cluster service reads and writes to a queue.

Mirrored cluster mode: This is the high availability mode for RabbitMQ. Unlike normal clustering, in mirrored clustering, the queue you create, both metadata and messages, will exist on multiple instances, that is, each RabbitMQ node will have a full mirror of the queue, meaning all the data in the queue. And then every time you write a message to a queue, it automatically synchronizes the message to multiple instances of the queue. RabbitMQ has a nice admin console, which adds a policy in the background. This policy is mirrored cluster mode and can be specified to synchronize data to all nodes or to a specified number of nodes. When creating a queue again, apply this policy. Data is automatically synchronized to other nodes. In this case, the advantage is that if any of your machines goes down, it doesn’t matter, the other machines (nodes) still contain the complete data of this queue, and other consumers can go to the other nodes to consume data. The downside is that, first of all, the performance overhead is too high. Messages need to be synchronized to all machines, resulting in heavy network bandwidth pressure and consumption. RabbitMQ a queue of data is stored on a single node, the mirror cluster, and each node in the queue is the complete data

A basic architectural understanding of Kafka is that it consists of multiple brokers, each of which is a node. You create a topic that can be divided into multiple partitions, each of which can reside on a different broker, and each of which holds a portion of the data. This is a natural distributed message queue, meaning that the data for a topic is distributed across multiple machines, with each machine hosting a portion of the data. After Kafka 0.8, HA mechanism is provided, namely replica replica mechanism. The data of each partition will be synchronized to other machines to form multiple replica copies. All replicas elect a leader, so production and consumption deal with the leader, and the other replicas are followers. When writing, the leader synchronizes data to all followers. When reading, the leader reads data directly from the leader

Can be. Can only read and write to leader? Simply, if you can read and write to each follower at will, then you have to worry about data consistency, because the complexity of the system is so high that problems can easily occur.

Kafka evenly distributes all replicas of a partition to different machines to improve fault tolerance. Because if a broker goes down, it’s okay, the partitions on that broker are all on other machines

If the partition has a copy, a new leader will be elected from the followers. This is called high availability. When the data is written, the producer writes to the leader, who then writes the data to the local disk, and the other followers actively pull the data from the Leader themselves. Once all the followers have synchronized their data, they send an ACK to the leader, who returns a write success message to the producer after receiving an ACK from all the followers. (Of course, this is only one of the modes and can be adjusted accordingly.) Consumption will only be read from the leader, but a message will only be read by the consumer if it has been successfully ack by all followers

15. How to ensure the reliable transmission of messages? What if the message gets lost

Data loss problems can occur among producers, MQ, and consumers

Producer loss: when producers send data to RabbitMQ, the data can be lost halfway through, due to network problems and so on. To do this, the RabbitMQ transaction channel.txselect is enabled before the producer sends a message. If the message is not received successfully, the producer will receive an exception. At this point, you can roll back the transaction channel.txrollback and retry sending the message; If a message is received, the transaction channel.txCommit can be committed. Throughput goes down because it’s too performance intensive. So in general, if you want to make sure that a message written to or written to RabbitMQ is not lost, you can start a Con firm pattern. Set this pattern at the producer and each time you write a message you are assigned a unique ID. Then if written to RabbitMQ,

RabbitMQ will send you an ACK message saying that the message is ok. If RabbitMQ fails to process the message, it will call you back to the NACK interface telling you that the message failed to be received and you can try again. In combination with this mechanism, you can maintain the status of each message ID in memory yourself, and if you haven’t received a callback for that message for a certain amount of time, you can resend it. The biggest difference between transactions and Cno firm mechanisms is that transactions are synchronous, you commit a transaction and it blocks there, but confirm machines are asynchronous, you send a message and then you send the next one. RabbitMQ then receives the message and asynchronously calls you back to an interface to inform you that the message was received. Therefore, confirm mechanisms are generally used in producers to avoid data loss

Lost in MQ: when RabbitMQ loses data, you must enable RabbitMQ persistence, meaning messages will persist to disk after being written, even if they are

RabbitMQ hangs itself and automatically reads the stored data after recovery. Data is not lost. There are two steps to setting persistence: make the queue persistent when you create it, so RabbitMQ will persist the metadata of the queue, but not the data in it. The second is to set the deliveryMode of the message to 2, which is to persist the message, and RabbitMQ will persist the message to disk. You must set up both of them at the same time

All right, RabbitMQ will restart the queue from disk even if it hangs and restarts again. Persistence can be coupled with confirm mechanisms on the producer side, and producers are not notified of ack data until the message is persisted to disk, so even if RabbitMQ dies before persisting to disk and the data is lost and the producer does not receive an ACK, you can send it yourself. Note that even if you enable persistence for RabbitMQ, it is possible that the message will be written to RabbitMQ before it is persisted to disk, and then RabbitMQ will hang up, causing a small loss of data in memory

Lost: When you consume something, the process hangs, for example after a restart, embarrassing RabbitMQ to assume that you have consumed it and the data is lost. To do this, use RabbitMQ’s ack mechanism. In simple terms, you can turn off RabbitMQ’s automatic ack, call it through an API, and ack it in your own code every time you make sure it’s done. In that case, if you’re not done with it, there’s no ACK? RabbitMQ will assume that you have not finished processing the purchase, and will assign the purchase to another consumer so that the message is not lost

16. How to ensure the sequential nature of messages

RabbitMQ: one queue, multiple consumers. This is not obvious.

Solution:

17. How to solve the delay and expiration of message queue? What happens when the message queue is full? There are millions of messages waiting for hours. How do you fix them

Message backlog handling method: Temporary emergency capacity expansion:

Fix the problem of the consumer first to ensure that the consumption speed is restored, and then stop all existing CNOsumers. Create a new topic with 10 times as many partitions and temporarily create 10 times as many queues. Then write a temporary consumer program that distributes data. This program is deployed to consume the backlog of data. After consumption, it does not do time-consuming processing, but directly polls and writes the 10 times as many queues as the temporary ones.

Then 10 times as many machines are temporarily enlisted to deploy the consumers, with each batch consuming a temporary queue of data. This is equivalent to temporarily expanding the Queue and consumer resources by 10 times, consuming data at 10 times the normal rate. Once the backlog of data is quickly consumed, the deployed architecture needs to be restored to consume messages with the original consumer machine. MQ message invalidation: Assuming you are using RabbitMQ, RabbtiMQ can be set to expire, i.e. TTL. If messages are stuck in a queue for more than a certain amount of time they are cleared by RabbitMQ and the data is gone. So that’s the second hole. This does not mean that the data will accumulate in MQ, but rather that the data will simply get lost. We can adopt a solution, that is, batch reguide, which we have done similar scenes online before. You know, when there’s a huge backlog, we just throw it away, and then after the peak period, like when people are drinking coffee and staying up until 12 o ‘clock at night, people are asleep. At this time we began to write procedures, will lose that batch of data, write a temporary program, bit by bit to find out, and then re-into mq inside, lost data to him during the day to make up for it. That’s all it has to be. Suppose 10,000 orders are unprocessed in MQ, and 1000 of them are lost. You have to manually write a program to find those 1000 orders and manually send them to MQ to be filled again

Mq message queue block full: What if messages are backlogged in MQ and you do not clear them for a long time, causing MQ to fill up? Is there another way to do this? No, your first plan was too slow. You wrote an AD hoc program to access data to consume, consume one by one, discard one by one, quickly consume all messages. Then go to plan two and make up the data later in the evening

18, the design of MQ ideas

For example, in this message queue system, let’s think about it from the following perspectives:

First of all, mq has to support scalability, which means it can be scaled up quickly when needed to increase throughput and capacity. Broker -> topic -> partition, each partition is a machine that stores a portion of the data. If resources are not enough now, it is easy to add partition to topic, and then do data migration, add machine, not can store more data, provide higher throughput.

Secondly, you have to consider whether the MQ data should be landed on disk. B: Yes, it must be. The disk is off to ensure that the other process dies and the data is lost. How do you drop the disk when you drop it? Sequential write, so that there is no addressing overhead of random disk reads and writes, sequential disk read and write performance is very high, this is the idea of Kafka.

Next you think about the availability of your MQ? For this, refer to the availability section on Kafka’s high availability guarantee mechanism. Multiple copies -> leader & follower -> Broker hangs and re-elects the leader to serve. Can support data 0 loss? Yes, refer to the Kafka zero data loss scheme we talked about earlier

19. What is Message?

A message, which is anonymous, consists of a header and a body. The body of the message is opaque, and the header consists of a set of optional attributes, including routing-key, priority (priority over other messages), delivery-mode (indicating that the message may require persistent storage), and so on.

20, What is Publisher?

The producer of messages is also a client application that publishes messages to the exchange.

21, What is Exchange (routing messages to queues)

A switch that receives messages sent by producers and routes them to queues in the server

22, What is Binding (association between message queues and switches)

Binding for association between message queues and exchanges. A binding is a routing rule that connects a switch to a message queue based on a routing key, so a switch can be thought of as a routing table made up of bindings

23. What is Queue?

Message queues, used to hold messages until they are sent to consumers. It is the container and destination of the message. A message can be put into one or more queues. The message remains in the queue, waiting for the consumer to connect to the queue to pick it up

What is Connection?

A network connection, such as a TCP connection.

25. What is Channel?

Channel: an independent two-way data channel in a multiplexing connection. The channel is built on the real TCP connection and the virtual connection, AMQP command is sent through the channel, whether it is to publish messages, subscribe queue or receive messages, these actions are completed through the channel. Because it is very expensive for an operating system to establish and destroy TCP, the concept of a channel was introduced to reuse a TCP connection

26. What is a Consumer?

Message consumer, representing a client application that retrieves a message from a message queue

27, What is a Virtual Host?

Virtual host, representing a batch of exchanges, message queues, and related objects. A virtual host is a separate server domain that shares the same authentication and encryption environment.

28, What is a Broker?

Represents the message queue server entity

29, Exchange type?

Exchange distributes messages according to different distribution policies. Currently, there are four types: Direct, FANout, Topic, and headers. Headers matches the header of an AMQP message rather than the routing key, and the Headers exchange is exactly the same as the Direct exchange, but performs much less well and is now almost useless.

Where is the Direct key distributed?

Direct: If the routing key in the message matches the Binding key in Binding, the exchange sends the message to the corresponding queue. It is a perfectly matched, unicast pattern.

31. Fanout (broadcast distribution)?

Fanout: Each message sent to an exchange of type Fanout is sent to all bound queues. Much like subnet broadcasting, each host in a subnet gets a copy of the message.

Fanout type forwarding messages is the fastest.

32, Topic switch (pattern matching)?

Topic switch: The topic switch allocates the routing key attributes of the message through pattern matching, matching the routing key to a pattern to which the queue must be bound. It splits the strings of routing and binding keys into words separated by dots. It also recognizes two wildcards: the symbol “#” and the symbol “”. # Match zero or more words, no more than one word.

Here are some interview questions