Why use MQ? The advantages of MQ

Short answer

  • Asynchronous processing – improves system throughput compared with traditional serial and parallel processing.
  • Application decoupling – Systems communicate with each other through messages without concern for processing by other systems.
  • Traffic clipping – Request volume can be controlled by message queue length; Can alleviate high concurrent requests in a short time.
  • Log processing – Solves mass log transfer.
  • Message communication – Message queues generally have efficient communication mechanisms built in, so they can also be used for pure message communication. Such as implementing point-to-point message queues, or chat rooms, etc.

Detailed answer

Mainly: decoupling, asynchronous, peak clipping.

Decoupling: System A sends data to the three BCD systems via interface calls. What if system E also wants this data? What if system C now doesn’t need it? A System manager almost crashed… System A is seriously coupled with other chaotic systems. System A produces A relatively critical piece of data, and many systems need system A to send this data. If MQ is used, system A generates A piece of data and sends it to MQ, which system needs the data to be consumed in MQ itself. If the new system needs data, it can be consumed directly from MQ; If a system no longer needs this data, simply unconsume the MQ message. In this way, system A does not need to consider who to send data to, does not need to maintain the code, and does not need to consider whether others call success, failure timeout, etc.

It is a system or a module, called multiple systems or modules, the call between each other is very complex, maintenance is very troublesome. However, this call does not need to be directly synchronized to the interface, if MQ is used to decouple it asynchronously.

Asynchronous: when system A receives A request, it needs to write data to its own local library and to three system write libraries in BCD. The write data to its own local library requires 3ms. The write data to BCD requires 300ms, 450ms, and 200ms respectively. The total delay of the final request is 3 + 300 + 450 + 200 = 953ms, close to 1s, the user feels that something is very slow. The user initiates a request through the browser. If MQ is used, system A sends three consecutive messages to the MQ queue. If it takes 5ms, the total time for system A to receive A request and return A response to the user is 3 + 5 = 8ms.

Peak clipping: Reduces strain on the server during peak times.

What are the advantages and disadvantages of message queues? What are the advantages and disadvantages of RabbitMQ?

The advantages mentioned above are the corresponding advantages in special scenarios, such as decoupling, asynchrony and peak clipping.

The disadvantages are as follows:

The system availability decreases

The system was running fine, and now you have to join a message queue, and the message queue is down, and your system is not ha ha. Therefore, system availability is reduced;

System complexity enhancement

After joining the message queue, many problems should be considered, such as: consistency, how to ensure that the message is not repeated consumption, how to ensure the reliability of message transmission, etc. As a result, there is more to consider and more complexity.

Consistency problem

A system processing directly return success, people think you this request is successful; But the problem is, what if BCD three systems, BD two system write library success, the result of C system write library failure? Your numbers don’t match up.

So message queuing is actually a very complex architecture, and you introduce it with a lot of benefits, but you have to do all kinds of additional technical solutions and architectures to get around it, and when you do that, you see, gee, the system is an order of magnitude more complex, maybe 10 times more complex. But when the chips are down, it’s — it’s still there.

What kind of message-oriented middleware is used in your production environment?

Start by saying what messaging middleware your company uses, such as RabbitMQ, and give you an initial analysis of the different MQ middleware technologies.

For example, ActiveMQ is an old message middleware, which has been widely used by many domestic companies with powerful functions.

However, the problem is that ActiveMQ can not be confirmed to support the complex scene of high concurrency, high load and high throughput of Internet companies, which is rarely implemented in Domestic Internet companies. What’s more, some traditional enterprises use ActiveMQ to make asynchronous call and system decoupling.

Then you can talk about RabbitMQ. The benefits of RabbitMQ are high concurrency, high throughput, high performance, and a very sophisticated backend management interface to use.

In addition, it supports clustering, high availability deployment architecture, high reliable messaging support, and more complete functionality.

And through the investigation, the domestic major Internet companies landed large-scale RabbitMQ cluster to support their own business case, domestic various small and medium-sized Internet companies using RabbitMQ practice is also more.

In addition, RabbitMQ has an active open source community, with frequent iterations, bug fixes and optimizations, so the company has adopted RabbitMQ.

RabbitMQ does have one drawback, however. It is based on The Erlang language itself, which makes it difficult to analyse the source code, and to customise it further, requiring a solid Erlang background.

Then we can talk about RocketMQ, which is open source and has been tested by ali’s production environment for high concurrency and high throughput, while also supporting special scenarios such as distributed transactions.

RocketMQ is based on the Java language, which makes it easy to read source code in depth and solve online production problems at the source level, including secondary development and modification of source code.

The other is Kafka. Kafka provides significantly less functionality for messaging middleware than the previous MQ middleware offerings.

But Kafka’s advantage is that it is designed for ultra-high throughput scenarios such as real-time log collection, real-time data synchronization, and real-time data computation.

So Kafka is used in big data with real-time computing technologies such as Spark Streaming, Storm and Flink. However, it is rarely used in traditional MQ middleware usage scenarios.

What are the advantages and disadvantages of Kafka, ActiveMQ, RabbitMQ, and RocketMQ?

ActiveMQ RabbitMQ RocketMQ Kafka ZeroMQ
Single machine throughput Is lower than the RabbitMQ 2.6 W /s (Message persistence) 11.6 w/s 17.3 w/s 29w/s
Development of language Java Erlang Java Scala/Java C
Primary maintainer Apache Mozilla/Spring Alibaba Apache IMatix, whose founder is dead
maturity mature mature The open source version is not mature enough More mature Only C, PHP and other versions are mature
The subscription form Peer-to-peer (P2P), broadcast (publish-subscribe) Four are offered: Direct, Topic,Headers and FANout. Fanout is broadcast mode Publish and subscribe pattern based on Topic /messageTag and regular matching by message type and attribute Publish and subscribe patterns based on topic and by topic for regular matching Peer-to-peer (p2p)
persistence Support for small amounts of stacking Support for small amounts of stacking Support mass accumulation Support mass accumulation Does not support
The order message Does not support Does not support support support Does not support
Performance stability good good general poor Very good
The cluster approach Simple cluster mode is supported, such as’ active-standby ‘, but advanced cluster mode is not well supported. Simple cluster, ‘copy’ mode is supported, advanced cluster mode is not well supported. In the open source version, you need to manually switch the Slave to Master A natural ‘leader-slave’ stateless cluster where each server is both Master and Slave Does not support
Management interface general good general There is no There is no

To sum up, after various comparisons, the following suggestions are made:

General business system to introduce MQ, everyone used ActiveMQ at the beginning, but now it is true that people do not use it much, has not been verified by large-scale throughput scenarios, the community is not very active, so you should forget it, I personally do not recommend using this;

Then RabbitMQ was used, but it was true that the Erlang language prevented a lot of Java engineers from getting too deep into it and controlling it. It was almost out of control for the company, but it was true that it was open source, fairly stable, and very active;

More and more companies are using RocketMQ, which is great because it’s alibaba, but there’s a risk that the community will suddenly disappear (RocketMQ has been donated to Apache, but it’s not very active on GitHub). RocketMQ is recommended, otherwise go back to RabbitMQ, they have an active open source community, it will not go wrong.

So for small and medium sized companies, with average technical strength and not particularly high technical challenges, RabbitMQ is a good choice. For large companies with strong infrastructure development capabilities, RocketMQ is a good choice.

Kafka is the industry standard for real-time computing, log collection and other scenarios in the field of big data. There is no problem with Kafka. The community is very active.

What are the common problems with MQ? How to solve these problems?

Common problems with MQ are:

  1. The order of messages
  2. Duplication of messages

The order of messages

Message ordering means that messages can be consumed in the order in which they are sent.

Suppose the producer produces two messages: M1 and M2. Suppose M1 is sent to S1 and M2 is sent to S2. What if M1 is consumed before M2?

Solution:

(1) Ensure that producer-MQServer-consumer relationship is one-to-one

Defect:

  • Parallelism becomes a bottleneck in the messaging system (insufficient throughput)
  • More exception handling, for example: whenever there is a problem on the consumer side, the entire processing process will be blocked, and we have to spend more effort to solve the problem of blocking. (2) Through reasonable design or decomposition of the problem to avoid.
  • Applications that don’t care about out-of-order actually abound
  • Out-of-order queues do not mean out-of-order messages so it makes more sense to ensure the order of messages at the business level rather than just relying on the messaging system.

Duplication of messages

The root cause of message duplication is that the network is unreachable.

So the way to solve this problem is to get around it. The question then becomes: what if the consumer receives two identical messages?

The business logic for processing messages on the consumer side remains idempotent. As long as idempotency is maintained, no matter how many duplicate messages are sent, the final result will be the same. Ensure that each message has a unique number and that successful message processing is accompanied by the de-table log. A log table is used to record the IDS of successfully processed messages, and if the newly arrived message ID is already in the log table, the message is not processed.

What is RabbitMQ?

RabbitMQ is an open source, Erlang, AMQP based messaging middleware

Rabbitmq usage scenarios

(1) Asynchronous communication between services

(2) Sequential consumption

(3) Scheduled tasks

(4) Request peak cutting

RabbitMQ basic concepts

  • Broker: Simple message queue server entity
  • Exchange: message Exchange, which specifies the rules by which messages are routed to which queue
  • Queue: Message Queue carrier, each message is put to one or more queues
  • Binding exchanges and queues according to routing rules
  • Routing Key: The Key used by Exchange to deliver messages
  • VHost: A VHost is a virtual broker, that is, a mini-RabbitMQ server. They all have their own queue, exchange, binding, etc., but most importantly, they have their own permission system, which can control users in the vhost range. Of course, from RabbitMQ’s global perspective, vhosts can be used as a means of isolating different permissions (a typical example is that different applications can run on different vhosts).
  • Producer: A program that delivers messages
  • A Consumer is a program that receives messages
  • Channel: Message Channel. In each connection of the client, multiple channels can be established. Each Channel represents a session task

Exchange, Queue, and RoutingKey determine a unique route from Exchange to Queue.

The working mode of RabbitMQ

I. Simple mode (namely, the simplest sending and receiving mode)

1. A message generates a message and puts the message into a queue

2. The message of consumers (consumer) to monitor the message queue, if there is a message queue, is consumed, after the news was taken, automatically deleted from the queue (hidden message may not have been properly handle consumer, has disappeared from the queue, causing the loss of the news, here can be set to manual ack, but if set to manual ack, After processing, send AN ACK message to the queue in a timely manner; otherwise, memory will overflow.

Ii. Work Mode (Competition for resources)

There can be multiple consumers. Consumer 1 and consumer 2 listen on the same queue at the same time, and the message is consumed. C1 and C2 compete for the contents of the current message queue, and the one who gets the message first is responsible for consuming the message. (Hidden danger: In the case of high concurrency, a message will be used by multiple consumers by default. You can set a syncronize to ensure that one message can be used by only one consumer.)

Publish /subscribe

1. Each consumer listens to his own queue;

2. The producer sends the message to the broker, which forwards the message to each queue bound to the switch. Each queue bound to the switch receives the message.

4. Routing Mode

The message producer sends the message to the switch according to the route. The route is a string (INFO). The message that is currently generated carries a routing character (object method).

2. Define the route string based on the service function

3. Obtain the corresponding function string from the system’s code logic and throw the message task to the corresponding queue.

4. Service scenario: Error notification; EXCEPTION; Error notification functionality; Error notification in the traditional sense; Customer notification; By using key routing, errors in the program can be encapsulated as messages to the message queue. Developers can customize consumers to receive errors in real time.

Topic Mode (one of the routing modes)

1. Asterisks (asterisks) represent wildcards

2. Asterisks stand for multiple words and pound signs stand for one word

3. Add fuzzy matching for routing

4. The message producer generates the message and delivers the message to the switch

5. The switch fuzzy matches the key to the corresponding queue, and the listening consumer of the queue receives the message consumption

(In my understanding, it is a fuzzy matching of routing query, similar to the fuzzy query of SQL)

How can RabbitMQ messages be ordered?

Splitting multiple queues, one consumer per queue, just a few more queues, that’s a real trouble spot; Or a queue corresponding to a consumer, which is queued internally by an in-memory queue and then distributed to different workers at the bottom for processing.

How are messages distributed?

If at least one consumer subscribes to the queue, the message is sent to the consumer in a round-robin fashion. Each message is distributed to only one subscribing consumer (provided the consumer can process the message and confirm it properly). Multiple consumption can be realized through routing

How are messages routed?

When the message provider -> route -> one or more queue messages are published to the exchange, the message will have a routing key, which is set when the message is created. Queues can be bound to switches through queue routing keys. When the message arrives at the exchange, RabbitMQ matches the routing key of the message with the routing key of the queue (different routing rules apply to different switches).

Commonly used switches are divided into the following three types:

Fanout: If the switch receives a message, it broadcasts it to all bound queues

Direct: If the routing keys match exactly, the message is delivered to the corresponding queue

Topic: Enables messages from different sources to reach the same queue. When using topic switches, you can use wildcards

What transport is the message based on?

The creation and destruction of TCP connections is expensive and the number of concurrent connections is limited by system resources, resulting in performance bottlenecks. RabbitMQ uses channels to transmit data. A channel is a virtual connection established within a real TCP connection, and there is no limit on the number of channels on each TCP connection.

How to ensure that messages are not re-consumed? In other words, how can messages be idempotent when consumed?

Under normal circumstances, consumers will send a confirmation message to the message queue when consuming a message. The message queue will know that the message has been consumed and will delete the message from the message queue.

However, due to network transmission failure, the confirmation message is not sent to the message queue. As a result, the message queue does not know that it has consumed the message and distributes the message to other consumers.

In view of the above problems, one solution is to ensure the uniqueness of the message, even if multiple transmission, do not let the message multiple consumption bring influence; Ensure message idempotency;

For example, when the data written into the message queue is uniquely marked, the message is consumed according to the unique mark to determine whether it has been consumed.

Suppose you have a system that consumes one message and inserts one piece of data into the database. If you repeat one message twice, you insert two, and the data is wrong. But if you consume to the second time, their own judgment whether it has been consumed, if directly thrown away, so not retain a data, so as to ensure the correctness of the data.

How do I ensure that messages are sent to RabbitMQ correctly? How do I ensure that message recipients consume messages?

Sender confirmation mode

Setting the channel to Confirm mode assigns a unique ID to all messages posted on the channel.

Once a message has been posted to the destination queue or written to disk (persistable messages), the channel sends an acknowledgement to the producer (containing the unique ID of the message).

If RabbitMQ encounters an internal error that causes the message to be lost, a NACK (Notacknowledged, unacknowledged) message is sent.

The sender confirmation pattern is asynchronous, and producer applications can continue sending messages while waiting for confirmation. When the acknowledgement message arrives at the producer application, the producer application’s callback method is triggered to process the acknowledgement message.

Recipient confirmation mechanism

Each message received by the consumer must be acknowledged (message receipt and message acknowledgement are two different operations). RabbitMQ can safely remove a message from the queue only if the consumer confirms it.

The timeout mechanism is not used and RabbitMQ only confirms that the Consumer needs to resend the message if the connection is broken. That is, RabbitMQ gives the Consumer enough time to process the message as long as the connection is not broken. Ensure final consistency of data;

Some special cases are listed below

  • If a consumer receives a message and disconnects or unsubscribes before confirming it, RabbitMQ will assume that the message has not been distributed and redistribute it to the next consumer that subscribes. (There may be hidden danger of repeated message consumption, which needs to be removed)
  • If a consumer receives a message without an acknowledgement and the connection is not broken, RabbitMQ considers the consumer to be busy and will not distribute any more messages to the consumer.

How to ensure reliable transmission of RabbitMQ messages?

Information is not reliable may be message loss, hijacking and other reasons;

Loss can be divided into producer loss message, message list loss message and consumer loss message.

Producers lose messages: From the point of view of producers losing data, RabbitMQ provides transaction and confirm modes to ensure that producers do not lose messages.

Transaction mechanism means: Before sending the message, the transaction (channel.txSelect()) is started, and the message is sent. If any exception occurs during the sending, the transaction is rolled back(channel.txRollback()), and if the sending succeeds, the transaction (channel.txcommit ()) is committed. However, there is a disadvantage to this approach: throughput decreases;

Once a channel enters Confirm mode, all messages posted on the channel will be assigned a unique ID (starting from 1). Once the message has been sent to all matching queues, the channel will be assigned a unique ID (starting from 1).

RabbitMQ sends an ACK to the producer (containing the unique ID of the message), which lets the producer know that the message has reached the destination queue correctly.

If rabbitMQ fails to process the message, it will send you a Nack message which you can retry.

Message queue data loss: message persistence.

When the message queue loses data, the configuration of persistent disks is generally enabled.

This persistence configuration can be used in conjunction with the Confirm mechanism, where you can send an Ack signal to the producer after the message persists to the disk.

This way, if rabbitMQ dies before the message persists to disk, the producer does not receive the Ack signal and will automatically resend it.

So how do you persist?

It’s easy, by the way, to do the following two steps

  1. If the durable identifier of a queue is set to true, it indicates that the queue is durable
  2. Set deliveryMode=2 when sending a message

If rabbitMQ is set this way, data will be restored after restart, even if the rabbitMQ fails

Consumer lost message: the consumer lost data is usually because of the automatic confirmation message mode, change to manual confirmation message can!

The consumer will automatically reply to RabbitMQ that the message has been received before processing it.

If the message fails to process at this point, the message is lost;

Solution: After the message is successfully processed, manually reply the confirmation message.

Why shouldn’t persistence be used for all messages?

First, there must be a performance drop, because writing to disk is much slower than writing to RAM, and message throughput can vary by a factor of 10.

Second, the persistence mechanism of message can be used with RabbitMQ’s built-in cluster scheme. The paradox is that if message is durable but Queue is durable, the owner node of the queue becomes faulty and the queue is not rebuilt. Messages to this queue will be blackholed; If message is durable and Queue is durable, the queue cannot be rebuilt on other nodes when the owner node of the queue becomes faulty and cannot be restarted. The queue can only be restored after the owner node restarts, and messages sent to the queue will be blackholed during this time.

So whether to persist messages requires a combination of performance requirements and possible problems. To achieve a throughput of more than 100,000 messages per second (with a single RabbitMQ server), either use other means to ensure reliable delivery of messages or use a very fast storage system to support full persistence (such as SSDS). Another processing principle is to persist only critical messages (based on business importance) and ensure that the volume of critical messages does not cause performance bottlenecks.

How to ensure high availability? The RabbitMQ cluster

RabbitMQ is typically a master-slave (non-distributed) approach to high availability, so we will use RabbitMQ as an example to explain how the first MQ high availability can be implemented. RabbitMQ has three modes: single-machine mode, common cluster mode, and mirrored cluster mode.

Single player mode, is the Demo level, generally is your local startup to play? No one is producing single machine mode

Normal cluster mode, which means that multiple RabbitMQ instances are started on multiple machines, one on each machine. The queue you create will only be placed on one RabbitMQ instance, but each instance synchronizes the metadata of the queue (metadata can be thought of as configuration information about the queue, which can be used to find the instance of the queue). When you consume, in fact, if you’re connected to another instance, that instance will pull data from the instance where the queue is. The main purpose of this solution is to improve throughput by having multiple nodes in the cluster service reads and writes to a queue.

Mirrored cluster mode: This is the high availability mode for RabbitMQ. Unlike normal clustering, in mirrored clustering, the queue you create, both metadata and messages, will exist on multiple instances, that is, each RabbitMQ node will have a full mirror of the queue, meaning all the data in the queue. And then every time you write a message to a queue, it automatically synchronizes the message to multiple instances of the queue. RabbitMQ has a nice admin console, which adds a policy in the background. This policy is mirrored cluster mode and can be specified to synchronize data to all nodes or to a specified number of nodes. When creating a queue again, apply this policy. Data is automatically synchronized to other nodes. In this case, the advantage is that if any of your machines goes down, it doesn’t matter, the other machines (nodes) still contain the complete data of this queue, and other consumers can go to the other nodes to consume data. The downside is that, first of all, the performance overhead is too high. Messages need to be synchronized to all machines, resulting in heavy network bandwidth pressure and consumption. RabbitMQ a queue of data is stored on a single node, the mirror cluster, and each node in the queue is the complete data.

How to solve message queue delay and expiration problem? What happens when the message queue is full? There are millions of messages waiting for hours. How do you fix them?

Message backlog handling method: Temporary emergency capacity expansion:

Fix the problem of the consumer first to ensure that the consumption speed is restored, and then stop all existing CNOsumers. Create a new topic with 10 times as many partitions and temporarily create 10 times as many queues. Then write a temporary consumer program that distributes data. This program is deployed to consume the backlog of data. After consumption, it does not do time-consuming processing, but directly polls and writes the 10 times as many queues as the temporary ones. Then 10 times as many machines are temporarily enlisted to deploy the consumers, with each batch consuming a temporary queue of data. This is equivalent to temporarily expanding the Queue and consumer resources by 10 times, consuming data at 10 times the normal rate. Once the backlog of data is quickly consumed, the deployed architecture needs to be restored to consume messages with the original consumer machine. MQ message invalidation: Assuming you are using RabbitMQ, RabbtiMQ can be set to expire, i.e. TTL. If messages are stuck in a queue for more than a certain amount of time they are cleared by RabbitMQ and the data is gone. So that’s the second hole. This does not mean that the data will accumulate in MQ, but rather that the data will simply get lost. We can adopt a solution, that is, batch reguide, which we have done similar scenes online before. You know, when there’s a huge backlog, we just throw it away, and then after the peak period, like when people are drinking coffee and staying up until 12 o ‘clock at night, people are asleep. At this time we began to write procedures, will lose that batch of data, write a temporary program, bit by bit to find out, and then re-into mq inside, lost data to him during the day to make up for it. That’s all it has to be. Suppose 10,000 orders are unprocessed in MQ, and 1000 of them are lost. You have to manually write a program to find those 1000 orders and manually send them to MQ to be filled again.

Mq message queue block full: What if messages are backlogged in MQ and you do not clear them for a long time, causing MQ to fill up? Is there another way to do this? No, your first plan was too slow. You wrote an AD hoc program to access data to consume, consume one by one, discard one by one, quickly consume all messages. Then go to plan two and make up the data later in the evening.

Designing MQ

For example, in this message queue system, let’s think about it from the following perspectives:

First of all, mq has to support scalability, which means it can be scaled up quickly when needed to increase throughput and capacity. Broker -> topic -> partition, each partition is a machine that stores a portion of the data. If resources are not enough now, it is easy to add partition to topic, and then do data migration, add machine, not can store more data, provide higher throughput.

Secondly, you have to consider whether the MQ data should be landed on disk. B: Yes, it must be. The disk is off to ensure that the other process dies and the data is lost. How do you drop the disk when you drop it? Sequential write, so that there is no addressing overhead of random disk reads and writes, sequential disk read and write performance is very high, this is the idea of Kafka.

Next you think about the availability of your MQ? For this, refer to the availability section on Kafka’s high availability guarantee mechanism. Multiple copies -> leader & follower -> Broker hangs and re-elects the leader to serve.

Can support data 0 loss? Yes, refer to the Kafka zero data loss scheme we talked about earlier.

Source: author: ThinkWon thinkwon.blog.csdn.net/article/det…