How can message queues be highly available

1 the interview questions

How to ensure the high availability of message queues

High availability is a must, as the disadvantages of MQ are numerous and result in reduced system availability. So as long as you are using MQ, some of the next points to ask are bound to revolve around the shortcomings of MQ. If you foolishly use MQ and never give it a second thought, you will be left with the impression that you are a technical slacker without any thought. If such students recruit to come in to do a 20K salary within the ordinary younger brother also ok. If you recruit to do more than 20 k salary of high, it would be miserable, let you design a system, there must be a pile of pits, accidents suffered by the company, the team on the back. 3 detailed explanation of interview questions

It’s a good question to ask, because how can Kafka be highly available? How to ensure the high availability of ActiveMQ? This is a bad question for an interviewer who probably uses RabbitMQ and has never used Kafka before. I thought that was a sign of difficulty. So skilled interviewers are asking how can HIGH availability of MQ be guaranteed? So which MQ you have used, you can say what you understand about the high availability of that MQ. 3.1 RabbitMQ High availability

RabbitMQ is an example of how to implement the first MQ high availability since it is master-slave based. Rabbitmq has three modes: single-machine deployment mode, common cluster deployment mode, and mirrored cluster deployment mode

3.1.2 Common Cluster mode: Start multiple RabbitMQ instances on multiple machines, one for each machine

However, the queue you create will only be placed on one RabbitMQ instance, but each instance will synchronize the metadata of the queue. When you’re done consuming, in fact, if you’re connected to another instance, that instance will pull data from the instance where the queue is. This way is really troublesome, and it’s not very good, it’s not distributed, it’s just a normal cluster. Because this leads to either a consumer connecting randomly one instance at a time and having the overhead of pulling data or a fixed connection consuming data from the instance where that queue is located and there’s a single-instance performance bottleneck and if that instance that put the queue goes down, then other instances can’t pull from that instance, If you enable message persistence and allow RabbitMQ to store messages, they will not necessarily be lost until the instance is recovered before pulling data from the queue. So it’s a bit awkward, because there’s no such thing as high availability. It’s more about throughput, which means that multiple nodes in the cluster serve reads and writes to a queue. The architecture diagram is shown below

3.1.3 Mirroring Cluster Mode

This is what is called the high availability mode of RabbitMQ. Unlike the normal clustering mode, the queue you create, the metadata and the messages in the queue are stored in multiple instances, and each time you write a message to the queue, it is automatically synchronized to the queues of multiple instances. benefits

If any machine goes down, it’s okay. The other machines can be used for harm

The performance overhead is too high. Messages are synchronized to all machines, resulting in heavy network bandwidth pressure and consumption. It’s not scalable, if you have a queue that’s heavily loaded, and you add machines, and you add machines that contain all the data in that queue, and there’s no way to linearly scale your queue then how do you turn on this mirror cluster mode? RabbitMQ has a nice admin console, which adds a policy in the background. This policy is a mirrored cluster policy, which can be specified to synchronize data to all nodes or to a specified number of nodes, and then when you create a queue again, apply this policy. Data is automatically synchronized to other nodes. The architecture diagram is shown below

3.2 High availability of Kafka

One of kafka’s most basic architectural insights is that it consists of multiple brokers, each of which is a node. You create a topic that can be divided into multiple partitions, each of which can reside on a different broker, and each of which holds a portion of the data. This is a natural distributed message queue, meaning that the data for a topic is distributed across multiple machines, with each machine hosting a portion of the data. In fact, RabbitMQ and the like are not distributed message queues, they are traditional message queues that provide some clustering, HA mechanism, because RabbitMQ queue data is stored on a node, in a mirror cluster, no matter how fun it is. It is also the complete data of this queue that each node puts. Prior to Kafka 0.8, there was no HA mechanism. When any broker went down, partitions on that broker became invalid, unwritable and unreadable, and there was no high availability. After Kafka 0.8, the HA mechanism is provided, that is, the replica replica mechanism. The data of each partition is synchronized to other machines to form multiple replica copies of its own. Then all replicas elect a leader. So production and consumption deal with the leader and the other replicas are the followers so that when writing, the leader synchronizes data to all the followers and reads data directly from the leader, so you can only read and write data from the leader? If you can read and write to each follower at will, you need to be concerned about data consistency. If the complexity of the system is too high, problems may arise. Kafka evenly distributes all replicas of a partition to different machines to improve fault tolerance. In this way, there is what is called high availability because if a broker breaks down, that’s fine. The partition on that broker has replicas on other machines. If there is a leader on that partition, a new leader will be elected. Continue reading and writing to the new leader. This is what is known as high availability when writing data, the producer writes to the leader, who then writes the data to the local disk, and the other followers themselves actively pull the data from the Leader. Once all the followers have synchronized their data, they send an ACK to the leader, who returns a write success message to the producer after receiving an ACK from all the followers. (Of course, this is only one of the modes and can be adjusted accordingly.) Consumption will only be read from the leader, but a message will only be read by the consumer if it has been successfully ack by all followers. In fact, this mechanic, in depth, can go a lot further, but I’ll go back to the theme and orientation of this course, focusing on interviews, and at least you get a sense of how Kafka ensures high availability, right? Don’t know nothing, you can also draw pictures for the interviewer. If you meet an interviewer who is really a kafka expert, you can only say sorry, you have not studied too deeply. But it’s important to understand that there’s a tradeoff, and you’re going to have to do a quick dive into kafka, not a deep dive into Kafka, which you don’t have time for. You just have to make sure that you probably didn’t know this before, but now that you do, you can probably say something when they ask you. And then there are a lot of other candidates, maybe worse than you, who haven’t seen this, who have been asked to the point where they can’t answer, whereas you can say something, which is sort of the point