One, the introduction


As an experienced microservices system architect, I am often asked, “Should I choose RabbitMQ or Kafka?” . For some reason, many developers treat these two technologies as equivalents. It is true that in some case scenarios it makes no difference whether you choose RabbitMQ or Kafka, but there are many differences in the underlying implementation of the two technologies.


Different scenarios require different solutions, and choosing the wrong one can seriously affect your ability to design, develop, and maintain software.


This article will cover the basic asynchronous messaging patterns and then RabbitMQ and Kafka and their internal structure information. The second part (unfinished) focuses on the main differences between the two technologies and their respective strengths and weaknesses, and finally explains how to choose the two technologies.


Asynchronous message mode


Asynchronous messaging can be used as a solution to decouple the production and processing of messages. When we think of messaging systems, we usually think of two main messaging patterns — message queues and publish/subscribe.


The message queue


Message queues can be used to decouple producers and consumers. Multiple producers can send messages to the same message queue; However, when a message is processed by one sender, the message is locked or removed from the queue and cannot be processed by other consumers. That is, a specific message can only be consumed by one consumer.


It is also important to note that if a consumer fails to process a message, the messaging system will typically put the message back into the queue so that other consumers can continue processing it. In addition to providing decoupling, message queues provide independent scaling for producers and consumers, as well as fault tolerance for error handling.



Publish/subscribe


In the Publish/subscribe (PUB/SUB) pattern, a single message can be concurrently retrieved and processed by multiple subscribers.




For example, events generated in a system can be notified to all subscribers by the publisher in this mode. The term topics is commonly used in many queuing systems to refer to the publish/subscribe pattern. In RabbitMQ, a topic is a specific implementation of the publish/subscribe model (more specifically, a type of exchange), but in this article I will treat topics as equivalent to publish/subscribe.


In general, there are two types of subscriptions:


  1. Ephemeral subscriptions, which only exist when the consumer is up and running. Once the consumer logs out, the corresponding subscription and unprocessed messages are lost.

  2. Durable subscriptions, which endure unless they are deleted. After the consumer exits, the messaging system continues to maintain the subscription, and subsequent messages can continue to be processed.


Third, the RabbitMQ


RabbitMQ, an implementation of messaging middleware, is often used as a service bus. RabbitMQ natively supports both of the above message modes. Other popular implementations of message-oriented middleware include ActiveMQ, ZeroMQ, Azure Service Bus, and Amazon Simple Queue Service (SQS). These message-oriented middleware implementations have a lot in common; Many of the concepts mentioned in this article apply to most of these middleware.


The queue


RabbitMQ supports a typical message queue out of the box. A developer can define a named queue to which a publisher can then send messages. Finally, the consumer can retrieve the message to be processed through this named queue.


Message exchanger


RabbitMQ uses a message exchange to implement the publish/subscribe model. Publishers can publish messages to the message exchange without knowing which subscribers those messages have.


Each consumer that subscribes to an exchange creates a queue; The message exchange then queues the produced messages for consumption by consumers. A message exchange can also filter messages for a number of subscribers based on various routing rules.




It is important to note that RabbitMQ supports both temporary and persistent subscriptions. Consumers can call the RabbitMQ API to select the type of subscription they want.


Based on RabbitMQ’s architecture, it is also possible to create a hybrid approach in which subscribers queue up and then compete as consumers to process messages on a particular queue within a group of subscribers called consumer groups. In this way, we implemented a publish/subscribe model while also being able to scale up subscribers to handle incoming messages.



Four, Apache Kafka


Apache Kafka is not an implementation of messaging middleware. Instead, it’s a distributed streaming system.


Unlike RabbitMQ, which is based on queues and switches, Kafka’s storage layer uses partitioned transaction logs. Kafka also provides streaming apis for real-time streaming processing and connector apis for easier integration with various data sources; Of course, this is beyond the scope of this article.


Cloud vendors offer alternative solutions for the Kafka storage layer, such as Azure Event Hubsy and AWS Kinesis Data Streams. There are specific cloud and open source solutions for Kafka streaming capabilities, but, again, they are beyond the scope of this article.




Kafka does not implement queues. Kafka, in turn, stores recordsets by category, and calls these categories topics.


Kafka maintains a message partition log for each topic. Each partition is composed of an ordered immutable sequence of records, and messages are appended consecutively to the end.


When messages arrive, Kafka appends them to the end of the partition. By default, Kafka uses a polling partition to distribute messages consistently across multiple partitions.


Kafka can change the behavior of creating a logical flow of messages. For example, in a multi-tenant application, we can create a message flow based on the tenant ID in each message. In IoT scenarios, we can map producers to a specific partition based on their identity at a constant level. Ensuring that messages from the same logical flow are mapped to the same partition ensures that messages are served sequentially to consumers.




Consumers read messages sequentially by maintaining partitions’ offsets (or indexes) and then consume them.


A single consumer can consume multiple different topics, and the number of consumers can scale up to the maximum number of partitions available.


So when creating a topic, we should carefully consider the expected message throughput on the topic we are creating. A group of consumers who consume the same topic is called a consumer group. The API provided through Kafka handles partition balancing between multiple consumers in the same consumer group and storage of the consumer’s current partition offset.






Producers can send messages to a specific topic, and then multiple consumer groups can consume the same message. Each consumer group can scale independently to handle the corresponding load. Since consumers maintain their own partition offsets, they can choose between persistent subscriptions, which do not lose offsets after restarts, and temporary subscriptions, which lose offsets after restarts and start reading from the latest records in the partition after each restart.


However, this implementation is not completely equivalent to the typical message queue pattern. Of course, we can create a topic that is associated with a consumer group that has a consumer, and we can simulate a typical message queue. However, there are many disadvantages to this, which we will discuss in detail in Part 2.


It’s worth noting that Kafka keeps messages in a partition for a pre-configured time, not based on whether a consumer consumes them. This retention mechanism allows consumers to freely reread previous messages. In addition, developers can leverage Kafka’s storage layer to implement features such as event tracing and log auditing.


V. Concluding remarks


Although RabbitMQ and Kafka can sometimes be considered equivalent, their implementations are quite different. So we can’t treat them as the same kind of tools; One is messaging middleware and the other is distributed streaming system.


As solution architects, we need to be able to recognize the differences between them and try to consider which type of solution to use in a given scenario.


A BLOG address:www.liangsonghua.com

Pay attention to wechat public number: songhua preserved egg bulletin board, get more exciting!

Introduction to our official account: We share our technical insights from working in JD, as well as JAVA technology and best practices in the industry, most of which are pragmatic, understandable and reproducible