This is the 12th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

If the background

Pulsar has also started to shine. Today we are going to talk about RocketMQ and Kafka, which I use most often, and we already know the differences between them. The selection of the two may be confused, so today I will lead you to analyze the difference between the two, as well as the selection standard!

Architecture contrast

RocketMQ architecture

RocketMQ consists of Nameservers, brokers, consumers, and producers. Nameservers do not communicate with each other. Brokers register with all Nameservers and determine whether the Broker is alive by heartbeat. Producers and consumers use nameserver to know what topics are on the broker.

Kafka’s architecture

Kafka metadata is stored in Zookeeper. The new version of Kafka is already stored inside Kafka, consisting of brokers, Zookeeper, producers, and consumers.

The Broker contrast

Master-slave architecture model differences:

Dimension is different
  • Kafka’s master/slave is partition based, whereas RocketMQ is Broker based.

    • Kafka master/slave can be switched (mainly due to Zookeeper’s master/slave switchover mechanism)
    • RocketMQ does not automatically switch. When RocketMQ’s Master goes down, reads can be routed to slave, but writes are routed to other brokers in this topic.
Brush plate mechanism

RocketMQ supports synchronous flush, where each message is flushed to disk before being returned, ensuring that messages are not lost, but with a slight throughput impact. In the master-slave structure, the asynchronous dual-write policy is a reliable choice.

Information query

RocketMQ supports message query and custom keys in addition to the offset of the queue. RocketMQ indexes both offset and key as separate index files.

Consumption failed retry and delayed consumption

RocketMQ defines a delay queue for each topic. When a message fails to be consumed, it is sent back to the Broker to be placed in a delay queue. Each consumer subscribs to the delay queue by default at startup, so that a failed message can be consumed again after a certain period of time.

  • The delay time corresponds to the delay level. The delay time increases with the number of failures. The last delay interval is 2 hours.

  • Of course, you can also specify the latency level for sending messages, so that you can actively set the latency consumption, which is useful in some specific scenarios.

Data read and write speed
  • Kafka has a unique directory for each partition, and each partition has its own. Log data file, which is equivalent to multiple log files for a topic.

  • RocketMQ is a commitlog file shared per topic,

Kafka’s topics tend to have multiple partitions, so Kafka can write data an order of magnitude faster than RocketMQ.

However, if the number of Kafka partitions exceeds a certain number of files, the sequential write becomes random and the performance deteriorates dramatically. Therefore, the number of Kafka partitions is limited.

Contrast random and sequential reads and writes

  • Continuous/random I/O (in the lower disk dimension)

    • Continuous I/O: Indicates that the address of the initial sector of the current I/O and the address of the end sector of the last I/O are completely consecutive or not far apart. Otherwise, if the difference is large, it counts as a random I/O.
  • Random I/ OS may occur because disk space is discontinuous due to disk fragmentation or the current block space is smaller than the file size.

Continuous I/O is more efficient than random I/O
  • Continuous I/O, the head hardly needs to change lanes, or change lanes for a short time;
  • Random I/ OS, if too many, can cause the head to change lanes constantly, resulting in a significant loss of efficiency.
Random and sequential speed comparison

IOPS and throughput: Why focus randomly on IOPS and sequentially on throughput?

  • The random ADDRESSING time and rotation delay of each I/O operation cannot be ignored, which limits the IOPS.

  • Sequential reading and writing can ignore addressing time and rotation delay, and mainly spend on data transmission time.

IOPS When you measure the performance of an I/O system, you need to specify the read/write mode and the size of a single I/O. The read/write mode is affected by rotation time and seek time, while the single I/O is affected by data transmission time.

Service governance
  • Kafka uses Zookeeper for service discovery and governance. Brokers and consumers register their information with Zookeeper and subscribe to ZNodes so that they can detect when a broker or consumer is down and adjust accordingly.

  • RocketMQ uses a custom nameServer for service discovery and governance. For example, if the broker is down, producers and consumers will not know it in real time and will have to wait until the next update of the broker cluster (up to 30 seconds) to adjust accordingly. The service has a window of unavailability, but data is not lost and consistency is guaranteed.

    • However, when one consumer is down, the broker will report back to other consumers in real time, triggering load balancing immediately, which can guarantee the real-time consumption of messages to some extent.

Producer differences

Way to send
  • Kafka uses asynchronous sending by default. A memory buffer temporarily stores messages and simultaneously sends multiple messages into a single packet. This improves throughput, but affects the effectiveness of messages.

  • RocketMQ can choose to send synchronously or asynchronously.

Send a response

Kafka send Ack supports three Settings:

  • Memory buffer returns (0);

  • Wait until the leader receives the message to return (1)

  • Wait until both the leader and isR followers receive the message and return (-1)

As mentioned above, Kafka is asynchronous flush

RocketMQ waits for the response of the broker to be confirmed. There are synchronous flush, asynchronous flush, synchronous double-write, and asynchronous double-write strategies. Compared to Kafka, there is a synchronous flush.

Consumer differences

The message filter
  • RocketMQ queues correspond to Kafka partitions, but RocketMQ topics can be further segmented, allowing messages to be tagged and messages to be further filtered by subscribing to specific tags.
Order news
  • RocketMQ supports both global and local ordering

  • Kafka also supports ordered messages, but this is not guaranteed if a broker goes down.

Consumer to confirm

RocketMQ only supports manual confirmation, where ack+1 periodically synchronizes consumption progress to the broker after a message is consumed, or offset is attached to the next pull.

Kafka supports periodic confirmation, automatic confirmation and manual confirmation. Offset is stored in ZooKeeper.

Consumption parallelism

Kafka is single-threaded by default. A Consumer can subscribe to one or more partitions, and a Partition can only be consumed by one Consumer at a time.

If the number of partitions is 10, a maximum of 10 machines can consume in parallel (each machine can only open one thread), or one machine can consume in parallel (10 threads). That is, the parallelism of consumption is consistent with the number of partitions.

RocketMQ consumption parallelism is divided into two types: ordered consumption mode and concurrent consumption mode.

  • In ordered mode, there is only one thread consumption per consumer, and the parallelism is exactly the same as Kafka.

  • The news of the concurrent mode, every time I pull by consumeMessageBatchMaxSize (default 1) assigned to the consumer thread pool, after the consumer thread pool min = 20, Max = 64. That is, if the concurrency of each queue is between 20 and 64, multiply multiple queues in a topic. So RocketMQ has an order of magnitude more concurrency than Kafka.

The degree of parallelism depends on the number of Consumer threads. For example, if Topic is configured with 10 queues, 10 machines consuming, and each machine has 100 threads, then the degree of parallelism is 1000.

Transaction message

RocketMQ specifies a certain level of transaction messages. The current open source version has removed the transaction message lookup function, making the transaction mechanism slightly less reliable, but Ali Cloud’s RocketMQ supports reliable transaction messages; Kafka does not support distributed transaction messages.

What is the difference between Topic and Tag?

Whether services are associated

  • Non-directly related news: Taobao transaction news and JINGdong logistics news are distinguished by different topics.

  • Transaction messages, such as electrical orders, women’s orders, and cosmetics orders, can be distinguished by tags.

Whether the message priority is consistent: For the same logistics message, hema must be delivered within hours, Tmall supermarket within 24 hours, and Taobao logistics is relatively slower. Messages with different priorities are distinguished by different topics.

Whether the message magnitude is equal: Although some business messages are small in quantity but have high real-time requirements, if they use the same Topic with some messages of trillions of magnitude, they may starve to death due to long waiting time. In this case, messages of different magnitudes need to be split and different topics need to be used.

Tag and Topic selection

For message classification, you can choose to create multiple topics, or create multiple tags under the same Topic.

The messages between different topics are not necessarily related.

Tag is used to distinguish related messages under the same Topic, such as the relationship between the complete set and subset, and the relationship between the sequence of processes.

Rational use of Topic and Tag can make the business structure clear and improve efficiency.

How does Tag implement message filtering

Different from other MQ middleware, the RocketMQ distributed message queue filters messages when the Consumer subscribs to them.

RocketMQ does this because the Producer side writes messages and the Consumer side subscribersmessages are stored separately. The Consumer side subscribersmessages need to get an index from the ConsumeQueue, the logical queue that consumes the messages. The actual message entity content is then read from the CommitLog, so there’s no way around its storage structure after all.

The storage structure of ConsumeQueue: You can see that there are eight bytes of the Message Tag hash, and tag-based Message filtering is based on this field value.

Tag Filtering mode
  • Consumer end when the subscribe message in addition to the specified Topic can also specify the Tag, if there are multiple Tag a message, can use the | | space.

  • The Consumer constructs the subscription request as a SubscriptionData and sends a Pull message request to the Broker.

  • Before the Broker reads data from RocketMQ’s file storage layer, Store, it builds a MessageFilter with the data and passes it to the Store.

  • When a Store reads a record from ConsumeQueue, it uses the tag hash value of the message to filter it, since the server is only judging by hashCode.

The original tag string cannot be accurately filtered. Therefore, after the message consuming end pulls the message, it needs to compare the original tag string of the message. If the original tag string is different, the message is discarded and no message is consumed.

Message Body filtering

Upload Java code to the server to do any form of filtering on messages, even Message Body filtering

The ability to stack data messages

Theoretically Kafka is more stackable than RocketMQ, but RocketMQ can also support hundreds of millions of messages on a single machine, which we believe is sufficient for the business.

Message data backtracking

  • Kafka can theoretically trace messages back to Offset

  • RocketMQ support according to the time to back in the news, millisecond precision, such as from the day before a certain time points a seconds began to consume news, a typical business scenarios such as consumer do order analysis, but due to program logic or dependent system failure and other reasons, led to today’s consumer message is invalid, all need to start from zero yesterday consumption, Then the time-based message replay function is very helpful to the business.

The performance comparison

  • Kafka single-machine write TPS is about 1m/SEC and message size is 10 bytes

  • A RocketMQ single machine can write about 70,000 TPS single instances per second, with three brokers deployed on a single machine, running up to 120,000 TPS single instances per second, and message sizes of 10 bytes.

Data consistency and real-time

Real-time message delivery

  • Kafka uses short polling, and the real time depends on the polling interval

  • RocketMQ uses long polling, which is as real-time as Push, with delivery delays of messages typically within milliseconds.

Consumption failure retry

  • Kafka consumption failure does not support retry

  • RocketMQ consumption failure Periodic retry is supported. The retry interval is extended

Message order

  • Kafka supports message ordering, but when a Broker goes down, messages are out of order

  • RocketMQ supports strict message ordering. In a sequential message scenario, when a Broker goes down, messages fail to be sent, but not out of order

Mysql Binlog distribution requires a strict message order

(Aside) RocketMQ’s unique tag mechanism, which Kafka doesn’t have

Ordinary messages, transaction messages, timed (delayed) messages, sequential messages, and different message types use different topics and cannot be distinguished by Tag.

conclusion

  • RocketMQ is targeted at non-logging reliable message delivery (logging scenarios are OK). Currently, RocketMQ is widely used in Alibaba Group in order, transaction, recharge, stream computing, message push, log stream processing, binglog distribution and other scenarios.

  • RocketMQ’s synchronous flush is more reliable on a single machine than Kafka’s and does not cause data loss due to OS crashes.

  • Synchronous Replication is also more reliable than Kafka asynchronous Replication with no single point of data at all.

  • In addition, Kafka Replication is based on topic, and can be switched over automatically when the host is down. However, there is a problem here. Because it is asynchronous Replication, data will be lost after the switchover, and if the Leader restarts, data conflicts will occur with the Leader.

  • For example, the recharge application calls the operator gateway at the current moment, but the recharge fails. It may be because the other party is under too much pressure, and the call will be successful later. Retries here require reliable retries, meaning that failed retries are not lost due to a Consumer outage.