Open source messaging engine system

You’ve probably heard the terms “message queue” and “message middleware.” To be honest, THOUGH, I prefer to use the term message engine system because message queues give a rather vague implication, as if Kafka is built using queues; The term “message-oriented middleware” overexaggerates the term “middleware” and makes it unclear what this middleware does.

The System like Kafka has its own name in foreign countries called Messaging System, which is simply translated into message System in many domestic literatures. Personally, I think it is not very appropriate, because it unilaterally emphasizes the role of the message body, but ignores the message passing attribute that such systems are proud of. Just like an engine, it has the ability to transfer certain energy. Therefore, I think it is more appropriate to translate it as message engine.

What does this kind of system do? I’ll start with the official, serious version of the answer. According to Wikipedia, a messaging engine system is a set of specifications. Enterprises use this set of specifications to deliver semantically accurate messages between different systems, enabling loosely-coupled asynchronous data transfer. That’s the official definition. If that’s hard to follow, try my folk version below:

System A sends A message to the message engine, and system B reads the message sent by system A from the message engine.

The most basic messaging engine does just that! Regardless of which version, they all point to two important facts:

  • The object that a message engine transmits is a message;
  • How messages are transmitted is part of the messaging engine design mechanism.

Since messaging engines are used to transfer messages between different systems, designing the format of the messages to be transferred has always been a priority. How can a message convey business semantics without ambiguity while maximizing reusability and generality? Pause for a few seconds to think about how you would design your message encoding.

An easy one is to use some of the mature solutions that exist, such as CSV, XML, or JSON; Or maybe you’re familiar with some of the open source serialization frameworks from foreign companies, such as Google’s Protocol Buffer or Facebook’s Thrift. These are all cool ideas. So now I tell you Kafka’s choice: it uses a pure binary byte sequence. Of course messages are still structured, but they are converted to binary byte sequences before being used.

Once the message is designed, it is not enough that the message engine system has to specify the specific transport protocol, that is, how I will transmit the message. There are two common methods:

  • Point-to-point model: Also known as message queue model. Using the “civilian” definition above, messages sent by system A can only be received by system B, and messages sent by System A cannot be read by any other system. Everyday examples such as telephone customer service are examples of this model: the same incoming customer call can only be handled by one customer service agent, and a second customer service agent cannot serve the customer.
  • Publish/subscribe model: Unlike the above, it has the concept of a Topic, which you can think of as a message container with similar logical semantics. The model also has sender and receiver, but the terms are different. The sender is also called Publisher and the receiver is called Subscriber. Unlike the point-to-point model, there may be multiple publishers sending messages to the same topic, and there may be multiple subscribers, all receiving messages to the same topic. Daily newspaper subscription is a typical publish/subscribe model.

What’s cool is that Kafka supports both messaging engine models. I’ll share how Kafka does this later.

When it comes to messaging engine systems, you might ask how JMS relates to it. JMS is the Java Message Service, which also supports both of the above Message engine models. It is not strictly a transport protocol but a set of apis. But perhaps JMS is so well known that many major messaging engine systems support the JMS specification, such as ActiveMQ, RabbitMQ, IBM’s WebSphere MQ, and Apache Kafka. Kafka, of course, does not follow the JMS specification entirely. Instead, Kafka takes a different and unique path.

Ok, so far we have only seen what a messaging engine system does and how it does it, but there is an important question of why it is used. Using the above example, we can’t help but ask, why can’t System A send messages directly to system B, with A message engine in between? The answer is “fill in the valley.” These four words are more famous than the messaging engine itself.

I read a lot of literature, and the most common words are these four words. The so-called “peak cutting and valley filling” is to buffer the upstream and downstream instantaneous burst flow, making it smoother. Especially for those upstream systems with strong transmission capacity, without the protection of the message engine, the “fragile” downstream system may be directly overwhelmed, resulting in a full-link service “avalanche”. However, once there is a message engine, it can effectively counter the upstream traffic impact, really do the upstream “peak” to fill the “valley”, avoid the flow shock. Another major benefit of message engine systems is the loose coupling between sender and receiver, which simplifies application development to some extent and reduces unnecessary interactions between systems.

Having said all this, you may not have much intuitive feeling about “peak clipping”. Let me give you an example of how Kafka resizes peak traffic. Think back to how you purchased this course at Geek Time. If I remember correctly, geek Time has a special subscribe button for each course, which you click to go to the paid page. This simple process may contain multiple sub-services, for example, clicking the subscribe button will call the order system to generate the corresponding order, and processing the order will call multiple sub-services downstream in turn, such as calling the interfaces of Alipay and wechat Pay, querying your login information, verifying course information, etc. Obviously, the upstream order operation is relatively simple, and its TPS is much higher than that of the downstream service that processes orders. Therefore, if the upstream and downstream systems are directly connected, the downstream service is bound to be unable to process the upstream order in time, resulting in order accumulation. In particular, when the emergence of such services as seckill, upstream order traffic will increase instantly, and the result may be directly across the downstream subsystem services.

A common solution to this problem is to impose speed limits on the upstream system, but this is obviously unreasonable for the upstream system, after all, the problem is not there. So it’s more common to introduce messaging engine systems like Kafka to combat these upstream and downstream TPS mismatches and instantaneous peak traffic.

Again, when Kafka was introduced. The upstream order service no longer interacts directly with the downstream service. When a new order is generated, it simply sends an order message to the Kafka Broker. Similarly, the downstream sub-services subscribe to the corresponding topic in Kafka and obtain the order message from the respective Partition of the topic in real time for processing, thus realizing the decoupling of the upstream order service and the downstream order processing service. This allows Kafka to store the instantaneous increase in order traffic as messages in the corresponding topic when a second kill occurs, without affecting the UPSTREAM service’s TPS, while giving downstream services enough time to consume them. This is where messaging engines like Kafka make the most sense.

Don’t worry if you’re not familiar with Kafka Broker, themes, and partitions; I’ll devote some time later in this column to introducing common Kafka concepts and terms.

digression

RocketMQ claims to specialize in financial business scenarios, and I personally believe it. Kafka’s roots are more in big data.

Kafka started out as a messaging engine and has since morphed into a streaming platform. No offense, BUT I don’t think messaging engines are a form of flow processing. In fact, stream processing is concerned with dealing with an infinite set of data. They are different fields 🙂

The difference between MQ and RPC is mostly a matter of dataflow mode. There are three common data flows: 1. 2. Through service invocation (REST/RPC); 3. Through asynchronous messaging (message engines such as Kafka) RPC is similar to MQ in that a remote call to a service can also be considered an event, but the difference is that:

  1. MQ has its own buffer that combates the overloaded and unavailable scenario
  2. MQ supports retry
  3. Allow publish/subscribe

There are other differences, of course. It should be said that RPC is a data flow pattern between database and MQ.

reference

  • Kafka in actual combat