Author introduction: Lin Qingshan (Name: Longji)

Operating system, database and middleware are the troika of basic software, and message queue is one of the most classic middleware with a history of more than 30 years. Its development mainly went through the following stages:

The first phase, before 2000. In The 1980s, The first message queue was The Information Bus, which proposed The publishing and subscription model for The first time to solve The communication problem between software. To the 1990s, is the era of international business software giants, IBM, Oracle, Microsoft have launched their own MQ, the most representative is IBM MQ, expensive, for high-end enterprises, such as large financial, telecommunications and other enterprises; This kind of commercial MQ is generally delivered by high-end hardware and integrated hardware and software. The architecture of MQ itself is a stand-alone architecture.

The second stage, 2000~2007. After entering the 2000s, the first generation of open source message queue rose, and JMS and AMQP two standards were born, and the corresponding two implementations were ActiveMQ and RabbitMQ. Open source greatly promoted the popularity of message queue, reduced the threshold of use, and gradually became the standard configuration of enterprise architecture. Compared to today, this type of MQ is mainly for traditional enterprise applications, for small traffic scenarios, and the horizontal scaling capability is weak.

The third stage, 2007-2018. PC Internet, mobile Internet explosive development. As the traditional message queue cannot bear the access traffic and mass data transmission of hundreds of millions of users, Internet message middleware was born. Its core capability is to fully adopt the distributed architecture and have strong horizontal expansion ability. The open source typical representatives are Kafka, RocketMQ and Notify of Taobao. The birth of Kafka also extends message-oriented middleware from Messaging domain to Streaming domain, from asynchronous decoupling scenario of distributed applications to Streaming storage and Streaming computing scenario of big data.

The fourth stage, 2014~ present. IoT, cloud computing and cloud native are leading new technology trends. In the IoT – oriented scenario, message queue starts from the cloud server application communication, extends to the edge machine room and Internet of Things terminal equipment, supporting MQTT and other Internet of Things standard protocol has become the standard configuration of each major message queue.

With the popularity of cloud computing, the concept of cloud native has gained popularity. Various representative technologies of cloud native emerge in an endless stream, including container, microservice, Serverless, Service Mesh, event driven, etc. The core problem of cloud native is how to redesign the application so as to fully release the technical bonus of cloud computing and realize the shortest path to business success.

Message queues themselves are one of the PaaS services in cloud computing, and one of the key evolvements to further “decouple” the ability to help businesses build modern applications is Eventing. By sublimating messages to “events” and providing standard Cloudevent-oriented orchestration filtering, publishing and subscribing capabilities to build a wider range of decoupling, including cloud service events and business applications, SaaS business events across organizations, legacy applications and modern applications, etc. At the same time, event-driven is also the natural paradigm of cloud computing Serverless function computing and the catalyst of application Serverless evolution.

For message-oriented middleware, cloud native also means the evolution of message queue’s own architecture. How to give full play to elastic computing, storage and network of cloud, so as to obtain stronger technical indicators and Serverless elastic capability.

What are the technological advances and breakthroughs in message-oriented middleware?

Alibaba Cloud MQ is a one-stop messaging service based on RocketMQ. RocketMQ is used as a unified kernel to implement industry standards and mainstream messaging protocols, including MQTT, Kafka, RabbitMQ, AMQP, CloudEvent, HTTP, etc., to meet the demands of customers in diverse scenarios. In order to improve the ease of use, we have productized different protocols to provide message services (such as Ali Cloud RabbitMQ and Ali Cloud Kafka) in the mode of independent products, which are out-of-the-box, o&M free and complete observable system to help open source customers move cloud seamlessly.

After the continuous polishing of tens of thousands of diversified scenarios of enterprise customers, and several years of super large scale cloud computing production practice, our kernel RocketMQ has gradually evolved into an integrated architecture and cloud native architecture.

1. Integrated architecture

Micro service, big data and real-time calculation, IoT, event-driven technologies such as tide, the extension of the news business boundary, the industry has a different message queue to meet different business scenarios, such as the RabbitMQ meet on micro service scenario, Kafka is focused on big data, event flow scenarios, EMQ is met the IoT vertical scenario. However, with the deepening of digital transformation, customers’ business often involves overlapping scenarios at the same time, such as the real-time calculation of messages from Internet of Things devices or business messages generated by microservice systems. If multiple systems are introduced, additional costs of machines, operation and maintenance, learning and other costs will be brought.

In the past, “sharing” was often the compromise of technical implementation, but now “integration” is the real demand of users. RocketMQ 5.0 extends diversified indexing based on unified Commitlog, including time indexing, million queue indexing, transaction indexing, KV indexing, bulk indexing, logical queuing, and more. It also supports RabbitMQ, Kafka, MQTT, edge lightweight computing and other product capabilities, truly realizing the integrated architecture of “message, event, flow” and “cloud side”.

2. Cloud native architecture

Cloud native architecture refers to the architecture that is native to the cloud. Cloud computing is the “source power” of cloud native. Talking about cloud native without cloud computing is like talking about it on paper. RocketMQ has evolved from Internet messaging middleware to cloud native messaging middleware over the past few years by building on ali Cloud’s super-scale cloud computing production practices and helping tens of thousands of enterprises complete their digital transformation. This is one of the biggest differences between RocketMQ and other message-oriented middleware. RocketMQ is a cloud-native architecture that has been put into practice. Let’s take a look at the key technology evolution of RocketMQ in cloud-native architecture.

RocketMQ was born out of taobao’s core e-commerce system in 2011. It was originally positioned as a service group business, designed for a single super-large Internet enterprise. The original architecture can not well meet the cloud computing scene, and there are many pain points, such as heavy SDK, complicated client logic, high cost of multi-language SDK development, slow iteration of commercial features; Poor elasticity, computing storage coupling, client and physical queue coupling, queue number cannot be extended to millions, tens of millions; Other mainstream open source messaging projects have also not implemented the transformation of cloud native architecture, such as RabbitMQ single queue capability cannot scale horizontally, Kafka elastic expansion will face a large amount of data copy balancing, etc., which are not suitable for providing elastic service for large scale customers in the public cloud.

To this end, RocketMQ 5.0’s cloud-oriented scenario has been redesigned to address fundamental issues at the architectural level, with comprehensive upgrades to clients, brokers, and storage engines:

Client lightweight. The RocketMQ 5.0 SDK brings a lot of logic down to the server side, reducing the number of lines of code by two-thirds and significantly reducing the cost of developing and maintaining multilingual SDKS. The lightweight SDK is easier to integrate with cloud native representative technologies such as Service Mesh and Dapr.

Separable separable storage and computation architecture. Users can enable storage and computing functions in the same process or deploy them separately based on different scenarios. The compute nodes deployed separately can be stateless. One access point can proxy all traffic. The new hardware kernel bypass technology can be used on the cloud to reduce performance and delay caused by separated deployment. The integrated storage and computing architecture provides the advantages of nearby computing and higher performance. In cloud multi-lease, multi-VPC complex networks, and multi-protocol access scenarios, the storage computing separation mode prevents back-end storage services from being directly exposed to clients, facilitating traffic control, isolation, scheduling, rights management, and protocol conversion.

However, there are advantages and disadvantages. The separation of storage and computation also brings problems such as link length, delay increase, machine cost increase, and operation and maintenance is not simplified. In addition to the operation and maintenance of stateful storage nodes, more stateless computing nodes need to be operated and maintained. In most simple message sending and receiving scenarios, data links are basically Log writing and Log reading without complex computing logic (computing logic is too simple compared with databases). In this case, the integrated storage and computing architecture is preferred, which is simple enough, has high performance and low latency. Especially in big data transmission scenarios, the integration of storage and computation can greatly reduce machine and traffic costs, as demonstrated by the evolution of Kafka’s architecture. In general, do not separate for the sake of storage and calculation, or to return to the essence of customers and business scenes.

Elastic storage engine. For IoT massive devices and large-scale small customers on the cloud, we introduce THE KV index of LSM to realize the capability of massive queues on a single machine, and the number of queues can be extended indefinitely. In order to further release the capacity of cloud storage, we implemented tiered storage. The message storage duration increased from 3 days to month and year level, and the storage space could be expanded indefinitely. Meanwhile, we separated hot and cold data, and the cost of cold data storage was reduced by 80%.

Serverless. In the old architecture, customers were aware of physical queues, which were bound to fixed storage nodes and had a strong state. The expansion and contraction of Broker, client, and physical queue are coupled to each other, and the load balancing granularity is at the queue level, which is not friendly to the technological evolution of Serverless. RocketMQ 5.0 further decouples logical and physical resources to achieve the ultimate elastic Serverless.

In the Messaging/ out-of-order message scenario, the client specifies topics for message out-of-order sending and receiving. The new architecture shields the concept of queues to the client and only exposes logical resource topics. The granularity of load balancing is from queue level to message level, realizing stateless client and flexible decoupling between client and server.

In the Streaming/ sequential message scenario, the client needs to specify a queue (also called partition) under a Topic for sequential message sending and receiving. In the new architecture, physical queues are shielded from clients and the concept of logical queues is introduced. A logical queue is fragmented horizontally and vertically on different physical storage nodes. Horizontal sharding solves the problem of high availability. Multiple fragments of the same logical queue can be written randomly. Sequence preservation is based on the principle of Happen before. The system is segmented vertically to solve the problem of logical queue capacity expansion. Multi-level queue mapping enables second-level capacity expansion with 0 data migration, and decouples logical and physical resources from elastic scaling.

What about messaging ecosystem players?

Under the guidance of the trend of cloud, IoT and big data, messaging has become a rigid requirement of modern application architecture and can be used in a wide range of scenarios, such as asynchronous decoupling of microservices, event-driven, upstream and downstream data of Internet of Things devices, big data streaming storage, and lightweight flow computing. Customer demand is strong, the market is active, attracted many manufacturers to join the competition.

From a good point of view, the full competition of manufacturers will further activate innovation, cultivate more users, expand the news market together, and users seem to have more choices.

Part from the point of view of the bad, the future competition failed message queue will enter stagnation, offline stage, the user’s application will migrate great transformation and stability risks, so suggest users to meet their business needs, as far as possible choose standard way to access interface, protocol, or directly use the fact that the standard message queue.

What are the future trends of messaging middleware?

With the continuous development of IoT and 5G networks, the growth rate of data volume is 28%, and it is expected that by 2025, the number of devices connected with the Internet of Things will reach 40 billion, entering the era of the Internet of everything. In the era of Internet of Things, information storage and computation will explode, and the information system will face huge cost pressure. In the future, the message system needs to dig into the dividend of new hardware, such as persistent memory, DPU and other technologies, and use the combination of soft and hard to optimize deeply, so as to further reduce the cost of message storage and calculation.

Another important trend in the IoT era is edge computing. Gartner estimates that by 2025, 75% of data will be processed outside traditional data centers or cloud environments, and messaging systems will need to be further lightweight and reduce resource consumption to adapt to edge computing environments. This also means that the integrated architecture of message-oriented middleware should have a good plug-in design and be able to achieve multi-modal output according to the characteristics of the scenario. For example, the public cloud can be deeply integrated with the infrastructure of the public cloud. Cloud disks and object storage can be fully utilized to enhance storage capabilities, and services such as log service and application monitoring can be integrated to improve observation capabilities. The form of edge computing is the ability to output core storage and lightweight computing at the minimum cost of resources, which is simple enough.

In recent years, cloud computing has developed rapidly, benefiting from the digital transformation of a large number of enterprises around the world, and enhancing the competitiveness of enterprises through online business, business data and intelligent data. The transformation of data is also accompanied by the transformation of business thinking. More and more enterprises adopt the “event-driven” mode to build business logic and digital system.

Gartner predicts that over 60% of new digital business solutions will adopt an “event-driven” model. From a business perspective, the “event-driven” model will help enterprises respond to customers in real time, capture more business opportunities and create incremental value. From a technical point of view, the “event-driven” architecture can link heterogeneous systems across organizations and environments in a dynamic, flexible and decoupled way, which is naturally suitable for building large digital business ecosystems across organizations.

In response to this trend, Messaing evolved to Eventing, resulting in the product form of EventBridge (EventBroker). In EventBridge, the concept of “events” becomes a first-class citizen, and event publishers and subscribers are not coupled to any specific message queue SDK or implementation. EventBroker builds a more generalized publish-subscribe pattern around the standard CloudEvent specification, linking all heterogeneous event sources and event processing targets across organizations and environments.

EventBridge is in the early days of the event-driven digital business ecosystem. In the future, EventBridge will implement more powerful capabilities around the event level of abstraction, such as full link observation of events, event analysis and computing, and low code development to help enterprises fully implement event-driven architectures in the cloud era.

About the author:

Lin Qingshan (name: Longji), ali Cloud senior technical expert, Ali Cloud message product line leader. Expert in the field of international messaging, dedicated to messaging, real-time computing, event-driven research and exploration, promoting the evolution of RocketMQ cloud native architecture, hyper-converged architecture.