There are a dizzyingly large number of messaging middleware out there — rabbitMQ, Kafka, rocketMQ, Pulsar, Redis, etc. What are their similarities and differences, and which one should you choose? In this article, we will try to take a closer look at the architecture and principles of Redis, Kafka, and Pulsar to help you understand and learn more about message queues. Author: Liu Deen, RESEARCH and development engineer at Tencent IEG.

First, the most basic queue

The most basic message queue is actually a two-end queue, which can be realized by a two-way linked list, as shown in the figure below:

Push_front: adds elements to the head of the queue; Pop_tail: Pulls elements from the end of the queue.

With such a data structure, we can build a queue of messages in memory, with some tasks continuously adding messages to the queue, while other tasks continuously fetching messages from the end of the queue in an orderly manner. The task of adding messages is called producer, and the task of taking out and using messages is called Consumer.

Implementing such an in-memory message queue is not difficult or even easy. But it’s going to take a lot of work to make it efficient when dealing with massive amounts of concurrent reads and writes. 2. Queue of Redis

Redis provides just the data structure described above — list. Redis List support:

Lpush: insert data from the left of the queue; Rpop: Fetch data from the right side of the queue.

This corresponds to our queue abstraction of push_front and pop_tail, so we can use redis’ list directly as a message queue. And Redis itself has done a good optimization for high concurrency, the internal data structure has been carefully designed and optimized. So in a sense, using redis’ list is much more likely than implementing a new list yourself.

On the other hand, there are some disadvantages to using redis list as message queue, such as:

Message persistence: Redis is an in-memory database, although there are two mechanisms for persistence, aof and RDB, but these are only auxiliary methods, both of which are unreliable. When the Redis server goes down, some data will be lost, which is unacceptable for many businesses. Hot key performance issues: Whether using coDIS or TwemProxy clustering solutions, all read and write requests to a queue end up on the same Redis instance and cannot be resolved by scaling up. If the number of concurrent reads and writes to a list is very high, an unresolvable hot key can be generated, possibly leading to a system crash. There is no confirmation mechanism: every time an RPOP is performed to consume a piece of data, that message is permanently deleted from the list. If consumers fail, the message can’t be retrieved. You might say that the consumer could repost the message to the queue on failure, but that would be ideal. In the extreme, what if the consumer process simply crashes, such as kill -9, panic, coredump… No support for multiple subscribers: a message can only be consumed by one consumer, no more after RPOP. If the queue stores the application log, for the same message, the monitoring system needs to consume it for a possible alarm, the BI system needs to consume it to draw a report, the link trace needs to consume it to draw a call relationship… This scenario is not supported by redis List. No support for secondary consumption: a message rpop is gone. If a consumer program runs halfway through and finds a bug in the code, it won’t be possible to fix it and start consuming again.

As for the above shortcomings, the first one (persistence) can be solved so far. Many companies have teams based on the Rocksdb LevelDB for secondary development, the implementation of redis protocol support KV storage. The storage is no longer Redis, but it works almost the same as Redis. They keep the data persistent, but they don’t do much about the other drawbacks mentioned above.

Redis 5.0 has introduced a stream data type, which is designed specifically to be used as a data structure for message queues. It borrows a lot from Kafka’s design, but there are still a lot of problems… Straight into the world of Kafka doesn’t it smell good? Third, Kafka

As you can see from the above, a true messaging middleware is more than just a queue. Especially when it carries a large number of business, its function integrity, throughput, stability, scalability have very demanding requirements. Kafka is a system designed specifically for messaging middleware.

The redis list has many shortcomings, but if you think about it, it can be summed up in two points:

The hot key problem cannot be solved, that is, the performance problem cannot be solved by adding machines; Data will be deleted: after rPOP, it’s gone, so you can’t satisfy multiple subscribers, you can’t re-consume, you can’t do ACK.

These two issues are at the heart of Kafka.

The inherent problem with hot keys is that the data is concentrated on one instance, so it’s a good idea to spread it across multiple machines. To this end, Kafka proposed the concept of partition. A queue (list in Redis) that corresponds to a topic in Kafka. Kafka divides a topic into multiple partitions, and each partition can be spread across different machines, thus spreading the stress of a single topic across multiple machines. Therefore, topic is more of a logical concept in Kafka, and the actual storage units are partitions.

The Redis list can do the same, but it requires additional logic in the business code. For example, you can create n lists, key1, key2… The client can push to a different key each time, and the consumer can rPOP the data from the list key1 to keyn at the same time. This creates a multi-partition kafka effect. So as you can see, a partition is a very naive concept for spreading requests across multiple machines.

Another big problem with Redis List is that RPOP deletes data, so kafka’s solution is simple, just don’t delete it. Kafka solves this problem with cursors.

Unlike A Redis list, kafka’s topic (actually a partion) is used to store data in a one-way queue, with new data appended to the end of the queue each time. It also maintains a cursor, starting from scratch, each time pointing to the subscript of the data to be consumed. Each time I consume one, I have cursor+1. In this way, Kafka can implement the same first-in, first-out semantics as Redis List, but kafka only updates the cursor each time, without deleting data.

There are so many benefits to this design, especially in terms of performance, that sequential writes have always been the only way to maximize disk bandwidth. But we’re going to focus on cursors and there are functional advantages to this design.

First, you can support an ACK mechanism for messages. Since messages are not deleted, the cursor can be updated only after the consumer clearly tells Kafka that the message was consumed successfully. This way, as long as Kafka persists the location of the cursor, even if the process fails to consume and crashes, it can still consume again when it recovers

Second, it can support group consumption:

Here we need to introduce the concept of a consumer group, which is very simple because a consumer group is essentially a set of cursors. Different consumer groups have their own cursors for the same topic. The cursor of the monitor group points to the second, the cursor of the BI group points to the fourth, and the trace group points to the 10000… Each consumer cursor is isolated from each other and does not affect each other.

By introducing the concept of consumer groups, it is very easy to support the simultaneous consumption of a topic by multiple businesses, that is, the so-called 1-N “broadcast”, where a message is broadcast to N subscribers.

Finally, it is also easy to re-consume via cursors. Because the cursor is only a record of the current consumption of which data, to re-consumption directly modify the value of the cursor can be. You can reset the cursor to whatever position you want, such as reset it to zero to start consuming again, or reset it to the end, which ignores all the existing data.

As you can see, Kafka’s data structure is a huge leap forward compared to Redis’ two-way linked list, not only in terms of performance, but also in terms of functionality.

Consider a simple architecture diagram for Kafka:

From the diagram, we can see that topic is a logical concept, not an entity. A topic contains multiple partitions that are distributed across multiple machines. This machine, in Kafka, is called a broker. One broker in the Kafka cluster corresponds to one instance in the Redis cluster. For a topic, there can be multiple different consumer groups consuming at the same time. Multiple instances of consumers within a consumer group can consume at the same time, which increases the rate of consumption.

However, it is important to note that a partition can only be consumed by one consumer instance in the consumer group. In other words, if there are multiple consumers in a consumer group, two consumers cannot consume a partition at the same time.

Why is that? In fact, Kafka provides sequential consumption semantics at the partition level. If multiple consumers consume a partition, even though Kafka itself distributes data sequentially, there is no guarantee that the consumer will receive data in the same order as Kafka, due to network latency and other conditions. The order in which the message reaches the consumer is not guaranteed. Therefore, a partition can only be consumed by one consumer. The cursor for each Consumer group in Kafka can be represented as a data structure like this:

{ “topic-foo”: { “groupA”: { “partition-0”: 0, “partition-1”: 123, “partition-2”: 78 }, “groupB”: { “partition-0”: 85, “partition-1”: 9991, “partition-2”: 772 }, } }

Looking at kafka’s macro architecture, you may be wondering how kafka’s consumption can be solved by moving the cursor without deleting data. Over time, the data is bound to fill up the disk. This relates to kafka’s retention mechanism, which is similar to expire in Redis.

The difference is that redis expires by key. If you set a 1 minute validity period for a redis list, after 1 minute redis deletes the entire list. Kafka’s expiration is for messages. It does not delete the entire topic(partition), only the expired messages in the partition are deleted. The good news is that Kafka’s partition is a one-way queue, so messages in the queue are produced in order. So every time a message expires, just delete it from scratch.

It may seem simple, but if you think about it, there are a lot of problems. For example, if all messages from topica-partition-0 are written to a file, such as topica-partition-0.log. To simplify the problem, if the producer produces a single line of messages in topica-partition-0.log, the file will soon be 200 gb. Now tell you, the first x line of this file is invalid, how should you delete? Very hard to do. It’s the same as asking you to delete the first n elements of an array. You need to move the next elements forward, which involves a lot of CPU copying. If the file is 10 megabytes, the deletion operation is very expensive, let alone 200 gb.

Therefore, Kafka performs another split when actually storing the partition. Data from topica-partition-0 is not written to a single file, but to multiple segment files. If the size of a segment file is set to 100 mbit/s, a new segment file will be created when the size reaches 100 mbit/s, and subsequent messages will be written to the newly created segment file, just like log file splitting in our business system. The advantage of this is that when all the messages in the segment are out of date, the entire file can be easily deleted. Since the messages in the segment are in order, it is only necessary to see if the last one is expired.

  1. Data lookup in Kafka

A partition for a topic is a logical array of segments, as shown in the figure below:

So the question is, if I reset the cursor to an arbitrary location, say message 2897, how do I read the data?

Based on the above file structure, you can see that we need to be sure of two things before we can read the corresponding data:

Which segment file message 2897 is in;

Where message 2897 is in the segment file.

To solve these two problems, Kafka has a very clever design. First, the segment file name is named after the offset of the first message in the file. Create a new segment file with the name 18234.log… create a new segment file with the name 18234.log… create a new segment file with the name 18234.log…

  • /kafka/topic/order_create/partition-0
    • 0.log
    • 18234.log #segment file
    • 39712.log
    • 54101.log

When we want to find out which segment the message offset x is in, we only need to do a binary search through the file name. For example, the message offset 2879 (the 2880th message) is obviously in the segment file 0.log.

Once the segment file is located, the other problem is finding the location of the message in the file, which is the offset. It would take an unacceptable amount of time to search from scratch! Kafka’s solution is to index files.

Kafka creates an index file for each segment file, just like mysql’s index file. Each record is a KV group, the key is the offset of the message, and the value is the offset of the message in the segment file:

offset position 0 0 1 124 2 336

Each segment file corresponds to an index file:

  • /kafka/topic/order_create/partition-0
    • 0.log

    • 0.index

    • 18234.log #segment file

    • 18234.index #index file

    • 39712.log

    • 39712.index

    • 54101.log

    • 54101.index

With an index file, we can get the exact location of a message and read it directly. Let’s go through the process again:

When querying the message offset is x

Use binary search to find this message at y.log

Read the y. lindex file to find the location in y.log of message x

The corresponding position of Y.log is read to obtain data

With this file organization, any message in Kafka can be read and retrieved very quickly. But this leads to another problem. If the number of messages is particularly large, adding a record to the index file for each message will waste a lot of space.

As a simple calculation, if a record in index is 16 bytes (offset 8 + Position 8), 100 million messages is 16*10^8 bytes =1.6G. For a slightly larger company, Kafka would collect far more than 100 million logs a day, perhaps tens or hundreds of times. In this case, the index file takes up a lot of storage. Therefore, Kafka chose to use “sparse indexes” as a trade-off.

The so-called sparse index means that not all messages will record its position in the index file. How many messages are recorded every interval? For example, one offset-position is recorded every interval of 10 messages:

offset position 0 0 10 1852 20 4518 30 6006 40 8756 50 10844

In this case, if we want to query the message offset x, we may not be able to find the exact location of the message, but we can use binary search, quickly determine the location of the nearest message, and then read a few more data to read the message we want.

For example, if we want to find the message offset 33, we can use binary search to locate the message offset 30, according to the table above, and then go to the corresponding log file to read back from that position three messages, the fourth one is 33. This approach is a trade-off between performance and storage space, and many systems are faced with a similar choice: sacrificing time for space or sacrificing space for time.

At this point, we should have a clearer picture of kafka’s overall architecture. But in the analysis above, I deliberately left out another very, very important point in Kafka, which is the design of the high availability aspect. Because it is obscure, it introduces a lot of the complexity of distributed theory that prevents us from understanding kafka’s basic model. In the following sections, we will focus on this topic.

  1. Kafka high availability

High availability (HA) is critical to the core systems of an enterprise. With the development of services, the size of clusters increases. However, faults occur in large-scale clusters and the hardware and network are unstable. When some nodes in the system cannot work properly for various reasons, the whole system can tolerate the failure and continue to provide services normally, which is called high availability. For stateful services, tolerating local failures is essentially tolerating data loss (not necessarily permanently, but at least for a period of time).

The simplest and only way for a system to tolerate the loss of data is to make a backup, so that the same data is copied to multiple machines, so called redundancy, or multiple copies. To this end, Kafka introduces the leader-follower concept. Each partition of a topic has a leader, and all reads and writes to that partition are done on the broker of that leader. Data on a partition is copied to other brokers that correspond to a partition called a follower:

When producing messages, the producer directly sends the messages to the partition leader. The partition leader writes the messages into its own log, and then waits for the followers to fetch the data for synchronization. The specific interaction is as follows:

The timing of the ack on the producer shown above is critical and directly related to the availability and reliability of the Kafka cluster.

If the data of the producer reaches the leader and is successfully written to the leader log, an ACK is performed

Advantages: No need to wait for data synchronization to complete, fast speed, high throughput rate, high availability; Disadvantages: If the leader fails before data synchronization is complete, data is lost, causing low reliability.

If the ACK is performed after all the followers have completed data synchronization, the advantages are as follows: When the leader fails, the followers also have complete data, which is highly reliable. Disadvantages: The synchronization of all followers is slow, resulting in poor performance, timeout of the producer, and low availability.

And the specific time to ack, for Kafka is configurable according to the actual application scenario.

In fact, Kafka real data synchronization process is very complex, this article is mainly to talk about some of the core principles of Kafka, data synchronization involved in a lot of technical details, SUCH as HW Epoch, not here one by one to expand. Finally, here’s a full picture of Kafka:

Finally, a brief summary of Kafka: By introducing the concept of partition, Kafka enables topics to be spread across multiple brokers to improve throughput. However, the cost of introducing multiple partitions is that there is no guarantee of global ordering of the topic dimension, and scenarios that require this feature can only use a single partition. Internally, each partition is stored in the way of multiple segment files. New messages are appended to the latest segment log file, and sparse indexes are used to record the location of messages in the log file, which facilitates quick reading of messages. When the data expires, you can simply delete the expired segment file. To achieve high availability, each partition has multiple copies, one of which is the leader and the other is the follower, spread across different brokers. All data written to and written to the partition is written on the leader broker. The followers only periodically pull data from the leader for synchronization. When the leader hangs up, the system will select the follower who keeps the synchronization with the leader as the new leader to continue to provide external services, greatly improving the availability. On the consumer side, Kafka introduces the concept of consumer groups. Each consumer group can consume a topic independently of the other, but a partition can only be consumed by a single consumer in the consumer group. By recording cursors, a consumer group can implement various features, such as ACK mechanism and repeated consumption. Almost all meta information is stored globally in ZooKeeper, except the actual message is recorded in the segment.

  1. The advantages and disadvantages

(1) Advantages: Kafka has many advantages

High performance: up to 100W TPS in single machine test;

Low delay: The delay of production and consumption is very low. The delay of E2E is also very low in a normal cluster.

High availability: Replicate + ISR + voting mechanism guarantee;

Mature tool chain: complete monitoring, operation and maintenance management solutions;

Ecological maturity: Kafka Stream is essential for big data scenarios.

(2) Deficiency

Inelastic capacity expansion: Both read and write data to a partition are stored on the broker of the partition leader. If the pressure on the broker is too high, a new broker cannot be added to solve the problem.

High capacity expansion cost: New brokers in the cluster can only process new topics. To share the pressure of old topic-partitions, partitions need to be manually migrated, which occupies a large amount of cluster bandwidth.

Rebalance all consumers as they enter and exit can cause the entire consumer group to rebalance. This will affect the consumption speed and increase latency.

Too many partitions can significantly degrade performance: ZK stress and too many partitions on the broker can degrade sequential disk writes to almost random writes.

After understanding kafka’s architecture, you can think about why it takes so much to scale kafka. This is essentially the same as redis cluster expansion! When a redis cluster has a hot key and an instance is not able to hold it, you can’t solve the problem by adding machines, because the hot key is still in a previous instance, and the newly added instance will not be able to divert it. Kafka is similarly scalable in two ways: adding a new machine (adding a broker) and adding a partition to a topic. Adding a partition to a topic is similar to adding a partition to a table in mysql. For example, the user order table is so large that it is divided into 1024 subtables by user ID user_order_{0.. 1023}, if it is not enough at the later stage, it will be more troublesome to increase the number of sub-tables. As the number of sub-tables increases, the hash value of user_id changes, making the old data unsearchable. So we had to shut it down, migrate data, and then go back online. Kafka adds a topic partition in the same way. For example, if a MSG contains a key, the producer must ensure that the same key is placed on the same partition. If the number of partitions increases, hash(hash(key) % parition_num) based on the key to obtain different results. Therefore, the same key cannot be saved to the same partition. It is also possible to implement a custom partitioner on the producer to ensure that the same key will fall on the same partition no matter how much the partition is expanded. However, this will leave no data on the newly added partition.

One problem you can see is that kafka’s core complexity is mostly in the storage area. How to fragment data, how to store it efficiently, how to read it efficiently, how to ensure consistency, how to recover from errors, how to expand and rebalance…

All of these deficiencies can be summed up in one word: Scalebility. A system that solves problems by adding machines is the ultimate goal. Pulsar claims to be a distributed messaging and streaming platform for the cloud-native era, so let’s take a look at what Pulsar is like. Fourth, the Pulsar

Kafka’s core complexity is its storage, high performance, high availability, low latency, support for rapid expansion of distributed storage is not just kafka needs, should be the common pursuit of all modern systems. At the bottom of the Apache project, there happens to be a system specifically built for log storage. It is called BooKeeper!

With a dedicated storage component, all that remains to implement a messaging system is how to use the storage system to implement the feature. Pulsar is one such “computation-storage separation” messaging system:

Pulsar uses BooKeeper as the storage service and the rest is the computing layer. This is actually a very popular architecture and a trend at the moment, and a lot of new types of storage are based on this “memory separation” architecture. For example, tiDB, the underlying storage is actually TIkV, the KV storage. Tidb is a higher computing layer that implements SQL-related functions itself. Another example is a lot of “persistent” Redis products, most of the underlying reliance on Rocksdb for KV storage, and then based on KV storage relationships to implement redis various data structures.

Broker in PulSAR is the same as a broker in Kafka, which is a running instance of PulSAR. Unlike Kafka, however, pulsar’s broker is a stateless service. It is an “API layer” that handles a large number of user requests, calls the BooKeeper interface to write data when a user message arrives, and looks up data from BooKeeper when a user wants to query a message. Of course the broker itself does a lot of caching and so on. The broker also relies on ZooKeeper to hold many metadata relationships.

Since brokers themselves are stateless, this layer can be scaled up very, very easily, especially in a K8S environment with a click of a mouse. As for message persistence, high availability, fault tolerance, and storage expansion, it’s all up to BooKeeper.

But just like the law of conservation of energy, the complexity of the system is conserved. The technical complexity required to achieve both high-performance and reliable storage doesn’t just disappear, it just moves from one place to another. Just as you write business logic and the product manager comes up with 20 different business scenarios, there are at least 20 if else’s. No matter what design pattern and architecture you use, those if else’s won’t be eliminated, they’ll just be moved from file to file and object to object. So those complexities are bound to be present in BooKeeper, and will be more complex than kafka’s storage implementation.

But one of the benefits of pulSAR’s arithmetic separation architecture is that we can learn PulSAR at a relatively clear boundary, called Concern Segregation. If you understand the semantics of the API that BooKeeper provides to the upper-layer brokers, you will be able to understand pulsar without understanding the internal implementation of BooKeeper.

The next question you can ask yourself is: since pulsar’s broker layer is a stateless service, is it possible to produce data from a topic within a broker at will?

This may seem like a good idea, but the answer is no — no. Why is that? If a producer can produce three messages a, B, and C on any one broker, how can we ensure that the messages are written to BooKeeper in the order a, B, and C? There is no guarantee that the order in which messages are written can only be ensured if messages a, B and C are sent to the same broker.

In this case, it seems to go back to the same problem as kafka. What if a topic is overloaded and a broker can’t handle it? So Pulsar, like Kafka, has a partition concept. A topic can be divided into multiple partitions. In order for messages to be in the same order within each partition, each partition must be produced by the same broker.

This may seem like the same as Kafka, with one broker per partition, but it is quite different. To ensure sequential writes to a partition, both Kafka and Pulsar require write requests to be sent to the partition’s corresponding broker, who ensures sequential writes. The difference, however, is that Kafka stores messages on the broker while Pulsar stores them on BooKeeper. If one of pulsar’s brokers dies, the partition can be switched to another broker immediately, as long as only one broker globally has write permission on topic-partition-x, which is essentially a transfer of ownership. There will be no data migration.

When a write request to a partition reaches the corresponding broker, the broker invokes the booKeeper interface for message storage. Like Kafka, pulsar has the concept of segments, and just like Kafka, pulsar stores by segments.

To make this clear, we have to introduce the concept of bookeeper called ledger. A ledger can be likened to a file on a file system, like in Kafka, written to a file called xxx.log. Pulsar stores the ledger in BooKeeper by segment.

Each node in a BooKeeper cluster is called a Bookie (why is a cluster instance in Kafka called a broker in BooKeeper called a Bookie… It doesn’t matter, it’s just a name, the author wrote a lot of code, still can’t make people happy to name). When instantiating a BooKeeper writer, we need to provide three parameters:

Node n: Indicates the number of Bookie in the BooKeeper cluster.

The number of copies m: Some ledger will be written into m of n bookie, so called m copies;

Confirm write number T: When writing data to ledger each time (concurrently writing to M Bookie), t acks must be received before the return is successful.

Bookeeper will do complex data synchronization for us based on these three parameters, so we don’t need to worry about duplicates or consistency. We just call the Append interface provided by BooKeeper and let it do the rest.

As shown in the figure above, parition is divided into multiple segments, and each segment is written into 3 of the 4 bookie. For example, segment1 is written to bookie1,2,4, segment2 is written to bookie1,3,4…

The segment of a partition in Kafka is distributed evenly across multiple storage nodes. What are the benefits? In Kafka, a partition writes to the same broker’s file system all the time. When the disk is insufficient, it needs to be expanded and migrated. As for Pulsar, since different segments in the partition can be saved on bookies with different BooKeeper, we can quickly add machines to solve the problem when the existing cluster bookie disk is insufficient due to a large number of writes. Let the new segment find the best bookie (the one with the most disk space left, the one with the lowest load, etc.) to write to, just remember the relationship between segments and bookies.

Since partitions are evenly distributed across nodes on BooKeeper as segments, it is very, very easy to expand storage capacity. This is also a demonstration of the advances that Pulsar has been claiming for a long time:

Brokers are stateless and can be expanded at will;

Partitions are distributed to the entire BooKeeper cluster by segment, and can be expanded easily without single point.

When a Bookie fails, because of the existence of multiple replicas, one of the other T-1 replicas can be randomly selected to read data to continuously provide external services and achieve high availability.

If you look at Pulsar after understanding kafka’s architecture, you will find that the core of Pulsar is the use of BooKeeper and some metadata storage. On the other hand, it is the proper separation of storage and computing architecture that helps us to separate concerns and learn quickly.

Consumption model

Another advanced design in Pulsar compared to Kafka is an abstraction of the consumption model, called Subscription. With this level of abstraction, users can support a wide variety of consumption scenarios. Compare this to Kafka, where there is only one consumption pattern, that is, one or more partitions for one consumer. If you want one partition for multiple consumers, this is not possible. Pulsar currently supports four consumption modes with Subscription:

The picture

We can view Pulsar Subscription as a Kafka consumer group, but subscription goes further by setting the consumption type of the “consumer group” :

Exclusive: one and only one consumer in the consumer group can consume, the others cannot connect to pulsar at all;

Failover: Each consumer in a consumer group can connect to the broker of each partition, but only one consumer can consume data. When this consumer collapses, another consumer is chosen to take over;

Shared: All consumers in a group can consume all partitions in a topic. Messages are distributed in a round-robin manner.

Key-shared: All consumers in a consumer group can consume all partitions in a topic, but messages with the same key are guaranteed to be sent to the same consumer.

These consumption models can meet a variety of business scenarios, and users can choose according to the actual situation. With this level of abstraction, Pulsar supports both the Queue consumption model, stream consumption model, and countless other consumption models (as long as someone mentions PR), which is what Pulsar calls a unified consumption model.

Underneath the consumer model abstraction are different cursor management logic. How to ack, how to move the cursor, how to quickly find the next MSG that needs to be retried… These are all technical details, but with this level of abstraction, you can hide the details and focus more on the application. Five, the separation of memory and computing architecture

In fact, the development of technology is always in a spiral, and many times you will find that the latest direction of development is back to the technology of 20 years ago.

Twenty years ago, due to the limitations of ordinary computer hardware devices, large amounts of data were stored through “cloud” centralized Storage such as Network Attached Storage (NAS). But this approach has many limitations, requiring dedicated hardware and the biggest problem is that it is difficult to scale up to accommodate massive data storage.

Database is also the main Oracle minicomputer – based scheme. However, with the development of the Internet and the increasing amount of data, Google later launched a distributed storage solution mainly based on ordinary computers. Any computer can serve as a storage node, and then make these nodes work together to form a larger storage system, which is HDFS.

However, the mobile Internet is increasing the amount of data, and the popularity of 4G and 5G has made users sensitive to delays. The need for reliable, fast, and scalable storage is becoming a necessity for enterprises. And as time goes on, the traffic concentration of Internet applications will become higher and higher, and the demand for this rigid demand of large enterprises will become more and more strong.

Therefore, reliable distributed storage as a kind of infrastructure is also constantly improving. They all have the same goal of allowing you to use them like Filesystem, with features such as high performance, high reliability and automatic error recovery. However, one problem we need to face is the limitation of CAP theory, linear consistency (C), availability (A), partition fault tolerance (P), all three can only satisfy both. Therefore, there is no perfect storage system, there are always some “shortcomings”. What we need to do is to select appropriate storage facilities to build upper-layer applications according to different business scenarios. This is the logic of Pulsar, the logic of NewSQL such as TIDB, and the basic logic of large distributed systems of the future, called “cloud native”.