Fantai geek guide:

Your IT peers in the financial world will be familiar with Kafka, but may not have heard much about Pulsar, a top-notch open source project managed by the Apache Foundation. One of its core technologies, BookKeeper, has existed since 2011 as a subproject of ZooKeeper, developed by Yahoo and contributed by the Yahoo Beijing Research team. Pulsar’s core technology master is also yahoo north research’s teammate, we know each other.

In my opinion, Pulsar is very valuable for financial business scenarios. (1) Real-time, reliable and persistent data replication across domains — real-time synchronization across network segments and across computer rooms has always been a critical requirement of the securities industry. These capabilities that are most difficult to implement for trading applications may have a more reliable mechanism to implement. Kafka was born before the concept of “cloud native” came into being, and Pulsar was a late mover in this area. The friendliness of containerization and container choreography was also an attraction for us.

At Fantai Geek, we use NATS and Kafka for our middleware, but we are actively looking for scenarios for Pulsar and believe we will find interesting applications for Pulsar in the securities industry and recommend them to our peers.

— Liang Qihong, co-founder of Fantai Geek


Here are 7 reasons to choose Pulsar over Kafka.

  1. A combination of streaming and queuing

Pulsar is like a two-in-one product that not only handles high-rate real-time scenarios like Kafka, but also supports standard message queuing patterns such as multi-consumer, fail-backup subscriptions, and message fanning. Pulsar automatically tracks the client’s read location and stores this information in a high-performance distributed Ledger (BookKeeper).

Unlike Kafka, Pulsar has the functionality of a traditional message queue (such as RabbitMQ), so you only need to run a Pulsar system to process both a live stream and a message queue.

  1. Partitioning is supported, but not required

If you’ve ever used Kafka, you know how partitioning works. All topics in Kafka are partitioned, which increases throughput. By partitioning and then dividing into different brokers, the processing rate of a single topic can be greatly increased. But what about topics that don’t require too much processing? In such cases, wouldn’t it be nice to forget about partitioning and avoid the API and administration effort that comes with it?

Pulsar can do that. If you only need one topic, you can use one topic without using partitions. If you need to maintain the processing rate of multiple consumer instances and do not need to use partitions, Pulsar’s shared subscription can do this.

Pulsar can also support partitioning if it is needed to further improve performance.

  1. Logs are good, but ledger is better

The Kafka development team anticipated the importance of logging for a real-time data exchange system. Logs are written to the system by means of apend, which is fast. The data in the log is serial and can be read quickly in the order in which it is written. Serial reads and writes are faster than random reads and writes. The interaction aspect of persistent storage is a bottleneck for any system that provides data assurance, and log abstraction maximizes efficiency in this area.

Logging is great, but it can also cause some problems when the amount of data is too large, and keeping all the logs on a single server has become a challenge. What happens after logs fill the server storage? How do I expand capacity? Or what if the server where the logs are stored goes down and you need to create a new server from the replica? Copying logs from one server to another takes a long time, especially if you want to keep your system’s data in real time.

Pulsar segments logs to avoid copying large chunks of logs. With BookKeeper, Pulsar segments logs across multiple servers. That is, logs are not stored on a single server, and no single server is a bottleneck for the entire system. This simplifies troubleshooting and capacity expansion. You only need to add new servers without rebalancing.

  1. stateless

The favorite thing for cloud native application developers is statelessness. Stateless components are fast to start, replaceable, and seamless to expand. Wouldn’t it be nice if message-oriented middleware was also stateless?

Kafka is not stateless. Each broker contains all the logs for the partition. If a broker fails, not all brokers can take over. If the workload is too high, new brokers cannot be added at will to share it, but must synchronize state with brokers that hold copies of their partitions.

In the Pulsar architecture, the broker is stateless. However, a completely stateless system cannot persist messages, so Pulsar does not rely on the broker for message persistence. In the Pulsar architecture, data distribution and preservation are independent of each other. The Broker receives data from the producer and sends it to the consumer, but the data is stored in BookKeeper.

Pulsar’s broker is stateless, so if the workload is high, new brokers can be added directly to quickly take over the workload.

  1. Simple cross-domain replication

Cross-domain replication is Pulsar’s specialty. Pulsar was designed with this feature in mind and is easy to configure. Pulsar can be used for both globally distributed applications and disaster recovery solutions.

  1. Steady performance

Benchmark (openmessaging. Cloud/docs/benchm… You can provide high throughput while maintaining low latency.

  1. Fully open source

Pulsar provides many features similar to Kafka, such as cross-domain replication, streaming message processing (Pulsar Functions), connectors (Pulsar IO), SQL-based topic queries (Pulsar SQL), Schema Registry, There are also some features that Kafka does not have, such as tiered storage and multi-tenancy. Even better, these features are open source.

conclusion

Above, there are many reasons why we chose Pulsar to build the messaging infrastructure service. In addition to the above reasons, Pulsar’s other features bring many benefits, such as multi-tenancy, namespaces, authentication and authorization, documentation, friendly support for Kubernetes, and more.

英文原文 :

Kafkaesque. IO / 7 – having – w…

This article was created by the wechat official account “AI Front” (ID: AI-front), and shall not be reproduced without authorization.