Translated by Ben Lorica and Jesse Anderson: StreamNative-Sijia

Multi-tier architecture, scalability, multi-tenancy, and persistence are just some of the reasons why many companies choose Pulsar.

To learn more about Apache Kafka, Apache Pulsar, Apache Spark, and other Data technologies, please attend the Strata Data Conference in New York City, September 23-26, 2019, They were presented in the seminar “Data Engineering & Architecture”.

The enterprise is generating data across a growing number of systems and devices, and messaging and event flow solutions (especially Apache Kafka) are becoming widely available. Over the past year, we have been tracking the progress of Apache Pulsar (Pulsar). While Pulsar is a late comer, it is a powerful solution. Developed and open sourced by Yahoo, Pulsar is designed to intelligently process, analyze, and deliver data from an ever-expanding array of services and applications, making it a perfect fit for modern data platforms. Pulsar is also designed to reduce the operational and maintenance burden associated with complex distributed systems.

Who else is interested in Pulsar? Karthik Ramasamy, CEO of Streamlio, shared the geostatistics of recent visitors to Pulsar’s home page:

Of the several thousand visitors, 33 percent came from the Americas, 36 percent from the Asia-Pacific region and 27 percent from Europe, the Middle East and Africa.

Although Apache Kafka is by far the most popular publish/subscribe solution, over the past year we have seen a number of companies using Pulsar. Several Pulsar features have proven to be valued by these companies, including:

  • Multi-tier architecture: Consists of a service layer (broker coordinating message reception, storage, processing, and delivery), a storage layer (persisting messages using Apache BookKeeper nodes), and a processing layer (through Pulsar functions or Pulsar SQL).
  • High performance and scalability: Pulsar has been in use at Yahoo for years, handling 100 billion messages per day on more than 2 million topics. It can support millions of topics while maintaining high throughput and low latency performance.
  • Easy to add storage or services without rebalancing the entire cluster: A multi-tier architecture allows storage to be added independently and also allows service and storage layers to be extended without downtime.
  • Support for common messaging models, including publish/subscribe messages and message queues.
  • Multi-tenant: A single Pulsar cluster can support the needs of the entire enterprise and allow each team to have its own namespace and capacity.
  • Persistence (No data lost) : Data is replicated and synchronized to disk.
  • Replication across territories: Native support for applications that are distributed across territories. Pulsar supports multiple modes for easy replication of data between clusters.

While the previous generation of messaging systems focused on mobile data, emerging frameworks (such as Pulsar) add data-processing capabilities that are critical for providing data to analytics and AI applications. The rise of connected devices, the advent of 5G, machine learning and the growing importance of AI all require companies to build infrastructure for capturing, processing and moving data streams. Enterprises will also increasingly need to perform these tasks in (near) real time. The good news is that key components for data management, processing, transmission, and scheduling continue to improve, and automation should reduce operational burdens.

Want to keep up to date with Pulsar’s development, user stories and hot topics? Follow the Apache Pulsar and StreamNative wechat accounts for the first time to share everything about Pulsar.

The original link: www.oreilly.com/ideas/one-s…