Pub/sub messaging: Apache Kafka vs. Apache Pulsar

Apache Kafka set the bar for large-scale distributed messaging, but Apache Pulsar has some neat tricks of its own

These days, massively scalable pub/sub messaging is virtually synonymous with Apache Kafka. Apache Kafka continues to be the rock-solid, open-source, go-to choice for distributed streaming applications, whether you’re adding something like Apache Storm or Apache Spark for processing or using the processing tools provided by Apache Kafka itself. But Kafka isn’t the only game in town.

Developed by Yahoo and now an Apache Software Foundation project, Apache Pulsar is going for the crown of messaging that Apache Kafka has worn for many years. Apache Pulsar offers the potential of faster throughput and lower latency than Apache Kafka in many situations, along with a compatible API that allows developers to switch from Kafka to Pulsar with relative ease. 

How should one choose between the venerable stalwart Apache Kafka and the upstart Apache Pulsar? Let’s look at their core open source offerings and what the core maintainers’ enterprise editions bring to the table.

Apache Kafka

Developed by LinkedIn and released as open source back in 2011, Apache Kafka has spread far and wide, pretty much becoming the default choice for many when thinking about adding a service bus or pub/sub system to an architecture. Since Apache Kafka’s debut, the Kafka ecosystem has grown considerably, adding the Scheme Registry to enforce schemas in Apache Kafka messaging, Kafka Connect for easy streaming from other data sources such as databases to Kafka, Kafka Streams for distributed stream processing, and most recently KSQL for performing SQL-like querying over Kafka topics. (A topic in Kafka is the name for a particular channel.)

To continue reading this article register now