What is Apache Kafka? The Software Explained, Simply

What is Apache Kafka? In short, it is a way of moving data between systems – for example, between applications, and servers.

It is often used to make multiple systems talk to each other smoothly: an intermediary between multiple data producers and consumers.

Despite this, it is not underpinned by a centralised process.

Rather, it is typically run as a cluster on one or more servers across multiple datacenters.

What is Apache Kafka Used for?

Kafka was originally used to underpin feeds of website activity (e.g. page views, searches, or other actions).

It was designed to handle high volume activity tracking.

It is also used for operational monitoring data, and to collect log files off servers and put them in a central place.

Many businesses deploy it to underpin an external commit-log for a distributed system (working as a re-syncing mechanism to restore data on failed nodes).

Who Created Apache Kafka?

Kafka was originally developed by LinkedIn.

It was open sourced in 2011, and graduated as a top-level Apache project in October 2012.

Read this: These 5 Developers ALONE Amended 2,516,983 Lines of ASF Code Last Year

Seven years later, it remains one of the Apache Software Foundation’s top five projects, alongside Hadoop, Lucene, POI, ZooKeeper.

Thousands of companies are built heavily on Kafka, from Netflix to Airbnb, via LinkedIn.

In the UK, it underpins real-time analytics and predictive maintenance for British Gas.

The Kafka micro-site lists six main distributions:

The Confluent Platform
The Cloudera Kafka distribution
The Stratio Kafka source for ubuntu , and for RHEL
IBM Event Streams, built on Apache Kafka
The Strimzi distribution
TIBCO Messaging

As Confluent puts it: “At its heart lies the humble, immutable commit log, and from there you can subscribe to it, and publish data to any number of systems or real-time applications.

“Unlike messaging queues, Kafka is a highly scalable, fault tolerant distributed system, allowing it to be deployed for applications like managing passenger and driver matching at Uber.”

It has four key APIs

The Producer API allows an application to publish a stream of records to one or more Kafka topics.
The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them.
The Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
The Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.

What is Apache Kafka? And What is It Used For?

One of its most popular uses now is for stream processing, or querying continuous data streams to detect changing conditions.

Many users of Kafka process data in pipelines consisting of multiple stages, where raw input data is consumed and then aggregated, enriched, or otherwise transformed into new “topics” for further follow-up processing.

Sign up for our weekly news round-up!

Sign up to the newsletter: In Brief

What is Apache Kafka Used for?

Who Created Apache Kafka?

What is Apache Kafka? And What is It Used For?

Sign up for our regular news round-up!

Sign up for our weekly news round-up!

Sign up to the newsletter: In Brief

I would also like to subscribe to:

Thank you for subscribing