May 10, 2025

What Is Apache Kafka?

A beginner-friendly introduction to Apache Kafka — distributed event streaming, producers, consumers, topics, and partitions explained with diagrams.

What Is Apache Kafka?

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Core Concepts

Topics & Partitions

A topic is a category or feed name to which records are published. Topics are split into partitions for parallelism and fault tolerance. Each partition is an ordered, immutable sequence of records.

Concept	Description
Topic	Named channel for a stream of records
Partition	Ordered log within a topic; unit of parallelism
Offset	Unique sequential ID for each record within a partition
Replication	Copies of partitions across brokers for fault tolerance

Producers & Consumers

Producers publish records to topics. They choose which partition to write to (round-robin, key-based hashing, or custom).
Consumers read records from topics. They belong to consumer groups — each partition is consumed by exactly one consumer in a group.

How Kafka Works — Message Flow

Rendering diagram…

Consumer Groups & Partition Assignment

When multiple consumers form a group, Kafka balances partitions across them:

Rendering diagram…

If Consumer 1 fails, Kafka rebalances — Consumer 2 takes over all three partitions until a replacement joins.

Key Properties

Durability — records are persisted to disk and replicated across brokers.
Ordering — guaranteed within a partition (not across partitions).
At-least-once delivery — consumers may see duplicates after a crash; use idempotent consumers for exactly-once semantics.
Horizontal scalability — add brokers and partitions to increase throughput.

When to Use Kafka

Event sourcing — store every state change as an immutable event.
Stream processing — real-time analytics with Kafka Streams or Flink.
Data integration — bridge between microservices, databases, and data lakes.
Log aggregation — centralise logs from distributed services.

What Is Apache Kafka?

What Is Apache Kafka?

Core Concepts

Topics & Partitions

Producers & Consumers

How Kafka Works — Message Flow

Consumer Groups & Partition Assignment

Key Properties

When to Use Kafka

Further Reading