What Is Apache Kafka?

A beginner-friendly introduction to Apache Kafka — distributed event streaming, producers, consumers, topics, and partitions explained with diagrams.


What Is Apache Kafka?

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.


Core Concepts

Topics & Partitions

A topic is a category or feed name to which records are published. Topics are split into partitions for parallelism and fault tolerance. Each partition is an ordered, immutable sequence of records.

ConceptDescription
TopicNamed channel for a stream of records
PartitionOrdered log within a topic; unit of parallelism
OffsetUnique sequential ID for each record within a partition
ReplicationCopies of partitions across brokers for fault tolerance

Producers & Consumers

  • Producers publish records to topics. They choose which partition to write to (round-robin, key-based hashing, or custom).
  • Consumers read records from topics. They belong to consumer groups — each partition is consumed by exactly one consumer in a group.

How Kafka Works — Message Flow

Rendering diagram…

Consumer Groups & Partition Assignment

When multiple consumers form a group, Kafka balances partitions across them:

Rendering diagram…

If Consumer 1 fails, Kafka rebalances — Consumer 2 takes over all three partitions until a replacement joins.


Key Properties

  • Durability — records are persisted to disk and replicated across brokers.
  • Ordering — guaranteed within a partition (not across partitions).
  • At-least-once delivery — consumers may see duplicates after a crash; use idempotent consumers for exactly-once semantics.
  • Horizontal scalability — add brokers and partitions to increase throughput.

When to Use Kafka

  • Event sourcing — store every state change as an immutable event.
  • Stream processing — real-time analytics with Kafka Streams or Flink.
  • Data integration — bridge between microservices, databases, and data lakes.
  • Log aggregation — centralise logs from distributed services.

Further Reading