Posted on: June 24, 2025 Posted by: rahulgite Comments: 0

What is Apache Kafka?

Apache Kafka is an open-source, distributed event streaming platform developed by LinkedIn and later open-sourced in 2011. It is primarily used for building real-time data pipelines and streaming applications. Kafka uses a publish-subscribe model and provides a durable messaging system that is highly scalable and fault-tolerant.

Key Characteristics:

  • Asynchronous Communication: Multiple producers can send data to a topic while multiple consumers subscribe and consume data independently.
  • Durability: Messages are stored on disk with a configurable time-to-live (TTL), allowing multiple consumers to process them independently.
  • Ordering and Fault Tolerance: Kafka preserves the order of messages within a partition and replicates data across brokers.

Real-World Example:

In an e-commerce application:

  • A payment service produces transaction records.
  • A fraud detection service and a notification service consume the same topic to take appropriate actions (e.g., alerting or flagging transactions).

Kafka Architecture Overview

Kafka’s architecture revolves around a clustered environment composed of the following components:

1. Kafka Cluster

  • A Kafka cluster consists of multiple Kafka Brokers (servers) managed centrally.

2. Kafka Broker

  • A broker is responsible for storing data, handling producer and consumer requests, and replicating messages.
  • One broker acts as the leader for a partition, while others act as followers.

3. Topics

  • A topic is a logical channel to which records are published.
  • Topics are split into multiple partitions to enable scalability.

4. Partitions

  • Each partition is a sequence of records ordered and immutable.
  • Kafka replicates partitions to ensure data availability.

Diagram: Kafka Topic Partitioning

              +-------------+     +-------------+
Producer -->  | Partition 0 | <-- |   Consumer  |
              +-------------+     +-------------+
Producer -->  | Partition 1 | <-- |   Consumer  |
              +-------------+     +-------------+

5. Offset

  • An offset is a unique ID for each message in a partition.
  • Kafka uses offsets to track how much of the log a consumer has read.

6. Producers

  • Producers create a ProducerRecord specifying:
    • Topic (mandatory)
    • Message content (mandatory)
    • Partition, Key, Headers (optional)

7. Consumers

  • Consumers subscribe to topics, process messages, and commit offsets.
  • They may belong to Consumer Groups to distribute load.

8. Zookeeper

  • Coordinates Kafka brokers
  • Manages:
    • Leader election
    • Metadata (topics, partitions)
    • Broker registration

Why Use Kafka?

1. Scalability

Kafka scales horizontally via more brokers and partitions.

2. Durability

Messages remain even after consumption (until TTL), allowing for multi-subscriber models.

3. Real-Time Processing

Kafka enables low-latency streaming due to fast disk-based log reads and offset management.

4. High Throughput

Kafka can handle millions of messages per second, making it ideal for high-traffic scenarios.

5. Retention Policy

Kafka retains data for a configurable period, ensuring availability even during consumer outages.

6. Dynamic Configuration

Topics and partitions can be dynamically updated.

7. Open Source

Strong community support and wide adoption by companies like LinkedIn, Netflix, Uber, etc.


Role of Zookeeper in Kafka

Zookeeper acts as a central coordinator for the Kafka cluster.

Responsibilities:

  • Leader Election: Chooses a broker as partition leader in case of failure.
  • Broker Registration: Maintains list of active brokers.
  • Metadata Management: Maintains topic-partition mappings.
  • Consumer Group Management: Tracks offsets for recovery.
  • Health Monitoring: Checks broker status and triggers elections when needed.

Kafka Without Zookeeper

Kafka Evolution:

  • Prior to Kafka 2.8.0, Zookeeper was mandatory.
  • Post 2.8.0, KIP-500 initiative allows running Kafka without Zookeeper.

Benefits:

  • Simplified architecture
  • Fewer moving parts
  • Improved performance and reliability

Current Status:

  • Kafka 4.x is expected to be fully Zookeeper-free, but this is still in development and not yet production-ready.

Diagram: Kafka Evolution Without Zookeeper

[Old Architecture]         [New Architecture]
Kafka <--> Zookeeper   =>   Kafka (Self-managed metadata)

Summary Table

ComponentDescription
ProducerPublishes messages to topics
ConsumerSubscribes to topics and processes messages
BrokerKafka server handling requests and data storage
TopicLogical channel for messages
PartitionUnit of parallelism and ordering within a topic
OffsetUnique ID per message per partition
ZookeeperCluster coordinator (until 2.8)

Leave a Comment