Posted on: June 24, 2025 Posted by: rahulgite Comments: 0

Kafka Architecture Overview

Main Components:

  • Producer: Generates and sends messages to Kafka topics.
  • Consumer: Retrieves messages from Kafka topics.
  • Cluster: Composed of multiple brokers for scalability and fault tolerance.
  • Zookeeper: Coordinates Kafka brokers, handles metadata, and leader election.

Hierarchical Structure:

Kafka Cluster
  ├── Brokers
        ├── Topics
              ├── Partitions
                    ├── Offsets (Each holds a message)

Key Terminologies

1. Topic

  • Logical feed name or category to which messages are sent.
  • Analogy: Like a folder where each message is a file.
  • Supports multiple producers and consumers.
  • Messages persist for a configurable TTL (Time-To-Live), not deleted after consumption.

2. Partition

  • Topics are split into partitions (like subfolders).
  • Messages are written and stored in a strict order.
  • Each message has a unique offset (sequential ID).
  • Parallel Consumption: Different consumers can read from different partitions.
  • Replication:
    • Implemented at partition level.
    • Each partition has:
      • Leader: Handles all reads/writes.
      • Followers: Replicate data from leader.
    • If leader fails, a follower takes over.

3. Offset

  • Unique identifier for each message within a partition.
  • Messages are read in sequential order by offset.
  • Helps consumers track what they have read.

4. Broker (Kafka Server/Node)

  • Kafka server that handles read/write operations.
  • Stores data to disk.
  • Enables load balancing and fault tolerance through clustering.
  • Single broker lacks replication/fault-tolerance capability.

5. Kafka Cluster

  • Group of brokers, topics, and partitions.
  • Provides scalability, redundancy, and fault tolerance.

Producer Internals

  • Sends data to Kafka topics.
  • Leader Discovery: Identifies partition leader before sending.
  • Partition Assignment:
    • Uses key hashing to assign messages to partitions.
    • Sequentially appends messages to offsets in a partition.
  • Tip: Avoid using same key for all messages to prevent partition imbalance.

Consumer Internals

  • Pulls messages from Kafka topics.
  • Offset Management:
    • Maintains read offset per consumer.
    • Helps avoid duplication or data loss.
  • Consumer Group:
    • Consumers with same group ID form a group.
    • One partition per consumer within a group.
  • Parallel Consumption: Multiple consumers can read different partitions.
  • Pull Model: Consumers actively pull data (Kafka doesn’t push).
  • Resilience: Consumers can reset offset to reprocess messages.

Zookeeper in Kafka

  • Manages broker metadata and coordinates the Kafka cluster.
  • Functions:
    • Broker registration and de-registration.
    • Leader election for partitions.
    • Failure notifications to producers/consumers.
  • Requirement: Kafka cannot function without Zookeeper.
  • Deployment: Should run with an odd number (e.g., 3) Zookeeper nodes.
  • Note: End users don’t interact directly with Zookeeper.

Illustrative Diagram of Kafka Architecture

           +------------------+
           |     Zookeeper    |
           +------------------+
                    ↑
        +-----------+-----------+
        |           |           |
  +------------+ +------------+ +------------+
  |  Broker 1  | |  Broker 2  | |  Broker 3  |
  +------------+ +------------+ +------------+
       ↑               ↑              ↑
    Topic A        Topic A        Topic B
   Partition 0    Partition 1    Partition 0

Producer ---> Broker (writes to topic partition)
Consumer ---> Broker (reads from topic partition)

Analogies for Understanding

  • Topic as Folder → Messages as Files
  • Partition as Subfolder → Stores files (messages) in order
  • Offset as Line Number → Helps reader (consumer) track progress
  • Broker as Post Office → Receives and stores mail (messages)
  • Producer as Sender → Sends letters (data)
  • Consumer as Receiver → Picks up letters from mailbox (broker)
  • Zookeeper as City Coordinator → Keeps post offices running smoothly

Interview Tips

  • Always describe data flow: Producer → Broker → Consumer.
  • Emphasize offset management for data reliability.
  • Highlight partitioning for scalability.
  • Understand and explain consumer groups and replication clearly.
  • Know the role of Zookeeper and upcoming KRaft mode (if asked about modern Kafka versions).

These concepts form the backbone of any Kafka-based distributed messaging architecture and are crucial for interviews in backend or data engineering roles.

Leave a Comment