Apache Kafka Architecture and Key Terminologies – Interview Notes

Posted on: June 24, 2025 Posted by: rahulgite Comments: 0

Kafka Architecture Overview

Main Components:

Producer: Generates and sends messages to Kafka topics.
Consumer: Retrieves messages from Kafka topics.
Cluster: Composed of multiple brokers for scalability and fault tolerance.
Zookeeper: Coordinates Kafka brokers, handles metadata, and leader election.

Hierarchical Structure:

Kafka Cluster
  ├── Brokers
        ├── Topics
              ├── Partitions
                    ├── Offsets (Each holds a message)

Key Terminologies

1. Topic

Logical feed name or category to which messages are sent.
Analogy: Like a folder where each message is a file.
Supports multiple producers and consumers.
Messages persist for a configurable TTL (Time-To-Live), not deleted after consumption.

2. Partition

Topics are split into partitions (like subfolders).
Messages are written and stored in a strict order.
Each message has a unique offset (sequential ID).
Parallel Consumption: Different consumers can read from different partitions.
Replication:
- Implemented at partition level.
- Each partition has:
  - Leader: Handles all reads/writes.
  - Followers: Replicate data from leader.
- If leader fails, a follower takes over.

3. Offset

Unique identifier for each message within a partition.
Messages are read in sequential order by offset.
Helps consumers track what they have read.

4. Broker (Kafka Server/Node)

Kafka server that handles read/write operations.
Stores data to disk.
Enables load balancing and fault tolerance through clustering.
Single broker lacks replication/fault-tolerance capability.

5. Kafka Cluster

Group of brokers, topics, and partitions.
Provides scalability, redundancy, and fault tolerance.

Producer Internals

Sends data to Kafka topics.
Leader Discovery: Identifies partition leader before sending.
Partition Assignment:
- Uses key hashing to assign messages to partitions.
- Sequentially appends messages to offsets in a partition.
Tip: Avoid using same key for all messages to prevent partition imbalance.

Consumer Internals

Pulls messages from Kafka topics.
Offset Management:
- Maintains read offset per consumer.
- Helps avoid duplication or data loss.
Consumer Group:
- Consumers with same group ID form a group.
- One partition per consumer within a group.
Parallel Consumption: Multiple consumers can read different partitions.
Pull Model: Consumers actively pull data (Kafka doesn’t push).
Resilience: Consumers can reset offset to reprocess messages.

Zookeeper in Kafka

Manages broker metadata and coordinates the Kafka cluster.
Functions:
- Broker registration and de-registration.
- Leader election for partitions.
- Failure notifications to producers/consumers.
Requirement: Kafka cannot function without Zookeeper.
Deployment: Should run with an odd number (e.g., 3) Zookeeper nodes.
Note: End users don’t interact directly with Zookeeper.

Illustrative Diagram of Kafka Architecture

           +------------------+
           |     Zookeeper    |
           +------------------+
                    ↑
        +-----------+-----------+
        |           |           |
  +------------+ +------------+ +------------+
  |  Broker 1  | |  Broker 2  | |  Broker 3  |
  +------------+ +------------+ +------------+
       ↑               ↑              ↑
    Topic A        Topic A        Topic B
   Partition 0    Partition 1    Partition 0

Producer ---> Broker (writes to topic partition)
Consumer ---> Broker (reads from topic partition)

Analogies for Understanding

Topic as Folder → Messages as Files
Partition as Subfolder → Stores files (messages) in order
Offset as Line Number → Helps reader (consumer) track progress
Broker as Post Office → Receives and stores mail (messages)
Producer as Sender → Sends letters (data)
Consumer as Receiver → Picks up letters from mailbox (broker)
Zookeeper as City Coordinator → Keeps post offices running smoothly

Interview Tips

Always describe data flow: Producer → Broker → Consumer.
Emphasize offset management for data reliability.
Highlight partitioning for scalability.
Understand and explain consumer groups and replication clearly.
Know the role of Zookeeper and upcoming KRaft mode (if asked about modern Kafka versions).

These concepts form the backbone of any Kafka-based distributed messaging architecture and are crucial for interviews in backend or data engineering roles.

Kafka