1. Kafka Fault Tolerance: Why is Kafka Fault-Tolerant?
Kafka is fault-tolerant due to:
- Replication: Each Kafka topic can be configured with multiple replicas across brokers. If one broker fails, another replica takes over.
- Acknowledgements: Kafka supports acknowledgments at the producer level (
acks=all) to ensure data is committed to all replicas before considering a message as written. - Log Segmentation and Disk-Based Storage: Kafka writes messages to disk, allowing recovery after crash/restarts.
- Zookeeper Coordination (in older versions): Manages broker metadata and leader election.
Example: In a 3-broker cluster with replication factor 3, if Broker 1 goes down, messages are still accessible from Broker 2 or 3.
Diagram:
Producer → Topic Partition (Leader) → Broker 1
↘ Replica → Broker 2
↘ Replica → Broker 3
2. Kafka vs. Other Messaging Systems (ActiveMQ, RabbitMQ)
| Feature | Kafka | ActiveMQ | RabbitMQ |
|---|---|---|---|
| Creator | LinkedIn (Apache) | LogicBlaze (Apache) | Pivotal Software |
| Language | Java, Scala | Java | Erlang |
| Messaging Type | Pull (dumb broker, smart consumer) | Push | Push |
| Throughput | Very High (~1M msg/sec) | Moderate | Moderate |
| Message Retention | Configurable retention (e.g., 7 days) | Deleted after consumption | Deleted after consumption |
| Order Guarantee | Within partition | No | No |
| Scalability | High (horizontal scaling) | Limited | Limited |
| Durability | High (disk storage) | Medium | Medium |
| Use Case | Logging, analytics, stream processing | Lightweight queues | Job queues, RPC |
Interview Tip: Emphasize Kafka’s scalability, retention, and message replay features.
Example: Kafka is used in Netflix and LinkedIn for massive-scale real-time analytics, while RabbitMQ is used in lightweight microservices to decouple services.
3. Kafka Message Flow and Debugging
When something breaks, how do you track the message?
- Use Consumer Offsets: Check committed offsets to see if a consumer has processed the message.
- Monitor Lag: Kafka consumer lag monitoring tools (like Burrow, Kafka Exporter) help determine how far behind a consumer is.
- Dead Letter Queue (DLQ): Capture failed messages for further analysis.
- Logging with Correlation IDs: Helps trace a message across systems.
Example:
If a consumer isn’t receiving data:
- Check Kafka offset vs. current log end offset.
- Validate topic-partition assignment.
- Review logs with correlation IDs.
4. Kafka Message Visibility and Management
With multiple subscribers, how do you track messages?
- Kafka does not delete messages after consumption – consumers manage their own offsets.
- Logging & Tracing: Include a unique message ID or correlation ID in each message.
- Consumer Group IDs: Each group gets its own offset tracking.
- Kafka Monitoring Tools:
- Confluent Control Center
- Burrow
- Kafka Tool
Analogy: Think of Kafka like a recorded TV show: the recording (messages) is available for everyone (consumers) to watch at their own pace.
5. Kafka Evolution and Capabilities
- Originally created by LinkedIn in 2011 as a publish-subscribe log-based messaging system.
- Later donated to Apache; evolved into a full-fledged streaming platform.
- Supports:
- High-throughput ingestion
- Long-term message retention
- Stream processing using Kafka Streams API and ksqlDB
Kafka is not just a queue – it’s a durable distributed log.
6. Kafka Architecture Overview (Diagram)
Key Components:
- Producer: Sends data to Kafka topics
- Broker: Kafka server that stores data
- Topic: Logical channel for messages
- Partition: Subdivision of topics (for parallelism)
- Consumer: Reads data from topics
- Consumer Group: Set of consumers sharing a topic’s partitions
- Zookeeper (for older versions): Coordinates metadata, leader election
Producer → Topic (Partition 0, 1, 2) → Broker Cluster
↘ ↘
Consumer Group A, B
7. Additional Interview Questions & Answers
Q1. How does Kafka achieve scalability?
A: Through partitioning of topics and distributing them across multiple brokers. Producers and consumers can work in parallel over partitions.
Q2. How is message order maintained in Kafka?
A: Kafka guarantees order within a partition. Use the same key for ordering.
Q3. What happens if a Kafka consumer fails?
A: Kafka will reassign the partition to another consumer in the group. Offsets ensure processing resumes from last successful point.
Q4. What is a retention policy in Kafka?
A: Kafka allows messages to be retained for a configurable duration (e.g., 7 days) or by size. This enables message replay.
Q5. What is idempotency in Kafka producer?
A: Kafka allows producers to set enable.idempotence=true to avoid duplicate writes in case of retries.
Q6. What is a Kafka topic?
A: A logical channel to which producers send data and consumers subscribe to read from.
Q7. What’s the role of Zookeeper in Kafka?
A: In Kafka versions <2.8, Zookeeper manages metadata, leader election. Newer versions are moving to KRaft (Kafka’s built-in consensus mechanism).
8. Summary Table for Quick Recap
| Feature | Kafka | ActiveMQ | RabbitMQ |
|---|---|---|---|
| Message Retention | Configurable | Deleted after consume | Deleted after consume |
| Pull/Push | Pull | Push | Push |
| Message Order | Guaranteed in partition | Not guaranteed | Not guaranteed |
| Durability | High | Medium | Medium |
| Scalability | High | Low | Low |
| Language | Java/Scala | Java | Erlang |
| Throughput | Very High | Moderate | Moderate |