1. Basic Kafka Concepts
What is Apache Kafka?
Apache Kafka is an open-source distributed event streaming platform used to build real-time data pipelines and streaming applications. It’s designed for high-throughput, fault-tolerant, and scalable messaging.
Example Use Case:
- E-commerce: Track real-time user activity and update recommendation systems.
- Banking: Stream transaction logs for fraud detection.
Kafka Topics
A topic is a category/feed name to which records are published. Topics are always multi-subscriber.
Example:
- A topic named
user-signup-eventscould store messages whenever a user signs up.
Diagram:
Producer -> [Kafka Topic: user-signup-events] -> Consumer
Kafka Partition
Each topic is split into partitions, which are ordered, immutable sequences of records.
Key Points:
- Each partition is an append-only log.
- Partitions enable parallelism.
Diagram:
Topic: user-activity └── Partition 0: [msg1, msg2, msg3] └── Partition 1: [msg4, msg5, msg6]
Offset
Each message in a partition is assigned a unique offset, which acts as a pointer.
Example:
Partition 0: [0: Login, 1: Logout, 2: Purchase]
- Consumers use offsets to keep track of read messages.
Key Components of Kafka
- Producer: Sends records to Kafka topics.
- Consumer: Reads records from topics.
- Topic: Logical channel to organize data.
- Partition: Enables load distribution.
- Broker: Kafka server that stores data.
- ZooKeeper: Coordinates brokers in the cluster.
Kafka Broker
A Kafka broker is a server that receives messages from producers, stores them on disk, and serves them to consumers.
Example:
Cluster with 3 brokers can support distributed topics for higher availability.
Consumer Group
A consumer group is a group of consumers that together read messages from a topic in parallel.
Example:
Consumer Group A: ├── Consumer 1 -> Partition 0 └── Consumer 2 -> Partition 1
2. Kafka Architecture
Kafka Architecture Overview
Kafka uses a publish-subscribe model where producers publish messages to topics, and consumers subscribe to those topics.
Diagram:
[Producer] -> [Kafka Broker] -> [Partitioned Topics] -> [Consumers]
Components Recap:
- Topics
- Producers
- Consumers
- Brokers
- ZooKeeper
Kafka Communication
Kafka uses TCP for communication.
- Producers push data to brokers.
- Consumers pull data from brokers.
Serialization: Avro, JSON, Protobuf for messages.
Kafka Cluster Benefits
- Horizontal scalability
- High availability
- Load balancing
- Fault tolerance through partition replication
3. Kafka Functionality
Fault Tolerance in Kafka
- Achieved via replication.
- Each partition has multiple replicas across brokers.
High Availability
- Replication
- Leader election
- Brokers can be added without downtime
Diagram:
Partition 0 (Leader: Broker 1, Replica: Broker 2)
Exactly-Once Semantics
Kafka 0.11+ provides exactly-once delivery using:
- Idempotent producer
- Transactions API
Message Ordering
- Kafka guarantees ordering within a partition.
Example:
Partition 0: [1: Start, 2: Process, 3: End]
Delivery Semantics
- At Most Once: Message may be lost
- At Least Once: Message may be duplicated
- Exactly Once: No loss or duplication
Message Timestamps
- CreateTime: Time at which producer created the message.
- LogAppendTime: Time at which message was appended by the broker.
4. Kafka and ZooKeeper
Role of ZooKeeper
- Manages broker metadata
- Handles leader election
- Watches broker availability
Diagram:
[ZooKeeper Cluster] <-> [Kafka Brokers]
Kafka Without ZooKeeper
- Until Kafka 2.8, ZooKeeper is mandatory.
- Kafka 2.8+ introduces KRaft mode, removing ZooKeeper dependency.
5. Kafka with Spring Boot
Integration Steps
- Add Kafka dependencies
<dependency> <groupId>org.springframework.kafka</groupId> <artifactId>spring-kafka</artifactId> </dependency>
- Configuration (application.yml):
spring:
kafka:
bootstrap-servers: localhost:9092
consumer:
group-id: myGroup
- Producer Example:
@Autowired
private KafkaTemplate<String, String> kafkaTemplate;
kafkaTemplate.send("topic-name", "Hello Kafka");
- Consumer Example:
@KafkaListener(topics = "topic-name")
public void listen(String message) {
System.out.println("Received: " + message);
}