Apache Kafka is a distributed messaging system designed to handle high-throughput, fault-tolerant, and real-time data streams. It enables communication between producers (data publishers) and consumers (data subscribers).
Kafka Architecture
The Kafka architecture comprises the following key components:
– Producers
- Producers are applications or systems that publish messages to Kafka topics.
- They write data to specific topics and can configure data partitioning based on keys.
– Topics
- Topics are logical channels where data is organized and stored.
- Each topic can have multiple partitions, which are distributed across brokers for scalability.
– Partitions
- Each topic is divided into partitions, which store subsets of messages.
- Messages within a partition are ordered, but there is no ordering guarantee across partitions.
– Brokers
- Brokers are Kafka servers that store data and handle client requests.
- Kafka clusters consist of multiple brokers, each identified by a unique ID.
– Consumers
- Consumers subscribe to topics and read data from partitions.
- Kafka ensures that messages are delivered to consumers in the order they are written within a partition.
6. Consumer Groups
- Consumers belong to consumer groups to enable parallel data processing.
- Each message is delivered to one consumer per group, ensuring balanced load distribution.
7. ZooKeeper (Deprecated in modern Kafka setups)
- ZooKeeper manages metadata for the Kafka cluster, including broker information and topic configurations.
- Modern Kafka uses the Kafka Raft protocol (KRaft) to remove the dependency on ZooKeeper.
8. Replication
- Kafka replicates partitions across brokers for fault tolerance.
- One replica acts as the leader, and others are followers.
- The leader handles all read and write requests, while followers synchronize data.
Kafka Offsets and Internal Working
Offsets
- Kafka uses offsets to track the position of messages in a partition.
- Each message within a partition has a unique offset.
- Consumers use offsets to track which messages have been read.
How Offsets Work
- When a consumer reads a message, it commits the offset to indicate it has processed the message.
- Offsets can be committed manually or automatically using consumer configurations.
- Stored offsets allow consumers to resume reading from where they left off in case of a failure.
Key Offset Management Concepts
- Auto-Offset Reset: Configures where a consumer starts reading when there are no committed offsets (e.g.,
earliest,latest). - Offset Commit: Determines whether offsets are committed automatically or explicitly.
- Rebalancing: When consumers join or leave a group, partitions are reassigned, and offsets ensure no data is lost.
Internal Working of Kafka
- Producer Workflow:
Producers send data to a topic. Data is partitioned using a key or round-robin algorithm. Kafka stores data in partitions on brokers. - Broker Workflow:
Brokers handle incoming messages and store them in partitions. Replication ensures fault tolerance by storing copies of data across brokers. - Consumer Workflow:
Consumers poll messages from brokers based on offsets. Consumer groups enable load balancing across partitions. - Retention Policy:
Kafka retains messages for a configured period, regardless of whether they are consumed. Supports replaying data for analytics or debugging. - Data Durability:
Kafka writes messages to disk and replicates them across brokers. Follower replicas synchronize data from the leader.
Kafka Integration with Spring Boot
Spring Boot simplifies Kafka integration by providing libraries for producer and consumer configuration, message serialization, and topic management.
Steps to Integrate Kafka in Spring Boot
- Add Kafka Dependencies Add the Kafka dependency to your
pom.xmlorbuild.gradle:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-kafka</artifactId>
</dependency>
- Configure Kafka Properties Add Kafka configurations to
application.ymlorapplication.properties:
spring:
kafka:
bootstrap-servers: localhost:9092
consumer:
group-id: my-group
auto-offset-reset: earliest
producer:
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.apache.kafka.common.serialization.StringSerializer
consumer:
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
- Create Kafka Producer Define a producer to send messages:
@Service
public class KafkaProducer {
private final KafkaTemplate<String, String> kafkaTemplate;
@Autowired
public KafkaProducer(KafkaTemplate<String, String> kafkaTemplate) {
this.kafkaTemplate = kafkaTemplate;
}
public void sendMessage(String topic, String message) {
kafkaTemplate.send(topic, message);
}
}
- Create Kafka Consumer Define a consumer to receive messages:
@Service
public class KafkaConsumer {
@KafkaListener(topics = "my-topic", groupId = "my-group")
public void listen(String message) {
System.out.println("Received message: " + message);
}
}
- Run and Test
- Start a Kafka cluster.
- Run the Spring Boot application.
- Use the producer to send messages and observe the consumer processing them.
Real-World Use Cases of Kafka with Spring Boot
- Event-Driven Microservices
Use Kafka for inter-service communication. Example: Order management systems where services publish and consume order events. - Real-Time Analytics
Stream data from Kafka to analytics engines. Example: Monitoring website activity in real-time. - Log Aggregation
Collect and centralize application logs for monitoring and debugging. - Data Pipelines
Transfer data between systems (e.g., from databases to data lakes).
Advantages of Using Kafka with Spring Boot
- Scalability: Handles high throughput with distributed architecture.
- Fault Tolerance: Ensures data durability with replication.
- Flexibility: Supports multiple consumers and real-time streaming.
- Integration: Spring Boot provides a seamless way to work with Kafka.
This comprehensive explanation provides an in-depth understanding of Kafka’s architecture, internal working, offsets, and integration with Spring Boot.