Kafka Architecture, Internal Working, and Integration with Spring Boot

Posted on: January 19, 2025 Posted by: rahulgite Comments: 0

Apache Kafka is a distributed messaging system designed to handle high-throughput, fault-tolerant, and real-time data streams. It enables communication between producers (data publishers) and consumers (data subscribers).

Kafka Architecture

The Kafka architecture comprises the following key components:

– Producers

Producers are applications or systems that publish messages to Kafka topics.
They write data to specific topics and can configure data partitioning based on keys.

– Topics

Topics are logical channels where data is organized and stored.
Each topic can have multiple partitions, which are distributed across brokers for scalability.

– Partitions

Each topic is divided into partitions, which store subsets of messages.
Messages within a partition are ordered, but there is no ordering guarantee across partitions.

– Brokers

Brokers are Kafka servers that store data and handle client requests.
Kafka clusters consist of multiple brokers, each identified by a unique ID.

– Consumers

Consumers subscribe to topics and read data from partitions.
Kafka ensures that messages are delivered to consumers in the order they are written within a partition.

6. Consumer Groups

Consumers belong to consumer groups to enable parallel data processing.
Each message is delivered to one consumer per group, ensuring balanced load distribution.

7. ZooKeeper (Deprecated in modern Kafka setups)

ZooKeeper manages metadata for the Kafka cluster, including broker information and topic configurations.
Modern Kafka uses the Kafka Raft protocol (KRaft) to remove the dependency on ZooKeeper.

8. Replication

Kafka replicates partitions across brokers for fault tolerance.
One replica acts as the leader, and others are followers.
The leader handles all read and write requests, while followers synchronize data.

Kafka Offsets and Internal Working

Offsets

Kafka uses offsets to track the position of messages in a partition.
Each message within a partition has a unique offset.
Consumers use offsets to track which messages have been read.

How Offsets Work

When a consumer reads a message, it commits the offset to indicate it has processed the message.
Offsets can be committed manually or automatically using consumer configurations.
Stored offsets allow consumers to resume reading from where they left off in case of a failure.

Key Offset Management Concepts

Auto-Offset Reset: Configures where a consumer starts reading when there are no committed offsets (e.g., earliest, latest).
Offset Commit: Determines whether offsets are committed automatically or explicitly.
Rebalancing: When consumers join or leave a group, partitions are reassigned, and offsets ensure no data is lost.

Internal Working of Kafka

Producer Workflow: Producers send data to a topic. Data is partitioned using a key or round-robin algorithm. Kafka stores data in partitions on brokers.
Broker Workflow: Brokers handle incoming messages and store them in partitions. Replication ensures fault tolerance by storing copies of data across brokers.
Consumer Workflow: Consumers poll messages from brokers based on offsets. Consumer groups enable load balancing across partitions.
Retention Policy: Kafka retains messages for a configured period, regardless of whether they are consumed. Supports replaying data for analytics or debugging.
Data Durability: Kafka writes messages to disk and replicates them across brokers. Follower replicas synchronize data from the leader.

Kafka Integration with Spring Boot

Spring Boot simplifies Kafka integration by providing libraries for producer and consumer configuration, message serialization, and topic management.

Steps to Integrate Kafka in Spring Boot

Add Kafka Dependencies Add the Kafka dependency to your pom.xml or build.gradle:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-kafka</artifactId>
</dependency>

Configure Kafka Properties Add Kafka configurations to application.yml or application.properties:

spring:
  kafka:
    bootstrap-servers: localhost:9092
    consumer:
      group-id: my-group
      auto-offset-reset: earliest
    producer:
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: org.apache.kafka.common.serialization.StringSerializer
    consumer:
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      value-deserializer: org.apache.kafka.common.serialization.StringDeserializer

Create Kafka Producer Define a producer to send messages:

@Service
public class KafkaProducer {
    private final KafkaTemplate<String, String> kafkaTemplate;

    @Autowired
    public KafkaProducer(KafkaTemplate<String, String> kafkaTemplate) {
        this.kafkaTemplate = kafkaTemplate;
    }

    public void sendMessage(String topic, String message) {
        kafkaTemplate.send(topic, message);
    }
}

Create Kafka Consumer Define a consumer to receive messages:

@Service
public class KafkaConsumer {

    @KafkaListener(topics = "my-topic", groupId = "my-group")
    public void listen(String message) {
        System.out.println("Received message: " + message);
    }
}

Run and Test
- Start a Kafka cluster.
- Run the Spring Boot application.
- Use the producer to send messages and observe the consumer processing them.

Real-World Use Cases of Kafka with Spring Boot

Event-Driven Microservices Use Kafka for inter-service communication. Example: Order management systems where services publish and consume order events.
Real-Time Analytics Stream data from Kafka to analytics engines. Example: Monitoring website activity in real-time.
Log Aggregation Collect and centralize application logs for monitoring and debugging.
Data Pipelines Transfer data between systems (e.g., from databases to data lakes).

Advantages of Using Kafka with Spring Boot

Scalability: Handles high throughput with distributed architecture.
Fault Tolerance: Ensures data durability with replication.
Flexibility: Supports multiple consumers and real-time streaming.
Integration: Spring Boot provides a seamless way to work with Kafka.

This comprehensive explanation provides an in-depth understanding of Kafka’s architecture, internal working, offsets, and integration with Spring Boot.

Kafka

Kafka Architecture, Internal Working, and Integration with Spring Boot