Posted on: September 26, 2025 Posted by: rahulgite Comments: 0

๐Ÿ”น 1. Kafka Data Replication

  • Topic Partitions are replicated across brokers.
  • Each partition has:
    • Leader Replica โ†’ handles all reads/writes.
    • Follower Replicas โ†’ copy data from the leader.
  • Replication Factor (RF): number of replicas for a partition (e.g., RF=3 โ†’ 1 leader + 2 followers).

๐Ÿ‘‰ If the leader fails, one of the in-sync followers (ISR) is promoted as the new leader.


๐Ÿ”น 2. Consumer Offsets Replication

  • Consumer progress (offsets) is stored in an internal topic __consumer_offsets, which is also replicated.
  • This ensures that even if a broker storing offsets fails, the offsets are safe.
  • Keeps consumer group rebalancing consistent.

๐Ÿ”น 3. Transactional State Replication

  • For exactly-once semantics (EOS), Kafka uses the __transaction_state internal topic.
  • Stores transaction metadata and is replicated across brokers.
  • Ensures committed/uncommitted messages can be recovered after broker failure.

๐Ÿ”น 4. Kafka Streams State Replication

  • Kafka Streams applications store intermediate results in state stores (backed by RocksDB).
  • These are checkpointed into changelog topics, which are replicated.
  • If a stream task fails, another instance can rebuild state from the replicated changelog.

๐Ÿ”น 5. Connect Configs and Status Replication

  • Kafka Connect uses internal topics:
    • __config โ†’ connector configs.
    • __status โ†’ connector/task status.
    • __offsets โ†’ source offsets.
  • All are replicated topics, so connector jobs can recover after failure.

๐Ÿ”น 6. Metadata Replication (ZooKeeper vs KRaft)

  • In ZooKeeper mode: metadata (brokers, topics, ACLs, configs) stored in ZooKeeper, replicated across ZK quorum.
  • In KRaft mode: metadata stored in an internal topic __cluster_metadata, replicated using Raft protocol.

๐Ÿ”น 7. Log Segments (File System Level)

  • On each broker, partitions are written as log segments on disk.
  • These are not duplicated on the same broker, but replicated across brokers at the partition level.

๐Ÿ”น 8. Producer Reliability (Duplicates vs Idempotence)

  • Retries: Producer may resend messages on failure โ†’ potential duplicates.
  • Idempotent Producer (enable.idempotence=true): ensures even with retries, duplicates are avoided.
  • Transactional Producer: duplicates avoided across multiple partitions/operations with EOS.

๐Ÿ”น Summary Table

ComponentWhat is Replicated / DuplicatedWhy (Failure Handling)
Partitions (data)Replicated across brokers (RF โ‰ฅ 2)Data durability & availability
Consumer OffsetsInternal topic __consumer_offsets (replicated)Resume consumer progress after failure
Transaction StateInternal topic __transaction_state (replicated)Ensure exactly-once delivery
Streams State StoresChangelog topics (replicated)Rebuild state after task crash
Connect Configs/OffsetsInternal topics __config, __status, __offsetsConnector/task recovery
Cluster MetadataZooKeeper quorum or __cluster_metadata topic (KRaft)Broker/topic/ACL/config recovery
Producer MessagesRetries (duplicates possible), Idempotence removes dupsEnsure reliability

โœ… In short: Kafka replicates partitions, metadata, offsets, transactions, and internal system topics to make sure failures (broker crash, consumer crash, or connector failure) do not lose data or state.

Leave a Comment