Kafka:
Top 70 Kafka Interview Questions and Answers for 2025 - GeeksforGeeks
1. what is Apache kafka
Distributed messaging platform that can be used for high throughput, scalable , fault tolerance and scalable data. Can be used for event driven architecute and highly used in modern platforms.
2. What are the key component of Kafka
Producer, Consumer, Consumer group, bootstrap server(broker), partition and topic and zookeeper.
Bootstrap or broker where kafka actually store the messages.
Zookeeper: Coordinates Kafka brokers, keeping track of topics, partitions, and message offsets.
topic: category of the message
partition: ordered sequences of record , managed by offset.
broker: is a kafka server runs in kafka cluster, broker receives the message, assignes the offset and commits the message to the disk and assings the log as well.
Consumer and consumer group:
consumers; consumes the message from certain topic
consumer group: is a set of consumers that works together to get the message from multiple topics
offset: unique identifier of the records or message.
kafka retention: data retention, either upto 1 gb or 1 month, handles through configuration.
kafka message size limit: 1 mb per message.
Idempotent producer in kafka: this ensures message is inserted only once even with multiple retries.
Serialization and deserialization: Producer has to do serialization before sending and consumer has to do deserialization before consuming it.
kafka message version: Kafka can't handle the version directly whereas it allows user to handle mutiple version by adding message version no in the message schema or payload.
Message ordering in kafka: kafka guarentees that message within the partition is in ordered sequence.
How does Kafka handle message ordering across multiple partitions:Kafka only guarantees message ordering within a single partition. Across multiple partitions, there is no guarantee of message ordering.
How does Kafka handle consumer lag?
Consumer lag in Kafka refers to the difference between the offset of the last produced message and the offset of the last consumed message. Kafka provides tools and APIs to monitor consumer lag, such as the Kafka Consumer Groups command-line tool and the AdminClient API. High consumer lag can indicate performance issues or insufficient consumer capacity. Kafka doesn't automatically handle lag, but it provides the information needed for applications to make scaling or performance optimization decisions.
Apend only writing,/sequential read.
Describe how leaders and followers work in Kafka replication.
Each partition elects a single leader; the leader handles all client reads/writes for that partition. Followers replicate the leader’s log asynchronously (or near-synchronously depending on settings). Followers stay in-sync by pulling fetches; if the leader fails, a follower in the ISR (in-sync replica set) is elected leader to maintain availability.
Can you explain ISR (In-Sync Replica) and its importance?
ISR is the set of replicas that have fully caught up with the leader (within configured lag thresholds). Only members of ISR are eligible for leader election (by default), ensuring that promoted leaders are up-to-date and minimizing data loss. ISR size and lag thresholds control trade-offs between availability and consistency.
Describe the lifecycle of a Kafka message from producer to consumer.
Producer serializes and sends message to a partition, leader appends to its log and (depending on acks) waits for replication, broker acknowledges. Consumer polls broker, fetches messages (commonly in batches), deserializes, processes, and commits offsets (to Kafka or external store). Offsets track consumption progress for fault-tolerant consumption.
What is the difference between acks=0, acks=1, and acks=all?
acks=0: producer does not wait for broker ack — lowest latency, highest risk of data loss.
acks=1: leader acknowledges after persisting to its log — moderate durability; risk if leader fails before followers replicate.
acks=all (or -1): leader waits for all ISR replicas to acknowledge — strongest durability (with replication factor >1), higher latency.
Explain the role of offsets and where Kafka stores them.
Offsets mark a consumer’s position in a partition. Modern Kafka stores offsets in an internal topic (__consumer_offsets), allowing fault-tolerance and visibility. Consumers commit offsets either automatically or explicitly; these commits are used to resume consumption after failures.
What strategies do you use to commit offsets safely?
Use manual commits after successful processing (commitSync/commitAsync) and commit only after side effects (DB writes, downstream emits) succeed. For exactly-once, use Kafka transactions to atomically write output and commit offsets. Also, handle idempotency on downstream systems to guard against duplicates.
How do you avoid consumer lag?
Monitor lag metrics; scale consumers (add parallelism) or increase partition count. Optimize consumer processing (batching, async IO), tune fetch size, increase broker throughput, and reduce GC pauses and long blocking operations.What factors influence partition count decisions?
Desired parallelism, throughput per partition, consumer count, key cardinality, retention and storage, and operational complexity. More partitions increase parallelism and throughput but add controller and metadata overhead, longer rebalances, and higher disk and network load.
How would you scale Kafka to millions of messages per second?
Increase partitions across many brokers, use large SSDs and fast network (10/25/100GbE), tune batching/compression, partition keys to spread load, scale producers/consumers horizontally, use separate clusters or topics for isolation, and monitor broker metrics. Consider tuning OS/network (page cache, sockets) and using KRaft/optimized metadata settings.
How does Kafka handle leader election?
Controller coordinates leader elections. On broker failure, controller picks an ISR replica (or out-of-sync if allowed) and updates metadata; producers/consumers refresh metadata and redirect to new leader. In KRaft, the Quorum-based metadata log handles elections more natively; in ZooKeeper-based clusters, controller reads ZooKeeper.
What mechanisms prevent message loss?
Replication, acks=all, durable writes to disk (fsync behavior), ISR checks, proper leader election settings, and consumer offset management. Operational practices (monitoring, backups, multiple data centers) and not enabling unclean leader election further reduce loss risk.
Kafka Consumer Configuration
Create the KafkaConsumerConfig class and this configuration class sets up the Kafka consumer of the Spring Boot application.
use: ConsumerFactory and ConcurrentKafkaListenerContainerFactory classes.
ProducerFactory KafkaTemplate for producer.
@EnableKafka @Configuration public class KafkaConsumerConfig { @Bean public ConsumerFactory<String, String> consumerFactory() { Map<String, Object> configProps = new HashMap<>(); configProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); configProps.put(ConsumerConfig.GROUP_ID_CONFIG, "group_id"); configProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class); configProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class); return new DefaultKafkaConsumerFactory<>(configProps); } @Bean public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() { ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>(); factory.setConsumerFactory(consumerFactory()); return factory; } }
Create the KafkaProducerConfig class and this configuration class sets up the Kafka producer of the Spring Boot applicationProducerFactory KafkaTemplate
@Configuration public class KafkaProducerConfig { @Bean public ProducerFactory<String, String> producerFactory() { Map<String, Object> configProps = new HashMap<>(); configProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); configProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class); configProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class); return new DefaultKafkaProducerFactory<>(configProps); } @Bean public KafkaTemplate<String, String> kafkaTemplate() { return new KafkaTemplate<>(producerFactory()); } }@Service public class KafkaProducerService { private static final String TOPIC = "my_topic"; private final KafkaTemplate<String, String> kafkaTemplate; public KafkaProducerService(KafkaTemplate<String, String> kafkaTemplate) { this.kafkaTemplate = kafkaTemplate; } public void sendMessage(String message) { kafkaTemplate.send(TOPIC, message); System.out.println("Message sent: " + message); } }@Service public class KafkaConsumerService { @KafkaListener(topics = "my_topic", groupId = "group_id") public void consume(String message) { System.out.println("Message received: " + message); } }<dependency> <groupId>org.springframework.kafka</groupId> <artifactId>spring-kafka</artifactId> </dependency>offset: there will be two offsets, producer offset : where the next message will be writtern and consumer offset: where the next message will be read.producer : Append only, consumer : read sequentialA consumer consumes the messages from all partition at the same time.What is partition.assignment.strategy?Offset Commit, Rebalancing & Consumer Group ~ Key Concepts in Kafka(Part2)Consumer rebalancing? when consumer goes down then consumers are rebalanced and distribute equally.When there are more partitions than consumers in a Kafka consumer group:
Each consumer will get multiple partitions
All consumers will be active
Some partitions will be bundled together on the same consumer
Kafka always tries to balance partitions across consumers in a group.
📌 Example
#Partitions #Consumers What Happens 6 3 Each consumer gets 2 partitions 6 2 Each consumer gets 3 partitions 6 1 One consumer processes all 6 When there are more consumers than partitions in the same Kafka consumer group:
Each partition can only be assigned to one consumer in a group
Extra consumers will become idle and receive no messages
There is no parallelism benefit from consumers exceeding partitions.
📌 Example
Partitions Consumers Result 3 2 All consumers active 3 3 Balanced — 1 partition each 3 4 1 idle consumer ( the idle consumer will be used if another consumer down) 3 10 7 idle consumers Rate limiter algorithm
✅ WebSocket What is it? A full-duplex (two-way) persistent connection between client & server. How it works Client opens a WebSocket connection Both client & server can send messages anytime Use cases Real-time chat/messaging Multiplayer games Live dashboards, trading apps Collaborative apps (Google Docs style) Pros True real-time bi-directional communication Low latency Efficient over time Cons More complex; needs WebSocket server Not ideal for broadcast-only scenariosWebHook What is it? Server-to-server push callback mechanism. Instead of client polling, server notifies a target URL when event happens. Use cases Payment confirmation (PayPal / Stripe) GitHub push notifications Slack notifications IoT alerts Pros Efficient (event push) No client waiting or polling Cons Requires public endpoint Reliability concerns if target is downFinal Comparison
HTTP Polling vs SSE vs WebSocket vs WebHooks
Feature WebSocket WebHook HTTP Polling SSE Direction 2-way Server → Server callback Client → Server Server → Client Connection Persistent Event-driven HTTP push Repeated HTTP requests Persistent HTTP stream Latency Very Low Event-driven Higher Low Complexity Medium Medium Very Low Low Best For Chats, Games Server events Simple periodic updates Live notifications, Streams
👨💼 Interview-ready Summary
WebSocket is a full-duplex persistent connection for real-time bi-directional communication.
WebHook is server-to-server push for event notifications.
HTTP Polling repeatedly checks for updates and is simple but inefficient.
SSE streams server → client updates over a persistent HTTP connection.
Comments
Post a Comment