Kafka:

Top 70 Kafka Interview Questions and Answers for 2025 - GeeksforGeeks

1. what is Apache kafka
Distributed messaging platform that can be used for high throughput, scalable , fault tolerance and scalable data. Can be used for event driven architecute and highly used in modern platforms.


2. What are the key component of Kafka
  Producer, Consumer, Consumer group, bootstrap server(broker), partition and topic and zookeeper.

Bootstrap or broker where kafka actually store the messages.

Zookeeper: Coordinates Kafka brokers, keeping track of topics, partitions, and message offsets.

topic: category of the message

partition: ordered sequences of record , managed by offset.

broker: is a kafka server runs in kafka cluster, broker receives the message, assignes the offset and commits the message to the disk and assings the log as well.

Consumer and consumer group:

 consumers; consumes the message from certain topic

consumer group: is a set of consumers that works together to get the message from multiple topics

offset: unique identifier of the records or message.

kafka retention: data retention, either upto 1 gb or 1 month, handles through configuration.

kafka message size limit: 1 mb per message.

Idempotent producer in kafka: this ensures message is inserted only once even with multiple retries.

Serialization and deserialization: Producer has to do serialization before sending and consumer has to do deserialization before consuming it.

kafka message version: Kafka can't handle the version directly whereas it allows user to handle mutiple version by adding message version no in the message schema or payload.

Message ordering in kafka: kafka guarentees that message within the partition is in ordered sequence.

How does Kafka handle message ordering across multiple partitions:Kafka only guarantees message ordering within a single partition. Across multiple partitions, there is no guarantee of message ordering.

How does Kafka handle consumer lag?

Consumer lag in Kafka refers to the difference between the offset of the last produced message and the offset of the last consumed message. Kafka provides tools and APIs to monitor consumer lag, such as the Kafka Consumer Groups command-line tool and the AdminClient API. High consumer lag can indicate performance issues or insufficient consumer capacity. Kafka doesn't automatically handle lag, but it provides the information needed for applications to make scaling or performance optimization decisions.





Apend only writing,/sequential read.

Describe how leaders and followers work in Kafka replication.
Each partition elects a single leader; the leader handles all client reads/writes for that partition. Followers replicate the leader’s log asynchronously (or near-synchronously depending on settings). Followers stay in-sync by pulling fetches; if the leader fails, a follower in the ISR (in-sync replica set) is elected leader to maintain availability.

Can you explain ISR (In-Sync Replica) and its importance?
ISR is the set of replicas that have fully caught up with the leader (within configured lag thresholds). Only members of ISR are eligible for leader election (by default), ensuring that promoted leaders are up-to-date and minimizing data loss. ISR size and lag thresholds control trade-offs between availability and consistency.

Describe the lifecycle of a Kafka message from producer to consumer.
Producer serializes and sends message to a partition, leader appends to its log and (depending on acks) waits for replication, broker acknowledges. Consumer polls broker, fetches messages (commonly in batches), deserializes, processes, and commits offsets (to Kafka or external store). Offsets track consumption progress for fault-tolerant consumption.

What is the difference between acks=0, acks=1, and acks=all?
acks=0: producer does not wait for broker ack — lowest latency, highest risk of data loss.
acks=1: leader acknowledges after persisting to its log — moderate durability; risk if leader fails before followers replicate.
acks=all (or -1): leader waits for all ISR replicas to acknowledge — strongest durability (with replication factor >1), higher latency.

Explain the role of offsets and where Kafka stores them.
Offsets mark a consumer’s position in a partition. Modern Kafka stores offsets in an internal topic (__consumer_offsets), allowing fault-tolerance and visibility. Consumers commit offsets either automatically or explicitly; these commits are used to resume consumption after failures.

What strategies do you use to commit offsets safely?
Use manual commits after successful processing (commitSync/commitAsync) and commit only after side effects (DB writes, downstream emits) succeed. For exactly-once, use Kafka transactions to atomically write output and commit offsets. Also, handle idempotency on downstream systems to guard against duplicates.

How do you avoid consumer lag?
Monitor lag metrics; scale consumers (add parallelism) or increase partition count. Optimize consumer processing (batching, async IO), tune fetch size, increase broker throughput, and reduce GC pauses and long blocking operations.What factors influence partition count decisions?

  • Desired parallelism, throughput per partition, consumer count, key cardinality, retention and storage, and operational complexity. More partitions increase parallelism and throughput but add controller and metadata overhead, longer rebalances, and higher disk and network load.

  • How would you scale Kafka to millions of messages per second?
    Increase partitions across many brokers, use large SSDs and fast network (10/25/100GbE), tune batching/compression, partition keys to spread load, scale producers/consumers horizontally, use separate clusters or topics for isolation, and monitor broker metrics. Consider tuning OS/network (page cache, sockets) and using KRaft/optimized metadata settings.

  • How does Kafka handle leader election?
    Controller coordinates leader elections. On broker failure, controller picks an ISR replica (or out-of-sync if allowed) and updates metadata; producers/consumers refresh metadata and redirect to new leader. In KRaft, the Quorum-based metadata log handles elections more natively; in ZooKeeper-based clusters, controller reads ZooKeeper.

  • What mechanisms prevent message loss?
    Replication, acks=all, durable writes to disk (fsync behavior), ISR checks, proper leader election settings, and consumer offset management. Operational practices (monitoring, backups, multiple data centers) and not enabling unclean leader election further reduce loss risk.


  • Kafka Consumer Configuration

    Create the KafkaConsumerConfig class and this configuration class sets up the Kafka consumer of the Spring Boot application.
    use: ConsumerFactory and ConcurrentKafkaListenerContainerFactory classes.

    ProducerFactory KafkaTemplate for producer.
    @EnableKafka
    @Configuration
    public class KafkaConsumerConfig {
    
        @Bean
        public ConsumerFactory<String, String> consumerFactory() {
            Map<String, Object> configProps = new HashMap<>();
            configProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
            configProps.put(ConsumerConfig.GROUP_ID_CONFIG, "group_id");
            configProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
            configProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
            return new DefaultKafkaConsumerFactory<>(configProps);
        }
    
        @Bean
        public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() {
            ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
            factory.setConsumerFactory(consumerFactory());
            return factory;
        }
    }
    
    Create the KafkaProducerConfig class and this configuration class sets up the Kafka producer of the Spring Boot application
    ProducerFactory KafkaTemplate
    @Configuration
    public class KafkaProducerConfig {
    
        @Bean
        public ProducerFactory<String, String> producerFactory() {
            Map<String, Object> configProps = new HashMap<>();
            configProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
            configProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
            configProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class);
            return new DefaultKafkaProducerFactory<>(configProps);
        }
    
        @Bean
        public KafkaTemplate<String, String> kafkaTemplate() {
            return new KafkaTemplate<>(producerFactory());
        }
    }

    @Service
    public class KafkaProducerService {
    
        private static final String TOPIC = "my_topic";
    
        private final KafkaTemplate<String, String> kafkaTemplate;
    
        public KafkaProducerService(KafkaTemplate<String, String> kafkaTemplate) {
            this.kafkaTemplate = kafkaTemplate;
        }
    
        public void sendMessage(String message) {
            kafkaTemplate.send(TOPIC, message);
            System.out.println("Message sent: " + message);
        }
    }
    @Service
    public class KafkaConsumerService {
    
        @KafkaListener(topics = "my_topic", groupId = "group_id")
        public void consume(String message) {
            System.out.println("Message received: " + message);
        }
    }
    <dependency>
                <groupId>org.springframework.kafka</groupId>
                <artifactId>spring-kafka</artifactId>
            </dependency>

    offset: there will be two offsets, producer offset : where the next message will be writtern and consumer offset: where the next message will be read.
    producer : Append only, consumer : read sequential
    A consumer consumes the messages from all partition at the same time.


    What is partition.assignment.strategy?
    Offset Commit, Rebalancing & Consumer Group ~ Key Concepts in Kafka(Part2)
    Consumer rebalancing? when consumer goes down then consumers are rebalanced and distribute equally.
    When there are more partitions than consumers in a Kafka consumer group:
    • Each consumer will get multiple partitions

    • All consumers will be active

    • Some partitions will be bundled together on the same consumer

    Kafka always tries to balance partitions across consumers in a group.


    📌 Example

    #Partitions#ConsumersWhat Happens
    63Each consumer gets 2 partitions
    62Each consumer gets 3 partitions
    61One consumer processes all 6

    When there are more consumers than partitions in the same Kafka consumer group:

    • Each partition can only be assigned to one consumer in a group

    • Extra consumers will become idle and receive no messages

    There is no parallelism benefit from consumers exceeding partitions.

    📌 Example

    PartitionsConsumersResult
    32All consumers active
    33Balanced — 1 partition each
    341 idle consumer ( the idle consumer will be used if another consumer down)
    3107 idle consumers
    Rate limiter algorithm
    




    ✅ WebSocket What is it? A full-duplex (two-way) persistent connection between client & server. How it works Client opens a WebSocket connection Both client & server can send messages anytime Use cases Real-time chat/messaging Multiplayer games Live dashboards, trading apps Collaborative apps (Google Docs style) Pros True real-time bi-directional communication Low latency Efficient over time Cons More complex; needs WebSocket server Not ideal for broadcast-only scenarios

     WebHook
    What is it?
    
    Server-to-server push callback mechanism.
    
    Instead of client polling, server notifies a target URL when event happens.
    
    Use cases
    
    Payment confirmation (PayPal / Stripe)
    
    GitHub push notifications
    
    Slack notifications
    
    IoT alerts
    
    Pros
    
    Efficient (event push)
    
    No client waiting or polling
    
    Cons
    
    Requires public endpoint
    
    Reliability concerns if target is down

    Final Comparison

    HTTP Polling vs SSE vs WebSocket vs WebHooks
    FeatureWebSocketWebHookHTTP PollingSSE
    Direction2-wayServer → Server callbackClient → ServerServer → Client
    ConnectionPersistentEvent-driven HTTP pushRepeated HTTP requestsPersistent HTTP stream
    LatencyVery LowEvent-drivenHigherLow
    ComplexityMediumMediumVery LowLow
    Best ForChats, GamesServer eventsSimple periodic updatesLive notifications, Streams

    👨‍💼 Interview-ready Summary

    WebSocket is a full-duplex persistent connection for real-time bi-directional communication.
    WebHook is server-to-server push for event notifications.
    HTTP Polling repeatedly checks for updates and is simple but inefficient.
    SSE streams server → client updates over a persistent HTTP connection.



    Comments

    Popular posts from this blog

    Archunit test

    Hexagonal Architecture

    visitor design pattern