KAFKA

Top 70 Kafka Interview Questions and Answers for 2025 - GeeksforGeeks

Kafka:

<dependency> <groupId>org.springframework.kafka</groupId>

<artifactId>spring-kafka</artifactId>

<version>3.3.1</version> </dependency>

 

Apache Kafka is a distributed log-based messaging system that guarantees ordering within individual partitions rather than across the entire topic. Unlike queue-based systems, Kafka retains messages in a durable, append-only log, allowing multiple consumers to read at different offsets. Kafka uses manual offset management, giving consumers control over retries and failure handling. If a consumer fails to process a message, it can delay committing the offset, preventing further progress in that partition while other partitions remain unaffected. This partition-based design enables fault isolation and parallel processing while allowing ordering to be maintained within partitions, depending on consumer handling.

Why is Apache Kafka Needed?

With businesses collecting massive volumes of data in real time, there is a need for tools that can handle this data efficiently. Kafka solves several key problems:

  1. Real-Time Processing: Kafka is optimized for handling real-time data streams, allowing businesses to process and act on data as it happens.
  1. Fault-Tolerant: Kafka ensures that even if parts of the system fail, data won’t be lost, making it a highly reliable messaging system.
  1. Scalable: Kafka scales horizontally by adding more brokers, allowing it to handle growing data loads and increasing numbers of producers and consumers.
  1. Event-Driven Architecture: Kafka powers event-driven architectures, enabling systems to respond to events in real-time without having to constantly poll for changes.



Core Components of Apache Kafka

1. Kafka Broker/ Bootstrap server.

Kafka broker is a server that runs Kafka and stores data. Typically, a Kafka cluster consists of multiple brokers that work together to provide scalability, fault tolerance, and high availability. Each broker is responsible for storing and serving data related to topics.

2. Producers

A producer is an application or service that sends messages to a Kafka topic. These processes push data into the Kafka system. Producers decide which topic the message should go to, and Kafka efficiently handles it based on the partitioning strategy.

 

3. Kafka Topic

A topic in Kafka is a category or feed name to which messages are published. Kafka messages are always associated with topics, and when you want to send a message, you send it to a specific topic. Topics are divided into partitions, which allow Kafka to scale horizontally and handle large volumes of data.

4. Consumers and Consumer Groups

Consumer is an application that reads messages from Kafka topics. Kafka allows consumer groups, where multiple consumers can read from the same topic, but Kafka ensures that each message is processed by only one consumer in the group. This helps with load balancing and allows consumers to read messages starting from any offset.

Partitions allow you to parallelize a topic by splitting the data in a particular topic across multiple brokers.

5. Zookeeper

Kafka uses Apache ZooKeeper to manage metadata, control access to Kafka resources, and handle leader election and broker coordination. ZooKeeper ensures high availability by making sure the Kafka cluster remains functional even if a broker fails.

 

 

A screenshot of a computer

AI-generated content may be incorrect.

To create messages, we first need to configure a ProducerFactory. This sets the strategy for creating Kafka Producer instances.

Then, we need a KafkaTemplate, which wraps a Producer instance and provides convenience methods for sending messages to Kafka topics.

A screen shot of a computer

AI-generated content may be incorrect.

A computer screen shot of a code

AI-generated content may be incorrect.A computer code with text

AI-generated content may be incorrect.

Publishing Messages

We can send messages using the KafkaTemplate class:

@Autowired

private KafkaTemplate<String, String> kafkaTemplate;

 

public void sendMessage(String msg) {

    kafkaTemplate.send(topicName, msg);

}

A screenshot of a computer

AI-generated content may be incorrect.

FuseBuilder

A screen shot of a computer code

AI-generated content may be incorrect.

A screenshot of a computer program

AI-generated content may be incorrect.

Consumer Configuration

To consume messages, we need to configure a ConsumerFactory and a KafkaListenerContainerFactory. Once these beans are available in the Spring Bean factory, POJO-based consumers can be configured using @KafkaListener annotation.

@EnableKafka annotation is required on the configuration class to enable the detection of @KafkaListener annotation on spring-managed beans:

A screenshot of a computer program

AI-generated content may be incorrect.

In your project, thread concurrency is primarily managed in the following ways: 

Kafka Listener Container Factory: 

The ConcurrentKafkaListenerContainerFactory is used in the SPEKafkaConsumerConfig class. This factory allows concurrent processing of Kafka messages by configuring multiple threads for Kafka consumers.

The setConsumerFactory method sets the consumer factory, and the setAckMode is set to MANUAL, which gives control over message acknowledgment.

Awaitility: 

In the test class KafkaProcessSecurityPriceServiceImplTest, Awaitility is used to handle asynchronous operations and ensure proper waiting for certain conditions to be met. This helps in managing thread synchronization during tests.

Atomic Variables: 

Atomic classes like AtomicInteger are used in the test methods to handle thread-safe operations, such as counting retry attempts or timeout attempts, ensuring no race conditions occur.

Resource Locking: 

The @ResourceLock annotation is used in test methods to prevent concurrent access to shared resources like embeddedKafkaBroker, ensuring thread safety during test execution.

These approaches collectively ensure thread-safe operations and concurrency management in your project.

Top 70 Kafka Interview Questions and Answers for 2025 - GeeksforGeeks

1. what is Apache kafka
Distributed messaging platform that can be used for high throughput, scalable , fault tolerance and scalable data. Can be used for event driven architecute and highly used in modern platforms.


2. What are the key component of Kafka
  Producer, Consumer, Consumer group, bootstrap server(broker), partition and topic and zookeeper.

Bootstrap or broker where kafka actually store the messages.

Zookeeper: Coordinates Kafka brokers, keeping track of topics, partitions, and message offsets.

topic: category of the message

partition: ordered sequences of record , managed by offset.

broker: is a kafka server runs in kafka cluster, broker receives the message, assignes the offset and commits the message to the disk and assings the log as well.

Consumer and consumer group:

 consumers; consumes the message from certain topic

consumer group: is a set of consumers that works together to get the message from multiple topics

offset: unique identifier of the records or message.

kafka retention: data retention, either upto 1 gb or 1 month, handles through configuration.

kafka message size limit: 1 mb per message.

Idempotent producer in kafka: this ensures message is inserted only once even with multiple retries.

Serialization and deserialization: Producer has to do serialization before sending and consumer has to do deserialization before consuming it.

kafka message version: Kafka can't handle the version directly whereas it allows user to handle mutiple version by adding message version no in the message schema or payload.

Message ordering in kafka: kafka guarentees that message within the partition is in ordered sequence.

How does Kafka handle message ordering across multiple partitions:Kafka only guarantees message ordering within a single partition. Across multiple partitions, there is no guarantee of message ordering.

How does Kafka handle consumer lag?

Consumer lag in Kafka refers to the difference between the offset of the last produced message and the offset of the last consumed message. Kafka provides tools and APIs to monitor consumer lag, such as the Kafka Consumer Groups command-line tool and the AdminClient API. High consumer lag can indicate performance issues or insufficient consumer capacity. Kafka doesn't automatically handle lag, but it provides the information needed for applications to make scaling or performance optimization decisions.





Apend only writing,/sequential read.

Describe how leaders and followers work in Kafka replication.
Each partition elects a single leader; the leader handles all client reads/writes for that partition. Followers replicate the leader’s log asynchronously (or near-synchronously depending on settings). Followers stay in-sync by pulling fetches; if the leader fails, a follower in the ISR (in-sync replica set) is elected leader to maintain availability.

Can you explain ISR (In-Sync Replica) and its importance?
ISR is the set of replicas that have fully caught up with the leader (within configured lag thresholds). Only members of ISR are eligible for leader election (by default), ensuring that promoted leaders are up-to-date and minimizing data loss. ISR size and lag thresholds control trade-offs between availability and consistency.

Describe the lifecycle of a Kafka message from producer to consumer.
Producer serializes and sends message to a partition, leader appends to its log and (depending on acks) waits for replication, broker acknowledges. Consumer polls broker, fetches messages (commonly in batches), deserializes, processes, and commits offsets (to Kafka or external store). Offsets track consumption progress for fault-tolerant consumption.

What is the difference between acks=0, acks=1, and acks=all?
acks=0: producer does not wait for broker ack — lowest latency, highest risk of data loss.
acks=1: leader acknowledges after persisting to its log — moderate durability; risk if leader fails before followers replicate.
acks=all (or -1): leader waits for all ISR replicas to acknowledge — strongest durability (with replication factor >1), higher latency.

Explain the role of offsets and where Kafka stores them.
Offsets mark a consumer’s position in a partition. Modern Kafka stores offsets in an internal topic (__consumer_offsets), allowing fault-tolerance and visibility. Consumers commit offsets either automatically or explicitly; these commits are used to resume consumption after failures.

What strategies do you use to commit offsets safely?
Use manual commits after successful processing (commitSync/commitAsync) and commit only after side effects (DB writes, downstream emits) succeed. For exactly-once, use Kafka transactions to atomically write output and commit offsets. Also, handle idempotency on downstream systems to guard against duplicates.

How do you avoid consumer lag?
Monitor lag metrics; scale consumers (add parallelism) or increase partition count. Optimize consumer processing (batching, async IO), tune fetch size, increase broker throughput, and reduce GC pauses and long blocking operations.What factors influence partition count decisions?

  • Desired parallelism, throughput per partition, consumer count, key cardinality, retention and storage, and operational complexity. More partitions increase parallelism and throughput but add controller and metadata overhead, longer rebalances, and higher disk and network load.

  • How would you scale Kafka to millions of messages per second?
    Increase partitions across many brokers, use large SSDs and fast network (10/25/100GbE), tune batching/compression, partition keys to spread load, scale producers/consumers horizontally, use separate clusters or topics for isolation, and monitor broker metrics. Consider tuning OS/network (page cache, sockets) and using KRaft/optimized metadata settings.

  • How does Kafka handle leader election?
    Controller coordinates leader elections. On broker failure, controller picks an ISR replica (or out-of-sync if allowed) and updates metadata; producers/consumers refresh metadata and redirect to new leader. In KRaft, the Quorum-based metadata log handles elections more natively; in ZooKeeper-based clusters, controller reads ZooKeeper.

  • What mechanisms prevent message loss?
    Replication, acks=all, durable writes to disk (fsync behavior), ISR checks, proper leader election settings, and consumer offset management. Operational practices (monitoring, backups, multiple data centers) and not enabling unclean leader election further reduce loss risk.


  • Kafka Consumer Configuration

    Create the KafkaConsumerConfig class and this configuration class sets up the Kafka consumer of the Spring Boot application.
    use: ConsumerFactory and ConcurrentKafkaListenerContainerFactory classes.

    ProducerFactory KafkaTemplate for producer.
    @EnableKafka
    @Configuration
    public class KafkaConsumerConfig {
    
        @Bean
        public ConsumerFactory<String, String> consumerFactory() {
            Map<String, Object> configProps = new HashMap<>();
            configProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
            configProps.put(ConsumerConfig.GROUP_ID_CONFIG, "group_id");
            configProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
            configProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
            return new DefaultKafkaConsumerFactory<>(configProps);
        }
    
        @Bean
        public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() {
            ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
            factory.setConsumerFactory(consumerFactory());
            return factory;
        }
    }
    
    Create the KafkaProducerConfig class and this configuration class sets up the Kafka producer of the Spring Boot application
    ProducerFactory KafkaTemplate
    @Configuration
    public class KafkaProducerConfig {
    
        @Bean
        public ProducerFactory<String, String> producerFactory() {
            Map<String, Object> configProps = new HashMap<>();
            configProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
            configProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
            configProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class);
            return new DefaultKafkaProducerFactory<>(configProps);
        }
    
        @Bean
        public KafkaTemplate<String, String> kafkaTemplate() {
            return new KafkaTemplate<>(producerFactory());
        }
    }

    @Service
    public class KafkaProducerService {
    
        private static final String TOPIC = "my_topic";
    
        private final KafkaTemplate<String, String> kafkaTemplate;
    
        public KafkaProducerService(KafkaTemplate<String, String> kafkaTemplate) {
            this.kafkaTemplate = kafkaTemplate;
        }
    
        public void sendMessage(String message) {
            kafkaTemplate.send(TOPIC, message);
            System.out.println("Message sent: " + message);
        }
    }
    @Service
    public class KafkaConsumerService {
    
        @KafkaListener(topics = "my_topic", groupId = "group_id")
        public void consume(String message) {
            System.out.println("Message received: " + message);
        }
    }
    <dependency>
                <groupId>org.springframework.kafka</groupId>
                <artifactId>spring-kafka</artifactId>
            </dependency>

    offset: there will be two offsets, producer offset : where the next message will be writtern and consumer offset: where the next message will be read.
    producer : Append only, consumer : read sequential
    A consumer consumes the messages from all partition at the same time.


    What is partition.assignment.strategy?
    Offset Commit, Rebalancing & Consumer Group ~ Key Concepts in Kafka(Part2)
    Consumer rebalancing? when consumer goes down then consumers are rebalanced and distribute equally.
    When there are more partitions than consumers in a Kafka consumer group:
    • Each consumer will get multiple partitions

    • All consumers will be active

    • Some partitions will be bundled together on the same consumer

    Kafka always tries to balance partitions across consumers in a group.


    📌 Example

    #Partitions#ConsumersWhat Happens
    63Each consumer gets 2 partitions
    62Each consumer gets 3 partitions
    61One consumer processes all 6

    When there are more consumers than partitions in the same Kafka consumer group:

    • Each partition can only be assigned to one consumer in a group

    • Extra consumers will become idle and receive no messages

    There is no parallelism benefit from consumers exceeding partitions.

    📌 Example

    PartitionsConsumersResult
    32All consumers active
    33Balanced — 1 partition each
    341 idle consumer ( the idle consumer will be used if another consumer down)
    3107 idle consumers
    Rate limiter algorithm
    




    ✅ WebSocket What is it? A full-duplex (two-way) persistent connection between client & server. How it works Client opens a WebSocket connection Both client & server can send messages anytime Use cases Real-time chat/messaging Multiplayer games Live dashboards, trading apps Collaborative apps (Google Docs style) Pros True real-time bi-directional communication Low latency Efficient over time Cons More complex; needs WebSocket server Not ideal for broadcast-only scenarios

     WebHook
    What is it?
    
    Server-to-server push callback mechanism.
    
    Instead of client polling, server notifies a target URL when event happens.
    
    Use cases
    
    Payment confirmation (PayPal / Stripe)
    
    GitHub push notifications
    
    Slack notifications
    
    IoT alerts
    
    Pros
    
    Efficient (event push)
    
    No client waiting or polling
    
    Cons
    
    Requires public endpoint
    
    Reliability concerns if target is down

    Final Comparison

    HTTP Polling vs SSE vs WebSocket vs WebHooks
    FeatureWebSocketWebHookHTTP PollingSSE
    Direction2-wayServer → Server callbackClient → ServerServer → Client
    ConnectionPersistentEvent-driven HTTP pushRepeated HTTP requestsPersistent HTTP stream
    LatencyVery LowEvent-drivenHigherLow
    ComplexityMediumMediumVery LowLow
    Best ForChats, GamesServer eventsSimple periodic updatesLive notifications, Streams

    👨‍💼 Interview-ready Summary

    WebSocket is a full-duplex persistent connection for real-time bi-directional communication.
    WebHook is server-to-server push for event notifications.
    HTTP Polling repeatedly checks for updates and is simple but inefficient.
    SSE streams server → client updates over a persistent HTTP connection.



    Comments

    Popular posts from this blog

    Archunit test

    Hexagonal Architecture

    visitor design pattern