KAFKA
Top 70 Kafka Interview Questions and Answers for 2025 - GeeksforGeeks
Kafka:
<dependency> <groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
<version>3.3.1</version> </dependency>
Apache Kafka is a distributed log-based messaging system
that guarantees ordering within individual partitions rather than across the
entire topic. Unlike queue-based systems, Kafka retains messages in a
durable, append-only log, allowing multiple consumers to read at
different offsets. Kafka uses manual offset management, giving consumers
control over retries and failure handling. If a consumer fails to process a
message, it can delay committing the offset, preventing further progress in
that partition while other partitions remain unaffected. This partition-based
design enables fault isolation and parallel processing while allowing
ordering to be maintained within partitions, depending on consumer handling.
Why is Apache Kafka Needed?
With businesses collecting massive volumes of data in
real time, there is a need for tools that can handle this data efficiently.
Kafka solves several key problems:
- Real-Time
Processing: Kafka is optimized for handling real-time data streams,
allowing businesses to process and act on data as it happens.
- Fault-Tolerant:
Kafka ensures that even if parts of the system fail, data won’t be lost,
making it a highly reliable messaging system.
- Scalable:
Kafka scales horizontally by adding more brokers, allowing it to handle
growing data loads and increasing numbers of producers and consumers.
- Event-Driven
Architecture: Kafka powers event-driven architectures,
enabling systems to respond to events in real-time without having to
constantly poll for changes.
Core Components of Apache Kafka
1. Kafka Broker/ Bootstrap server.
A Kafka broker is a server that runs Kafka and stores
data. Typically, a Kafka cluster consists of multiple brokers that work
together to provide scalability, fault tolerance, and high availability. Each
broker is responsible for storing and serving data related to topics.
2. Producers
A producer is an application or service that sends messages
to a Kafka topic. These processes push data into the Kafka
system. Producers decide which topic the message should go to, and Kafka
efficiently handles it based on the partitioning strategy.
3. Kafka Topic
A topic in Kafka is a category or feed name to which
messages are published. Kafka messages are always associated with topics, and
when you want to send a message, you send it to a specific topic. Topics are
divided into partitions, which allow Kafka to scale horizontally and handle
large volumes of data.
4. Consumers and Consumer Groups
A Consumer is
an application that reads messages from Kafka topics. Kafka allows consumer
groups, where multiple consumers can read from the same topic, but Kafka
ensures that each message is processed by only one consumer in the group. This
helps with load balancing and allows consumers to read messages starting from
any offset.
Partitions allow you to parallelize a topic by splitting
the data in a particular topic across multiple brokers.
5. Zookeeper
Kafka uses Apache ZooKeeper to manage
metadata, control access to Kafka resources, and handle leader election and broker
coordination. ZooKeeper ensures high availability by making sure
the Kafka cluster remains functional even if a broker fails.
To create
messages, we first need to configure a ProducerFactory. This sets the strategy for creating
Kafka Producer instances.
Then, we
need a KafkaTemplate, which wraps a Producer instance
and provides convenience methods for sending messages to Kafka topics.
Publishing Messages
We can send messages using the KafkaTemplate class:
@Autowired
private KafkaTemplate<String, String>
kafkaTemplate;
public void sendMessage(String msg) {
kafkaTemplate.send(topicName, msg);
}
FuseBuilder
Consumer Configuration
To consume messages, we need to configure a ConsumerFactory and
a KafkaListenerContainerFactory.
Once these beans are available in the Spring Bean factory, POJO-based consumers
can be configured using @KafkaListener annotation.
@EnableKafka annotation
is required on the configuration class to enable the detection of @KafkaListener annotation
on spring-managed beans:
In your project, thread concurrency is primarily managed
in the following ways:
Kafka Listener Container Factory:
The ConcurrentKafkaListenerContainerFactory is used
in the SPEKafkaConsumerConfig class. This factory allows concurrent processing
of Kafka messages by configuring multiple threads for Kafka consumers.
The setConsumerFactory method sets the consumer factory, and
the setAckMode is set to MANUAL, which gives control over message
acknowledgment.
Awaitility:
In the test class KafkaProcessSecurityPriceServiceImplTest,
Awaitility is used to handle asynchronous operations and ensure proper waiting
for certain conditions to be met. This helps in managing thread synchronization
during tests.
Atomic Variables:
Atomic classes like AtomicInteger are used in the
test methods to handle thread-safe operations, such as counting retry attempts
or timeout attempts, ensuring no race conditions occur.
Resource Locking:
The @ResourceLock annotation is used in test methods
to prevent concurrent access to shared resources like embeddedKafkaBroker,
ensuring thread safety during test execution.
These approaches collectively ensure thread-safe operations
and concurrency management in your project.
Top 70 Kafka Interview Questions and Answers for 2025 - GeeksforGeeks
1. what is Apache kafka
Distributed messaging platform that can be used for high throughput, scalable , fault tolerance and scalable data. Can be used for event driven architecute and highly used in modern platforms.
2. What are the key component of Kafka
Producer, Consumer, Consumer group, bootstrap server(broker), partition and topic and zookeeper.
Bootstrap or broker where kafka actually store the messages.
Zookeeper: Coordinates Kafka brokers, keeping track of topics, partitions, and message offsets.
topic: category of the message
partition: ordered sequences of record , managed by offset.
broker: is a kafka server runs in kafka cluster, broker receives the message, assignes the offset and commits the message to the disk and assings the log as well.
Consumer and consumer group:
consumers; consumes the message from certain topic
consumer group: is a set of consumers that works together to get the message from multiple topics
offset: unique identifier of the records or message.
kafka retention: data retention, either upto 1 gb or 1 month, handles through configuration.
kafka message size limit: 1 mb per message.
Idempotent producer in kafka: this ensures message is inserted only once even with multiple retries.
Serialization and deserialization: Producer has to do serialization before sending and consumer has to do deserialization before consuming it.
kafka message version: Kafka can't handle the version directly whereas it allows user to handle mutiple version by adding message version no in the message schema or payload.
Message ordering in kafka: kafka guarentees that message within the partition is in ordered sequence.
How does Kafka handle message ordering across multiple partitions:Kafka only guarantees message ordering within a single partition. Across multiple partitions, there is no guarantee of message ordering.
How does Kafka handle consumer lag?
Consumer lag in Kafka refers to the difference between the offset of the last produced message and the offset of the last consumed message. Kafka provides tools and APIs to monitor consumer lag, such as the Kafka Consumer Groups command-line tool and the AdminClient API. High consumer lag can indicate performance issues or insufficient consumer capacity. Kafka doesn't automatically handle lag, but it provides the information needed for applications to make scaling or performance optimization decisions.
Apend only writing,/sequential read.
Describe how leaders and followers work in Kafka replication.
Each partition elects a single leader; the leader handles all client reads/writes for that partition. Followers replicate the leader’s log asynchronously (or near-synchronously depending on settings). Followers stay in-sync by pulling fetches; if the leader fails, a follower in the ISR (in-sync replica set) is elected leader to maintain availability.
Can you explain ISR (In-Sync Replica) and its importance?
ISR is the set of replicas that have fully caught up with the leader (within configured lag thresholds). Only members of ISR are eligible for leader election (by default), ensuring that promoted leaders are up-to-date and minimizing data loss. ISR size and lag thresholds control trade-offs between availability and consistency.
Describe the lifecycle of a Kafka message from producer to consumer.
Producer serializes and sends message to a partition, leader appends to its log and (depending on acks) waits for replication, broker acknowledges. Consumer polls broker, fetches messages (commonly in batches), deserializes, processes, and commits offsets (to Kafka or external store). Offsets track consumption progress for fault-tolerant consumption.
What is the difference between acks=0, acks=1, and acks=all?
acks=0: producer does not wait for broker ack — lowest latency, highest risk of data loss.
acks=1: leader acknowledges after persisting to its log — moderate durability; risk if leader fails before followers replicate.
acks=all (or -1): leader waits for all ISR replicas to acknowledge — strongest durability (with replication factor >1), higher latency.
Explain the role of offsets and where Kafka stores them.
Offsets mark a consumer’s position in a partition. Modern Kafka stores offsets in an internal topic (__consumer_offsets), allowing fault-tolerance and visibility. Consumers commit offsets either automatically or explicitly; these commits are used to resume consumption after failures.
What strategies do you use to commit offsets safely?
Use manual commits after successful processing (commitSync/commitAsync) and commit only after side effects (DB writes, downstream emits) succeed. For exactly-once, use Kafka transactions to atomically write output and commit offsets. Also, handle idempotency on downstream systems to guard against duplicates.
How do you avoid consumer lag?
Monitor lag metrics; scale consumers (add parallelism) or increase partition count. Optimize consumer processing (batching, async IO), tune fetch size, increase broker throughput, and reduce GC pauses and long blocking operations.What factors influence partition count decisions?
Desired parallelism, throughput per partition, consumer count, key cardinality, retention and storage, and operational complexity. More partitions increase parallelism and throughput but add controller and metadata overhead, longer rebalances, and higher disk and network load.
How would you scale Kafka to millions of messages per second?
Increase partitions across many brokers, use large SSDs and fast network (10/25/100GbE), tune batching/compression, partition keys to spread load, scale producers/consumers horizontally, use separate clusters or topics for isolation, and monitor broker metrics. Consider tuning OS/network (page cache, sockets) and using KRaft/optimized metadata settings.
How does Kafka handle leader election?
Controller coordinates leader elections. On broker failure, controller picks an ISR replica (or out-of-sync if allowed) and updates metadata; producers/consumers refresh metadata and redirect to new leader. In KRaft, the Quorum-based metadata log handles elections more natively; in ZooKeeper-based clusters, controller reads ZooKeeper.
What mechanisms prevent message loss?
Replication, acks=all, durable writes to disk (fsync behavior), ISR checks, proper leader election settings, and consumer offset management. Operational practices (monitoring, backups, multiple data centers) and not enabling unclean leader election further reduce loss risk.
Kafka Consumer Configuration
Create the KafkaConsumerConfig class and this configuration class sets up the Kafka consumer of the Spring Boot application.
use: ConsumerFactory and ConcurrentKafkaListenerContainerFactory classes.
ProducerFactory KafkaTemplate for producer.
@EnableKafka @Configuration public class KafkaConsumerConfig { @Bean public ConsumerFactory<String, String> consumerFactory() { Map<String, Object> configProps = new HashMap<>(); configProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); configProps.put(ConsumerConfig.GROUP_ID_CONFIG, "group_id"); configProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class); configProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class); return new DefaultKafkaConsumerFactory<>(configProps); } @Bean public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() { ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>(); factory.setConsumerFactory(consumerFactory()); return factory; } }
Create the KafkaProducerConfig class and this configuration class sets up the Kafka producer of the Spring Boot applicationProducerFactory KafkaTemplate
@Configuration public class KafkaProducerConfig { @Bean public ProducerFactory<String, String> producerFactory() { Map<String, Object> configProps = new HashMap<>(); configProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); configProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class); configProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class); return new DefaultKafkaProducerFactory<>(configProps); } @Bean public KafkaTemplate<String, String> kafkaTemplate() { return new KafkaTemplate<>(producerFactory()); } }@Service public class KafkaProducerService { private static final String TOPIC = "my_topic"; private final KafkaTemplate<String, String> kafkaTemplate; public KafkaProducerService(KafkaTemplate<String, String> kafkaTemplate) { this.kafkaTemplate = kafkaTemplate; } public void sendMessage(String message) { kafkaTemplate.send(TOPIC, message); System.out.println("Message sent: " + message); } }@Service public class KafkaConsumerService { @KafkaListener(topics = "my_topic", groupId = "group_id") public void consume(String message) { System.out.println("Message received: " + message); } }<dependency> <groupId>org.springframework.kafka</groupId> <artifactId>spring-kafka</artifactId> </dependency>offset: there will be two offsets, producer offset : where the next message will be writtern and consumer offset: where the next message will be read.producer : Append only, consumer : read sequentialA consumer consumes the messages from all partition at the same time.What is partition.assignment.strategy?Offset Commit, Rebalancing & Consumer Group ~ Key Concepts in Kafka(Part2)Consumer rebalancing? when consumer goes down then consumers are rebalanced and distribute equally.When there are more partitions than consumers in a Kafka consumer group:
Each consumer will get multiple partitions
All consumers will be active
Some partitions will be bundled together on the same consumer
Kafka always tries to balance partitions across consumers in a group.
📌 Example
#Partitions #Consumers What Happens 6 3 Each consumer gets 2 partitions 6 2 Each consumer gets 3 partitions 6 1 One consumer processes all 6 When there are more consumers than partitions in the same Kafka consumer group:
Each partition can only be assigned to one consumer in a group
Extra consumers will become idle and receive no messages
There is no parallelism benefit from consumers exceeding partitions.
📌 Example
Partitions Consumers Result 3 2 All consumers active 3 3 Balanced — 1 partition each 3 4 1 idle consumer ( the idle consumer will be used if another consumer down) 3 10 7 idle consumers Rate limiter algorithm
✅ WebSocket What is it? A full-duplex (two-way) persistent connection between client & server. How it works Client opens a WebSocket connection Both client & server can send messages anytime Use cases Real-time chat/messaging Multiplayer games Live dashboards, trading apps Collaborative apps (Google Docs style) Pros True real-time bi-directional communication Low latency Efficient over time Cons More complex; needs WebSocket server Not ideal for broadcast-only scenariosWebHook What is it? Server-to-server push callback mechanism. Instead of client polling, server notifies a target URL when event happens. Use cases Payment confirmation (PayPal / Stripe) GitHub push notifications Slack notifications IoT alerts Pros Efficient (event push) No client waiting or polling Cons Requires public endpoint Reliability concerns if target is downFinal Comparison
HTTP Polling vs SSE vs WebSocket vs WebHooks
Feature WebSocket WebHook HTTP Polling SSE Direction 2-way Server → Server callback Client → Server Server → Client Connection Persistent Event-driven HTTP push Repeated HTTP requests Persistent HTTP stream Latency Very Low Event-driven Higher Low Complexity Medium Medium Very Low Low Best For Chats, Games Server events Simple periodic updates Live notifications, Streams
👨💼 Interview-ready Summary
WebSocket is a full-duplex persistent connection for real-time bi-directional communication.
WebHook is server-to-server push for event notifications.
HTTP Polling repeatedly checks for updates and is simple but inefficient.
SSE streams server → client updates over a persistent HTTP connection.
Comments
Post a Comment