
8. Kafka Questions
This section focuses on Kafka fundamentals, including topics, partitions, consumer groups, message delivery guarantees, event-driven architecture, serialization, failure handling, and how Kafka integrates with Spring Boot in distributed systems.
1. What is Kafka?
Apache Kafka is a distributed event streaming platform used for high-throughput, fault-tolerant, real-time data processing.
Kafka is commonly used for:
Event-driven systems
Messaging
Log aggregation
Real-time analytics
Microservice communication
Data streaming pipelines
Kafka was originally developed by LinkedIn and later became an Apache project under the Apache Software Foundation.
Kafka stores streams of records called events or messages.
Example event:
{
"orderId": 1001,
"status": "CREATED"
}Kafka is designed for:
Horizontal scalability
Distributed systems
High durability
Massive throughput
Unlike traditional queues, Kafka persists messages on disk and allows consumers to replay events.
2. What is a topic?
A topic is a logical category or stream of messages in Kafka.
Example topics:
orders
payments
notifications
user-eventsProducers send messages to topics.
Consumers read messages from topics.
Example:
Producer → orders topic → ConsumerTopics are central to Kafka architecture.
3. What is a partition?
A partition is a subdivision of a topic.
Kafka topics are split into partitions for:
Scalability
Parallel processing
Distributed storage
Example:
orders topic
├── Partition 0
├── Partition 1
└── Partition 2Each partition is an ordered sequence of messages.
Kafka guarantees ordering only inside a partition.
Partitions allow Kafka to scale horizontally across multiple brokers.
4. What is a consumer group?
A consumer group is a set of consumers working together to process messages from a topic.
Example:
Consumer Group A
├── Consumer 1
├── Consumer 2
└── Consumer 3Kafka distributes partitions among consumers in the same group.
Important rule:
One partition → One consumer within a groupBenefits:
Parallel processing
Scalability
Load balancing
Fault tolerance
Different consumer groups can independently consume the same topic.
5. What is an offset?
An offset is the position of a message inside a partition.
Example:
Partition 0
Offset 0
Offset 1
Offset 2Kafka uses offsets to track consumption progress.
Consumers commit offsets after processing messages.
Offsets are extremely important for:
Retry handling
Recovery
Delivery guarantees
Replay functionality
6. What is a broker?
A broker is a Kafka server.
Kafka clusters usually contain multiple brokers.
Example:
Broker 1
Broker 2
Broker 3Brokers are responsible for:
Storing partitions
Serving producers
Serving consumers
Replication
Fault tolerance
Kafka distributes partitions across brokers.
7. What is producer?
A producer is an application that sends messages to Kafka topics.
Example:
kafkaTemplate.send("orders", order);Producer responsibilities:
Serialize messages
Choose partitions
Send events
Handle retries
Example use cases:
Order service publishing events
Payment system sending updates
Logging systems sending logs
8. What is consumer?
A consumer reads messages from Kafka topics.
Example:
@KafkaListener(topics = "orders")
public void consume(OrderEvent event) {
}Consumers process events asynchronously.
Responsibilities:
Read messages
Deserialize data
Process business logic
Commit offsets
9. Why is Kafka used?
Kafka is used because it handles massive real-time event streams efficiently.
Advantages:
Feature | Benefit |
|---|---|
High throughput | Millions of messages |
Scalability | Distributed partitions |
Durability | Persistent storage |
Fault tolerance | Replication |
Replay support | Re-read old events |
Decoupling | Independent services |
Kafka is heavily used in:
Microservices
Financial systems
E-commerce
Real-time analytics
Streaming systems
10. Difference between Kafka and RabbitMQ?
This is a very common interview question.
Kafka
Designed for:
Event streaming
High throughput
Distributed logs
Replayable events
Messages are persisted for configurable retention periods.
RabbitMQ
Designed for:
Traditional message queues
Complex routing
Short-lived messages
Messages are usually removed after consumption.
Main differences:
Kafka | RabbitMQ |
|---|---|
Distributed log | Traditional message broker |
Pull-based | Push-based |
Very high throughput | Lower throughput |
Replay support | Limited replay |
Persistent event stream | Queue processing |
Better for streaming | Better for task queues |
11. What is event-driven architecture?
Event-driven architecture is a system design where services communicate through events.
Example:
Order Created Event
→ Payment Service
→ Inventory Service
→ Notification ServiceInstead of direct synchronous calls:
Service A → Service Bservices publish events asynchronously.
Benefits:
Loose coupling
Scalability
Better resilience
Independent services
Kafka is one of the most popular platforms for event-driven architecture.
12. What is at-least-once delivery?
At-least-once delivery guarantees messages are never lost.
However:
Duplicates may occurIf acknowledgment fails, Kafka may resend message.
This is the most common Kafka delivery mode.
13. What is at-most-once delivery?
At-most-once delivery guarantees no duplicate processing.
However:
Messages may be lostIf consumer commits offset before processing succeeds, failures may lose data.
14. What is exactly-once delivery?
Exactly-once delivery guarantees messages are processed only once.
Kafka supports exactly-once semantics using:
Idempotent producers
Transactions
Offset coordination
Exactly-once is more complex and has performance trade-offs.
15. What is idempotent consumer?
An idempotent consumer can safely process the same message multiple times without producing incorrect results.
Example:
Processing same payment event twice should not charge customer twice.
Idempotency is critical because duplicates can occur in distributed systems.
16. Why should consumers be idempotent?
Because duplicate delivery is possible.
Example failure scenario:
Consumer processes message
Database update succeeds
Consumer crashes before offset commit
Kafka redelivers message
Without idempotency:
Duplicate business operationsmay occur.
Examples:
Double payments
Duplicate emails
Incorrect inventory updates
17. What happens if consumer fails after processing but before committing offset?
Kafka assumes message was not processed successfully.
Result:
Message is reprocessedThis is why duplicates may happen in at-least-once delivery systems.
Consumers should therefore be idempotent.
18. What is dead letter topic?
Dead Letter Topic (DLT) stores messages that repeatedly fail processing.
Example:
orders-dltUsed for:
Failed messages
Debugging
Manual investigation
Preventing endless retries
Very important in production systems.
19. What is retry topic?
Retry topics temporarily store failed messages before retrying later.
Example flow:
Main Topic
→ Retry Topic
→ Dead Letter TopicUseful for transient failures:
Temporary database outage
Network issues
External API failure
20. How do you handle poison messages?
Poison messages are messages that always fail processing.
Handling strategies:
Strategy | Explanation |
|---|---|
Dead Letter Topic | Move failed messages |
Retry limit | Prevent infinite retries |
Validation | Reject invalid data |
Monitoring | Alert failures |
Without proper handling, poison messages can block consumers continuously.
21. What is consumer lag?
Consumer lag is the difference between:
Latest offset - Consumer offsetHigh lag means consumers cannot keep up with message production.
Lag monitoring is critical in production Kafka systems.
22. How do you monitor Kafka?
Common monitoring metrics:
Consumer lag
Broker health
Throughput
Partition distribution
Error rates
Disk usage
Common tools:
Prometheus
Grafana
Kafka Exporter
Confluent Control Center
Monitoring is extremely important in distributed event systems.
23. What is message key?
A message key helps Kafka determine partition assignment.
Example:
kafkaTemplate.send("orders", orderId, order);Same key usually maps to same partition.
Benefits:
Ordering guarantee
Related message grouping
24. How does Kafka choose partition?
Kafka partition selection depends on:
Situation | Behavior |
|---|---|
Key exists | Hash(key) determines partition |
No key | Round-robin distribution |
Same key always maps consistently to same partition unless partition count changes.
25. How do you guarantee ordering in Kafka?
Kafka guarantees ordering only inside a partition.
To maintain ordering:
Use same message keyExample:
order-1001All events for same order go to same partition.
26. Can Kafka guarantee global ordering?
No.
Kafka cannot guarantee ordering across multiple partitions.
Only partition-level ordering exists.
Global ordering would severely reduce scalability.
This is a very important interview point.
27. What is schema registry?
Schema Registry manages message schemas centrally.
Commonly used with:
Avro
Protobuf
Benefits:
Schema validation
Version compatibility
Producer-consumer consistency
Without schema management, evolving message structures becomes risky.
28. What is Avro?
Avro is a binary serialization format commonly used with Kafka.
Advantages:
Compact size
Fast serialization
Strong schema support
Version compatibility
Avro works very well with Schema Registry.
29. What is JSON serialization?
JSON serialization converts objects into JSON text format.
Example:
{
"id": 1,
"name": "John"
}Advantages:
Human-readable
Easy debugging
Simple integration
Disadvantages:
Larger payload
Slower than binary formats like Avro
30. How do you integrate Kafka with Spring Boot?
Usually using:
spring-kafkadependency.
Producer example:
@Autowired
private KafkaTemplate<String, Object> kafkaTemplate;
kafkaTemplate.send("orders", order);Consumer example:
@KafkaListener(topics = "orders")
public void consume(OrderEvent event) {
}Spring Boot auto-configures:
KafkaTemplate
ConsumerFactory
ProducerFactory
Listener containers
using application configuration.
Example:
spring.kafka.bootstrap-servers=localhost:9092Spring Kafka greatly simplifies Kafka integration in enterprise applications.
