8. Kafka Questions
T
Tuan Nguyen

8. Kafka Questions

This section focuses on Kafka fundamentals, including topics, partitions, consumer groups, message delivery guarantees, event-driven architecture, serialization, failure handling, and how Kafka integrates with Spring Boot in distributed systems.

1. What is Kafka?

Apache Kafka is a distributed event streaming platform used for high-throughput, fault-tolerant, real-time data processing.

Kafka is commonly used for:

  • Event-driven systems

  • Messaging

  • Log aggregation

  • Real-time analytics

  • Microservice communication

  • Data streaming pipelines

Kafka was originally developed by LinkedIn and later became an Apache project under the Apache Software Foundation.

Kafka stores streams of records called events or messages.

Example event:

{
  "orderId": 1001,
  "status": "CREATED"
}

Kafka is designed for:

  • Horizontal scalability

  • Distributed systems

  • High durability

  • Massive throughput

Unlike traditional queues, Kafka persists messages on disk and allows consumers to replay events.


2. What is a topic?

A topic is a logical category or stream of messages in Kafka.

Example topics:

orders
payments
notifications
user-events

Producers send messages to topics.

Consumers read messages from topics.

Example:

Producer → orders topic → Consumer

Topics are central to Kafka architecture.


3. What is a partition?

A partition is a subdivision of a topic.

Kafka topics are split into partitions for:

  • Scalability

  • Parallel processing

  • Distributed storage

Example:

orders topic
 ├── Partition 0
 ├── Partition 1
 └── Partition 2

Each partition is an ordered sequence of messages.

Kafka guarantees ordering only inside a partition.

Partitions allow Kafka to scale horizontally across multiple brokers.


4. What is a consumer group?

A consumer group is a set of consumers working together to process messages from a topic.

Example:

Consumer Group A
 ├── Consumer 1
 ├── Consumer 2
 └── Consumer 3

Kafka distributes partitions among consumers in the same group.

Important rule:

One partition → One consumer within a group

Benefits:

  • Parallel processing

  • Scalability

  • Load balancing

  • Fault tolerance

Different consumer groups can independently consume the same topic.


5. What is an offset?

An offset is the position of a message inside a partition.

Example:

Partition 0
Offset 0
Offset 1
Offset 2

Kafka uses offsets to track consumption progress.

Consumers commit offsets after processing messages.

Offsets are extremely important for:

  • Retry handling

  • Recovery

  • Delivery guarantees

  • Replay functionality


6. What is a broker?

A broker is a Kafka server.

Kafka clusters usually contain multiple brokers.

Example:

Broker 1
Broker 2
Broker 3

Brokers are responsible for:

  • Storing partitions

  • Serving producers

  • Serving consumers

  • Replication

  • Fault tolerance

Kafka distributes partitions across brokers.


7. What is producer?

A producer is an application that sends messages to Kafka topics.

Example:

kafkaTemplate.send("orders", order);

Producer responsibilities:

  • Serialize messages

  • Choose partitions

  • Send events

  • Handle retries

Example use cases:

  • Order service publishing events

  • Payment system sending updates

  • Logging systems sending logs


8. What is consumer?

A consumer reads messages from Kafka topics.

Example:

@KafkaListener(topics = "orders")
public void consume(OrderEvent event) {
}

Consumers process events asynchronously.

Responsibilities:

  • Read messages

  • Deserialize data

  • Process business logic

  • Commit offsets


9. Why is Kafka used?

Kafka is used because it handles massive real-time event streams efficiently.

Advantages:

Feature

Benefit

High throughput

Millions of messages

Scalability

Distributed partitions

Durability

Persistent storage

Fault tolerance

Replication

Replay support

Re-read old events

Decoupling

Independent services

Kafka is heavily used in:

  • Microservices

  • Financial systems

  • E-commerce

  • Real-time analytics

  • Streaming systems


10. Difference between Kafka and RabbitMQ?

This is a very common interview question.


Kafka

Designed for:

  • Event streaming

  • High throughput

  • Distributed logs

  • Replayable events

Messages are persisted for configurable retention periods.


RabbitMQ

Designed for:

  • Traditional message queues

  • Complex routing

  • Short-lived messages

Messages are usually removed after consumption.


Main differences:

Kafka

RabbitMQ

Distributed log

Traditional message broker

Pull-based

Push-based

Very high throughput

Lower throughput

Replay support

Limited replay

Persistent event stream

Queue processing

Better for streaming

Better for task queues


11. What is event-driven architecture?

Event-driven architecture is a system design where services communicate through events.

Example:

Order Created Event
→ Payment Service
→ Inventory Service
→ Notification Service

Instead of direct synchronous calls:

Service A → Service B

services publish events asynchronously.

Benefits:

  • Loose coupling

  • Scalability

  • Better resilience

  • Independent services

Kafka is one of the most popular platforms for event-driven architecture.


12. What is at-least-once delivery?

At-least-once delivery guarantees messages are never lost.

However:

Duplicates may occur

If acknowledgment fails, Kafka may resend message.

This is the most common Kafka delivery mode.


13. What is at-most-once delivery?

At-most-once delivery guarantees no duplicate processing.

However:

Messages may be lost

If consumer commits offset before processing succeeds, failures may lose data.


14. What is exactly-once delivery?

Exactly-once delivery guarantees messages are processed only once.

Kafka supports exactly-once semantics using:

  • Idempotent producers

  • Transactions

  • Offset coordination

Exactly-once is more complex and has performance trade-offs.


15. What is idempotent consumer?

An idempotent consumer can safely process the same message multiple times without producing incorrect results.

Example:

Processing same payment event twice should not charge customer twice.

Idempotency is critical because duplicates can occur in distributed systems.


16. Why should consumers be idempotent?

Because duplicate delivery is possible.

Example failure scenario:

  1. Consumer processes message

  2. Database update succeeds

  3. Consumer crashes before offset commit

  4. Kafka redelivers message

Without idempotency:

Duplicate business operations

may occur.

Examples:

  • Double payments

  • Duplicate emails

  • Incorrect inventory updates


17. What happens if consumer fails after processing but before committing offset?

Kafka assumes message was not processed successfully.

Result:

Message is reprocessed

This is why duplicates may happen in at-least-once delivery systems.

Consumers should therefore be idempotent.


18. What is dead letter topic?

Dead Letter Topic (DLT) stores messages that repeatedly fail processing.

Example:

orders-dlt

Used for:

  • Failed messages

  • Debugging

  • Manual investigation

  • Preventing endless retries

Very important in production systems.


19. What is retry topic?

Retry topics temporarily store failed messages before retrying later.

Example flow:

Main Topic
→ Retry Topic
→ Dead Letter Topic

Useful for transient failures:

  • Temporary database outage

  • Network issues

  • External API failure


20. How do you handle poison messages?

Poison messages are messages that always fail processing.

Handling strategies:

Strategy

Explanation

Dead Letter Topic

Move failed messages

Retry limit

Prevent infinite retries

Validation

Reject invalid data

Monitoring

Alert failures

Without proper handling, poison messages can block consumers continuously.


21. What is consumer lag?

Consumer lag is the difference between:

Latest offset - Consumer offset

High lag means consumers cannot keep up with message production.

Lag monitoring is critical in production Kafka systems.


22. How do you monitor Kafka?

Common monitoring metrics:

  • Consumer lag

  • Broker health

  • Throughput

  • Partition distribution

  • Error rates

  • Disk usage

Common tools:

  • Prometheus

  • Grafana

  • Kafka Exporter

  • Confluent Control Center

Monitoring is extremely important in distributed event systems.


23. What is message key?

A message key helps Kafka determine partition assignment.

Example:

kafkaTemplate.send("orders", orderId, order);

Same key usually maps to same partition.

Benefits:

  • Ordering guarantee

  • Related message grouping


24. How does Kafka choose partition?

Kafka partition selection depends on:

Situation

Behavior

Key exists

Hash(key) determines partition

No key

Round-robin distribution

Same key always maps consistently to same partition unless partition count changes.


25. How do you guarantee ordering in Kafka?

Kafka guarantees ordering only inside a partition.

To maintain ordering:

Use same message key

Example:

order-1001

All events for same order go to same partition.


26. Can Kafka guarantee global ordering?

No.

Kafka cannot guarantee ordering across multiple partitions.

Only partition-level ordering exists.

Global ordering would severely reduce scalability.

This is a very important interview point.


27. What is schema registry?

Schema Registry manages message schemas centrally.

Commonly used with:

  • Avro

  • Protobuf

Benefits:

  • Schema validation

  • Version compatibility

  • Producer-consumer consistency

Without schema management, evolving message structures becomes risky.


28. What is Avro?

Avro is a binary serialization format commonly used with Kafka.

Advantages:

  • Compact size

  • Fast serialization

  • Strong schema support

  • Version compatibility

Avro works very well with Schema Registry.


29. What is JSON serialization?

JSON serialization converts objects into JSON text format.

Example:

{
  "id": 1,
  "name": "John"
}

Advantages:

  • Human-readable

  • Easy debugging

  • Simple integration

Disadvantages:

  • Larger payload

  • Slower than binary formats like Avro


30. How do you integrate Kafka with Spring Boot?

Usually using:

spring-kafka

dependency.

Producer example:

@Autowired
private KafkaTemplate<String, Object> kafkaTemplate;

kafkaTemplate.send("orders", order);

Consumer example:

@KafkaListener(topics = "orders")
public void consume(OrderEvent event) {
}

Spring Boot auto-configures:

  • KafkaTemplate

  • ConsumerFactory

  • ProducerFactory

  • Listener containers

using application configuration.

Example:

spring.kafka.bootstrap-servers=localhost:9092

Spring Kafka greatly simplifies Kafka integration in enterprise applications.

Comments