Kafka Interview Question

Created on: Dec 4, 2024

Tell me about some of the use cases where Kafka is not suitable.
- Kafka is designed to manage large amounts of data. Traditional messaging systems would be more appropriate if only a small number of messages need to be processed every day.
- Although Kafka includes a streaming API, it is insufficient for executing data transformations. For ETL (extract, transform, load) jobs, Kafka should be avoided.
- There are superior options, such as RabbitMQ, for scenarios when a simple task queue is required.
- If long-term storage is necessary, Kafka is not a good choice. It simply allows you to save data for a specific retention period and no longer.
What are the different strategies for retrying message processing in Kafka?
1. Immediate Retries:
  - increasing the risk of consumer lag
  - May block other messages in the partition
2. Retry with Backoff: Introduce a delay between retries using mechanisms like exponential backoff or fixed delay.
3. Retry Topics: Use separate Kafka topics to handle retries. The consumer sends failed messages to a retry topic. Another consumer reads from the retry topic after a configured delay.
  - Cons: Requires careful topic and offset management.
  - Increases system complexity.
4. Dead-Letter Queue (DLQ): Move messages to a dead-letter queue after a maximum number of retries.
  - Configure a separate Kafka topic as a DLQ.

How do you implement a dead-letter queue (DLQ) in Kafka? Adding dependency

<dependency>
		<groupId>org.springframework.kafka</groupId>
		<artifactId>spring-kafka</artifactId>
	</dependency>

Create ConsumerFactory and ConcurrentKafkaListenerContainerFactory beans

 @Bean
 public ConsumerFactory<String, Payment> consumerFactory() {
     Map<String, Object> config = new HashMap<>();
     config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
     return new DefaultKafkaConsumerFactory<>(
     config, new StringDeserializer(), new JsonDeserializer<>(Payment.class));
 }

 @Bean
 public ConcurrentKafkaListenerContainerFactory<String, Payment> containerFactory() {
     ConcurrentKafkaListenerContainerFactory<String, Payment> factory =
     new ConcurrentKafkaListenerContainerFactory<>();
     factory.setConsumerFactory(consumerFactory());
     return factory;
 }

We will be adding beans for ConsumerFactory, ConcurrentKafkaListenerContainerFactory for retry logic and dlt

@KafkaListener(topics = "orders-placed",  groupId = "order-consumer-group")
@RetryableTopic(
         backoff = @Backoff(delay = 600000L, multiplier = 3.0, maxDelay = 5400000L),
         attempts = "4",
         autoCreateTopics = "false",
         include = {SocketException.class, RuntimeException.class},
         dltStrategy = DltStrategy.ALWAYS_RETRY_ON_ERROR
)
 public void consumeOrder(OrderPlacedEvent orderPlacedEvent) {
 ....
 }

 @DltHandler
 public void processMessage(OrderPlacedEvent orderPlacedEvent, @Header(KafkaHeaders.RECEIVED_TOPIC) String topic) {
 ....
 }

Where offset is stored ?
- Offset is stored in a seperate topic __consumer_offsets
- While the consumer is actively running, it holds the last-pulled offsets in memory. This is not persisted unless explicitly committed.
- If you disable auto-commit (enable.auto.commit = false), you can manually manage offsets.
What is the role of Zookeeper in apache kafka ?
1. Zookeeper keeps a list of active brokers and manages them. Each broker register itself with zookeeper and provides metadata such as broker id and address.
2. Zookeeper manages leader election process. If leader broker fails zookeeper trigger a new election to assign a new leader.
3. It stores configuration for kafka topics, partition, broker.
4. Broker detects broker failure and notifies kafka controller (a special broker).

What is difference between earliest and latest property of offset in consumer ?

Property	`earliest`	`latest`
Definition	Start consuming messages from the beginning of the topic if no offset is found.	Start consuming messages from the end of the topic if no offset is found.
When to Use	When you want to process all existing messages from the topic.	When you are only interested in new messages from the topic.
Offset Behavior	Reads from offset 0 if no committed offset exists.	Reads from the latest offset (end of the log) if no committed offset exists.
Scenario Example	Suitable for batch processing or systems that need to process all messages from the beginning.	Suitable for real-time monitoring or systems that only care about new events.
Message Processing	Processes all historical messages and continues with new messages.	Skips historical messages and starts processing new messages.
Use Case	Data processing systems that need full data (e.g., analytics, ETL pipelines).	Real-time applications like log monitoring or notifications.