Profile Photo

Kafka Interview Question

Created on: Dec 4, 2024

  1. Tell me about some of the use cases where Kafka is not suitable.

    • Kafka is designed to manage large amounts of data. Traditional messaging systems would be more appropriate if only a small number of messages need to be processed every day.
    • Although Kafka includes a streaming API, it is insufficient for executing data transformations. For ETL (extract, transform, load) jobs, Kafka should be avoided.
    • There are superior options, such as RabbitMQ, for scenarios when a simple task queue is required.
    • If long-term storage is necessary, Kafka is not a good choice. It simply allows you to save data for a specific retention period and no longer.
  2. What are the different strategies for retrying message processing in Kafka?

    1. Immediate Retries:
      • increasing the risk of consumer lag
      • May block other messages in the partition
    2. Retry with Backoff: Introduce a delay between retries using mechanisms like exponential backoff or fixed delay.
    3. Retry Topics: Use separate Kafka topics to handle retries. The consumer sends failed messages to a retry topic. Another consumer reads from the retry topic after a configured delay.
      • Cons: Requires careful topic and offset management.
      • Increases system complexity.
    4. Dead-Letter Queue (DLQ): Move messages to a dead-letter queue after a maximum number of retries.
      • Configure a separate Kafka topic as a DLQ.
  3. How do you implement a dead-letter queue (DLQ) in Kafka? Adding dependency

    <dependency>
    		<groupId>org.springframework.kafka</groupId>
    		<artifactId>spring-kafka</artifactId>
    	</dependency>
    

    Create ConsumerFactory and ConcurrentKafkaListenerContainerFactory beans

    @Bean public ConsumerFactory<String, Payment> consumerFactory() { Map<String, Object> config = new HashMap<>(); config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers); return new DefaultKafkaConsumerFactory<>( config, new StringDeserializer(), new JsonDeserializer<>(Payment.class)); } @Bean public ConcurrentKafkaListenerContainerFactory<String, Payment> containerFactory() { ConcurrentKafkaListenerContainerFactory<String, Payment> factory = new ConcurrentKafkaListenerContainerFactory<>(); factory.setConsumerFactory(consumerFactory()); return factory; }

    We will be adding beans for ConsumerFactory, ConcurrentKafkaListenerContainerFactory for retry logic and dlt

    @KafkaListener(topics = "orders-placed", groupId = "order-consumer-group") @RetryableTopic( backoff = @Backoff(delay = 600000L, multiplier = 3.0, maxDelay = 5400000L), attempts = "4", autoCreateTopics = "false", include = {SocketException.class, RuntimeException.class}, dltStrategy = DltStrategy.ALWAYS_RETRY_ON_ERROR ) public void consumeOrder(OrderPlacedEvent orderPlacedEvent) { .... }
    @DltHandler public void processMessage(OrderPlacedEvent orderPlacedEvent, @Header(KafkaHeaders.RECEIVED_TOPIC) String topic) { .... }
  4. Where offset is stored ?

    • Offset is stored in a seperate topic __consumer_offsets
    • While the consumer is actively running, it holds the last-pulled offsets in memory. This is not persisted unless explicitly committed.
    • If you disable auto-commit (enable.auto.commit = false), you can manually manage offsets.
  5. What is the role of Zookeeper in apache kafka ?

    1. Zookeeper keeps a list of active brokers and manages them. Each broker register itself with zookeeper and provides metadata such as broker id and address.
    2. Zookeeper manages leader election process. If leader broker fails zookeeper trigger a new election to assign a new leader.
    3. It stores configuration for kafka topics, partition, broker.
    4. Broker detects broker failure and notifies kafka controller (a special broker).
  6. What is difference between earliest and latest property of offset in consumer ?

    Propertyearliestlatest
    DefinitionStart consuming messages from the beginning of the topic if no offset is found.Start consuming messages from the end of the topic if no offset is found.
    When to UseWhen you want to process all existing messages from the topic.When you are only interested in new messages from the topic.
    Offset BehaviorReads from offset 0 if no committed offset exists.Reads from the latest offset (end of the log) if no committed offset exists.
    Scenario ExampleSuitable for batch processing or systems that need to process all messages from the beginning.Suitable for real-time monitoring or systems that only care about new events.
    Message ProcessingProcesses all historical messages and continues with new messages.Skips historical messages and starts processing new messages.
    Use CaseData processing systems that need full data (e.g., analytics, ETL pipelines).Real-time applications like log monitoring or notifications.