Retry with backoff time per consumed message that fails in Kafka - apache-kafka

In Kafka, is it possible to set a backoff time per message, if processing of that message fails ? - So that I can process other messages, and try again later with the message that failed ? If I just put it back out in front of the topic, it will reappear very fast. I am using Kafka with Spring Boot.

Kafka does not have any built-in capabilities for a backoff time when consuming data as far as I know.
As soon as you process other messages successfully and also commit them it will be difficult to re-read only those where the processing failed. Kafka topics are built to be consumed in sequence while guaranteeing the order of messages per TopicPartition.
What we usually do in such a scenario is to catch the Exception during the processing of the message and then send it into a seperate topic (together with a error code/hint) and continue the processing of later incoming messages. That way you can analyse the data later and, if necessary, move the messages from that other topic into your original topic again.
The insertion of the problematic messages from the seperate topic into your original input topic could be done through a simple batch job that you run from time to time or even using the command line tools provided by Kafka.

Related

retry logic blocks the main consumer while its waiting for the retry in spring

I am referring:
https://medium.com/trendyol-tech/how-to-implement-retry-logic-with-spring-kafka-710b51501ce2
And it says that if we use below:
factory.setErrorHandler(new SeekToCurrentErrorHandler(new DeadLetterPublishingRecoverer(kafkaTemplate), 3));
It will block the main consumer while its waiting for the retry. (https://medium.com/trendyol-tech/how-to-implement-retry-logic-with-spring-kafka-710b51501ce2#:~:text=Also%20it%20blocks%20the%20main%20consumer%20while%20its%20waiting%20for%20the%20retry)
So, my question is do we really need retry on main topic or can we move the failed messages to a retry topic and then process messages there so that our main topic is non-blocking.
Can we achieve non-blocking retry using STCH?
Non-blocking retries were recently added to the new 2.7 release.
https://docs.spring.io/spring-kafka/docs/current/reference/html/#retry-topic
Achieving non-blocking retry / dlt functionality with Kafka usually requires setting up extra topics and creating and configuring the corresponding listeners. Since 2.7 Spring for Apache Kafka offers support for that via the #RetryableTopic annotation and RetryTopicConfiguration class to simplify that bootstrapping.
If message processing fails, the message is forwarded to a retry topic with a back off timestamp. The retry topic consumer then checks the timestamp and if it’s not due it pauses the consumption for that topic’s partition. When it is due the partition consumption is resumed, and the message is consumed again. If the message processing fails again the message will be forwarded to the next retry topic, and the pattern is repeated until a successful processing occurs, or the attempts are exhausted, and the message is sent to the Dead Letter Topic (if configured).

(Spring) Kafka appears to consume newly produced messages out of order

Situation:
We have a Spring Boot / Spring Kafka application that is reading from a Kafka topic with a single partition. There is a single instance of the application running and it has a single-threaded KafkaMessageListenerContainer (not Concurrent). We have a single consumer group.
We want to manage offsets ourselves based on committing to a transactional database. At startup, we read initial offsets from our database and seek to that offset and begin reading older messages. (For example with an empty database, we would start at offset 0.) We do this via implementing ConsumerRebalanceListener and seek()ing in that callback. We pause() the KafkaMessageListenerContainer prior to starting it so that we don't read any messages prior to the ConsumerRebalanceListener being invoked (then we resume() the container inside the ConsumerRebalanceListener.onPartitionsAssigned() callback). We acknowledge messages manually as they are consumed.
Issue:
While in the middle of reading these older messages (1000s of messages and 10s of seconds/minutes into the reading), a separate application produces messages into the same topic and partition we're reading from.
We observe that these newly produced messages are consumed immediately, intermingled with the older messages we're in the process of reading. So we observe message offsets that jump in this single consumer thread: from the basically sequential offsets of the older messages to ones that are from the new messages that were just produced, and then back to the older, sequential ones.
We don't see any errors in reading messages or anything that would trigger retries or anything like that. The reads of newer messages happen in the main thread as do the reads of older messages, so I don't believe there's another listener container running.
How could this happen? Doesn't this seem contrary to the ordering guarantees Kafka is supposed to provide? How can we prevent this behavior?
Details:
We have the following settings (some in properties, some in code, please excuse the mix):
properties.consumer.isolationLevel = KafkaProperties.IsolationLevel.READ_COMMITTED
properties.consumer.maxPollRecords = 500
containerProps.ackMode = ContainerProperties.AckMode.MANUAL
containerProps.eosMode = ContainerProperties.EOSMode.BETA
spring.kafka.consumer.auto-offset-reset=none
spring.kafka.enable-auto-commit=false
Versions:
Spring Kafka 2.5.5.RELEASE
Kafka 2.5.1
(we could definitely try upgrading if there was a reason to believe this was the result of a bug that was fixed since then.)
I can share some code snippets for any of the above if it's interesting.

When message process fails, can consumer put back message to same topic?

Suppose one of my program consuming message from kafka topic. During processing of message, consumer access some db. Its db acccess fails due to xyz reason. But we dont have to abandon the message. We need to park the message for later processing. In JMS when message processing fails application container put back the message to the queue. It does not lost. In Kafka once it received its offset increases and next message comes. How to handle this ?
There are two approaches to achieve this.
Set the Kafka Acknowledge mode to manual and in case of error terminate the consumer thread without submitting the offset (If group management is enabled new consumer will be added after triggering re balancing and poll the same batch)
Second approach is simple, just have one error topic and publish messages to error topic in case of any error, so later you can consumer them or keep track of them.

Is consumer offset commited even when failing to post to output topic in Kafka Streams?

If I have a Kafka stream application that fails to post to a topic (because the topic does not exist) does it commit the consumer offset and continue, or will it loop on the same message until it can resolve the output topic? The application merely prints an error and runs fine otherwise from what I can observe.
An example of the error when trying to post to topic:
Error while fetching metadata with correlation id 80 : {super.cool.test.topic=UNKNOWN_TOPIC_OR_PARTITION}
In my mind it would just spin on the same message until the issue is resolved in order to not lose data? I could not find a clear answer on what the default behavior is. We haven't set autocommit to off or anything like that, most of the settings are set to the default.
I am asking as we don't want to end up in a situation where the health check is fine (application is running while printing errors to log) and we are just throwing away tons of Kafka messages.
Kafka Streams will not commit the offsets for this case, as it provides at-least-once processing guarantees (in fact, it's not even possible to reconfigure Kafka Streams differently -- only stronger exactly-once guarantees are possible). Also, Kafka Streams disables auto-commit on the consumer always (and does not allow you to enable it), as Kafka Streams manages committing offset itself.
If you run with default setting, the producer should actually throw an exception and the corresponding thread should die -- you can get a callback if a thread dies, by registering KafkaStreams#uncaughtExceptionHandler().
You can also observe KafkaStreams#state() (or register a callback KafkaStreams#setStateListener()). The state will go to DEAD if all threads are dead (note, there was a bug in older version for which the state was still RUNNING for this case: https://issues.apache.org/jira/browse/KAFKA-5372)
Hence, the application should not be in a healthy state and Kafka Streams will not retry the input message but stop processing and you would need to restart the client. On restart, it would re-read the failed input message an re-try to write to the output topic.
If you want Kafka Streams to retry, you need to increase the producer config reties to avoid that the producer throws an exception and retries writing internally. This may "block" further processing eventually if producer write buffer becomes full.

Simple-Kafka-consumer message delivery duplication

I am trying to implement a simple Producer-->Kafka-->Consumer application in Java. I am able to produce as well as consume the messages successfully, but the problem occurs when I restart the consumer, wherein some of the already consumed messages are again getting picked up by consumer from Kafka (not all messages, but a few of the last consumed messages).
I have set autooffset.reset=largest in my consumer and my autocommit.interval.ms property is set to 1000 milliseconds.
Is this 'redelivery of some already consumed messages' a known problem, or is there any other settings that I am missing here?
Basically, is there a way to ensure none of the previously consumed messages are getting picked up/consumed by the consumer?
Kafka uses Zookeeper to store consumer offsets. Since Zookeeper operations are pretty slow, it's not advisable to commit offset after consumption of every message.
It's possible to add shutdown hook to consumer that will manually commit topic offset before exit. However, this won't help in certain situations (like jvm crash or kill -9). To guard againts that situations, I'd advise implementing custom commit logic that will commit offset locally after processing each message (file or local database), and also commit offset to Zookeeper every 1000ms. Upon consumer startup, both these locations should be queried, and maximum of two values should be used as consumption offset.