Suppose one of my program consuming message from kafka topic. During processing of message, consumer access some db. Its db acccess fails due to xyz reason. But we dont have to abandon the message. We need to park the message for later processing. In JMS when message processing fails application container put back the message to the queue. It does not lost. In Kafka once it received its offset increases and next message comes. How to handle this ?
There are two approaches to achieve this.
Set the Kafka Acknowledge mode to manual and in case of error terminate the consumer thread without submitting the offset (If group management is enabled new consumer will be added after triggering re balancing and poll the same batch)
Second approach is simple, just have one error topic and publish messages to error topic in case of any error, so later you can consumer them or keep track of them.
Related
I have a use case where I want to consume from a kafka topic and depending on some logic if I am not able to process the message right now, I want to enqueue the message back to the same topic from where it had been read
Something like this
Topic1 ---> Consumer ---> Can't process now
^
|Re-enqueues________|
Is it possible ?
Yes, this is possible.
However, be aware that depending on your retention settings the re-ingested message might exist in the topic multiple times. Also, the consumer will consume all messages as long as it is running which could lead to the case that it has consumed all valid messages but keeps on re-ingesting the other messages over and over again.
The typical pattern to deal with messages that should be re-ingested into your pipeline is to send them to a dedicated Kafka topic. Once your consumer is fixed to be able to process those messages you can then have your consumer read that dedicated topic just once.
Most places I look recommend that to prevent data loss you should create a "retry" topic. If a consumer fails it should send a message to the "retry" topic which would wait a set period of time and then send a message back to the "main" topic.
Isn't this an anti-pattern since when it goes back to the "main" topic all the services subscribed to the "main" topic would reprocess the failed message even though only one of the services failed to process it initially?
Is there a conventional way of solving this such as putting the clientId in the headers for messages that are the result of a retry? Am I missing something?
Dead-letter queues (DLQ), in themselves, are not an anti-pattern. Cycling it back through the main topic might be, but that is subjective.
The alternative would be to "stop the world" and update the consumer code to resolve the errors before the topic retention deletes the messages you care about. OR, make your downstream consumers also read from the DLQ topic(s), but handle them differently from the main topic.
Is there a conventional way of solving this such as putting the clientId in the headers
Maybe if you wanted to track lineage somehow, but re-introducing those previously bad messages would introduce lag and ordering issues that interfere with the "good" messages.
Related question
What is the best practice to retry messages from Dead letter Queue for Kafka
In Kafka, is it possible to set a backoff time per message, if processing of that message fails ? - So that I can process other messages, and try again later with the message that failed ? If I just put it back out in front of the topic, it will reappear very fast. I am using Kafka with Spring Boot.
Kafka does not have any built-in capabilities for a backoff time when consuming data as far as I know.
As soon as you process other messages successfully and also commit them it will be difficult to re-read only those where the processing failed. Kafka topics are built to be consumed in sequence while guaranteeing the order of messages per TopicPartition.
What we usually do in such a scenario is to catch the Exception during the processing of the message and then send it into a seperate topic (together with a error code/hint) and continue the processing of later incoming messages. That way you can analyse the data later and, if necessary, move the messages from that other topic into your original topic again.
The insertion of the problematic messages from the seperate topic into your original input topic could be done through a simple batch job that you run from time to time or even using the command line tools provided by Kafka.
If I have a Kafka stream application that fails to post to a topic (because the topic does not exist) does it commit the consumer offset and continue, or will it loop on the same message until it can resolve the output topic? The application merely prints an error and runs fine otherwise from what I can observe.
An example of the error when trying to post to topic:
Error while fetching metadata with correlation id 80 : {super.cool.test.topic=UNKNOWN_TOPIC_OR_PARTITION}
In my mind it would just spin on the same message until the issue is resolved in order to not lose data? I could not find a clear answer on what the default behavior is. We haven't set autocommit to off or anything like that, most of the settings are set to the default.
I am asking as we don't want to end up in a situation where the health check is fine (application is running while printing errors to log) and we are just throwing away tons of Kafka messages.
Kafka Streams will not commit the offsets for this case, as it provides at-least-once processing guarantees (in fact, it's not even possible to reconfigure Kafka Streams differently -- only stronger exactly-once guarantees are possible). Also, Kafka Streams disables auto-commit on the consumer always (and does not allow you to enable it), as Kafka Streams manages committing offset itself.
If you run with default setting, the producer should actually throw an exception and the corresponding thread should die -- you can get a callback if a thread dies, by registering KafkaStreams#uncaughtExceptionHandler().
You can also observe KafkaStreams#state() (or register a callback KafkaStreams#setStateListener()). The state will go to DEAD if all threads are dead (note, there was a bug in older version for which the state was still RUNNING for this case: https://issues.apache.org/jira/browse/KAFKA-5372)
Hence, the application should not be in a healthy state and Kafka Streams will not retry the input message but stop processing and you would need to restart the client. On restart, it would re-read the failed input message an re-try to write to the output topic.
If you want Kafka Streams to retry, you need to increase the producer config reties to avoid that the producer throws an exception and retries writing internally. This may "block" further processing eventually if producer write buffer becomes full.
So I got some annoying offset commiting case with my kafka consumers.
I use 'kafka-node' for my project.
I created a topic.
Created 2 consumers within a consumer-group over 2 servers.
Auto-commit set to false.
For every mesaage my consumers get, they start an async process which can take between 1~20sec, when the process done the consumer commits the offset..
My problem is:
There is a senarios in which,
Consumer 1 gets a message and takes 20sec to process.
In the middle of the process he gets another message which takes 1s to process.
He finish the second message processing, commit the offset, then crashes right away.
Causing the previous message processing to fail.
If I re run the consumer, hes not reading the first message again, because the second message already commited the offsst which is greater than the first.
How can i avoid this?
Kafkaconsumer.on('message', async(message)=>{
await SOMETHING_ASYNC_1~20SEC;
Kafkaconsumer.commit(()=>{});
});
You essentially want to throttle messages and handle concurrency by utilizing async.queue.
Create a async.queue with message processor and concurrency of one (the message processor itself is wrapped with setImmediate so it will not freeze up the event loop)
Set the queue.drain to resume the consumer
The handler for consumer's message event pauses the consumer and pushes the message to the queue.
The kafka-node README details this here.
An example implementation, similar to your problem, can be found here.