Retry logic in kafka consumer - apache-kafka

I have a use case where i consume certain logs from a queue and hit some third party API's with some info from that log , in case the third party system is not responding properly i wish to implement a retry logic for that particular log .
I can add a time field and repush the message to the same queue and this message will get again consumed if its time field is valid i.e less than current time and if not then get pushed again into queue.
But this logic will add same log again and again until retry time is correct and the queue will grow unnecessarily.
Is there is better way to implement retry logic in Kafka ?

You can create several retry topics and push failed task there. For instance you can create 3 topics with different delays in mins and rotate the single failed task till the max attempt limit reached.
‘retry_5m_topic’ — for retry in 5 minutes
‘retry_30m_topic’ — for retry in 30 minutes
‘retry_1h_topic’ — for retry in 1 hour
See more for details: https://blog.pragmatists.com/retrying-consumer-architecture-in-the-apache-kafka-939ac4cb851a

In consumer, if it throws an exception, produce another message with attempt number 1. so next time when it is consumed, it has the property of attempt no 1. Handle it in the producer that, if it attempts more than your retry count, then stop producing it.

Yes, this could be one straight solution that I also thought of. But with this, we will end up in creating many topics as it is possible that message processing will fail again.
I solved this problem by mapping this use case to Rabbit MQ. In rabbit MQ we have the concept of retry exchange where if a message processing fails from an exchange then u can send it to a retry exchange with a TTL. Once TTL gets expired the message will move back to the main exchange and is ready to be processed again.
I can post some examples explaining how we can implement an exponential backoff message processing using Rabbit MQ.

Related

How to cancel scheduled messages in ActiveMQ Artemis

I have set ActiveMQ Artemis' configuration to redeliver an unsuccessful message after a period of time with a delay, like this
attempt no 1. unsuccessful delivery wait for 5 secs
attempt no 2. unsuccessful delivery wait for 10 secs
...
attempt no nnn. unsuccessful delivery wait for 5 hours
The problem is that I don't see messages on the queue that are scheduled and I don't know how to cancel 5 hours waiting period and redeliver the message right now
My questions
Why can't I see that message on the queue when I execute browse() function on the Artemis GUI Console? I can only see that message when I execute listScheduledMessages(). Had I not tried listScheduledMessages() I would be wondering why have I lost a message.
Is there any way to repeat a message without waiting for the next 5 hours?
You can't see scheduled messages when you use the browse() management method because technically scheduled messages are not on the queue. If they were on the queue they would be delivered to consumers.
There is currently no way to repeat a message without waiting for the scheduled time to arrive. However, you could get the message's ID using listScheduledMessages() and then pass that ID to removeMessage(long) to delete the message and then resend it with a different (or no) schedule.

How can a kafka consumer doing infinite retires recover from a bad incoming message?

I am kafka newbie and as I was reading the docs, I had this design related question related to kafka consumer.
A kafka consumer reads messages from the kafka stream which is made up
of one or more partitions from one or more servers.
Lets say one of the incoming messages is corrupt and as a result the consumer fails to process. But when processing event logs you don't want to drop any events, as a result you do infinite retries to avoid transient errors during processing. In such cases of infinite retries, how can the consumer move forward. Is there a way to blacklist this message for next retry?
I'd think it needs manual intervention. Where we log some message metadata (don't know what exactly yet) to look at which message is failing and have logic in place where each consumer checks redis (or someplace else?) after n reties to see if this message needs to be skipped. The blacklist doesn't have to be stored forever in the redis either, only until the consumer can skip it. Here's a pseudocode of what i just described:
while (errorState) {
if (msg in blacklist) {
//skip
commitOffset()
} else {
errorState = processMessage(msg);
if (!errorState) {
commitOffset();
} else {
// log this msg so that we can add to blacklist
logger.info(msg)
}
}
}
I'd like to hear from more experienced folks to see if there are better ways to do this.
We had a requirement in our project where the processing of an incoming message to update a record was dependent on the record being present. Due to some race condition, sometimes update arrived before the insert. In such cases, we implemented couple of approaches.
A. Manual retry with a predefined delay. The code checks if the insert has arrived. If so, processing goes as normal. Otherwise, it would sleep for 500ms, then try again. This would repeat 10 times. At the end, if the message is still not processed, the code logs the message, commits the offset and moves forward. The processing of message is always done in a thread from a pool, so it doesn't block the main thread either. However, in the worst case each message would take 5 seconds of application time.
B. Recently, we refined the above solution to use a message scheduler based on kafka. So now if insert has not arrived before the update, system sends it to a separate scheduler which operates on kafka. This scheduler would replay the message after some time. After 3 retries, we again log the message and stop scheduling or retrying. This gives us the benefit of not blocking the application threads and manage when we would like to replay the message again.

JMS messages moving to DLQ

JMS mesages are sometimes moving to the DLQ without throwing any exception.
Jboss server instance used is 4.3.0.GA_CP04_EAP.
We are using an an MDB that listens for incoming messages on a queue A, when it receives any message it updates the database and sens an email in one transaction.Transaction is CMT.
Now, what is happening is, sometimes mesages are not picked up by the consumer and they end up in the DLQ. Though from the JMX- console message count i could see that the message once did arrive to the queue A but then goes to the DLQ.
This happens intermittently and does not throw any exceptions on the logs either .
What seems to work most of the times is restarting the servers. No idea about what happens behind the scenes though.
**And after 29 days, same problem has returned.
This follows a pattern but varies with every restart.
There are 2 clustered serevrs which also do loadbalancing , P1 and P2.
First two email messages go to and processed by P1-Email sent
Next email message resquest goes to P2-Email sent
Next two email messages go to and processed by P1-Email sent
Next email message resquest goes to P2-Email NOT SENT
and the cycle repeats
I have found a workaround to this nagging problem thanks to the helpful info found at http://leakfromjavaheap.blogspot.in/2013/05/when-dead-letter-queue-becomes-zombie.html
DLQ listener is set up to listen for any incoming messages and puts them back to their intended destination if any of them is found on DLQ.
Also, considering the situation where any message is travelling from DLQ to the Queue and back to the DLQ in endless loops, a counter is set to check how many times the message has been to the DLQ before, if it exceeds the limit, then it is put to a Permanent DLQ (DLQ for a DLQ).
Application has been running smoothly ever since.
If you can provide the log details when message goes to DLQ, would be better to dig into this issue.
The logs did not contain any useful info; not even an exception to give a hint.
Finally,changed the local tx data source to xa data source and it was a success.Still wondering if there is a reason behind it.

Oracle Service Bus Proxy Service Scheduler

I need to create a proxy service scheduler that receive messages of the queue after 5 minutes. like queue produce message either a single or multiple but proxy receieve that messages after interval of every 5 minutes. how can i achieve this only using oracle service bus ...
Kindly help me for this
OSB do not provide Scheduler capabilities out of the box. You can do either of the following:
For JMS Queue put infinite retries by not setting retry limit and set retry interval as 5 minutes.
Create a scheduler. Check this post for the same: http://blogs.oracle.com/jamesbayer/entry/weblogic_scheduling_a_polling
Answer left for reference only, messages shouldn't be a subject to complex computed selections in this way, some value comparison and pattern matching only.
To fetch only old enough messages from queue,
not modifying queue or messages
not introducing any new brokers between queue and consumer
not prematurely consuming messages
, use Message Selector field of OSB Proxy on JMS Transport tab to set boolean expression (SQL 92) that checks that message's JMSTimestamp header is at least 5 minutes older than current time.
... and I wasn't successful to quickly produce valid message selector neither from timestamp nor JMSMessageID (it contains time in milis - 'ID:<465788.1372152510324.0>').
I guess somebody could still use it in some specific case.
You can use Quartz scheduler APIs to create schedulers across domains.
Regards,
Sajeev
I don't know whether this works for you, but its working good for me. May be you can use this to do your needful.
Goto Transport Details of your Proxy Service, under Advanced Options tab, set the following fields.
Polling Frequency (Mention your frequency 300 sec(5 min))
Physical Directory (may be here you need to give your Queue path)

MSMQ Adding a delay on Messages

I have a Microsoft Message Queue that gets populated with messages. If there is a problem with the processing of the message, I would like to retry the message, I do not want to retry the message immidiatley.
Is there a way to add a delay to the message in the MSMQ to avoid it being available for a certain amount of time??
The other alternative is to have another queue (A retry queue) and read that queue every 15 minutes, But i would rather not do this.
What you are looking for is "Poison Message Handling" ( even if its not the message fault, but an temporary environment problem ).
There are lots of articles on that. Here are some:
Poison Message Handling in MSMQ 3.0
Poison Message Handling in MSMQ 4.0
Surviving poison messages in MSMQ
In short: you have to move them to a retry queue.
So I've seen some code recently that handles this in the exception logic, the code has a built in retry step that attempts after a delay. It fails, waits for a specific amount of time, then tries again.
Essentially it recursively tries a set number of times (lengthening the delay each time). Fairly neat, no reason to have another queue. There is alot of generics and delegates used to execute the methods. Don't know if something like this could be done or not. I would suspect you would still want to handle the case of the message not being able to be delivered with another queue though.