JMS messages moving to DLQ - jboss

JMS mesages are sometimes moving to the DLQ without throwing any exception.
Jboss server instance used is 4.3.0.GA_CP04_EAP.
We are using an an MDB that listens for incoming messages on a queue A, when it receives any message it updates the database and sens an email in one transaction.Transaction is CMT.
Now, what is happening is, sometimes mesages are not picked up by the consumer and they end up in the DLQ. Though from the JMX- console message count i could see that the message once did arrive to the queue A but then goes to the DLQ.
This happens intermittently and does not throw any exceptions on the logs either .
What seems to work most of the times is restarting the servers. No idea about what happens behind the scenes though.
**And after 29 days, same problem has returned.
This follows a pattern but varies with every restart.
There are 2 clustered serevrs which also do loadbalancing , P1 and P2.
First two email messages go to and processed by P1-Email sent
Next email message resquest goes to P2-Email sent
Next two email messages go to and processed by P1-Email sent
Next email message resquest goes to P2-Email NOT SENT
and the cycle repeats
I have found a workaround to this nagging problem thanks to the helpful info found at http://leakfromjavaheap.blogspot.in/2013/05/when-dead-letter-queue-becomes-zombie.html
DLQ listener is set up to listen for any incoming messages and puts them back to their intended destination if any of them is found on DLQ.
Also, considering the situation where any message is travelling from DLQ to the Queue and back to the DLQ in endless loops, a counter is set to check how many times the message has been to the DLQ before, if it exceeds the limit, then it is put to a Permanent DLQ (DLQ for a DLQ).
Application has been running smoothly ever since.

If you can provide the log details when message goes to DLQ, would be better to dig into this issue.

The logs did not contain any useful info; not even an exception to give a hint.
Finally,changed the local tx data source to xa data source and it was a success.Still wondering if there is a reason behind it.

Related

MSMQ poison message means what?

I'm pretty new to this queue service and I don't know what really means poisoned message.
I read that is a message you cant consume, but It means you can Peek() and see the details but not Receive() or what?
From my point of view, I would say a poisoned message is a message on top of the queue that because of its format or even corrupted format is not consumible because the business in charge of handle it can't do it and it maybe generates a exception that in a transactional scenario is catched and handled with a rollback, so the message stays on top forever.
What do you think? Am I totally wrong?
I've had to deal with poison MSMQ messages before, ugh! I'd say your definition is close.
A poison message is basically a message that is repeatedly read from a queue when the service reading the message cannot process the message because of an exception or some other issue and terminates the transaction under which the message is read. In such cases, the message remains in the queue is retried again upon next read from the queue. This can theoretically go on forever if there is a problem with the message.
For example, the message contained data that would violate a database constraint. I sometimes would create an error queue and have the service processing the messages throw the "poison" message into that if an exception occurred during processing. This would at least remove the message from the queue and give me an opportunity to view it later without effecting the main production queues.
Here is some advice and information on poison message handling.

Retry logic in kafka consumer

I have a use case where i consume certain logs from a queue and hit some third party API's with some info from that log , in case the third party system is not responding properly i wish to implement a retry logic for that particular log .
I can add a time field and repush the message to the same queue and this message will get again consumed if its time field is valid i.e less than current time and if not then get pushed again into queue.
But this logic will add same log again and again until retry time is correct and the queue will grow unnecessarily.
Is there is better way to implement retry logic in Kafka ?
You can create several retry topics and push failed task there. For instance you can create 3 topics with different delays in mins and rotate the single failed task till the max attempt limit reached.
‘retry_5m_topic’ — for retry in 5 minutes
‘retry_30m_topic’ — for retry in 30 minutes
‘retry_1h_topic’ — for retry in 1 hour
See more for details: https://blog.pragmatists.com/retrying-consumer-architecture-in-the-apache-kafka-939ac4cb851a
In consumer, if it throws an exception, produce another message with attempt number 1. so next time when it is consumed, it has the property of attempt no 1. Handle it in the producer that, if it attempts more than your retry count, then stop producing it.
Yes, this could be one straight solution that I also thought of. But with this, we will end up in creating many topics as it is possible that message processing will fail again.
I solved this problem by mapping this use case to Rabbit MQ. In rabbit MQ we have the concept of retry exchange where if a message processing fails from an exchange then u can send it to a retry exchange with a TTL. Once TTL gets expired the message will move back to the main exchange and is ready to be processed again.
I can post some examples explaining how we can implement an exponential backoff message processing using Rabbit MQ.

How can a kafka consumer doing infinite retires recover from a bad incoming message?

I am kafka newbie and as I was reading the docs, I had this design related question related to kafka consumer.
A kafka consumer reads messages from the kafka stream which is made up
of one or more partitions from one or more servers.
Lets say one of the incoming messages is corrupt and as a result the consumer fails to process. But when processing event logs you don't want to drop any events, as a result you do infinite retries to avoid transient errors during processing. In such cases of infinite retries, how can the consumer move forward. Is there a way to blacklist this message for next retry?
I'd think it needs manual intervention. Where we log some message metadata (don't know what exactly yet) to look at which message is failing and have logic in place where each consumer checks redis (or someplace else?) after n reties to see if this message needs to be skipped. The blacklist doesn't have to be stored forever in the redis either, only until the consumer can skip it. Here's a pseudocode of what i just described:
while (errorState) {
if (msg in blacklist) {
//skip
commitOffset()
} else {
errorState = processMessage(msg);
if (!errorState) {
commitOffset();
} else {
// log this msg so that we can add to blacklist
logger.info(msg)
}
}
}
I'd like to hear from more experienced folks to see if there are better ways to do this.
We had a requirement in our project where the processing of an incoming message to update a record was dependent on the record being present. Due to some race condition, sometimes update arrived before the insert. In such cases, we implemented couple of approaches.
A. Manual retry with a predefined delay. The code checks if the insert has arrived. If so, processing goes as normal. Otherwise, it would sleep for 500ms, then try again. This would repeat 10 times. At the end, if the message is still not processed, the code logs the message, commits the offset and moves forward. The processing of message is always done in a thread from a pool, so it doesn't block the main thread either. However, in the worst case each message would take 5 seconds of application time.
B. Recently, we refined the above solution to use a message scheduler based on kafka. So now if insert has not arrived before the update, system sends it to a separate scheduler which operates on kafka. This scheduler would replay the message after some time. After 3 retries, we again log the message and stop scheduling or retrying. This gives us the benefit of not blocking the application threads and manage when we would like to replay the message again.

Why are resent messages discarded in QuickFIX?

I have a QuickFIX/J application running as acceptor. ResetOnLogon is N in the configuration.
When the initiator is logged on, since the seq nums are different the initiator app sends the messages and I see those messages in the FIX log file. The first one of those message is passed to the application layer but the others are not, all are discarded.
What can be the reason that the messages are received but not passed to the application level?
The most likely reason for this is that the messages contain the PossDupFlag <43> with a 'Y' value, and a MsgSeqNum <34> that is infact recognized as a dupe by the engine. In that case you won't receive these as application level messages.

MSMQ Adding a delay on Messages

I have a Microsoft Message Queue that gets populated with messages. If there is a problem with the processing of the message, I would like to retry the message, I do not want to retry the message immidiatley.
Is there a way to add a delay to the message in the MSMQ to avoid it being available for a certain amount of time??
The other alternative is to have another queue (A retry queue) and read that queue every 15 minutes, But i would rather not do this.
What you are looking for is "Poison Message Handling" ( even if its not the message fault, but an temporary environment problem ).
There are lots of articles on that. Here are some:
Poison Message Handling in MSMQ 3.0
Poison Message Handling in MSMQ 4.0
Surviving poison messages in MSMQ
In short: you have to move them to a retry queue.
So I've seen some code recently that handles this in the exception logic, the code has a built in retry step that attempts after a delay. It fails, waits for a specific amount of time, then tries again.
Essentially it recursively tries a set number of times (lengthening the delay each time). Fairly neat, no reason to have another queue. There is alot of generics and delegates used to execute the methods. Don't know if something like this could be done or not. I would suspect you would still want to handle the case of the message not being able to be delivered with another queue though.