Transactional Dead-letter queue is filling up

Transactional Dead-letter queue is filling up - msmq

Our Transactional Dead-letter queue is filling up in MSMQ. I can't find documentation on particular.net that points to why this could be happening.
It looks like every single message that is being processed (successfully) is ending up at the that queue.
What is the reason items are sent to the Transactional Dead Letter queue?

If you open dead letter queue in Computer Management, each message has a reason why it ended up there. It's under "class" column. That should point you where to look. For example one of reasons could be "The time-to-be-received has elapsed.", if message wasn't received within time specified in its "TimeToBeReceived" property.

Related

MSMQ poison message means what?

I'm pretty new to this queue service and I don't know what really means poisoned message.
I read that is a message you cant consume, but It means you can Peek() and see the details but not Receive() or what?
From my point of view, I would say a poisoned message is a message on top of the queue that because of its format or even corrupted format is not consumible because the business in charge of handle it can't do it and it maybe generates a exception that in a transactional scenario is catched and handled with a rollback, so the message stays on top forever.
What do you think? Am I totally wrong?

I've had to deal with poison MSMQ messages before, ugh! I'd say your definition is close.
A poison message is basically a message that is repeatedly read from a queue when the service reading the message cannot process the message because of an exception or some other issue and terminates the transaction under which the message is read. In such cases, the message remains in the queue is retried again upon next read from the queue. This can theoretically go on forever if there is a problem with the message.
For example, the message contained data that would violate a database constraint. I sometimes would create an error queue and have the service processing the messages throw the "poison" message into that if an exception occurred during processing. This would at least remove the message from the queue and give me an opportunity to view it later without effecting the main production queues.
Here is some advice and information on poison message handling.

Using many consumers in SQS Queue

I know that it is possible to consume a SQS queue using multiple threads. I would like to guarantee that each message will be consumed once. I know that it is possible to change the visibility timeout of a message, e.g., equal to my processing time. If my process spend more time than the visibility timeout (e.g. a slow connection) other thread can consume the same message.
What is the best approach to guarantee that a message will be processed once?

What is the best approach to guarantee that a message will be processed once?
You're asking for a guarantee - you won't get one. You can reduce probability of a message being processed more than once to a very small amount, but you won't get a guarantee.
I'll explain why, along with strategies for reducing duplication.
Where does duplication come from
When you put a message in SQS, SQS might actually receive that message more than once
For example: a minor network hiccup while sending the message caused a transient error that was automatically retried - from the message sender's perspective, it failed once, and successfully sent once, but SQS received both messages.
SQS can internally generate duplicates
Simlar to the first example - there's a lot of computers handling messages under the covers, and SQS needs to make sure nothing gets lost - messages are stored on multiple servers, and can this can result in duplication.
For the most part, by taking advantage of SQS message visibility timeout, the chances of duplication from these sources are already pretty small - like fraction of a percent small.
If processing duplicates really isn't that bad (strive to make your message consumption idempotent!), I'd consider this good enough - reducing chances of duplication further is complicated and potentially expensive...
What can your application do to reduce duplication further?
Ok, here we go down the rabbit hole... at a high level, you will want to assign unique ids to your messages, and check against an atomic cache of ids that are in progress or completed before starting processing:
Make sure your messages have unique identifiers provided at insertion time
Without this, you'll have no way of telling duplicates apart.
Handle duplication at the 'end of the line' for messages.
If your message receiver needs to send messages off-box for further processing, then it can be another source of duplication (for similar reasons to above)
You'll need somewhere to atomically store and check these unique ids (and flush them after some timeout). There are two important states: "InProgress" and "Completed"
InProgress entries should have a timeout based on how fast you need to recover in case of processing failure.
Completed entries should have a timeout based on how long you want your deduplication window
The simplest is probably a Guava cache, but would only be good for a single processing app. If you have a lot of messages or distributed consumption, consider a database for this job (with a background process to sweep for expired entries)
Before processing the message, attempt to store the messageId in "InProgress". If it's already there, stop - you just handled a duplicate.
Check if the message is "Completed" (and stop if it's there)
Your thread now has an exclusive lock on that messageId - Process your message
Mark the messageId as "Completed" - As long as this messageId stays here, you won't process any duplicates for that messageId.
You likely can't afford infinite storage though.
Remove the messageId from "InProgress" (or just let it expire from here)
Some notes
Keep in mind that chances of duplicate without all of that is already pretty low. Depending on how much time and money deduplication of messages is worth to you, feel free to skip or modify any of the steps
For example, you could leave out "InProgress", but that opens up the small chance of two threads working on a duplicated message at the same time (the second one starting before the first has "Completed" it)
Your deduplication window is as long as you can keep messageIds in "Completed". Since you likely can't afford infinite storage, make this last at least as long as 2x your SQS message visibility timeout; there is reduced chances of duplication after that (on top of the already very low chances, but still not guaranteed).
Even with all this, there is still a chance of duplication - all the precautions and SQS message visibility timeouts help reduce this chance to very small, but the chance is still there:
Your app can crash/hang/do a very long GC right after processing the message, but before the messageId is "Completed" (maybe you're using a database for this storage and the connection to it is down)
In this case, "Processing" will eventually expire, and another thread could process this message (either after SQS visibility timeout also expires or because SQS had a duplicate in it).

Store the message, or a reference to the message, in a database with a unique constraint on the Message ID, when you receive it. If the ID exists in the table, you've already received it, and the database will not allow you to insert it again -- because of the unique constraint.

AWS SQS API doesn't automatically "consume" the message when you read it with API,etc. Developer need to make the call to delete the message themselves.
SQS does have a features call "redrive policy" as part the "Dead letter Queue Setting". You just set the read request to 1. If the consume process crash, subsequent read on the same message will put the message into dead letter queue.
SQS queue visibility timeout can be set up to 12 hours. Unless you have a special need, then you need to implement process to store the message handler in database to allow it for inspection.

You can use setVisibilityTimeout() for both messages and batches, in order to extend the visibility time until the thread has completed processing the message.
This could be done by using a scheduledExecutorService, and schedule a runnable event after half the initial visibility time. The code snippet bellow creates and executes the VisibilityTimeExtender every half of the visibilityTime with a period of half the visibility time. (The time should to guarantee the message to be processed, extended with visibilityTime/2)
private final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
ScheduledFuture<?> futureEvent = scheduler.scheduleAtFixedRate(new VisibilityTimeExtender(..), visibilityTime/2, visibilityTime/2, TimeUnit.SECONDS);
VisibilityTimeExtender must implement Runnable, and is where you update the new visibility time.
When the thread is done processing the message, you can delete it from the queue, and call futureEvent.cancel(true) to stop the scheduled event.

JMS messages moving to DLQ

JMS mesages are sometimes moving to the DLQ without throwing any exception.
Jboss server instance used is 4.3.0.GA_CP04_EAP.
We are using an an MDB that listens for incoming messages on a queue A, when it receives any message it updates the database and sens an email in one transaction.Transaction is CMT.
Now, what is happening is, sometimes mesages are not picked up by the consumer and they end up in the DLQ. Though from the JMX- console message count i could see that the message once did arrive to the queue A but then goes to the DLQ.
This happens intermittently and does not throw any exceptions on the logs either .
What seems to work most of the times is restarting the servers. No idea about what happens behind the scenes though.
**And after 29 days, same problem has returned.
This follows a pattern but varies with every restart.
There are 2 clustered serevrs which also do loadbalancing , P1 and P2.
First two email messages go to and processed by P1-Email sent
Next email message resquest goes to P2-Email sent
Next two email messages go to and processed by P1-Email sent
Next email message resquest goes to P2-Email NOT SENT
and the cycle repeats
I have found a workaround to this nagging problem thanks to the helpful info found at http://leakfromjavaheap.blogspot.in/2013/05/when-dead-letter-queue-becomes-zombie.html
DLQ listener is set up to listen for any incoming messages and puts them back to their intended destination if any of them is found on DLQ.
Also, considering the situation where any message is travelling from DLQ to the Queue and back to the DLQ in endless loops, a counter is set to check how many times the message has been to the DLQ before, if it exceeds the limit, then it is put to a Permanent DLQ (DLQ for a DLQ).
Application has been running smoothly ever since.

If you can provide the log details when message goes to DLQ, would be better to dig into this issue.

The logs did not contain any useful info; not even an exception to give a hint.
Finally,changed the local tx data source to xa data source and it was a success.Still wondering if there is a reason behind it.

MaxConcurrentListeners and Remote Transactional Reads from MSMQ

Could it be that MaxConcurrentListereners on a DistributedTxMessageListenerContainer isn't much of use? I have the impression that only one thread at a time can handle a message from the queue. Maybe it's logical since the message will only be removed from the queue once the transaction is successfull. Or am I wrong here?

Yes, only one thread can receive a particular message from a queue.
Multiple threads can be receiving messages from a queue at any one time, though.
When a message is transactionally received from a queue, it becomes invisible to all other threads until the transaction aborts of commits.
If it aborts then the message reappears in the queue (made visible again); if it commits then the message is physically deleted from the queue.
Cheers
John Breakwell

MSMQ Adding a delay on Messages

I have a Microsoft Message Queue that gets populated with messages. If there is a problem with the processing of the message, I would like to retry the message, I do not want to retry the message immidiatley.
Is there a way to add a delay to the message in the MSMQ to avoid it being available for a certain amount of time??
The other alternative is to have another queue (A retry queue) and read that queue every 15 minutes, But i would rather not do this.

What you are looking for is "Poison Message Handling" ( even if its not the message fault, but an temporary environment problem ).
There are lots of articles on that. Here are some:
Poison Message Handling in MSMQ 3.0
Poison Message Handling in MSMQ 4.0
Surviving poison messages in MSMQ
In short: you have to move them to a retry queue.

So I've seen some code recently that handles this in the exception logic, the code has a built in retry step that attempts after a delay. It fails, waits for a specific amount of time, then tries again.
Essentially it recursively tries a set number of times (lengthening the delay each time). Fairly neat, no reason to have another queue. There is alot of generics and delegates used to execute the methods. Don't know if something like this could be done or not. I would suspect you would still want to handle the case of the message not being able to be delivered with another queue though.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse