NServiceBus MSMQ messages intermittently get stuck on the Outgoing Queue - msmq

We have a Pub / Sub system based on NServiceBus, where we have intermittent issues with messages getting stuck on the Publishers outgoing queue indefinitely, rather than being transmitted to the Subscribers input queues.
Points to note:
When we restart the Publisher Service and Subscriber services, message flow resumes normally for a while.
The problem seems to occur more often if a sustained period of time between messages occurs.
The publisher service resides on the LAN, the subscribers on the otherside of a firewall.
Some messages get through! As mentioned after service restarts, things go fine for a while.
Using QueueExplorer, I can see the messages on the Outgoing queue have a state of WAITING.
Annoyingly our development environment does not exhibit this behaviour, but then again the publisher and subscribers all reside on the same LAN in this environment.

MSMQ messages being stuck in an outgoing queue is purely an MSMQ issue. Restarting the Publisher and Subscriber services should make no difference as they are not directly involved in message delivery. If you can fix the problem by ONLY restarting the Pub/Sub services and NOT the Message Queuing services then it looks like a resources/memory leak problem.
I imagine something like this happening:
Messages flow to destination, which uses up kernel memory in storing them
For some reason, kernel memory runs out (too many messages, memory leak, whetever)
Destination now rejects new messages as they cannot be loaded into memory from the wire
Connection is reset and not re-connected until WaitTime value reached; Queue is "waiting" at this point
System loops through (3) and (4) until ...
Pub/Sub services are restarted and now there is sufficient resources for messages to be delivered
Goto (2)
Occasional messages get through when just enough kernel memory is temporarily freed up by one of the many services and device drivers that use it.
Item 4 of this blog post is the most likely culprit:
http://blogs.msdn.com/b/johnbreakwell/archive/2006/09/18/insufficient-resources-run-away-run-away.aspx
Cheers
John Breakwell

We had a similar scenario in production, it turned out we migrated one of our subscriber endpoints to a new physical host and forgot to unsubscribe before shutting down the old endpoint. Our publisher was trying to deliver messages to both the old and new endpoints but could only reach the new one. Eventually the publishers outbound queue grew so large that it started affecting all outgoing messages.

I have run into this issue as well, I know it is not Item 4, as I don't send anything to it before it gets stuck in the outgoing queue. If I let both publisher and subscriber sit for about 10 minutes before sending a message, it never leaves the outgoing queue. If I send a message before that amount of time, it flows fine. Also, if I restart the subscriber the message will then flow. This is reproducible every time I let them sit idle for 10 minutes.
I think I found the answer here, at least this fixed the issue I was having:
http://support.microsoft.com/kb/2554746
Also, in my case it had nothing to do with restarting, so don't let that throw you off, I did exhibit the symptoms in the netstat and messages would initially go through when the client was first started up.

Just to throw my 2p in:
We had an issue where the message queuing service had some kind of memory leak and would consume large amounts of memory which is did not release.
This lead to messages getting stuck for long periods of time - although they would eventually be delivered (sometimes after 3 days).
We have not bothered fixing this yet as it only happens when the service is under heavy load which does not happen often.

Related

How to handle application failure after reading event from source in Spring Cloud Stream with rabbit MQ

I am using Spring Cloud Stream over RabbitMQ for my project. I have a processor that reads from a source, process the message and publish it to the sink.
Is my understanding correct that if my application picks up an event from the stream and fails (e.g. app sudden death):
unless I ack the message or
I save the message after reading it from the queue
then my event would be lost? What other option would I have to make sure not to lose the event in such case?
DIgging through the Rabbit-MQ documentation I found this very useful example page for the different types of queues and message deliveries for RabbitMQ, and most of them can be used with AMPQ.
In particular looking at the work queue example for java, I found exactly the answer that I was looking for:
Message acknowledgment
Doing a task can take a few seconds. You may wonder what happens if
one of the consumers starts a long task and dies with it only partly
done. With our current code, once RabbitMQ delivers a message to the
consumer it immediately marks it for deletion. In this case, if you
kill a worker we will lose the message it was just processing. We'll
also lose all the messages that were dispatched to this particular
worker but were not yet handled. But we don't want to lose any tasks.
If a worker dies, we'd like the task to be delivered to another
worker.
In order to make sure a message is never lost, RabbitMQ supports
message acknowledgments. An ack(nowledgement) is sent back by the
consumer to tell RabbitMQ that a particular message has been received,
processed and that RabbitMQ is free to delete it.
If a consumer dies (its channel is closed, connection is closed, or
TCP connection is lost) without sending an ack, RabbitMQ will
understand that a message wasn't processed fully and will re-queue it.
If there are other consumers online at the same time, it will then
quickly redeliver it to another consumer. That way you can be sure
that no message is lost, even if the workers occasionally die.
There aren't any message timeouts; RabbitMQ will redeliver the message
when the consumer dies. It's fine even if processing a message takes a
very, very long time.
Manual message acknowledgments are turned on by default. In previous
examples we explicitly turned them off via the autoAck=true flag. It's
time to set this flag to false and send a proper acknowledgment from
the worker, once we're done with a task.
Thinking about it, using the ACK seems to be the logic thing to do. The reason why I didn't think about it before, is because I thought of a ACK just under the perspective of the publisher and not of the broker. The piece of documentation above was very useful to me.

MSMQ - Send copies of messages received

Is there any way to configure an MSMQ queue to send copies of all messages it receives to another MSMQ queue? I have a memory leak on a production application that services a queue. I have a test version (that hopefully fixes the memory leak) on a test server, that services a test queue. I want to deluge the test version with the production stream of messages, to ensure that the memory leak has been fixed. After I am done testing, I would like to shut off this "message forwarding"
I had the same problem on my application, I was faced with 2 solutions, the easiest one I would recommend you to do is to make a very simple application that Peeks every message in a Queue via a transaction, and send a copy of the Message object to another queue, and you're done, just Abort() the transaction, that way you can be sure it'll be restored and wait for the production app to process the messages.
The other alternative would be to have the Message Queue apps just send the messages to yet another message queue, that way you don't have to mess around with peeks in Production and you'll have full access to your own queue in a test environment.
No. MSMQ is a protocol for delivering messages. You would need an application to read the delivered messages and send new copies to a different queue.

MSMQ: How do you send a msg from transactional dead letter queue to a private queue on remote machine

Windows Server 2012
MSMQ 6 Workgroup Mode
We've had issues trying to recover MSMQ messages that were sent to the transaction dead letter queue. We've tried moving them to the outbound queue, the message seems to send fine (even the Event Log says so) however it never gets to the destination queue.
After trial and error we've figured out how to get them to another queue on the same server but not to the destination queue on a remote server. We don't want to lose anymore messages. Does anyone have any suggestion on how we can deliver these messages?
Thank you,
David
As I understood your question, it's a one time problem with some number of messages you already have in MSMQ, and not general connectivity issue between machines? If so, you should be able to solve it with some MSMQ management tool. Disclaimer: I'm the author of one such tool - QueueExplorer. I don't know what other tools can do, but with QueueExplorer you can copy/paste or drag/drop messages to another machine opened in separate tab/window. In order to do that QueueExplorer has to perform MSMQ Send operation, so messages will have to pass through MSMQ between these two machines.
So if there's still that issue that prevented original delivery you'll still be stuck. In that case you can save all messages to a file, transfer it to another machine through file system and load it there to whichever queue they should go. This is obviusly just a manual workaround for one time situation. Btw. this could be done in QueueExplore's trial mode.
If however problem is with connectivity and messages always end up in dead letter queue, it's better to check them from Computer Management. It's one area where it's better than our tool - you can turn on "Class" column and see reason why messages couldn't be delivered. For instance if you see "The time-to-be-received has elapsed" you'll know what's the problem.

MSMQ console showing message count but no messages for private queue

I have a transactional private message queue (among other message queues on which I have not seen this problem) on a Windows Server 2008 R2 server.
This particular queue has a recurring problem happening every few weeks where the console shows a nonzero count of messages in the queue, but it does not have any messages in the queue itself or any subqueue. Queue Explorer shows the same thing. Performance counters indicate there are messages like the count in the built-in msmq console and queue explorer.
I cannot find any messages. I understand that I could see a situation like this for outgoing queues with dead letter tracking such that it may have been delivered to a remote machine but not yet processed. This is not an outgoing queue, though. Messages are sourced from remote machines and have landed here on this machine.
Also, I am certain that the count I'm seeing are not journal messages or subqueues.
Does this make any sense? Is there a logical explanation for this and under some circumstance this is expected? If so, what is it?
EDIT: Removed info about purging queue removing the count - that was incorrect. Purging actually does nothing and leaves me in the same state as before with a count reflected, but no messages showing.
As you noted, you can see a message count on an outgoing queue if source journaling is in use. The invisible messages are there in case they need to be moved to the DLQ.I would expect your problem to be similar - there should be a visible message in the outgoing queue and an invisible message in the destination queue because delivery hasn't completed. I assume a handshaking or storage acknowledgement has been lost along the way. Or maybe the message has been processed and removed from the queue but MSMQ couldn't update the sender of the fact. Check the outgoing queues on the remote machines sending TO this queue.

MSMQ: What can cause a "Insufficient resources to perform operation" error when receiving from a queue?

MSMQ: What can cause a "Insufficient resources to perform operation" error when receiving from a queue?
At the time the queue only held 2,000 messages with each message being about 5KB in size.
I had the same error message and the solution was simple.
There were a lot of messages sitting on various queues, and the storage limits had been reached. I went to:
Server Manager -> Features
Right clicked on Message Queuing
Selected properties
In the General tab un-ticked the storage limits
I was informed that services using MSMQ would be re-started, and then the error went away.
From John Breakwell's Blog there are eleven possibilities:
The thread pool for the remote read is exhausted (MSMQ 2.0 only).
The number of local callback threads is exceeded
The volume of messages has exceeded
what the system can handle (MSMQ 2.0
only).
Paged-pool kernel memory is
exhausted.
Mismatched binaries.
The message size is too large.
The machine quota has been exceeded.
Routing problems when opening a
transactional foreign queue (MSMQ
3.0 only)
Lack of disk space.
Storage problems on mobile devices
Clustering too many MSMQ resources
Too many open connections
Computer name was longer than 15 characters
Too many messages in the dead letter queue
http://blogs.msdn.com/johnbreakwell/archive/2006/09/18/761035.aspx
I would check the version of your queue and the amount of connections (to and from) your queue open at the time of error. Any of those "could have" caused your error.
I had too many failed messages in my outgoing queue.
Check System Queues -> Dead-letter messages. I cleared this queue out and it worked fine again.
If journaling is enabled, you will be storing copies of all messages removed from the queue, so you might also be hitting the MSMQ journal limit. Short term fix might be to purge the journals for the queue, longer term - disable journaling.
I encountered the same error, after checking the things mentioned above it turned out that it was the computer name that was causing the issue! It was longer than 15 characters, after I changed it to a shorter one the issue was gone.
For me, the problem was not the machine that hosted the queue. It was the machine that was sending the message to the queue. I noticed that the "Outgoing Queues" on the source machine showed large numbers of messages, which led me to MSMQ Messages Are Stuck In The Outgoing Queue. Reinstalling MSMQ on the source machine is what fixed it for me.