Clustered MSMQ - Invalid queue path name when sending - msmq

We have a two node cluster running on Windows 2008 R2. I've installed MSMQ with the Message Queue Server and Directory Service Integration options on both nodes. I've created a clustered MSMQ resource named TESTV0Msmq (we use transactional queues so a DTC resource had been created previously).
The virtual resource resolves correctly when I ping it.
I created a small console executable in c# using the MessageQueue contructor to allow us to send basic messages (to both transactional and non transactional queues).
From the active node these paths work:
.\private$\clustertest
{machinename}\private$\clustertest
but TESTV0Msmq\private$\clustertest returns "Invalid queue path name".
According to this article:
http://technet.microsoft.com/en-us/library/cc776600(WS.10).aspx
I should be able to do this?
In particular, queues can be created on a virtual server, and
messages can be sent to them. Such queues are addressed using the
VirtualServerName\QueueName syntax.

Sounds like classic Clustering MSMQ problem:
Clustering MSMQ applications - rule #1
If you can access ".\private$\clustertest" or "{machinename}\private$\clustertest" from the active node then that means there is a queue called clustertest hosted by the LOCAL MSMQ queue manager. It doesn't work on the passive node because there is no queue called clustertest there yet. If you fail over the resource, it should fail.
You need to create a queue in the clustered resource instead. TESTV0Msmq\private$\clustertest fails because the queue was created on the local machine and not the virtual machine.
Cheers
John Breakwell

Related

Clustered, HA Distributed Transaction Manager

I'm looking for specific product/technology or any proposed solution for the following problem:
I need a JTA-compliant transaction manager, that can enlist XAResources via resource-adapters and perform two-phase commit
It should be transparently available in JBoss AS/WildFly
It should be clustered with high-availability for
Transaction manager itself
Application server (JBoss) with applications as clients for TM deployed at AS
As "clustered" I mean not TM clustering, but client clustering sharing the same transaction: e.g. transaction begins on one JBoss server, then continues on second and is committed/rolled back on third. So the underlying resource (Database, enterprise bus, messaging) see all the requests from several app-servers as ONE transaction
As "high-availability" I mean that any component involved in transaction work execution could have a standby/hot-active instance that could complete/rollback work in case of main instance out of order. This include:
Transaction manager itself (it should not rely on one instance running, all transaction info should be replicated on-line within cluster)
Transaction clients (application running on JBoss instance which is processing transactonal call should fail-over on other JBoss instance in case of server outage)
I can't get the JTS catch in terms of work with XA resources (not in terms of work with saved transactional objects) and have not yet achieved any success in setting up JTS in cluster/HA. May be there is an issue that transaction could be managed by only one instance of TM and if it fails the transaction is buried until server restarted.
I don't know whether what I'm looking for is an utopia or whether I an not on the right way at all :)

Clustered MSMQ Issue

I am having an issue with MSMQ in a clustered environment. I have the following setup:
2 Nodes setup in a Windows Failover, lets call them "Node A" and "Node B".
I have then setup a Clustered Instance of MSMQ lets call it "MSMQ Instance".
I have also setup a Clustered instance of the DTC, lets call it "DTC Instance".
Within the DTC instance, I have allowed access both locally and also through the Clustered instance, basically I have taken all authentication off to test.
I have also created a clustered instance of our In house application, lets call it "Application Instance". Within this Application instance, I have other resources added, which are other services the application uses and also the Net.MSMQ adapter.
The Issue.......
When I seem to Cluster the Application Instance, it always seems to set the owner to be the opposite Node that I am using, so if I am creating the Clustered Instance on Node A it always sets the current owner to Node B, however that is not the issue.
The issue I have is that as long as the Application Instance is running on Node B, MSMQ seems to work.
The outbound queues are created locally, receive messages and are then processed through the MSMQ Cluster.
If I then Failover to Node A, the MSMQ refuses to work. The outbound queues are not created and therefore no messages are being processed.
I get an error in Event Viewer:
"The version check failed with the error: 'Unrecognized error -1072824309 (0xc00e000b)'. The version of MSMQ cannot be detected All operations that are on the queued channel will fail. Ensure that MSMQ is installed and is available"
If I then failover back to Node B it works.
The Application has been setup to use the MSMQ instance and all the permissions are correct.
Do I need to have a Clustered instance of DTC or can I just configure it as resource within the MSMQ instance?
Can anybody shed any light on this as I am at a brick wall with this?
Yes, you will need to have a clustered DTC setup.
For your clustered MSMQ instance you will then need to configure the clustered DTC as a "dependendy" Right click on MSMQ -> Properties -> Dependencies
I do not know if this is mandatory in all cases, but on our Cluster we also have a file share configured as a dependcy for the MSMQ. To my understanding this should ensure that temporary files that are needed by MSMQ are still available after a node switch.
Additionally, here are two articles that I found very helpful in setting up the cluster nodes. They might be helpful in confirming step-by-step that your configurations are correct:
"Building MSMQ cluster". You will find several other links in that article that will guide you further.
Microsoft also has a detailed document: "Deploying Message Queuing (MSMQ) 3.0 in a Server Cluster".

NServiceBus distributor cannot create queues on clustered MSMQ

I'm trying to set up a NServiceBus distributor on a Windows failover cluster. I've successfully followed the "official" guides and most of the things seem to work nicely. Except for actually starting the distributor on the cluster. When it starts it tries to create it's queues on the clustered MSMQ, but is denied permission:
Unhandled Exception: Magnum.StateMachine.StateMachineException: Exception occurred in Topshelf.Internal.ServiceController`1[[NServiceBus.Hosting.Windows.WindowsHost, NServiceBus.Host, Version=3.2.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c]] during state Initial while handling OnStart ---> System.Exception: Exception when starting endpoint, error has been logged. Reason: The queue does not exist or you do not have sufficient permissions to perform the operation. ---> System.Messaging.MessageQueueException: The queue does not exist or you do not have sufficient permissions to perform the operation.
I'm able to create queues when opening the clustered MSMQ manager, but even if I run the distributor using my own account it gets this error.
Something that might be related, is that I cannot change properties on the Message Queuing object in the clustered MSMQ manager. For instance, I try to change the message storage limit, I get this error:
The properties of TEST-CLU-MSMQ cannot be set
Error: This operation is not supported for Message Queuing installed in workgroup mode
I can however change this setting on the node's MSMQ settings, and those are also installed in workgroup mode.
Any ideas? I've tried reinstalling the cluster and services and just about everything, to no avail. Environment is Windows Server 2008R2

NServicebus failing while sending messages to a msmq cluster queue in a load balanced environment

We are having MSMQ issues in a load balanced, high volume environment using NServiceBus.
Our environment looks as follows: 1 F5 distributing web traffic via round robin to 6 application servers. Each of these 6 servers uses a Bus.Send to 1 queue on a remote machine that resides on a cluster.
The event throughput during normal usage is approximately 5-10 per second, per server. So 30-60 events per second in the entire environment, depending on load.
The issue we're seeing is that 1 of the application boxes is able to send messages to the cluster queue, but the other 5 are not. Looking at the 5 boxes experiencing failure, the outgoing queue to the cluster is inactive.
There are also a high number of events in the transaction dead letter queue. When we purge that queue, the outgoing queue connects to the cluster, however, the messages grow as unacknowledged in the outgoing queue. This continues to grow until they move into the transaction dead letter queue again, and the outgoing queue changes state to inactive.
Interestingly, when we perform this purge operation, a different box will become the 'good box'. So we're pretty sure that the issue is not one bad box, it's that only 1 box at a time can reliably maintain a connection to the cluster queue.
Has anybody come across this before?
We have, and it was because of the issue described here: http://blogs.msdn.com/b/johnbreakwell/archive/2007/02/06/msmq-prefers-to-be-unique.aspx
Short version: Every MSMQ installation has an unique id assigned to it when you install MSMQ. It is called QMId and located in the registry under
HKLM\Software\Microsoft\MSMQ\Parameters\Machine Cache\QMid
It is used as an identifier when doing send to a remote receiver, which in turn uses it to send ACKs back to the correct sender. The receiver, in your case the cluster, maintains a cache that maps QMIds to IPs. Our problem was that several of our workers had the SAME QMId. This ment the cluster sent all ACKS for all messages from all the machines to the first machine who sent a message. At some point, and for some operations like a MSMQ windows service restart, the cache expires and ANOTHER machine magically "works".
So check your 6 servers and make sure none of them has the same QMid. Ours had the same value because they were all ghosted from a Windows image that was taken after MSMQ was installed.
The fix is easy, just reinstall the MSMQ feature on each machine to generate a new unique QMId.
If your machines are created from the same image, you probably have non-unique MachineCache IDs. You can fix this by running the following powershell script on each machine.
This can be done before the image is created, or on each machine after it is launched.
Remove-ItemProperty -Path 'Registry::HKEY_LOCAL_MACHINE\Software\Microsoft\MSMQ\Parameters\MachineCache' -name 'QMId'
Set-ItemProperty -Path 'Registry::HKLM\Software\Microsoft\MSMQ\Parameters' -Name SysPrep -Value 1
Restart-Service -Name 'MSMQ'

MSMQ redundancy

I'm looking into WCF/MSMQ.
Does anyone know how one handles redudancy with MSMQ? It is my understanding that the queue sits on the server, but what if the server goes down and is not recoverable, how does one prevent the messages from being lost?
Any good articles on this topic?
There is a good article on using MSMQ in the enterprise here.
Tip 8 is the one you should read.
"Using Microsoft's Windows Clustering tool, queues will failover from one machine to another if one of the queue server machines stops functioning normally. The failover process moves the queue and its contents from the failed machine to the backup machine. Microsoft's clustering works, but in my experience, it is difficult to configure correctly and malfunctions often. In addition, to run Microsoft's Cluster Server you must also run Windows Server Enterprise Edition—a costly operating system to license. Together, these problems warrant searching for a replacement.
One alternative to using Microsoft's Cluster Server is to use a third-party IP load-balancing solution, of which several are commercially available. These devices attach to your network like a standard network switch, and once configured, load balance IP sessions among the configured devices. To load-balance MSMQ, you simply need to setup a virtual IP address on the load-balancing device and configure it to load balance port 1801. To connect to an MSMQ queue, sending applications specify the virtual IP address hosted by the load-balancing device, which then distributes the load efficiently across the configured machines hosting the receiving applications. Not only does this increase the capacity of the messages you can process (by letting you just add more machines to the server farm) but it also protects you from downtime events caused by failed servers.
To use a hardware load balancer, you need to create identical queues on each of the servers configured to be used in load balancing, letting the load balancer connect the sending application to any one of the machines in the group. To add an additional layer of robustness, you can also configure all of the receiving applications to monitor the queues of all the other machines in the group, which helps prevent problems when one or more machines is unavailable. The cost for such queue-monitoring on remote machines is high (it's almost always more efficient to read messages from a local queue) but the additional level of availability may be worth the cost."
Not to be snide, but you kind of answered your own question. If the server is unrecoverable, then you can't recover the messages.
That being said, you might want to back up the message folder regularly. This TechNet article will tell you how to do it:
http://technet.microsoft.com/en-us/library/cc773213.aspx
Also, it will not back up express messages, so that is something you have to be aware of.
If you prefer, you might want to store the actual messages for processing in a database upon receipt, and have the service be the consumer in a producer/consumer pattern.