Clustered MSMQ Issue - msmq

I am having an issue with MSMQ in a clustered environment. I have the following setup:
2 Nodes setup in a Windows Failover, lets call them "Node A" and "Node B".
I have then setup a Clustered Instance of MSMQ lets call it "MSMQ Instance".
I have also setup a Clustered instance of the DTC, lets call it "DTC Instance".
Within the DTC instance, I have allowed access both locally and also through the Clustered instance, basically I have taken all authentication off to test.
I have also created a clustered instance of our In house application, lets call it "Application Instance". Within this Application instance, I have other resources added, which are other services the application uses and also the Net.MSMQ adapter.
The Issue.......
When I seem to Cluster the Application Instance, it always seems to set the owner to be the opposite Node that I am using, so if I am creating the Clustered Instance on Node A it always sets the current owner to Node B, however that is not the issue.
The issue I have is that as long as the Application Instance is running on Node B, MSMQ seems to work.
The outbound queues are created locally, receive messages and are then processed through the MSMQ Cluster.
If I then Failover to Node A, the MSMQ refuses to work. The outbound queues are not created and therefore no messages are being processed.
I get an error in Event Viewer:
"The version check failed with the error: 'Unrecognized error -1072824309 (0xc00e000b)'. The version of MSMQ cannot be detected All operations that are on the queued channel will fail. Ensure that MSMQ is installed and is available"
If I then failover back to Node B it works.
The Application has been setup to use the MSMQ instance and all the permissions are correct.
Do I need to have a Clustered instance of DTC or can I just configure it as resource within the MSMQ instance?
Can anybody shed any light on this as I am at a brick wall with this?

Yes, you will need to have a clustered DTC setup.
For your clustered MSMQ instance you will then need to configure the clustered DTC as a "dependendy" Right click on MSMQ -> Properties -> Dependencies
I do not know if this is mandatory in all cases, but on our Cluster we also have a file share configured as a dependcy for the MSMQ. To my understanding this should ensure that temporary files that are needed by MSMQ are still available after a node switch.
Additionally, here are two articles that I found very helpful in setting up the cluster nodes. They might be helpful in confirming step-by-step that your configurations are correct:
"Building MSMQ cluster". You will find several other links in that article that will guide you further.
Microsoft also has a detailed document: "Deploying Message Queuing (MSMQ) 3.0 in a Server Cluster".

Related

Clustered, HA Distributed Transaction Manager

I'm looking for specific product/technology or any proposed solution for the following problem:
I need a JTA-compliant transaction manager, that can enlist XAResources via resource-adapters and perform two-phase commit
It should be transparently available in JBoss AS/WildFly
It should be clustered with high-availability for
Transaction manager itself
Application server (JBoss) with applications as clients for TM deployed at AS
As "clustered" I mean not TM clustering, but client clustering sharing the same transaction: e.g. transaction begins on one JBoss server, then continues on second and is committed/rolled back on third. So the underlying resource (Database, enterprise bus, messaging) see all the requests from several app-servers as ONE transaction
As "high-availability" I mean that any component involved in transaction work execution could have a standby/hot-active instance that could complete/rollback work in case of main instance out of order. This include:
Transaction manager itself (it should not rely on one instance running, all transaction info should be replicated on-line within cluster)
Transaction clients (application running on JBoss instance which is processing transactonal call should fail-over on other JBoss instance in case of server outage)
I can't get the JTS catch in terms of work with XA resources (not in terms of work with saved transactional objects) and have not yet achieved any success in setting up JTS in cluster/HA. May be there is an issue that transaction could be managed by only one instance of TM and if it fails the transaction is buried until server restarted.
I don't know whether what I'm looking for is an utopia or whether I an not on the right way at all :)

NServiceBus distributor cannot create queues on clustered MSMQ

I'm trying to set up a NServiceBus distributor on a Windows failover cluster. I've successfully followed the "official" guides and most of the things seem to work nicely. Except for actually starting the distributor on the cluster. When it starts it tries to create it's queues on the clustered MSMQ, but is denied permission:
Unhandled Exception: Magnum.StateMachine.StateMachineException: Exception occurred in Topshelf.Internal.ServiceController`1[[NServiceBus.Hosting.Windows.WindowsHost, NServiceBus.Host, Version=3.2.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c]] during state Initial while handling OnStart ---> System.Exception: Exception when starting endpoint, error has been logged. Reason: The queue does not exist or you do not have sufficient permissions to perform the operation. ---> System.Messaging.MessageQueueException: The queue does not exist or you do not have sufficient permissions to perform the operation.
I'm able to create queues when opening the clustered MSMQ manager, but even if I run the distributor using my own account it gets this error.
Something that might be related, is that I cannot change properties on the Message Queuing object in the clustered MSMQ manager. For instance, I try to change the message storage limit, I get this error:
The properties of TEST-CLU-MSMQ cannot be set
Error: This operation is not supported for Message Queuing installed in workgroup mode
I can however change this setting on the node's MSMQ settings, and those are also installed in workgroup mode.
Any ideas? I've tried reinstalling the cluster and services and just about everything, to no avail. Environment is Windows Server 2008R2

Questions Concerning Using Celery with Multiple Load-Balanced Django Application Servers

I'm interested in using Celery for an app I'm working on. It all seems pretty straight forward, but I'm a little confused about what I need to do if I have multiple load balanced application servers. All of the documentation assumes that the broker will be on the same server as the application. Currently, all of my application servers sit behind an Amazon ELB and tasks need to be able to come from any one of them.
This is what I assume I need to do:
Run a broker server on a separate instance
Configure each application instance to connect to that broker server
Each application instance will also be be a celery working (running
celeryd)?
My only beef with that is: What happens if my broker instance dies? Can I run 2 broker instances some how so I'm safe if one goes under?
Any tips or information on what to do in a setup like mine would be greatly appreciated. I'm sure I'm missing something or not understanding something.
For future reference, for those who do prefer to stick with RabbitMQ...
You can create a RabbitMQ cluster from 2 or more instances. Add those instances to your ELB and point your celeryd workers at the ELB. Just make sure you connect the right ports and you should be all set. Don't forget to allow your RabbitMQ machines to talk among themselves to run the cluster. This works very well for me in production.
One exception here: if you need to schedule tasks, you need a celerybeat process. For some reason, I wasn't able to connect the celerybeat to the ELB and had to connect it to one of the instances directly. I opened an issue about it and it is supposed to be resolved (didn't test it yet). Keep in mind that celerybeat by itself can only exist once, so that's already a single point of failure.
You are correct in all points.
How to make reliable broker: make clustered rabbitmq installation, as described here:
http://www.rabbitmq.com/clustering.html
Celery beat also doesn't have to be a single point of failure if you run it on every worker node with:
https://github.com/ybrs/single-beat

Clustered MSMQ - Invalid queue path name when sending

We have a two node cluster running on Windows 2008 R2. I've installed MSMQ with the Message Queue Server and Directory Service Integration options on both nodes. I've created a clustered MSMQ resource named TESTV0Msmq (we use transactional queues so a DTC resource had been created previously).
The virtual resource resolves correctly when I ping it.
I created a small console executable in c# using the MessageQueue contructor to allow us to send basic messages (to both transactional and non transactional queues).
From the active node these paths work:
.\private$\clustertest
{machinename}\private$\clustertest
but TESTV0Msmq\private$\clustertest returns "Invalid queue path name".
According to this article:
http://technet.microsoft.com/en-us/library/cc776600(WS.10).aspx
I should be able to do this?
In particular, queues can be created on a virtual server, and
messages can be sent to them. Such queues are addressed using the
VirtualServerName\QueueName syntax.
Sounds like classic Clustering MSMQ problem:
Clustering MSMQ applications - rule #1
If you can access ".\private$\clustertest" or "{machinename}\private$\clustertest" from the active node then that means there is a queue called clustertest hosted by the LOCAL MSMQ queue manager. It doesn't work on the passive node because there is no queue called clustertest there yet. If you fail over the resource, it should fail.
You need to create a queue in the clustered resource instead. TESTV0Msmq\private$\clustertest fails because the queue was created on the local machine and not the virtual machine.
Cheers
John Breakwell

MSMQ redundancy

I'm looking into WCF/MSMQ.
Does anyone know how one handles redudancy with MSMQ? It is my understanding that the queue sits on the server, but what if the server goes down and is not recoverable, how does one prevent the messages from being lost?
Any good articles on this topic?
There is a good article on using MSMQ in the enterprise here.
Tip 8 is the one you should read.
"Using Microsoft's Windows Clustering tool, queues will failover from one machine to another if one of the queue server machines stops functioning normally. The failover process moves the queue and its contents from the failed machine to the backup machine. Microsoft's clustering works, but in my experience, it is difficult to configure correctly and malfunctions often. In addition, to run Microsoft's Cluster Server you must also run Windows Server Enterprise Edition—a costly operating system to license. Together, these problems warrant searching for a replacement.
One alternative to using Microsoft's Cluster Server is to use a third-party IP load-balancing solution, of which several are commercially available. These devices attach to your network like a standard network switch, and once configured, load balance IP sessions among the configured devices. To load-balance MSMQ, you simply need to setup a virtual IP address on the load-balancing device and configure it to load balance port 1801. To connect to an MSMQ queue, sending applications specify the virtual IP address hosted by the load-balancing device, which then distributes the load efficiently across the configured machines hosting the receiving applications. Not only does this increase the capacity of the messages you can process (by letting you just add more machines to the server farm) but it also protects you from downtime events caused by failed servers.
To use a hardware load balancer, you need to create identical queues on each of the servers configured to be used in load balancing, letting the load balancer connect the sending application to any one of the machines in the group. To add an additional layer of robustness, you can also configure all of the receiving applications to monitor the queues of all the other machines in the group, which helps prevent problems when one or more machines is unavailable. The cost for such queue-monitoring on remote machines is high (it's almost always more efficient to read messages from a local queue) but the additional level of availability may be worth the cost."
Not to be snide, but you kind of answered your own question. If the server is unrecoverable, then you can't recover the messages.
That being said, you might want to back up the message folder regularly. This TechNet article will tell you how to do it:
http://technet.microsoft.com/en-us/library/cc773213.aspx
Also, it will not back up express messages, so that is something you have to be aware of.
If you prefer, you might want to store the actual messages for processing in a database upon receipt, and have the service be the consumer in a producer/consumer pattern.