NServiceBus and remote input queues with an MSMQ cluster - msmq

We would like to use the Publish / Subscribe abilities of NServiceBus with an MSMQ cluster. Let me explain in detail:
We have an SQL Server cluster that also hosts the MSMQ cluster. Besides SQL Server and MSMQ we cannot host any other application on this cluster. This means our subscriber is not allowed to run on the clsuter.
We have multiple application servers hosting different types of applications (going from ASP.NET MVC to SharePoint Server 2010). The goal is to do a pub/sub between all these applications.
All messages going through the pub/sub are critical and have an important value to the business. That's why we don't want local queues on the application server, but we want to use MSMQ on the cluster (in case we lose one of the application servers, we don't risk losing the messages since they are safe on the cluster).
Now I was assuming I could simply do the following at the subscriber side:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<configSections>
<section name="MsmqTransportConfig" type="NServiceBus.Config.MsmqTransportConfig, NServiceBus.Core" />
....
</configSections>
<MsmqTransportConfig InputQueue="myqueue#server" ErrorQueue="myerrorqueue"
NumberOfWorkerThreads="1" MaxRetries="5" />
....
</configuration>
I'm assuming this used to be supported seen the documentation: http://docs.particular.net/nservicebus/messaging/publish-subscribe/
But this actually throws an exception:
Exception when starting endpoint, error has been logged. Reason:
'InputQueue' entry in 'MsmqTransportConfig' section is obsolete. By
default the queue name is taken from the class namespace where the
configuration is declared. To override it, use .DefineEndpointName()
with either a string parameter as queue name or Func parameter
that returns queue name. In this instance, 'myqueue#server' is defined
as queue name.
Now, the exception clearly states I should use the DefineEndpointName method:
Configure.With()
.DefaultBuilder()
.DefineEndpointName("myqueue#server")
But this throws an other exception which is documented (input queues should be on the same machine):
Exception when starting endpoint, error has been logged. Reason: Input
queue must be on the same machine as this process.
How can I make sure that my messages are safe if I can't use MSMQ on my cluster?
Dispatcher!
Now I've also been looking into the dispatcher for a bit and this doesn't seem to solve my issue either. I'm assuming also the dispatcher wouldn't be able to get messages from a remote input queue? And besides that, if the dispatcher dispatches messages to the workers, and the workers go down, my messages are lost (even though they were not processed)?
Questions?
To summarize, these are the things I'm wondering with my scenario in NServiceBus:
I want my messages to be safe on the MSMQ cluster and use a remote input queue. Is this something is should or shouldn't do? Is it possible with NServiceBus?
Should I use a dispatcher in this case? Can it read from a remote input queue? (I cannot run the dispatcher on the cluster)
What if the dispatcher dispatchers messages to the workers and one of the workers goes down? Do I lose the message(s) that were being processed?

Phill's comment is correct.
The thing is that you would get the type of fault tolerance you require practically by default if you set up a virtualized environment. In that case, the C drive backing the local queue of your processes is actually sitting on the VM image on your SAN.

You will need a MessageEndpointMappings section that you will use to point to the Publisher's input queue. This queue is used by your Subscriber to drop off subscription messages. This will need to be QueueName#ClusterServerName. Be sure to use the cluster name and not a node name. The Subscriber's input queue will be used to receive messages from the Publisher and that will be local, so you don't need the #servername.
There are 2 levels of failure, one is that the transport is down(say MSMQ) and the other is that the endpoint is down(Windows Service). In the event that the endpoint is down, the transport will handle persisting the messages to disk. A redundant network storage device may be in order.
In the event that the transport is down, assuming it is MSMQ the messages will backup on the Publisher side of things. Therefore you have to account for the size and number of messages to calculate how long you want to messages to backup for. Since the Publisher is clustered, you can be assured that the messages will arrive eventually assuming you planned your disk appropriately.

Related

Messages are stuck in ActiveMQ Artemis cluster queues

We have a problem with Apache ActiveMQ Artemis cluster queues. Sometimes messages are beginning to pile up in the particular cluster queues. It usually happens 1-4 times per day and mostly on production (it was only one time for last 90 days when it has happened on one of the test environments).
These messages are not delivered to consumers on other cluster brokers until we restart cluster connector (or entire broker).
The problem looks related to ARTEMIS-3809.
Our setup is: 6 servers in one environment (3 pairs of master/backup servers). Operating system is Linux (Red Hat).
We have tried to:
upgrade from 2.22.0 to 2.23.1
increase minLargeMessageSize on the cluster connectors to 1024000
The messages are still being stuck in the cluster queues.
Another problem that I tried to configure min-large-message-size as it written in documentation (in cluster-connection), but it caused errors at start (broker.xml did not pass validation with xsd), so it was only option to specify minLargeMessageSize in the URL parameters of connector for each cluster broker. I don't know if this setting has effect.
So we had to make a script which checks if messages are stuck in the cluster queues and restarts cluster connector.
How can we debug this situation?
When the messages are stuck, nothing wrong is written to the log (no errors, no stacktraces etc.).
Which logging level (for what classes) should we enable to debug or trace level to find out what happens with the cluster connectors?
I believe you can remedy the situation by setting this on your cluster-connection:
<producer-window-size>-1</producer-window-size>
See ARTEMIS-3805 for more details.
Generally speaking, moving message around the cluster via the cluster-connection, while convenient, isn't terribly efficient (much less so for "large" messages). Ideally you would have a sufficient number of clients on each node to consume the messages that were originally produced there. If you don't have that many clients then you may want to re-evaluate the size of your cluster as it may actually decrease overall message throughput rather than increase it.
If you're just using 3 HA pairs in order to establish a quorum for replication then you should investigate the recently added pluggable quorum voting which allows integration with a 3rd party component (e.g. ZooKeeper) for leader election eliminating the need for a quorum of brokers.

Client Local Queue in Red Hat AMQ

We have a network of Red Hat AMQ 7.2 brokers with Master/Slave configuration. The client application publish / subscribe to topics on the broker cluster.
How do we handle the situation wherein the network connectivity between the client application and the broker cluster goes down? Does Red Hat AMQ have a native solution like client local queue and a jms to jms bridge between local queue and remote broker so that network connectivity failure will not result in loss of messages.
It would be possible for you to craft a solution where your clients use a local broker and that local broker bridges messages to the remote broker. The local broker will, of course, never lose network connectivity with the local clients since everything is local. However, if the local broker loses connectivity with the remote broker it will act as a buffer and store messages until connectivity with the remote broker is restored. Once connectivity is restored then the local broker will forward the stored messages to the remote broker. This will allow the producers to keep working as if nothing has actually failed. However, you would need to configure all this manually.
That said, even if you don't implement such a solution there is absolutely no need for any message loss even when clients encounter a loss of network connectivity. If you send durable (i.e. persistent) messages then by default the client will wait for a response from the broker telling the client that the broker successfully received and persisted the message to disk. More complex interactions might require local JMS transactions and even more complex interactions may require XA transactions. In any event, there are ways to eliminate the possibility of message loss without implementing some kind of local broker solution.

How to add health check for topics in KafkaStreams api

I have a critical Kafka application that needs to be up and running all the time. The source topics are created by debezium kafka connect for mysql binlog. Unfortunately, many things can go wrong with this setup. A lot of times debezium connectors fail and need to be restarted, so does my apps then (because without throwing any exception it just hangs up and stops consuming). My manual way of testing and discovering the failure is checking kibana log, then consume the suspicious topic through terminal. I can mimic this in code but obviously no way the best practice. I wonder if there is the ability in KafkaStream api that allows me to do such health check, and check other parts of kafka cluster?
Another point that bothers me is if I can keep the stream alive and rejoin the topics when connectors are up again.
You can check the Kafka Streams State to see if it is rebalancing/running, which would indicate healthy operations. Although, if no data is getting into the Topology, I would assume there would be no errors happening, so you need to then lookup the health of your upstream dependencies.
Overall, sounds like you might want to invest some time into using monitoring tools like Consul or Sensu which can run local service health checks and send out alerts when services go down. Or at the very least Elasticseach alerting
As far as Kafka health checking goes, you can do that in several ways
Is the broker and zookeeper process running? (SSH to the node, check processes)
Is the broker and zookeeper ports open? (use Socket connection)
Are there important JMX metrics you can track? (Metricbeat)
Can you find an active Controller broker (use AdminClient#describeCluster)
Are there a required minimum number of brokers you would like to respond as part of the Controller metadata (which can be obtained from AdminClient)
Are the topics that you use having the proper configuration? (retention, min-isr, replication-factor, partition count, etc)? (again, use AdminClient)

Storm cluster not working in Production mode

I have a storm topology which in two nodes. One is the nimbus and the other is the supervisor.
A proxy which is not part of storm accepts an HTTP request from a client and passes it to the storm topology.
The topology is like this:
1. The proxy passes data to a storm spout.
2. The spout passes data to multiple bolts.
3. The result is passed back to the proxy by the last bolt.
I am running the proxy and passing data to storm. I am able to connect a socket to the listener at the topology side. The data emitted by the spout is shown to be 0 in the UI. The same topology works fine in a local mode.
Thought it was a problem with supervisor, but the supervisor seems to be running fine because I am able to see the supervisor description and the individual spouts and bolts. But none of them emit anything.
Now, I am confused if the problem is the data being passed to the wrong machine or something. In order to communicate to the spout, Im creating the socket from the proxy as follows:
InetAddress stormInetAddr=InetAddress.getByName("198.18.17.16");
int stormPort=4321;
Socket stormSocket=new Socket(stormInetAddr,stormPort);
Here 198.18.17.16 is the nimbus IP. And 4321 is the port where data is being expected.
I tried giving the supervisor IP here, and it didnt connect. However, this does.
Now the proxy waits for the output on a specific port.
On the other side, after processing, data is read from the bolt. And there seems to be no activity from the cluster. But, I am getting a response which is basically the same request I had sent with some jumbled up data. And this response is supposed to be sent by the last bolt to a specific port which I had defined. And I GET data back, but the cluster shows NO ACTIVITY. I know this is very vague, but, does anyone have any idea as to whats happening?
It sounds like Storm is working fine, but your proxy/network settings are not. If it were a storm error, you should see exceptions in Nimbus UI and/or in the Storm supervisor logs.
Consider temporarily shutting down storm and use nc -l 4321 on the supervisor machines to assert your proxy is working as expected.
However...
You may have a fundamental flaw in your model. Storm's spouts are pull-based, so it seems odd to have incoming requests pushed to them. This is possible, of course, if you have your spouts start listening when they spin up and simply queue the requests. However, this presents another challenge for your model: you will likely have multiple spouts running on a single machine and they cannot share the same port (4321).
If you want to meld these two world of push & pull; then consider using a Kafka Spout.

JBOSS messaging replicated queue

I am using JBOSS messaging in the following way:
1) Two JBOSS instances using 'all' config (i.e. clustered config)
2) One replicated queue created on each JBOSS instance wiht same JNDI name (clustered = true)
3) one producer attach locally to the queue on each instance (i.e. both the producer on both the nodes keep on adding messages to this replicated queue)
4) One JBOSS instance is marked as "consumer node" and queue message consumer is started on only this node (i.e. messages will be consumed on only one node). There is a logic which will decide which JBOSS instance is marked as "consumer node"
5) PostOffice used is clustered
6) server peer configured to not enforce message sequencing.
7) produced messages are non-persistent (deliveryMode = NON_PERSISTENT)
But I am facing problem with this. Messages produced on "non consumer node" do not get replicated to the queue on the "consumer node" and hence not available for consumption.
I enabled the logging and checked that postoffice finds two queues but only delivers to the local queue as it discovers that the remote queue is recoverable.
Any idea how to set it working?
FYI: I believe a message can be delivered to only one queue (local or remote). So, I want only one queue which is distributed but I am currently getting 2 different distributed queues (however their JNDI name is same). Is this a problem? If yes, how to solve this? Weblogic provides the option of creating a queue on admin server and thus a shared queue is possible there. What is the similar mechanism in JBOSS messaging? Or should I need to approach this problem as 2 queues which are synchronized. If yes, then how to achieve synchronization between them?
Thanks for taking out sometime to help me!!
Regards