RabbitMQ - design re-try mechanism - mongodb

Whenever I'm not able to process events/messages that are in RabbitMQ, I am storing into MongoDB for automated and/or manual re-try. Question is how to enable automated re-try mechanism from MongoDB. How effectively can I listen to MongoDB? Is it good to store failed events/messages in MongoDB upfront? Or should I create error queue where I can listen to failed events/messages push the messages to MongoDB for manual re-try whenever automated re-try fails? Any other suggestions?
My intention is to design effective re-try mechanism for failed RabbitMQ events/messages

This is a typical use case for dead-letter exchanges.
Any queue can be associated with a dead-letter exchange. A consumer which is unable to process a message for whatever reason it can reject the given message. The message will be routed to the dead-letter exchange which works as a regular one. You can therefore apply any routing policy on the dead-lettered message.

Related

How to handle kafka consumer failures

I am trying understand how to handle failed consumer records. How to
we know there is record failure. What I am seeing is when the record
processing failed in the consumer with runtime exception consumer is
keep on retrying. But when the next record is available to process it
is commiting offset of the latest record, which is expected. My
question how to we know about failed record. In older messaging
systems failed messages are rolled back to queues and processing stops
there. Then we know the queue is down and we can take action.
I can record the failed record into some db table,but what happens if this recording fails?
I can move failures to error/ dead letter queues, again what happens if this moving fails?
I am using kafka 2.6 with spring boot 2.3.4. Any help would be appreciated
Sounds like you would need to disable auto commits and manually commit the offsets yourself when your scope of "sucessfully processed" is achieved. If you include external processes like a database, then you will also need to increase Kafka client timeouts so it doesnt think the consumer is dead while waiting on error logging/handling.

Ressource announcement - Message queing registry on Kubernetes

I have an application which makes use of RabbitMQ messages - it sends messages. Other applications can react on these messages but they need to know which messages are available on the system and what they semantically mean.
My message queuing system is RabbitMQ and RabbitMQ as well as the applications are hosted and administered using Kubernetes.
I am aware of
https://kubemq.io/: That seems to be an alternative to RabbitMQ?
https://knative.dev/docs/eventing/event-registry/ also an alternative to RabbitMQ? but with a meta-layer approach to integrate existing event sources? The documentation is not clear for me.
Is there a general-purpose "MQ-interface service availabe, a solution, where I can register which messages are sent by an application, how the payload is technically and semantically set up, which serialization format is used and under what circumstances errors will be sent?
Can I do this in Kubernetes YAML-files?
RabbitMQ does not have this in any kind of generic fashion so you would have to write it yourself. Rabbit messages are just a series of bytes, there is no schema registry.

Synchronization mode of ha replication

Version : ActiveMQ Artemis 2.10.1
When we use ha-policy and replication, is the synchronization mode between the master and the slave full synchronization? Can we choose full synchronization or asynchronization?
I'm not 100% certain what you mean by "full synchronization" so I'll just explain how the brokers behave...
When a master broker receives a durable (i.e. persistent) message it will write the message to disk and send the message to the slave in parallel. The broker will then wait for the local disk write operation to complete as well as receive a response from the slave that it accepted the message before it responds to the client who originally sent the message.
This behavior is not configurable.

How to add health check for topics in KafkaStreams api

I have a critical Kafka application that needs to be up and running all the time. The source topics are created by debezium kafka connect for mysql binlog. Unfortunately, many things can go wrong with this setup. A lot of times debezium connectors fail and need to be restarted, so does my apps then (because without throwing any exception it just hangs up and stops consuming). My manual way of testing and discovering the failure is checking kibana log, then consume the suspicious topic through terminal. I can mimic this in code but obviously no way the best practice. I wonder if there is the ability in KafkaStream api that allows me to do such health check, and check other parts of kafka cluster?
Another point that bothers me is if I can keep the stream alive and rejoin the topics when connectors are up again.
You can check the Kafka Streams State to see if it is rebalancing/running, which would indicate healthy operations. Although, if no data is getting into the Topology, I would assume there would be no errors happening, so you need to then lookup the health of your upstream dependencies.
Overall, sounds like you might want to invest some time into using monitoring tools like Consul or Sensu which can run local service health checks and send out alerts when services go down. Or at the very least Elasticseach alerting
As far as Kafka health checking goes, you can do that in several ways
Is the broker and zookeeper process running? (SSH to the node, check processes)
Is the broker and zookeeper ports open? (use Socket connection)
Are there important JMX metrics you can track? (Metricbeat)
Can you find an active Controller broker (use AdminClient#describeCluster)
Are there a required minimum number of brokers you would like to respond as part of the Controller metadata (which can be obtained from AdminClient)
Are the topics that you use having the proper configuration? (retention, min-isr, replication-factor, partition count, etc)? (again, use AdminClient)

NServiceBus and remote input queues with an MSMQ cluster

We would like to use the Publish / Subscribe abilities of NServiceBus with an MSMQ cluster. Let me explain in detail:
We have an SQL Server cluster that also hosts the MSMQ cluster. Besides SQL Server and MSMQ we cannot host any other application on this cluster. This means our subscriber is not allowed to run on the clsuter.
We have multiple application servers hosting different types of applications (going from ASP.NET MVC to SharePoint Server 2010). The goal is to do a pub/sub between all these applications.
All messages going through the pub/sub are critical and have an important value to the business. That's why we don't want local queues on the application server, but we want to use MSMQ on the cluster (in case we lose one of the application servers, we don't risk losing the messages since they are safe on the cluster).
Now I was assuming I could simply do the following at the subscriber side:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<configSections>
<section name="MsmqTransportConfig" type="NServiceBus.Config.MsmqTransportConfig, NServiceBus.Core" />
....
</configSections>
<MsmqTransportConfig InputQueue="myqueue#server" ErrorQueue="myerrorqueue"
NumberOfWorkerThreads="1" MaxRetries="5" />
....
</configuration>
I'm assuming this used to be supported seen the documentation: http://docs.particular.net/nservicebus/messaging/publish-subscribe/
But this actually throws an exception:
Exception when starting endpoint, error has been logged. Reason:
'InputQueue' entry in 'MsmqTransportConfig' section is obsolete. By
default the queue name is taken from the class namespace where the
configuration is declared. To override it, use .DefineEndpointName()
with either a string parameter as queue name or Func parameter
that returns queue name. In this instance, 'myqueue#server' is defined
as queue name.
Now, the exception clearly states I should use the DefineEndpointName method:
Configure.With()
.DefaultBuilder()
.DefineEndpointName("myqueue#server")
But this throws an other exception which is documented (input queues should be on the same machine):
Exception when starting endpoint, error has been logged. Reason: Input
queue must be on the same machine as this process.
How can I make sure that my messages are safe if I can't use MSMQ on my cluster?
Dispatcher!
Now I've also been looking into the dispatcher for a bit and this doesn't seem to solve my issue either. I'm assuming also the dispatcher wouldn't be able to get messages from a remote input queue? And besides that, if the dispatcher dispatches messages to the workers, and the workers go down, my messages are lost (even though they were not processed)?
Questions?
To summarize, these are the things I'm wondering with my scenario in NServiceBus:
I want my messages to be safe on the MSMQ cluster and use a remote input queue. Is this something is should or shouldn't do? Is it possible with NServiceBus?
Should I use a dispatcher in this case? Can it read from a remote input queue? (I cannot run the dispatcher on the cluster)
What if the dispatcher dispatchers messages to the workers and one of the workers goes down? Do I lose the message(s) that were being processed?
Phill's comment is correct.
The thing is that you would get the type of fault tolerance you require practically by default if you set up a virtualized environment. In that case, the C drive backing the local queue of your processes is actually sitting on the VM image on your SAN.
You will need a MessageEndpointMappings section that you will use to point to the Publisher's input queue. This queue is used by your Subscriber to drop off subscription messages. This will need to be QueueName#ClusterServerName. Be sure to use the cluster name and not a node name. The Subscriber's input queue will be used to receive messages from the Publisher and that will be local, so you don't need the #servername.
There are 2 levels of failure, one is that the transport is down(say MSMQ) and the other is that the endpoint is down(Windows Service). In the event that the endpoint is down, the transport will handle persisting the messages to disk. A redundant network storage device may be in order.
In the event that the transport is down, assuming it is MSMQ the messages will backup on the Publisher side of things. Therefore you have to account for the size and number of messages to calculate how long you want to messages to backup for. Since the Publisher is clustered, you can be assured that the messages will arrive eventually assuming you planned your disk appropriately.