ActiveMQ Artemis: full disk policy

ActiveMQ Artemis: full disk policy - activemq-artemis

I'm using ActiveMQ Artemis 2.17. I've set up the address full message policy to PAGING and ran some crash tests. When the disk is full, I'm getting the message "AMQ222212: Disk Full! Blocking message production on address...". Is there a setup to raise an error to the producers instead of blocking them?
Regards
Nicolas

When the disk is full the only option is to block. However, if the protocol doesn't support flow control an exception will be thrown. This is noted in the documentation.
For what it's worth, individual addresses can be configured to behave differently when they reach max-size-bytes. One of those options is to return a failure to producers, e.g.:
<address-full-policy>FAIL</address-full-policy>
See the documentation for more details.

Related

Messages are stuck in ActiveMQ Artemis cluster queues

We have a problem with Apache ActiveMQ Artemis cluster queues. Sometimes messages are beginning to pile up in the particular cluster queues. It usually happens 1-4 times per day and mostly on production (it was only one time for last 90 days when it has happened on one of the test environments).
These messages are not delivered to consumers on other cluster brokers until we restart cluster connector (or entire broker).
The problem looks related to ARTEMIS-3809.
Our setup is: 6 servers in one environment (3 pairs of master/backup servers). Operating system is Linux (Red Hat).
We have tried to:
upgrade from 2.22.0 to 2.23.1
increase minLargeMessageSize on the cluster connectors to 1024000
The messages are still being stuck in the cluster queues.
Another problem that I tried to configure min-large-message-size as it written in documentation (in cluster-connection), but it caused errors at start (broker.xml did not pass validation with xsd), so it was only option to specify minLargeMessageSize in the URL parameters of connector for each cluster broker. I don't know if this setting has effect.
So we had to make a script which checks if messages are stuck in the cluster queues and restarts cluster connector.
How can we debug this situation?
When the messages are stuck, nothing wrong is written to the log (no errors, no stacktraces etc.).
Which logging level (for what classes) should we enable to debug or trace level to find out what happens with the cluster connectors?

I believe you can remedy the situation by setting this on your cluster-connection:
<producer-window-size>-1</producer-window-size>
See ARTEMIS-3805 for more details.
Generally speaking, moving message around the cluster via the cluster-connection, while convenient, isn't terribly efficient (much less so for "large" messages). Ideally you would have a sufficient number of clients on each node to consume the messages that were originally produced there. If you don't have that many clients then you may want to re-evaluate the size of your cluster as it may actually decrease overall message throughput rather than increase it.
If you're just using 3 HA pairs in order to establish a quorum for replication then you should investigate the recently added pluggable quorum voting which allows integration with a 3rd party component (e.g. ZooKeeper) for leader election eliminating the need for a quorum of brokers.

Kafka - Error on specific consumer -Broker not available

We have deployed multiple Kafka consumers in container's clusters. All are working properly except for one, which is throwing warning "Connection to node 0 could not be established. Broker may not be available", however, this error appears only in one of the containers, and this consumer is running in the same network and server of the others. So I have ruled out issues with kafka server configuration.
I tried changing the groupid of the consumer and I got it working for some minutes, but now warn is appearing again. I consume all topics used by this consumer from a bash shell and I can consume.
Having into account the above context, I think it could be due to bad practice in the consumer software code, also, it could be about offsets got damaged. How could I identify if are there some of this kind using kafka logs?

You can exec into the container and netcat the broker's advertised addresses to verify connectivity.
You can also use the Kafka shell scripts to verify consuming functionality, as always.
Corrupted offsets would prevent any consumer from reading, not only one. Bad code practices wouldn't show up in logs
If you have the container running "on same server as others", I'd suggest working with affinity rules and constraints to spread your applications onto multiple servers before placing on the same machine

Consuming # subscription via MQTT causes huge queue on lost connection

I am using Artemis 2.14 and Java 14.0.2 on two Ubuntu 18.04 VM with 4 Cores an 16 GB RAM. My producers send approximately 2,000 Messages per seconds to approx 5,500 different Topics.
When I connect via the MQTT.FX client with certificate based authorization and do a subscription to # the MQTT.FX client dies after some time and in the web console I see a queue under # with my client id that won't be cleared by Artemis. It seems that this queue grows until the RAM is 100% used. After some time my Artemis Broker restarts itself.
Is this behaviour of Artemis normal? How can I tell Artemis to clean up "zombie" queues after some time?
I already tried to use this configuration parameters in different ways, but nothing works:
confirmationWindowSize=0
clientFailureCheckPeriod=30000
consumerWindowSize=0

Auto-created queues are automatically removed by default when:
consumer-count is 0
message-count is 0
This is done so that no messages are inadvertently deleted.
However, you can change the corresponding auto-delete-queues-message-count address-setting in broker.xml to -1 to skip the message-count check. Also, can adjust the auto-delete-queues-delay to configure a delay if needed.
It's worth noting that if you create a subscription like # (which is fairly dangerous) you need to be prepared to consume the messages as quickly as they are produced to avoid accumulation of messages in the queue. If accumulation is unavoidable then you should configure the max-size-bytes and address-full-policy according to your needs so the broker doesn't get overwhelmed. See the documentation for more details on that.

Diverts deleted when restarting ActiveMQ Artemis

I have searched and haven't found anything about this online nor in the manual.
I have setup addresses and use both multicast to several queues and an AnyCast to a single queue (All durable queues). To this I have connected Diverts created in the API in runtime.
The diverts works great when sending messages. BUT when I restart the ActiveMQ Artemis instance the Diverts are deleted. Everything else is in place. Just the Diverts are deleted.
Any ideas on how to keep Diverts after a Restart?

Diverts created via the management API during runtime are volatile. If you wish to have a divert which survives a broker restart you should modify the broker.xml with the desired divert configuration.
Of course, the current behavior may not work for your use-case. If that's true then I would encourage you to open a "Feature Request" JIRA at Artemis JIRA project. Furthermore, if you're really committed to seeing the behavior change you can download the code, make the necessary changes, and submit a pull-request (or attach a patch to the JIRA). Checkout the Artemis Hacking Guide for help getting started.

NServiceBus and remote input queues with an MSMQ cluster

We would like to use the Publish / Subscribe abilities of NServiceBus with an MSMQ cluster. Let me explain in detail:
We have an SQL Server cluster that also hosts the MSMQ cluster. Besides SQL Server and MSMQ we cannot host any other application on this cluster. This means our subscriber is not allowed to run on the clsuter.
We have multiple application servers hosting different types of applications (going from ASP.NET MVC to SharePoint Server 2010). The goal is to do a pub/sub between all these applications.
All messages going through the pub/sub are critical and have an important value to the business. That's why we don't want local queues on the application server, but we want to use MSMQ on the cluster (in case we lose one of the application servers, we don't risk losing the messages since they are safe on the cluster).
Now I was assuming I could simply do the following at the subscriber side:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<configSections>
<section name="MsmqTransportConfig" type="NServiceBus.Config.MsmqTransportConfig, NServiceBus.Core" />
....
</configSections>
<MsmqTransportConfig InputQueue="myqueue#server" ErrorQueue="myerrorqueue"
NumberOfWorkerThreads="1" MaxRetries="5" />
....
</configuration>
I'm assuming this used to be supported seen the documentation: http://docs.particular.net/nservicebus/messaging/publish-subscribe/
But this actually throws an exception:
Exception when starting endpoint, error has been logged. Reason:
'InputQueue' entry in 'MsmqTransportConfig' section is obsolete. By
default the queue name is taken from the class namespace where the
configuration is declared. To override it, use .DefineEndpointName()
with either a string parameter as queue name or Func parameter
that returns queue name. In this instance, 'myqueue#server' is defined
as queue name.
Now, the exception clearly states I should use the DefineEndpointName method:
Configure.With()
.DefaultBuilder()
.DefineEndpointName("myqueue#server")
But this throws an other exception which is documented (input queues should be on the same machine):
Exception when starting endpoint, error has been logged. Reason: Input
queue must be on the same machine as this process.
How can I make sure that my messages are safe if I can't use MSMQ on my cluster?
Dispatcher!
Now I've also been looking into the dispatcher for a bit and this doesn't seem to solve my issue either. I'm assuming also the dispatcher wouldn't be able to get messages from a remote input queue? And besides that, if the dispatcher dispatches messages to the workers, and the workers go down, my messages are lost (even though they were not processed)?
Questions?
To summarize, these are the things I'm wondering with my scenario in NServiceBus:
I want my messages to be safe on the MSMQ cluster and use a remote input queue. Is this something is should or shouldn't do? Is it possible with NServiceBus?
Should I use a dispatcher in this case? Can it read from a remote input queue? (I cannot run the dispatcher on the cluster)
What if the dispatcher dispatchers messages to the workers and one of the workers goes down? Do I lose the message(s) that were being processed?

Phill's comment is correct.
The thing is that you would get the type of fault tolerance you require practically by default if you set up a virtualized environment. In that case, the C drive backing the local queue of your processes is actually sitting on the VM image on your SAN.

You will need a MessageEndpointMappings section that you will use to point to the Publisher's input queue. This queue is used by your Subscriber to drop off subscription messages. This will need to be QueueName#ClusterServerName. Be sure to use the cluster name and not a node name. The Subscriber's input queue will be used to receive messages from the Publisher and that will be local, so you don't need the #servername.
There are 2 levels of failure, one is that the transport is down(say MSMQ) and the other is that the endpoint is down(Windows Service). In the event that the endpoint is down, the transport will handle persisting the messages to disk. A redundant network storage device may be in order.
In the event that the transport is down, assuming it is MSMQ the messages will backup on the Publisher side of things. Therefore you have to account for the size and number of messages to calculate how long you want to messages to backup for. Since the Publisher is clustered, you can be assured that the messages will arrive eventually assuming you planned your disk appropriately.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse