Artemis HA configuration for core bridges

Artemis HA configuration for core bridges - activemq-artemis

We have 4 servers in two shared disk HA pairs, with core bridges between them. The core bridge configuration and the connectors they use (sms and sms1b) are identical on all 4 servers. The only differences being master vs slave ha, and the host names in the other fields (acceptor, artemis and node0 connector, name)
In testing we found the bridge works perfectly when the two lives are up, but sometimes when shutting down a live server, the backup never opens a consumer for the bridge.
Is this the intended way to configure a pair of HA servers with a core bridge, or is the backup server configured wrong?
<configuration xmlns="urn:activemq"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xi="http://www.w3.org/2001/XInclude"
xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
<core xmlns="urn:activemq:core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:activemq:core ">
<name>ba-artms3.example.com</name>
<security-enabled>false</security-enabled>
<persistence-enabled>true</persistence-enabled>
<paging-directory>/data/ba_artemis/msg-sms1/paging</paging-directory>
<bindings-directory>/data/ba_artemis/msg-sms1/bindings</bindings-directory>
<journal-directory>/data/ba_artemis/msg-sms1/journal</journal-directory>
<large-messages-directory>/data/ba_artemis/msg-sms1/large-messages</large-messages-directory>
<journal-datasync>true</journal-datasync>
<journal-min-files>2</journal-min-files>
<journal-pool-files>10</journal-pool-files>
<journal-device-block-size>4096</journal-device-block-size>
<journal-file-size>10M</journal-file-size>
<journal-buffer-timeout>132000</journal-buffer-timeout>
<journal-max-io>4096</journal-max-io>
<connectors>
<connector name="artemis">tcp://ba-artms3.example.com:2539</connector>
<connector name = "node0">tcp://ba-artms4.example.com:2539</connector>
<connector name="sms1">(tcp://ba-artms3.example.com:61616,tcp://ba-artms4.example.com:61616)</connector>
<connector name="sms1b">(tcp://ba-artms9.example.com:61616,tcp://ba-artms10.example.com:61616)</connector>
</connectors>
<disk-scan-period>5000</disk-scan-period>
<max-disk-usage>90</max-disk-usage>
<critical-analyzer>true</critical-analyzer>
<critical-analyzer-timeout>120000</critical-analyzer-timeout>
<critical-analyzer-check-period>60000</critical-analyzer-check-period>
<critical-analyzer-policy>HALT</critical-analyzer-policy>
<page-sync-timeout>620000</page-sync-timeout>
<acceptors>
<acceptor name="artemis">tcp://ba-artms3.example.com:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;amqpMinLargeMessageSize=102400;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true;</acceptor>
<acceptor name="cluster">tcp://ba-artms3.example.com:2539?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;amqpMinLargeMessageSize=102400;protocols=CORE;useEpoll=true</acceptor>
<acceptor name="amqp">tcp://ba-artms3.example.com:5672?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=AMQP;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpMinLargeMessageSize=102400;amqpDuplicateDetection=true</acceptor>
</acceptors>
<cluster-user>msg-sms1-cluster</cluster-user>
<cluster-password>redacted</cluster-password>
<cluster-connections>
<cluster-connection name="msg-sms1">
<connector-ref>artemis</connector-ref>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>0</max-hops>
<static-connectors>
<connector-ref>node0</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
<ha-policy>
<shared-store>
<master>
<failover-on-shutdown>true</failover-on-shutdown>
</master>
</shared-store>
</ha-policy>
<security-settings>
<security-setting match="#">
<permission type="createNonDurableQueue" roles="amq"/>
<permission type="deleteNonDurableQueue" roles="amq"/>
<permission type="createDurableQueue" roles="amq"/>
<permission type="deleteDurableQueue" roles="amq"/>
<permission type="createAddress" roles="amq"/>
<permission type="deleteAddress" roles="amq"/>
<permission type="consume" roles="amq"/>
<permission type="browse" roles="amq"/>
<permission type="send" roles="amq"/>
<!-- we need this otherwise ./artemis data imp wouldn't work -->
<permission type="manage" roles="amq"/>
</security-setting>
</security-settings>
<address-settings>
<address-setting match="activemq.management#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
<address-setting match="#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
</address-settings>
<xi:include href="${configDir}/addresses.xml"/>
<bridges>
<bridge name="sms1_forwarder">
<queue-name>UpdateOutboundForward_0</queue-name>
<forwarding-address>UpdateOutbound</forwarding-address>
<ha>true</ha>
<failover-on-server-shutdown>true</failover-on-server-shutdown>
<user>rave</user>
<password>redacted</password>
<static-connectors>
<connector-ref>sms1</connector-ref>
</static-connectors>
</bridge>
<bridge name="sms1b_forwarder">
<queue-name>UpdateOutboundForward_1</queue-name>
<forwarding-address>UpdateOutbound</forwarding-address>
<ha>true</ha>
<failover-on-server-shutdown>true</failover-on-server-shutdown>
<user>rave</user>
<password>redacted</password>
<static-connectors>
<connector-ref>sms1b</connector-ref>
</static-connectors>
</bridge>
</bridges>
</core>
</configuration>
Keep in mind that the acceptor on port 2539 is specifically used for clustering. There are 4 servers total: ba-artms3 (live), ba-artms4 (slave) & ba-artms9 (live), ba-artms10 (slave).

Your configuration looks essentially correct, but it's hard to tell with so many moving pieces - especially the extra acceptor for clustering. I've seen folks do that before, but it's not something I've ever tested or recommended so I'm not sure how it would function in practice. Theoretically it would be fine, but there are always complicating factors many of which are subtle.
It would be worth simplifying your configuration to only what's absolutely necessary to reproduce the problem. For example, configure just 1 bridge on 1 server connecting to a live/backup pair with all the brokers on your local machine on unique ports (i.e. no docker). Once you have that working you can keeping adding complexity and testing as you go to see where things break down (assuming they do).

The solution that worked for me was using one connector per server instead of the bracket and comma syntax:
<connectors>
<connector name="artemis">tcp://ba-artms3.example.com:2539</connector>
<connector name = "node0">tcp://ba-artms4.example.com:2539</connector>
<connector name="sms1_1">tcp://ba-artms3.example.com:2539</connector>
<connector name="sms1_2">tcp://ba-artms4.example.com:2539</connector>
<connector name="sms1b_1">tcp://ba-artms9.example.com:2539</connector>
<connector name="sms1b_2">tcp://ba-artms10.example.com:2539</connector>
</connectors>
And then listing both of them in the list:
<bridge name="sms1b_forwarder">
<queue-name>UpdateOutboundForward_1</queue-name>
<forwarding-address>UpdateOutbound</forwarding-address>
<ha>true</ha>
<failover-on-server-shutdown>true</failover-on-server-shutdown>
<user>rave</user>
<password>redacted</password>
<static-connectors>
<connector-ref>sms1b_1</connector-ref>
<connector-ref>sms1b_2</connector-ref>
</static-connectors>
</bridge>
After that in the log instead of seeing:
2021-07-20 09:13:51,809 WARN [org.apache.activemq.artemis.core.server] AMQ224091: Bridge BridgeImpl#5886a172 [name=sms1b_forwarder, queue=QueueImpl[name=UpdateOutboundForward_1, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=13be52e3-cb0f-11eb-a851-000c29d5fa03], temp=false]#56414412 targetConnector=ServerLocatorImpl (identity=Bridge sms1b_forwarder) [initialConnectors=[TransportConfiguration(name=sms1b, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=ba-artms9-example-com], discoveryGroupConfiguration=null]] is unable to connect to destination. Retrying
2021-07-20 09:13:51,880 INFO [org.apache.activemq.artemis.core.server] AMQ221027: Bridge BridgeImpl#983e222 [name=sms1_forwarder, queue=QueueImpl[name=UpdateOutboundForward_0, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=13be52e3-cb0f-11eb-a851-000c29d5fa03], temp=false]#7440d62 targetConnector=ServerLocatorImpl (identity=Bridge sms1_forwarder) [initialConnectors=[TransportConfiguration(name=sms1, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=ba-artms3-example-com], discoveryGroupConfiguration=null]] is connected
I was seeing:
2021-07-20 09:30:40,333 INFO [org.apache.activemq.artemis.core.server] AMQ221027: Bridge BridgeImpl#339a572d [name=sms1b_forwarder, queue=QueueImpl[name=UpdateOutboundForward_1, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=13be52e3-cb0f-11eb-a851-000c29d5fa03], temp=false]#3d5daf2e targetConnector=ServerLocatorImpl (identity=Bridge sms1b_forwarder) [initialConnectors=[TransportConfiguration(name=sms1b_1, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=ba-artms9-example-com, TransportConfiguration(name=sms1b_2, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=61616&host=ba-artms10-example-com], discoveryGroupConfiguration=null]] is connected
2021-07-20 09:30:42,206 INFO [org.apache.activemq.artemis.core.server] AMQ221027: Bridge BridgeImpl#47c22520 [name=sms1_forwarder, queue=QueueImpl[name=UpdateOutboundForward_0, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=13be52e3-cb0f-11eb-a851-000c29d5fa03], temp=false]#4905c3a8 targetConnector=ServerLocatorImpl (identity=Bridge sms1_forwarder) [initialConnectors=[TransportConfiguration(name=sms1_1, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=2539&host=ba-artms3-example-com, TransportConfiguration(name=sms1_2, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=2539&host=ba-artms4-example-com], discoveryGroupConfiguration=null]] is connected
Note both servers now showing up in the initialConnectors.

Related

Red Hat AMQ broker errors in HA cluster configuration

I have Red Hat AMQ broker 7.4.1 HA cluster pair configured using shared storage over NFS with static discovery cluster.
Master's broker.xml:
<?xml version='1.0'?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<configuration xmlns="urn:activemq"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xi="http://www.w3.org/2001/XInclude"
xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
<core xmlns="urn:activemq:core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:activemq:core ">
<name>0.0.0.0</name>
<persistence-enabled>true</persistence-enabled>
<journal-type>ASYNCIO</journal-type>
<paging-directory>/Share/JBossMQ7.4/data/paging</paging-directory>
<bindings-directory>/Share/JBossMQ7.4/data/bindings</bindings-directory>
<journal-directory>/Share/JBossMQ7.4/data/journal</journal-directory>
<large-messages-directory>/Share/JBossMQ7.4/data/large-messages</large-messages-directory>
<journal-datasync>true</journal-datasync>
<journal-min-files>100</journal-min-files>
<journal-pool-files>250</journal-pool-files>
<journal-file-size>10M</journal-file-size>
<!--
This value was determined through a calculation.
Your system could perform 62.5 writes per millisecond
on the current journal configuration.
That translates as a sync write every 16000 nanoseconds.
Note: If you specify 0 the system will perform writes directly to the disk.
We recommend this to be 0 if you are using journalType=MAPPED and journal-datasync=false.
-->
<journal-buffer-timeout>16000</journal-buffer-timeout>
<!--
When using ASYNCIO, this will determine the writing queue depth for libaio.
-->
<journal-max-io>4096</journal-max-io>
<!--
You can verify the network health of a particular NIC by specifying the <network-check-NIC> element.
<network-check-NIC>theNicName</network-check-NIC>
-->
<!--
Use this to use an HTTP server to validate the network
<network-check-URL-list>http://www.apache.org</network-check-URL-list> -->
<!-- <network-check-period>10000</network-check-period> -->
<!-- <network-check-timeout>1000</network-check-timeout> -->
<!-- this is a comma separated list, no spaces, just DNS or IPs
it should accept IPV6
Warning: Make sure you understand your network topology as this is meant to validate if your network is valid.
Using IPs that could eventually disappear or be partially visible may defeat the purpose.
You can use a list of multiple IPs, and if any successful ping will make the server OK to continue running -->
<!-- <network-check-list>10.0.0.1</network-check-list> -->
<!-- use this to customize the ping used for ipv4 addresses -->
<!-- <network-check-ping-command>ping -c 1 -t %d %s</network-check-ping-command> -->
<!-- use this to customize the ping used for ipv6 addresses -->
<!-- <network-check-ping6-command>ping6 -c 1 %2$s</network-check-ping6-command> -->
<!-- how often we are looking for how many bytes are being used on the disk in ms -->
<disk-scan-period>5000</disk-scan-period>
<!-- once the disk hits this limit the system will block, or close the connection in certain protocols
that won't support flow control. -->
<max-disk-usage>95</max-disk-usage>
<!-- should the broker detect dead locks and other issues -->
<critical-analyzer>true</critical-analyzer>
<critical-analyzer-timeout>240000</critical-analyzer-timeout>
<critical-analyzer-check-period>60000</critical-analyzer-check-period>
<critical-analyzer-policy>HALT</critical-analyzer-policy>
<!-- the system will enter into page mode once you hit this limit.
This is an estimate in bytes of how much the messages are using in memory
The system will use half of the available memory (-Xmx) by default for the global-max-size.
You may specify a different value here if you need to customize it to your needs.
<global-max-size>100Mb</global-max-size>
-->
<ha-policy>
<shared-store>
<master>
<failover-on-shutdown>true</failover-on-shutdown>
</master>
</shared-store>
</ha-policy>
<connectors>
<connector name="master-connector">tcp://server24:61616</connector>
<connector name="slave-connector">tcp://server25:61616</connector>
</connectors>
<acceptors>
<!-- useEpoll means: it will use Netty epoll if you are on a system (Linux) that supports it -->
<!-- amqpCredits: The number of credits sent to AMQP producers -->
<!-- amqpLowCredits: The server will send the # credits specified at amqpCredits at this low mark -->
<!-- Note: If an acceptor needs to be compatible with HornetQ and/or Artemis 1.x clients add
"anycastPrefix=jms.queue.;multicastPrefix=jms.topic." to the acceptor url.
See https://issues.apache.org/jira/browse/ARTEMIS-1644 for more information. -->
<acceptor name="netty-acceptor">tcp://server24:61616</acceptor>
</acceptors>
<cluster-user>admin</cluster-user>
<cluster-password>admin</cluster-password>
<cluster-connections>
<cluster-connection name="static-cluster">
<connector-ref>master-connector</connector-ref>
<static-connectors>
<connector-ref>slave-connector</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
<security-settings>
<security-setting match="#">
<permission type="createNonDurableQueue" roles="amq"/>
<permission type="deleteNonDurableQueue" roles="amq"/>
<permission type="createDurableQueue" roles="amq"/>
<permission type="deleteDurableQueue" roles="amq"/>
<permission type="createAddress" roles="amq"/>
<permission type="deleteAddress" roles="amq"/>
<permission type="consume" roles="amq"/>
<permission type="browse" roles="amq"/>
<permission type="send" roles="amq"/>
<!-- we need this otherwise ./artemis data imp wouldn't work -->
<permission type="manage" roles="amq"/>
</security-setting>
</security-settings>
<address-settings>
<!-- if you define auto-create on certain queues, management has to be auto-create -->
<address-setting match="activemq.management#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
<address-setting match="Claim">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<max-delivery-attempts>-1</max-delivery-attempts>
<redelivery-delay>1000</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
<!--default for catch all-->
<address-setting match="#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
</address-settings>
<addresses>
<address name="DLQ">
<anycast>
<queue name="DLQ" />
</anycast>
</address>
<address name="ExpiryQueue">
<anycast>
<queue name="ExpiryQueue" />
</anycast>
</address>
</addresses>
<!-- Uncomment the following if you want to use the Standard LoggingActiveMQServerPlugin pluging to log in events
<broker-plugins>
<broker-plugin class-name="org.apache.activemq.artemis.core.server.plugin.impl.LoggingActiveMQServerPlugin">
<property key="LOG_ALL_EVENTS" value="true"/>
<property key="LOG_CONNECTION_EVENTS" value="true"/>
<property key="LOG_SESSION_EVENTS" value="true"/>
<property key="LOG_CONSUMER_EVENTS" value="true"/>
<property key="LOG_DELIVERING_EVENTS" value="true"/>
<property key="LOG_SENDING_EVENTS" value="true"/>
<property key="LOG_INTERNAL_EVENTS" value="true"/>
</broker-plugin>
</broker-plugins>
-->
</core>
</configuration>
Slave's broker.xml:
<?xml version='1.0'?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<configuration xmlns="urn:activemq"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xi="http://www.w3.org/2001/XInclude"
xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
<core xmlns="urn:activemq:core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:activemq:core ">
<name>0.0.0.0</name>
<persistence-enabled>true</persistence-enabled>
<journal-type>ASYNCIO</journal-type>
<paging-directory>/Share/JBossMQ7.4/data/paging</paging-directory>
<bindings-directory>/Share/JBossMQ7.4/data/bindings</bindings-directory>
<journal-directory>/Share/JBossMQ7.4/data/journal</journal-directory>
<large-messages-directory>/Share/JBossMQ7.4/data/large-messages</large-messages-directory>
<journal-datasync>true</journal-datasync>
<journal-min-files>100</journal-min-files>
<journal-pool-files>250</journal-pool-files>
<journal-file-size>10M</journal-file-size>
<!--
This value was determined through a calculation.
Your system could perform 62.5 writes per millisecond
on the current journal configuration.
That translates as a sync write every 16000 nanoseconds.
Note: If you specify 0 the system will perform writes directly to the disk.
We recommend this to be 0 if you are using journalType=MAPPED and journal-datasync=false.
-->
<journal-buffer-timeout>16000</journal-buffer-timeout>
<!--
When using ASYNCIO, this will determine the writing queue depth for libaio.
-->
<journal-max-io>4096</journal-max-io>
<!--
You can verify the network health of a particular NIC by specifying the <network-check-NIC> element.
<network-check-NIC>theNicName</network-check-NIC>
-->
<!--
Use this to use an HTTP server to validate the network
<network-check-URL-list>http://www.apache.org</network-check-URL-list> -->
<!-- <network-check-period>10000</network-check-period> -->
<!-- <network-check-timeout>1000</network-check-timeout> -->
<!-- this is a comma separated list, no spaces, just DNS or IPs
it should accept IPV6
Warning: Make sure you understand your network topology as this is meant to validate if your network is valid.
Using IPs that could eventually disappear or be partially visible may defeat the purpose.
You can use a list of multiple IPs, and if any successful ping will make the server OK to continue running -->
<!-- <network-check-list>10.0.0.1</network-check-list> -->
<!-- use this to customize the ping used for ipv4 addresses -->
<!-- <network-check-ping-command>ping -c 1 -t %d %s</network-check-ping-command> -->
<!-- use this to customize the ping used for ipv6 addresses -->
<!-- <network-check-ping6-command>ping6 -c 1 %2$s</network-check-ping6-command> -->
<!-- how often we are looking for how many bytes are being used on the disk in ms -->
<disk-scan-period>5000</disk-scan-period>
<!-- once the disk hits this limit the system will block, or close the connection in certain protocols
that won't support flow control. -->
<max-disk-usage>95</max-disk-usage>
<!-- should the broker detect dead locks and other issues -->
<critical-analyzer>true</critical-analyzer>
<critical-analyzer-timeout>240000</critical-analyzer-timeout>
<critical-analyzer-check-period>60000</critical-analyzer-check-period>
<critical-analyzer-policy>HALT</critical-analyzer-policy>
<!-- the system will enter into page mode once you hit this limit.
This is an estimate in bytes of how much the messages are using in memory
The system will use half of the available memory (-Xmx) by default for the global-max-size.
You may specify a different value here if you need to customize it to your needs.
<global-max-size>100Mb</global-max-size>
-->
<ha-policy>
<shared-store>
<slave>
<allow-failback>false</allow-failback>
<failover-on-shutdown>true</failover-on-shutdown>
</slave>
</shared-store>
</ha-policy>
<connectors>
<connector name="master-connector">tcp://server24:61616</connector>
<connector name="slave-connector">tcp://server25:61616</connector>
</connectors>
<acceptors>
<acceptor name="netty-acceptor">tcp://server25:61616</acceptor>
</acceptors>
<cluster-user>admin</cluster-user>
<cluster-password>admin</cluster-password>
<cluster-connections>
<cluster-connection name="static-cluster">
<connector-ref>slave-connector</connector-ref>
<static-connectors>
<connector-ref>master-connector</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
<security-settings>
<security-setting match="#">
<permission type="createNonDurableQueue" roles="amq"/>
<permission type="deleteNonDurableQueue" roles="amq"/>
<permission type="createDurableQueue" roles="amq"/>
<permission type="deleteDurableQueue" roles="amq"/>
<permission type="createAddress" roles="amq"/>
<permission type="deleteAddress" roles="amq"/>
<permission type="consume" roles="amq"/>
<permission type="browse" roles="amq"/>
<permission type="send" roles="amq"/>
<!-- we need this otherwise ./artemis data imp wouldn't work -->
<permission type="manage" roles="amq"/>
</security-setting>
</security-settings>
<address-settings>
<!-- if you define auto-create on certain queues, management has to be auto-create -->
<address-setting match="activemq.management#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
<address-setting match="Claim">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<max-delivery-attempts>-1</max-delivery-attempts>
<redelivery-delay>1000</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
<!--default for catch all-->
<address-setting match="#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
</address-settings>
<addresses>
<address name="DLQ">
<anycast>
<queue name="DLQ" />
</anycast>
</address>
<address name="ExpiryQueue">
<anycast>
<queue name="ExpiryQueue" />
</anycast>
</address>
</addresses>
<!-- Uncomment the following if you want to use the Standard LoggingActiveMQServerPlugin pluging to log in events
<broker-plugins>
<broker-plugin class-name="org.apache.activemq.artemis.core.server.plugin.impl.LoggingActiveMQServerPlugin">
<property key="LOG_ALL_EVENTS" value="true"/>
<property key="LOG_CONNECTION_EVENTS" value="true"/>
<property key="LOG_SESSION_EVENTS" value="true"/>
<property key="LOG_CONSUMER_EVENTS" value="true"/>
<property key="LOG_DELIVERING_EVENTS" value="true"/>
<property key="LOG_SENDING_EVENTS" value="true"/>
<property key="LOG_INTERNAL_EVENTS" value="true"/>
</broker-plugin>
</broker-plugins>
-->
</core>
</configuration>
Although this is working but getting some errors/warnings regularly.
Normally we keep slave as live and master as backup. But after errors will start accumulating we will stop broker on slave and master will be live. Then we will again power up broker on slave and stop on master. This will take us to starting state.
I have configured <allow-failback>false</allow-failback> since it was not able to failback effectively.
Errors before and after fail over:
2022-08-30 08:47:51,838 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: AMQ229014: Did not receive data from /10.000.000.42:29260 within the 30,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
2022-08-30 16:24:48,501 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: syscall:write(..) failed: Broken pipe [code=GENERIC_EXCEPTION]
2022-08-30 16:24:48,501 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: syscall:write(..) failed: Broken pipe [code=GENERIC_EXCEPTION]
2022-08-30 16:24:48,504 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: syscall:write(..) failed: Broken pipe [code=GENERIC_EXCEPTION]
2022-08-30 16:24:48,522 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: AMQ229014: Did not receive data from /10.111.225.41:28018 within the 30,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
2022-08-30 16:24:48,535 WARN [org.apache.activemq.artemis.journal] "Reference Handler" daemon prio=10 Id=2 RUNNABLE
at java.base#11.0.6/java.lang.ref.Reference.waitForReferencePendingList(Native Method)
at java.base#11.0.6/java.lang.ref.Reference.processPendingReferences(Reference.java:241)
at java.base#11.0.6/java.lang.ref.Reference$ReferenceHandler.run(Reference.java:213)
2022-08-30 16:24:48,541 WARN [org.apache.activemq.artemis.journal] "Finalizer" daemon prio=8 Id=3 WAITING on java.lang.ref.ReferenceQueue$Lock#443d3450
at java.base#11.0.6/java.lang.Object.wait(Native Method)
2022-08-30 16:24:52,070 WARN [org.apache.activemq.artemis.core.server] AMQ222154: Error checking DLQ: ActiveMQShutdownException[errorType=SHUTDOWN_ERROR message=Journal must be in state=LOADED, was [STOPPED]]
2022-08-30 16:42:26,345 WARN [org.apache.activemq.artemis.core.server] Errors occurred during the buffering operation : javax.jms.IllegalStateException: Consumer does not exist
Is there some issue with the configuration?
I am using NFSv3. Will it make impact?

NFSv3 doesn't support the locking semantics required by ActiveMQ Artemis to be used as a shared storage device for high availability. You need to use NFSv4.
For what it's worth, the NFSv4 protocol was published 20 years ago. There's few (if any) reasons to still be using NFSv3.

Artemis 2.18.0 replica set doesn't work properly

I have running Artemis version 2.17.0 replica set with master and two slaves. It works fine when I check "Broker diagram" view in the web console I see connection between master and slave (the other slave as a backup) as on the picture
I upgraded now Artemis to 2.18.0 version and after restarting all artemis brokers when I check "Broker diagram" I see only master node and there is no link to the slave like on the picture above. The other two nodes are running as a slaves so there is only one master.
As I said on Artemis-2.17.0 it works.
Someone knows why is that ? here is for example the broker.xml for master node
<?xml version='1.0'?>
<configuration xmlns="urn:activemq"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xi="http://www.w3.org/2001/XInclude"
xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
<core xmlns="urn:activemq:core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:activemq:core ">
<name>0.0.0.0</name>
<persistence-enabled>true</persistence-enabled>
<journal-type>ASYNCIO</journal-type>
<paging-directory>data/paging</paging-directory>
<bindings-directory>data/bindings</bindings-directory>
<journal-directory>data/journal</journal-directory>
<large-messages-directory>data/large-messages</large-messages-directory>
<journal-datasync>true</journal-datasync>
<journal-min-files>2</journal-min-files>
<journal-pool-files>10</journal-pool-files>
<journal-device-block-size>4096</journal-device-block-size>
<journal-file-size>10M</journal-file-size>
<journal-buffer-timeout>28000</journal-buffer-timeout>
<journal-max-io>4096</journal-max-io>
<!-- how often we are looking for how many bytes are being used on the disk in ms -->
<disk-scan-period>5000</disk-scan-period>
<!-- once the disk hits this limit the system will block, or close the connection in certain protocols
that won't support flow control. -->
<max-disk-usage>100</max-disk-usage>
<!-- should the broker detect dead locks and other issues -->
<critical-analyzer>true</critical-analyzer>
<critical-analyzer-timeout>150000</critical-analyzer-timeout>
<critical-analyzer-check-period>60000</critical-analyzer-check-period>
<critical-analyzer-policy>HALT</critical-analyzer-policy>
<page-sync-timeout>1628000</page-sync-timeout>
<global-max-size>204Mb</global-max-size>
<!-- Connectors -->
<connectors>
<connector name="netty-connector">tcp://artemis01:61616?sslEnabled=true;trustStorePath=/client_ts.p12;trustStorePassword=12345</connector>
</connectors>
<acceptors>
<acceptor name="netty-acceptor">tcp://artemis01:61616?sslEnabled=true;keyStorePath=/broker_ks.p12;keyStorePassword=123456</acceptor>
</acceptors>
<cluster-connections>
<cluster-connection name="my-cluster">
<connector-ref>netty-connector</connector-ref>
<retry-interval>1000</retry-interval>
<retry-interval-multiplier>3</retry-interval-multiplier>
<use-duplicate-detection>true</use-duplicate-detection>
<message-load-balancing>STRICT</message-load-balancing>
<discovery-group-ref discovery-group-name="my-discovery-group"/>
</cluster-connection>
</cluster-connections>
<broadcast-groups>
<broadcast-group name="my-broadcast-group">
<local-bind-address>artemis01</local-bind-address>
<local-bind-port>9876</local-bind-port>
<group-address>231.7.7.7</group-address>
<group-port>9876</group-port>
<broadcast-period>2000</broadcast-period>
<connector-ref>netty-connector</connector-ref>
</broadcast-group>
</broadcast-groups>
<discovery-groups>
<discovery-group name="my-discovery-group">
<local-bind-address>artemis01</local-bind-address>
<local-bind-port>9876</local-bind-port>
<group-address>231.7.7.7</group-address>
<group-port>9876</group-port>
<refresh-timeout>10000</refresh-timeout>
</discovery-group>
</discovery-groups>
<network-check-list>artemis02,artemis03</network-check-list>
<network-check-period>5000</network-check-period>
<network-check-timeout>2000</network-check-timeout>
<network-check-ping-command>ping -c 1 -t %d %s</network-check-ping-command>
<network-check-ping6-command>ping6 -c 1 %2$s</network-check-ping6-command>
<!-- Other config -->
<ha-policy>
<replication>
<master>
<check-for-live-server>true</check-for-live-server>
</master>
</replication>
</ha-policy>
<security-settings>
<security-setting match="#">
<permission type="createNonDurableQueue" roles="amq"/>
<permission type="deleteNonDurableQueue" roles="amq"/>
<permission type="createDurableQueue" roles="amq"/>
<permission type="deleteDurableQueue" roles="amq"/>
<permission type="createAddress" roles="amq"/>
<permission type="deleteAddress" roles="amq"/>
<permission type="consume" roles="amq"/>
<permission type="browse" roles="amq"/>
<permission type="send" roles="amq"/>
<!-- we need this otherwise ./artemis data imp wouldn't work -->
<permission type="manage" roles="amq"/>
</security-setting>
</security-settings>
<addresses>
<address name="exampleQueue">
<anycast>
<queue name="exampleQueue"/>
</anycast>
</address>
<address name="DLQ">
</address>
<address name="ExpiryQueue">
<anycast>
<queue name="ExpiryQueue" />
</anycast>
</address>
</addresses>
<address-settings>
<!-- if you define auto-create on certain queues, management has to be auto-create -->
<address-setting match="activemq.management#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
<!--default for catch all-->
<address-setting match="#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<auto-create-dead-letter-resources>true</auto-create-dead-letter-resources>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
<address-setting match="exampleQueue">
<dead-letter-address>DLQ</dead-letter-address>
<redelivery-delay>1000</redelivery-delay>
<max-delivery-attempts>3</max-delivery-attempts>
<max-size-bytes>-1</max-size-bytes>
<page-size-bytes>1048576</page-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
</address-setting>
</address-settings>
<!-- Uncomment the following if you want to use the Standard LoggingActiveMQServerPlugin pluging to log in events
<broker-plugins>
<broker-plugin class-name="org.apache.activemq.artemis.core.server.plugin.impl.LoggingActiveMQServerPlugin">
<property key="LOG_ALL_EVENTS" value="true"/>
<property key="LOG_CONNECTION_EVENTS" value="true"/>
<property key="LOG_SESSION_EVENTS" value="true"/>
<property key="LOG_CONSUMER_EVENTS" value="true"/>
<property key="LOG_DELIVERING_EVENTS" value="true"/>
<property key="LOG_SENDING_EVENTS" value="true"/>
<property key="LOG_INTERNAL_EVENTS" value="true"/>
</broker-plugin>
</broker-plugins>
-->
<diverts>
</diverts>
</core>
</configuration>
Log from Master - artemis01
AMQ222208: SSL handshake failed for client from /195.10.125.225:58790: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown.
Logs from Slave - artemis02
2021-08-24 00:07:06,544 ERROR [org.apache.activemq.artemis.core.client] AMQ214016: Failed to create netty connection: javax.net.ssl.SSLHandshakeException: No name matching artemis01.mydomain.com found
at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131) [java.base:]
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:325) [java.base:]
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:268) [java.base:]
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:263) [java.base:]
at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1340) [java.base:]
at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1215) [java.base:]
at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1158) [java.base:]
at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:396) [java.base:]
at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:445) [java.base:]
at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1260) [java.base:]
at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1247) [java.base:]
at java.base/java.security.AccessController.doPrivileged(AccessController.java:691) [java.base:]
at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask.run(SSLEngineImpl.java:1192) [java.base:]
at io.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1550) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1396) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1237) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1286) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.18.0.jar:2.18.0]
Caused by: java.security.cert.CertificateException: No name matching artemis01.mydomain.com found
at java.base/sun.security.util.HostnameChecker.matchDNS(HostnameChecker.java:229) [java.base:]
at java.base/sun.security.util.HostnameChecker.match(HostnameChecker.java:102) [java.base:]
at java.base/sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:452) [java.base:]
at java.base/sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:412) [java.base:]
at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:292) [java.base:]
at java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:144) [java.base:]
at java.base/sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1318) [java.base:]

I set up a simple pair of brokers using replication and the "Broker Diagram" displays both the master and the slave as expected:
Ensure you have the "Show Backup Brokers" checkbox selected.
Regarding the SSL issue, see the upgrade instructions for 2.18.0 in the documentation:
...core clients will now expect the CN or Subject Alternative Name values of the broker's SSL certificate to match the hostname in the client's URL.
...
To deal with this you can do one of the following:
Update your SSL certificates to use a hostname which matches the hostname in the client's URL. This is the recommended option with regard to security.
Update any connector using sslEnabled=true to also use verifyHost=false. Using this option means that you won't get the extra security of hostname verification, but no certificates will need to change. This essentially restores the previous default behavior.

Apache ActiveMQ Artemis Slow performance after paging starts

We have an ActiveMQ Artemis test deployment and we noticed very slow performance after broker having a large number of messages. This is when paging starts. I hope this is normal. To mitigate this after testing we doubled the xmx for the broker. Now the paging (and performance drop) is delayed. My question is are there any other parameters beside memory which can address this.
My broker.xml is:
<configuration xmlns="urn:activemq" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
<core xmlns="urn:activemq:core">
<ha-policy>
<replication>
<master>
<group-name>master</group-name>
<check-for-live-server>true</check-for-live-server>
</master>
</replication>
</ha-policy>
<global-max-size>-1</global-max-size>
<bindings-directory>/opt/broker/broker-data/bindings</bindings-directory>
<journal-directory>/opt/broker/broker-data/journal</journal-directory>
<large-messages-directory>/opt/broker/broker-data/largemessages</large-messages-directory>
<paging-directory>/opt/broker-data/paging</paging-directory>
<journal-min-files>25</journal-min-files>
<journal-type>ASYNCIO</journal-type>
<journal-max-io>5000</journal-max-io>
<journal-sync-transactional>false</journal-sync-transactional>
<journal-sync-non-transactional>false</journal-sync-non-transactional>
<journal-buffer-timeout>750000</journal-buffer-timeout>
<connectors>
<connector name="netty-connector">tcp://node1:61616?tcpSendBufferSize=307200;tcpReceiveBufferSize=307200;writeBufferHighWaterMark=1228800;useEpoll=true;useNio=true</connector>
</connectors>
<acceptors>
<acceptor name="netty-acceptor">tcp://node1:61616?tcpSendBufferSize=307200;tcpReceiveBufferSize=307200;writeBufferHighWaterMark=1228800;useEpoll=true;useNio=true</acceptor>
</acceptors>
<broadcast-groups>
<broadcast-group name="my-broadcast-group">
<group-address>${udp-address:231.7.7.7}</group-address>
<group-port>9875</group-port>
<broadcast-period>100</broadcast-period>
<connector-ref>netty-connector</connector-ref>
</broadcast-group>
</broadcast-groups>
<discovery-groups>
<discovery-group name="my-discovery-group">
<group-address>${udp-address:231.7.7.7}</group-address>
<group-port>9875</group-port>
<refresh-timeout>10000</refresh-timeout>
</discovery-group>
</discovery-groups>
<cluster-connections>
<cluster-connection name="my-cluster">
<connector-ref>netty-connector</connector-ref>
<connection-ttl>130000</connection-ttl>
<call-timeout>120000</call-timeout>
<retry-interval>500</retry-interval>
<use-duplicate-detection>true</use-duplicate-detection>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<discovery-group-ref discovery-group-name="my-discovery-group"/>
</cluster-connection>
</cluster-connections>
<security-settings>
<security-setting match="#">
<permission type="createNonDurableQueue" roles="amq"/>
<permission type="deleteNonDurableQueue" roles="amq"/>
<permission type="createDurableQueue" roles="amq"/>
<permission type="deleteDurableQueue" roles="amq"/>
<permission type="createAddress" roles="amq"/>
<permission type="deleteAddress" roles="amq"/>
<permission type="consume" roles="amq"/>
<permission type="browse" roles="amq"/>
<permission type="send" roles="amq"/>
<permission type="manage" roles="amq"/>
</security-setting>
</security-settings>
<address-settings>
<address-setting match="activemq.management#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
<address-setting match="#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
<auto-delete-addresses>false</auto-delete-addresses>
</address-setting>
</address-settings>
<!-- address section -->
</core>
</configuration>
EDIT:
Most critical issue is once paging starts broker won't recover to original performance even-though majority of messages are consumed.

Consider that paged messages need to be synchronized to disk similarly to durable ones and the parameter to be set to control the frequency of flushes is page-sync-timeout. If no value is set, the default one is used (see the documentation for an explanation about what that setting is for).
By looking at your journal-buffer-timeout (and assuming this is correctly set) your disk seems quite slow so it's expected that paged messages won't perform great as the disk doesn't have enough IOPS.
I would first check what's the expected IOPS for random writes for your disk and will set page-sync-timeout accordingly (1/IOPS in nanoseconds), but don't expect any improvement if the disk isn't fast enough.
Additional note: If you don't care about power failure durability you can still disable journal-datasync and it should let any disk write to be able to survive just to process failures (i.e. no power failure guarantees). It should be ok if you are using shared-nothing replication, given that a backup is able to take the role in case of failure.

Artemis master node not starting after failover to slave

I have an Artemis 2.11.0 HA pair configured using shared storage over NFS (but I don't know the mount options). Here's the master's broker.xml:
<?xml version='1.0'?>
<configuration xmlns="urn:activemq"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xi="http://www.w3.org/2001/XInclude"
xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
<core xmlns="urn:activemq:core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:activemq:core ">
<name>0.0.0.0</name>
<persistence-enabled>true</persistence-enabled>
<journal-type>NIO</journal-type>
<paging-directory>\\test\data\paging</paging-directory>
<bindings-directory>\\test\data\bindings</bindings-directory>
<journal-directory>\\test\data\journal</journal-directory>
<large-messages-directory>\\test\data\large-messages</large-messages-directory>
<journal-datasync>true</journal-datasync>
<journal-min-files>2</journal-min-files>
<journal-pool-files>10</journal-pool-files>
<journal-device-block-size>4096</journal-device-block-size>
<journal-file-size>10M</journal-file-size>
<journal-buffer-timeout>752000</journal-buffer-timeout>
<journal-max-io>1</journal-max-io>
<disk-scan-period>5000</disk-scan-period>
<max-disk-usage>90</max-disk-usage>
<!-- should the broker detect dead locks and other issues -->
<critical-analyzer>false</critical-analyzer>
<critical-analyzer-timeout>120000</critical-analyzer-timeout>
<critical-analyzer-check-period>60000</critical-analyzer-check-period>
<critical-analyzer-policy>HALT</critical-analyzer-policy>
<page-sync-timeout>1028000</page-sync-timeout>
<global-max-size>4096Mb</global-max-size>
<ha-policy>
<shared-store>
<master>
<failover-on-shutdown>true</failover-on-shutdown>
</master>
</shared-store>
</ha-policy>
<connectors>
<connector name="netty">tcp://ip-address:61617</connector>
</connectors>
<broadcast-groups>
<broadcast-group name="teamsMQ-broadcast-group">
<local-bind-address>ip-address</local-bind-address>
<local-bind-port>9877</local-bind-port>
<group-address>224.0.0.1</group-address>
<group-port>9876</group-port>
<broadcast-period>2000</broadcast-period>
<connector-ref>netty</connector-ref>
</broadcast-group>
</broadcast-groups>
<discovery-groups>
<discovery-group name="teamsMQ-discovery-group">
<local-bind-address>ip-address</local-bind-address>
<group-address>224.0.0.1</group-address>
<group-port>9876</group-port>
<refresh-timeout>10000</refresh-timeout>
</discovery-group>
</discovery-groups>
<acceptors>
<acceptor name="netty">tcp://ip-address:61617</acceptor>
<!-- Acceptor for every supported protocol -->
<acceptor name="artemis">tcp://0.0.0.0:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true</acceptor>
<!-- AMQP Acceptor. Listens on default AMQP port for AMQP traffic.-->
<acceptor name="amqp">tcp://0.0.0.0:5672?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=AMQP;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true</acceptor>
<!-- STOMP Acceptor. -->
<acceptor name="stomp">tcp://0.0.0.0:61613?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=STOMP;useEpoll=true</acceptor>
<!-- HornetQ Compatibility Acceptor. Enables HornetQ Core and STOMP for legacy HornetQ clients. -->
<acceptor name="hornetq">tcp://0.0.0.0:5445?anycastPrefix=jms.queue.;multicastPrefix=jms.topic.;protocols=HORNETQ,STOMP;useEpoll=true</acceptor>
<!-- MQTT Acceptor -->
<acceptor name="mqtt">tcp://0.0.0.0:1883?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=MQTT;useEpoll=true</acceptor>
</acceptors>
<security-settings>
<security-setting match="#">
<permission type="createNonDurableQueue" roles="amq"/>
<permission type="deleteNonDurableQueue" roles="amq"/>
<permission type="createDurableQueue" roles="amq"/>
<permission type="deleteDurableQueue" roles="amq"/>
<permission type="createAddress" roles="amq"/>
<permission type="deleteAddress" roles="amq"/>
<permission type="consume" roles="amq"/>
<permission type="browse" roles="amq"/>
<permission type="send" roles="amq"/>
<!-- we need this otherwise ./artemis data imp wouldn't work -->
<permission type="manage" roles="amq"/>
</security-setting>
</security-settings>
<address-settings>
<!-- if you define auto-create on certain queues, management has to be auto-create -->
<address-setting match="activemq.management#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
<!--default for catch all-->
<address-setting match="#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
</address-settings>
<addresses>
<address name="DLQ">
<anycast>
<queue name="DLQ" />
</anycast>
</address>
<address name="ExpiryQueue">
<anycast>
<queue name="ExpiryQueue" />
</anycast>
</address>
</addresses>
</core>
</configuration>
And the slave's:
<?xml version='1.0'?>
<configuration xmlns="urn:activemq"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xi="http://www.w3.org/2001/XInclude"
xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
<core xmlns="urn:activemq:core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:activemq:core ">
<name>0.0.0.0</name>
<persistence-enabled>true</persistence-enabled>
<journal-type>NIO</journal-type>
<paging-directory>\\test\data\paging</paging-directory>
<bindings-directory>\\test\data\bindings</bindings-directory>
<journal-directory>\\test\data\journal</journal-directory>
<large-messages-directory>\\test\data\large-messages</large-messages-directory>
<journal-datasync>true</journal-datasync>
<journal-min-files>2</journal-min-files>
<journal-pool-files>10</journal-pool-files>
<journal-device-block-size>4096</journal-device-block-size>
<journal-file-size>10M</journal-file-size>
<journal-buffer-timeout>688000</journal-buffer-timeout>
<journal-max-io>1</journal-max-io>
<disk-scan-period>5000</disk-scan-period>
<max-disk-usage>90</max-disk-usage>
<!-- should the broker detect dead locks and other issues -->
<critical-analyzer>false</critical-analyzer>
<critical-analyzer-timeout>120000</critical-analyzer-timeout>
<critical-analyzer-check-period>60000</critical-analyzer-check-period>
<critical-analyzer-policy>HALT</critical-analyzer-policy>
<page-sync-timeout>1028000</page-sync-timeout>
<global-max-size>4096Mb</global-max-size>
<ha-policy>
<shared-store>
<slave>
<failover-on-shutdown>true</failover-on-shutdown>
<allow-failback>true</allow-failback>
</slave>
</shared-store>
</ha-policy>
<connectors>
<connector name="netty">tcp://ip-address:61617</connector>
</connectors>
<broadcast-groups>
<broadcast-group name="teamsMQ-broadcast-group">
<local-bind-address>ip-address</local-bind-address>
<local-bind-port>9877</local-bind-port>
<group-address>224.0.0.1</group-address>
<group-port>9876</group-port>
<broadcast-period>2000</broadcast-period>
<connector-ref>netty</connector-ref>
</broadcast-group>
</broadcast-groups>
<discovery-groups>
<discovery-group name="teamsMQ-discovery-group">
<local-bind-address>ip-address</local-bind-address>
<group-address>224.0.0.1</group-address>
<group-port>9876</group-port>
<refresh-timeout>10000</refresh-timeout>
</discovery-group>
</discovery-groups>
<acceptors>
<acceptor name="netty">tcp://ip-address:61617</acceptor>
<!-- Acceptor for every supported protocol -->
<acceptor name="artemis">tcp://0.0.0.0:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true</acceptor>
<!-- AMQP Acceptor. Listens on default AMQP port for AMQP traffic.-->
<acceptor name="amqp">tcp://0.0.0.0:5672?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=AMQP;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true</acceptor>
<!-- STOMP Acceptor. -->
<acceptor name="stomp">tcp://0.0.0.0:61613?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=STOMP;useEpoll=true</acceptor>
<!-- HornetQ Compatibility Acceptor. Enables HornetQ Core and STOMP for legacy HornetQ clients. -->
<acceptor name="hornetq">tcp://0.0.0.0:5445?anycastPrefix=jms.queue.;multicastPrefix=jms.topic.;protocols=HORNETQ,STOMP;useEpoll=true</acceptor>
<!-- MQTT Acceptor -->
<acceptor name="mqtt">tcp://0.0.0.0:1883?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=MQTT;useEpoll=true</acceptor>
</acceptors>
<security-settings>
<security-setting match="#">
<permission type="createNonDurableQueue" roles="amq"/>
<permission type="deleteNonDurableQueue" roles="amq"/>
<permission type="createDurableQueue" roles="amq"/>
<permission type="deleteDurableQueue" roles="amq"/>
<permission type="createAddress" roles="amq"/>
<permission type="deleteAddress" roles="amq"/>
<permission type="consume" roles="amq"/>
<permission type="browse" roles="amq"/>
<permission type="send" roles="amq"/>
<!-- we need this otherwise ./artemis data imp wouldn't work -->
<permission type="manage" roles="amq"/>
</security-setting>
</security-settings>
<address-settings>
<!-- if you define auto-create on certain queues, management has to be auto-create -->
<address-setting match="activemq.management#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
<!--default for catch all-->
<address-setting match="#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
</address-settings>
<addresses>
<address name="DLQ">
<anycast>
<queue name="DLQ" />
</anycast>
</address>
<address name="ExpiryQueue">
<anycast>
<queue name="ExpiryQueue" />
</anycast>
</address>
</addresses>
</core>
</configuration>
The master node goes down for an unknown reason. The below log gets printed continuously:
AMQ222154: Error checking DLQ: ActiveMQShutdownException[errorType=SHUTDOWN_ERROR message=Journal must be in state=LOADED, was [STOPPED]]
2021-01-15 23:02:05,414 WARN [org.apache.activemq.artemis.core.server] AMQ222154: Error checking DLQ: ActiveMQShutdownException[errorType=SHUTDOWN_ERROR message=Journal must be in state=LOADED, was [STOPPED]]
at org.apache.activemq.artemis.core.journal.impl.JournalImpl.checkJournalIsLoaded(JournalImpl.java:1087) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendUpdateRecord(JournalImpl.java:886) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.Journal.appendUpdateRecord(Journal.java:98) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.updateDeliveryCount(AbstractJournalStorageManager.java:756) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.server.impl.QueueImpl.checkRedelivery(QueueImpl.java:3052) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.server.impl.RefsOperation.rollbackRedelivery(RefsOperation.java:166) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.server.impl.RefsOperation.afterRollback(RefsOperation.java:113) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.afterRollback(TransactionImpl.java:589) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.access$200(TransactionImpl.java:40) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl$4.done(TransactionImpl.java:442) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$1.run(OperationContextImpl.java:244) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.11.0.jar:2.11.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_275]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_275]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.11.0.jar:2.11.0]
The Slave comes up as expected, but throws an NPE:
2021-01-15 23:02:27,529 INFO [org.apache.activemq.artemis.core.server] AMQ221010: Backup Server is now live
2021-01-15 23:02:27,545 ERROR [org.apache.activemq.artemis.core.server] AMQ224000: Failure in initialisation: java.lang.NullPointerException
at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation$FailbackChecker.<init>(SharedStoreBackupActivation.java:193) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.startFailbackChecker(SharedStoreBackupActivation.java:185) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.run(SharedStoreBackupActivation.java:118) [artemis-server-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:3863) [artemis-server-2.11.0.jar:2.11.0]
The master attempts to start, but it doesn't progress beyond AMQ221034: Waiting indefinitely to obtain live lock. The logs are stuck at this point even after multiple restarts.
2021-01-15 23:03:56,238 INFO [org.apache.activemq.artemis.core.server] AMQ221006: Waiting to obtain live lock
2021-01-15 23:03:56,300 INFO [org.apache.activemq.artemis.core.server] AMQ221013: Using NIO Journal
2021-01-15 23:03:56,581 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE
2021-01-15 23:03:56,581 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol support for: AMQP
2021-01-15 23:03:56,581 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol support for: HORNETQ
2021-01-15 23:03:56,581 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol support for: MQTT
2021-01-15 23:03:56,581 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding protocol support for: OPENWIRE
2021-01-15 23:03:56,581 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol support for: STOMP
2021-01-15 23:03:56,644 WARN [org.apache.activemq.artemis.core.server] AMQ222035: Directory \\test\data\paging\cd776bae-1a55-11eb-985d-0050569136c8 did not have an identification file address.txt
2021-01-15 23:03:56,644 WARN [org.apache.activemq.artemis.core.server] AMQ222035: Directory \\test\data\paging\a84f1e4f-1f1a-11eb-a37f-0050569136c8 did not have an identification file address.txt
2021-01-15 23:03:56,644 WARN [org.apache.activemq.artemis.core.server] AMQ222035: Directory \\test\data\paging\a87edff5-1f1a-11eb-a37f-0050569136c8 did not have an identification file address.txt
2021-01-15 23:03:56,988 INFO [org.apache.activemq.artemis.core.server] AMQ221034: Waiting indefinitely to obtain live lock
Can you please advise on the issue here and the steps to recover?
Does the NPE in the slave start-up have any effects on the queue/functioning?
Do I need to stop the Slave manually to get the master to start successfully?

If the master broker encounters any "critical IO" problems it will automatically shut itself down. When it shuts itself down it will relinquish the lock it has on the shared journal. When the shared lock is relinquished the slave will automatically activate. When the master is restarted it will attempt to acquire the lock on the shared journal but it won't be able to since the slave has it.
The slave failed to set up the FailbackChecker thread due to a NullPointerException because there is no <cluster-connection> configured in either broker.xml. This is an invalid configuration. You must configure a <cluster-connection>, e.g.:
<cluster-connections>
<cluster-connection name="my-cluster">
<connector-ref>netty</connector-ref>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<discovery-group-ref discovery-group-name="teamsMQ-discovery-group"/>
</cluster-connection>
</cluster-connections>
Since the FailbackChecker thread isn't running the slave will not know that the master has restarted and initiated a fail-back. Therefore, you will need to stop the slave so it can relinquish its lock on the shared journal. At this point the master broker will start. Keep in mind that any clients connected to the slave will be disconnected and will have to reconnect to the master.

Large message lost for divert

I have following queue setup in cluster:
server1 running on 61616
server2 running on 61617
I start two consumer with two different available master server
java -jar AmqJmsConsume.jar -duration 5 -queue IT-InputQueue -stats -log
/tmp/artemis/4 -verify -commitdelay 300 -url 'tcp://localhost:61616'
java -jar AmqJmsConsume.jar -duration 5 -queue IT-InputQueue -stats -log
/tmp/artemis/4 -verify -commitdelay 300 -url 'tcp://localhost:61617'
and run producer to server1 as below
java -jar AmqJmsProducer.jar -topic RFTopic -stats -log /tmp/artemis/4 -id
-count 500 -n 500 -ttl 3600000 -url 'tcp://localhost:61616' -outliers 100
-outliersize 500k
both consumer gets few message but after below exception server1 clears
queue with almost 50% of messages but rest of messages are stuck with
internal.sf.cluster queue and not received by any consumer.
java.lang.IllegalStateException: no queueIDs defined
at org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionBridge.beforeForward(ClusterConnectionBridge.java:182) [artemis-server-2.9.0.jar:2.9.0]
at org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.handle(BridgeImpl.java:653) [artemis-server-2.9.0.jar:2.9.0]
at org.apache.activemq.artemis.core.server.impl.QueueImpl.handle(QueueImpl.java:3346) [artemis-server-2.9.0.jar:2.9.0]
at org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:2606) [artemis-server-2.9.0.jar:2.9.0]
at org.apache.activemq.artemis.core.server.impl.QueueImpl.access$2300(QueueImpl.java:117) [artemis-server-2.9.0.jar:2.9.0]
at org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:3613) [artemis-server-2.9.0.jar:2.9.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.9.0.jar:2.9.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.9.0.jar:2.9.0]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.9.0.jar:2.9.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_211]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_211]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.9.0.jar:2.9.0]
2019-07-29 10:21:38,503 WARN [org.apache.activemq.artemis.core.server.impl.QueueImpl] null: java.util.NoSuchElementException
at org.apache.activemq.artemis.utils.collections.PriorityLinkedListImpl$PriorityLinkedListIterator.repeat(PriorityLinkedListImpl.java:172) [artemis-commons-2.9.0.jar:2.9.0]
at org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:2627) [artemis-server-2.9.0.jar:2.9.0]
at org.apache.activemq.artemis.core.server.impl.QueueImpl.access$2300(QueueImpl.java:117) [artemis-server-2.9.0.jar:2.9.0]
at org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:3613) [artemis-server-2.9.0.jar:2.9.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.9.0.jar:2.9.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.9.0.jar:2.9.0]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.9.0.jar:2.9.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_211]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_211]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.9.0.jar:2.9.0]
server1-master broker
<?xml version='1.0'?>
<configuration xmlns="urn:activemq"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xi="http://www.w3.org/2001/XInclude"
xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
<core xmlns="urn:activemq:core"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:activemq:core ">
<persistence-enabled>true</persistence-enabled>
<thread-pool-max-size>400</thread-pool-max-size>
<journal-type>NIO</journal-type>
<paging-directory>${data.dir}/paging</paging-directory>
<bindings-directory>${data.dir}/bindings</bindings-directory>
<journal-directory>${data.dir}/journal</journal-directory>
<large-messages-directory>${data.dir}/large-messages
</large-messages-directory>
<journal-datasync>true</journal-datasync>
<journal-min-files>20</journal-min-files>
<journal-pool-files>20</journal-pool-files>
<journal-file-size>10M</journal-file-size>
<journal-compact-min-files>30</journal-compact-min-files>
<journal-buffer-timeout>23480000</journal-buffer-timeout>
<!-- When using ASYNCIO, this will determine the writing queue depth for
libaio. -->
<journal-max-io>1</journal-max-io>
<!-- You can verify the network health of a particular NIC by specifying
the <network-check-NIC> element. <network-check-NIC>theNicName</network-check-NIC> -->
<!-- Use this to use an HTTP server to validate the network <network-check-URL-list>http://www.apache.org</network-check-URL-list> -->
<!-- <network-check-period>10000</network-check-period> -->
<!-- <network-check-timeout>1000</network-check-timeout> -->
<!-- this is a comma separated list, no spaces, just DNS or IPs it should
accept IPV6 Warning: Make sure you understand your network topology as this
is meant to validate if your network is valid. Using IPs that could eventually
disappear or be partially visible may defeat the purpose. You can use a list
of multiple IPs, and if any successful ping will make the server OK to continue
running -->
<!-- <network-check-list>10.0.0.1</network-check-list> -->
<!-- use this to customize the ping used for ipv4 addresses -->
<!-- <network-check-ping-command>ping -c 1 -t %d %s</network-check-ping-command> -->
<!-- use this to customize the ping used for ipv6 addresses -->
<!-- <network-check-ping6-command>ping6 -c 1 %2$s</network-check-ping6-command> -->
<!-- how often we are looking for how many bytes are being used on the
disk in ms -->
<disk-scan-period>5000</disk-scan-period>
<!-- once the disk hits this limit the system will block, or close the
connection in certain protocols that won't support flow control. -->
<max-disk-usage>90</max-disk-usage>
<!-- should the broker detect dead locks and other issues -->
<critical-analyzer>true</critical-analyzer>
<critical-analyzer-timeout>120000</critical-analyzer-timeout>
<critical-analyzer-check-period>60000</critical-analyzer-check-period>
<critical-analyzer-policy>LOG</critical-analyzer-policy>
<transaction-timeout>1800000</transaction-timeout>
<!-- the system will enter into page mode once you hit this limit. This
is an estimate in bytes of how much the messages are using in memory The
system will use half of the available memory (-Xmx) by default for the global-max-size.
You may specify a different value here if you need to customize it to your
needs. <global-max-size>100Mb</global-max-size> -->
<connectors>
<connector name="netty-connector">tcp://localhost:61616</connector>
</connectors>
<!-- Acceptors -->
<acceptors>
<acceptor name="netty-acceptor">tcp://localhost:61616?anycastPrefix=jms.queue.;multicastPrefix=jms.topic.</acceptor>
</acceptors>
<ha-policy>
<shared-store>
<master>
<failover-on-shutdown>true</failover-on-shutdown>
</master>
</shared-store>
</ha-policy>
<broadcast-groups>
<broadcast-group name="my-broadcast-group">
<broadcast-period>5000</broadcast-period>
<jgroups-file>idsk-jgroups.xml</jgroups-file>
<jgroups-channel>persistence-fs</jgroups-channel>
<connector-ref>netty-connector</connector-ref>
</broadcast-group>
</broadcast-groups>
<discovery-groups>
<discovery-group name="my-discovery-group">
<jgroups-file>idsk-jgroups.xml</jgroups-file>
<jgroups-channel>persistence-fs</jgroups-channel>
<refresh-timeout>10000</refresh-timeout>
</discovery-group>
</discovery-groups>
<cluster-connections>
<cluster-connection name="my-cluster">
<connector-ref>netty-connector</connector-ref>
<retry-interval>500</retry-interval>
<use-duplicate-detection>true</use-duplicate-detection>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<discovery-group-ref discovery-group-name="my-discovery-group" />
</cluster-connection>
</cluster-connections>
<cluster-user>admin</cluster-user>
<cluster-password>admin</cluster-password>
<diverts>
<divert name="RF-Transform">
<routing-name>RFeeds-Transform</routing-name>
<address>RFTopic</address>
<forwarding-address>IT-InputQueue</forwarding-address>
<exclusive>false</exclusive>
</divert>
<divert name="RF-Output">
<routing-name>RFeeds-Output</routing-name>
<address>RFTopic</address>
<forwarding-address>T1-InputQueue</forwarding-address>
<exclusive>false</exclusive>
</divert>
</diverts>
<security-settings>
<security-setting match="#">
<permission type="createNonDurableQueue" roles="amq" />
<permission type="deleteNonDurableQueue" roles="amq" />
<permission type="createDurableQueue" roles="amq" />
<permission type="deleteDurableQueue" roles="amq" />
<permission type="createAddress" roles="amq" />
<permission type="deleteAddress" roles="amq" />
<permission type="consume" roles="amq" />
<permission type="browse" roles="amq" />
<permission type="send" roles="amq" />
<!-- we need this otherwise ./artemis data imp wouldn't work -->
<permission type="manage" roles="amq" />
</security-setting>
</security-settings>
<address-settings>
<!-- if you define auto-create on certain queues, management has to be
auto-create -->
<address-setting match="activemq.management#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>60000</redelivery-delay>
<max-delivery-attempts>5</max-delivery-attempts>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>50485760</max-size-bytes>
<page-size-bytes>10485760</page-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>false</auto-create-addresses>
</address-setting>
<!--default for catch all -->
<address-setting match="#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>60000</redelivery-delay>
<max-delivery-attempts>5</max-delivery-attempts>
<redistribution-delay>10000</redistribution-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>50485760</max-size-bytes>
<page-size-bytes>10485760</page-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>false</auto-create-addresses>
</address-setting>
</address-settings>
<addresses>
<address name="DLQ">
<anycast>
<queue name="DLQ" />
</anycast>
</address>
<address name="ExpiryQueue">
<anycast>
<queue name="ExpiryQueue" />
</anycast>
</address>
<address name="IT-InputQueue">
<anycast>
<queue name="IT-InputQueue" />
</anycast>
</address>
<address name="T1-InputQueue">
<anycast>
<queue name="T1-InputQueue" />
</anycast>
</address>
<address name="RFTopic">
<multicast />
</address>
</addresses>
</core>
</configuration>
server-master broker
<?xml version='1.0'?>
<configuration xmlns="urn:activemq"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xi="http://www.w3.org/2001/XInclude"
xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
<core xmlns="urn:activemq:core"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:activemq:core ">
<persistence-enabled>true</persistence-enabled>
<thread-pool-max-size>400</thread-pool-max-size>
<journal-type>NIO</journal-type>
<paging-directory>${data.dir}/paging</paging-directory>
<bindings-directory>${data.dir}/bindings</bindings-directory>
<journal-directory>${data.dir}/journal</journal-directory>
<large-messages-directory>${data.dir}/large-messages
</large-messages-directory>
<journal-datasync>true</journal-datasync>
<journal-min-files>20</journal-min-files>
<journal-pool-files>20</journal-pool-files>
<journal-file-size>10M</journal-file-size>
<journal-compact-min-files>30</journal-compact-min-files>
<journal-buffer-timeout>23480000</journal-buffer-timeout>
<!-- When using ASYNCIO, this will determine the writing queue depth for
libaio. -->
<journal-max-io>1</journal-max-io>
<!-- You can verify the network health of a particular NIC by specifying
the <network-check-NIC> element. <network-check-NIC>theNicName</network-check-NIC> -->
<!-- Use this to use an HTTP server to validate the network <network-check-URL-list>http://www.apache.org</network-check-URL-list> -->
<!-- <network-check-period>10000</network-check-period> -->
<!-- <network-check-timeout>1000</network-check-timeout> -->
<!-- this is a comma separated list, no spaces, just DNS or IPs it should
accept IPV6 Warning: Make sure you understand your network topology as this
is meant to validate if your network is valid. Using IPs that could eventually
disappear or be partially visible may defeat the purpose. You can use a list
of multiple IPs, and if any successful ping will make the server OK to continue
running -->
<!-- <network-check-list>10.0.0.1</network-check-list> -->
<!-- use this to customize the ping used for ipv4 addresses -->
<!-- <network-check-ping-command>ping -c 1 -t %d %s</network-check-ping-command> -->
<!-- use this to customize the ping used for ipv6 addresses -->
<!-- <network-check-ping6-command>ping6 -c 1 %2$s</network-check-ping6-command> -->
<!-- how often we are looking for how many bytes are being used on the
disk in ms -->
<disk-scan-period>5000</disk-scan-period>
<!-- once the disk hits this limit the system will block, or close the
connection in certain protocols that won't support flow control. -->
<max-disk-usage>90</max-disk-usage>
<!-- should the broker detect dead locks and other issues -->
<critical-analyzer>true</critical-analyzer>
<critical-analyzer-timeout>120000</critical-analyzer-timeout>
<critical-analyzer-check-period>60000</critical-analyzer-check-period>
<critical-analyzer-policy>LOG</critical-analyzer-policy>
<transaction-timeout>1800000</transaction-timeout>
<!-- the system will enter into page mode once you hit this limit. This
is an estimate in bytes of how much the messages are using in memory The
system will use half of the available memory (-Xmx) by default for the global-max-size.
You may specify a different value here if you need to customize it to your
needs. <global-max-size>100Mb</global-max-size> -->
<connectors>
<connector name="netty-connector">tcp://localhost:61617</connector>
</connectors>
<!-- Acceptors -->
<acceptors>
<acceptor name="netty-acceptor">tcp://localhost:61617?anycastPrefix=jms.queue.;multicastPrefix=jms.topic.</acceptor>
</acceptors>
<ha-policy>
<shared-store>
<master>
<failover-on-shutdown>true</failover-on-shutdown>
</master>
</shared-store>
</ha-policy>
<broadcast-groups>
<broadcast-group name="my-broadcast-group">
<broadcast-period>5000</broadcast-period>
<jgroups-file>idsk-jgroups.xml</jgroups-file>
<jgroups-channel>persistence-fs</jgroups-channel>
<connector-ref>netty-connector</connector-ref>
</broadcast-group>
</broadcast-groups>
<discovery-groups>
<discovery-group name="my-discovery-group">
<jgroups-file>idsk-jgroups.xml</jgroups-file>
<jgroups-channel>persistence-fs</jgroups-channel>
<refresh-timeout>10000</refresh-timeout>
</discovery-group>
</discovery-groups>
<cluster-connections>
<cluster-connection name="my-cluster">
<connector-ref>netty-connector</connector-ref>
<retry-interval>500</retry-interval>
<use-duplicate-detection>true</use-duplicate-detection>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<discovery-group-ref discovery-group-name="my-discovery-group" />
</cluster-connection>
</cluster-connections>
<cluster-user>admin</cluster-user>
<cluster-password>admin</cluster-password>
<diverts>
<divert name="RF-Transform">
<routing-name>RFeeds-Transform</routing-name>
<address>RFTopic</address>
<forwarding-address>IT-InputQueue</forwarding-address>
<exclusive>false</exclusive>
</divert>
<divert name="RF-Output">
<routing-name>RFeeds-Output</routing-name>
<address>RFTopic</address>
<forwarding-address>T1-InputQueue</forwarding-address>
<exclusive>false</exclusive>
</divert>
</diverts>
<security-settings>
<security-setting match="#">
<permission type="createNonDurableQueue" roles="amq" />
<permission type="deleteNonDurableQueue" roles="amq" />
<permission type="createDurableQueue" roles="amq" />
<permission type="deleteDurableQueue" roles="amq" />
<permission type="createAddress" roles="amq" />
<permission type="deleteAddress" roles="amq" />
<permission type="consume" roles="amq" />
<permission type="browse" roles="amq" />
<permission type="send" roles="amq" />
<!-- we need this otherwise ./artemis data imp wouldn't work -->
<permission type="manage" roles="amq" />
</security-setting>
</security-settings>
<address-settings>
<!-- if you define auto-create on certain queues, management has to be
auto-create -->
<address-setting match="activemq.management#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>60000</redelivery-delay>
<max-delivery-attempts>5</max-delivery-attempts>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>50485760</max-size-bytes>
<page-size-bytes>10485760</page-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>false</auto-create-addresses>
</address-setting>
<!--default for catch all -->
<address-setting match="#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>60000</redelivery-delay>
<max-delivery-attempts>5</max-delivery-attempts>
<redistribution-delay>10000</redistribution-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>50485760</max-size-bytes>
<page-size-bytes>10485760</page-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>false</auto-create-addresses>
</address-setting>
</address-settings>
<addresses>
<address name="DLQ">
<anycast>
<queue name="DLQ" />
</anycast>
</address>
<address name="ExpiryQueue">
<anycast>
<queue name="ExpiryQueue" />
</anycast>
</address>
<address name="IT-InputQueue">
<anycast>
<queue name="IT-InputQueue" />
</anycast>
</address>
<address name="T1-InputQueue">
<anycast>
<queue name="T1-InputQueue" />
</anycast>
</address>
<address name="RFTopic">
<multicast />
</address>
</addresses>
</core>
</configuration>
My Analysis:
I have gone thru the code of artemis-server module and below are my findings.
The org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl. route(message,context,direct,rejectDuplicates,bindingMove) method clean the internal properties from the Message. Is there any reason to cleanup those properties. As I am getting above error because the properties "_AMQ_ROUTE_TO$.artemis.internal.sf.my-cluster.68d83229-be54-11e9-824e-f8b156cb82da" is not present in message properties
We are cleaning up internal properties based on following condition.
package org.apache.activemq.artemis.core.message.impl
public class CoreMessage extends RefCountMessage implements ICoreMessage {
private static final Predicate<SimpleString> INTERNAL_PROPERTY_NAMES_PREDICATE =
name -> (name.startsWith(Message.HDR_ROUTE_TO_IDS) && !name.equals(Message.HDR_ROUTE_TO_IDS)) ||
(name.startsWith(Message.HDR_ROUTE_TO_ACK_IDS) && !name.equals(Message.HDR_ROUTE_TO_ACK_IDS));

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Artemis HA configuration for core bridges - activemq-artemis

Related

Red Hat AMQ broker errors in HA cluster configuration

Artemis 2.18.0 replica set doesn't work properly

Apache ActiveMQ Artemis Slow performance after paging starts

Artemis master node not starting after failover to slave

Large message lost for divert

Categories

Resources