Artemis crashed and messages/queues lost - activemq-artemis

Any ideas what happened here? Can messages and queues be restored? How to use ActiveMQ Artemis data? All queues/messages are lost. Only DLQ/ExpiryQueue.
We had to do a configuration change. So we started updating the Slave.
To failover to the Slave to fix the Master we did a ./artemis-service stop #16:32
Some warnings on Master but not much else.
On the Slave we can see some timeouts and connection failures. Not sure if Slave took over the Queues and worked ok or not.
#16:46:27 we did a ./artemis-service start on the Master
Seems like Artemis started moving/removing bindings/journals
Master seems to fail to start with Connections issues #16:46:33
Slave tries to restart again and take control #16:46:37
Slave cleans out journals/bindings
And now all queues are gone. Only DLQ and ExpiryQueue exists on Slave. Master is down.
Can we restore from oldreplica? Might be wiped already?
We have not yet ried to restore from Replicas. Is there a manual or KnowledgeBase?
Master:
A few of these:
16:32:15,847 WARN [org.apache.activemq.artemis.core.server] AMQ222061: Client connection failed, clearing up resources for session ID:xxx.yyy.zzz-30305-1552451287008-17:1:-1
16:32:15,847 WARN [org.apache.activemq.artemis.core.server] AMQ222107: Cleared up resources for session ID:xxx.yyy.zzz-30305-1552451287008-17:1:-1
...
16:32:16,088 INFO [org.apache.activemq.hawtio.plugin.PluginContextListener] Destroyed artemis-plugin plugin
16:32:16,093 INFO [org.apache.activemq.hawtio.branding.PluginContextListener] Destroyed activemq-branding plugin
16:32:16,104 INFO [org.apache.activemq.artemis.core.server] AMQ221002: Apache ActiveMQ Artemis Message Broker version 2.4.0 [06926557-2906-11e8-a15f-005056926b6e] stopped, uptime 4 days 3 hours
Meanwhile on the Slave:
16:32:15,867 INFO [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: LiveFailoverQuorumVote
16:32:15,868 INFO [org.apache.activemq.artemis.core.server] AMQ221067: Waiting 30 seconds for quorum vote results.
16:32:15,869 INFO [org.apache.activemq.artemis.core.server] AMQ221068: Received all quorum votes.
16:32:15,869 INFO [org.apache.activemq.artemis.core.server] AMQ221071: Failing over based on quorum vote results.
16:32:15,889 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: AMQ119015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
16:32:15,944 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: AMQ119015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
16:32:15,999 INFO [org.apache.activemq.artemis.core.server] AMQ221037: ActiveMQServerImpl::serverUUID=06926557-2906-11e8-a15f-005056926b6e to become 'live'
16:32:16,450 WARN [org.apache.activemq.artemis.core.client] AMQ212004: Failed to connect to server.
16:32:34,825 INFO [org.apache.activemq.artemis.core.server] AMQ221003: Deploying queue DLQ on address DLQ
16:32:34,825 INFO [org.apache.activemq.artemis.core.server] AMQ221003: Deploying queue ExpiryQueue on address ExpiryQueue
16:32:35,156 INFO [org.apache.activemq.artemis.core.server] AMQ221007: Server is now live
16:33:51,161 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: AMQ119014: Did not receive data from /aaa.bbb.ccc.ddd:53781 within the 60,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
16:34:10,061 INFO [io.hawt.web.LoginServlet] hawtio login is using 1800 sec. HttpSession timeout
16:34:11,891 ERROR [org.apache.activemq.artemis.core.server] AMQ224088: Timeout (10 seconds) while handshaking has occurred.
16:34:52,912 ERROR [org.apache.activemq.artemis.core.server] AMQ224088: Timeout (10 seconds) while handshaking has occurred.
16:35:01,183 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: AMQ119014: Did not receive data from /aaa.bbb.ccc.ddd:47171 within the 60,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
16:36:01,191 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: AMQ119014: Did not receive data from /aaa.bbb.ccc.ddd:54700 within the 60,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
16:36:12,705 ERROR [org.apache.activemq.artemis.core.server] AMQ224088: Timeout (10 seconds) while handshaking has occurred.
16:36:53,104 ERROR [org.apache.activemq.artemis.core.server] AMQ224088: Timeout (10 seconds) while handshaking has occurred.
16:37:11,203 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: AMQ119014: Did not receive data from /aaa.bbb.ccc.ddd:16751 within the 60,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
16:38:11,209 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: AMQ119014: Did not receive data from /aaa.bbb.ccc.ddd:20634 within the 60,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
16:38:12,182 ERROR [org.apache.activemq.artemis.core.server] AMQ224088: Timeout (10 seconds) while handshaking has occurred.
16:39:21,216 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: AMQ119014: Did not receive data from /aaa.bbb.ccc.ddd:61541 within the 60,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
16:40:21,225 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: AMQ119014: Did not receive data from /aaa.bbb.ccc.ddd:49708 within the 60,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
16:41:18,720 ERROR [org.apache.activemq.artemis.core.server] AMQ224088: Timeout (10 seconds) while handshaking has occurred.
16:45:51,271 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: AMQ119014: Did not receive data from /aaa.bbb.ccc.ddd:20748 within the 60,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
16:46:32,151 INFO [org.apache.activemq.artemis.core.server] AMQ221025: Replication: sending AIOSequentialFile:/opt/brokers/ActiveMQServer2/./data/journal/activemq-data-167.amq (size=10,485,760) to replica.
16:46:33,100 INFO [org.apache.activemq.artemis.core.server] AMQ221025: Replication: sending AIOSequentialFile:/opt/brokers/ActiveMQServer2/./data/journal/activemq-data-168.amq (size=10,485,760) to replica.
16:46:33,182 WARN [org.apache.activemq.artemis.core.server] AMQ222061: Client connection failed, clearing up resources for session ID:xxx.yyy.zzz-30305-1552451287008-19:1:-1
16:46:33,188 WARN [org.apache.activemq.artemis.core.server] AMQ222107: Cleared up resources for session ID:xxx.yyy.zzz-30305-1552451287008-19:1:-1
16:46:33,199 WARN [org.apache.activemq.artemis.core.server] AMQ222061: Client connection failed, clearing up resources for session 380d4fc6-4f13-11e9-b0fb-00505692a0af
16:46:33,288 INFO [io.hawt.web.AuthenticationFilter] Destroying hawtio authentication filter
16:46:33,290 INFO [io.hawt.HawtioContextListener] Destroying hawtio services
16:46:33,391 INFO [org.apache.activemq.hawtio.plugin.PluginContextListener] Destroyed artemis-plugin plugin
16:46:33,397 INFO [org.apache.activemq.hawtio.branding.PluginContextListener] Destroyed activemq-branding plugin
16:46:33,408 INFO [org.apache.activemq.artemis.core.server] AMQ221002: Apache ActiveMQ Artemis Message Broker version 2.4.0 [06926557-2906-11e8-a15f-005056926b6e] stopped, uptime 4 days 3 hours
16:46:37,163 INFO [org.apache.activemq.artemis.integration.bootstrap] AMQ101000: Starting ActiveMQ Artemis Server
16:46:37,266 INFO [org.apache.activemq.artemis.core.server] AMQ221000: backup Message Broker is starting with configuration Broker Configuration (clustered=true,journalDirectory=./data/journal,bindingsDirectory=./data/bindings,largeMessagesDirectory=./data/large-messages,pagingDirectory=./data/paging)
16:46:37,288 INFO [org.apache.activemq.artemis.core.server] AMQ221055: There were too many old replicated folders upon startup, removing /opt/brokers/ActiveMQServer2/./data/bindings/oldreplica.45
16:46:37,292 INFO [org.apache.activemq.artemis.core.server] AMQ222162: Moving data directory /opt/brokers/ActiveMQServer2/./data/bindings to /opt/brokers/ActiveMQServer2/./data/bindings/oldreplica.47
16:46:37,295 INFO [org.apache.activemq.artemis.core.server] AMQ221055: There were too many old replicated folders upon startup, removing /opt/brokers/ActiveMQServer2/./data/journal/oldreplica.49
16:46:37,597 INFO [org.apache.activemq.artemis.core.server] AMQ222162: Moving data directory /opt/brokers/ActiveMQServer2/./data/journal to /opt/brokers/ActiveMQServer2/./data/journal/oldreplica.51
16:46:37,628 INFO [org.apache.activemq.artemis.core.server] AMQ221055: There were too many old replicated folders upon startup, removing /opt/brokers/ActiveMQServer2/./data/paging/oldreplica.45
16:46:37,852 INFO [org.apache.activemq.hawtio.branding.PluginContextListener] Initialized activemq-branding plugin
16:46:38,056 INFO [org.apache.activemq.artemis.core.server] AMQ222162: Moving data directory /opt/brokers/ActiveMQServer2/./data/paging to /opt/brokers/ActiveMQServer2/./data/paging/oldreplica.47
16:46:38,091 INFO [org.apache.activemq.artemis.core.server] AMQ221012: Using AIO Journal
16:46:38,157 INFO [org.apache.activemq.artemis.core.server] AMQ221057: Global Max Size is being adjusted to 1/2 of the JVM max size (-Xmx). being defined as 2,147,483,648
16:46:38,252 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE
16:46:38,252 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol support for: AMQP
16:46:38,253 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol support for: HORNETQ
16:46:38,253 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol support for: MQTT
16:46:38,253 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding protocol support for: OPENWIRE
16:46:38,254 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol support for: STOMP
16:46:38,328 INFO [org.apache.activemq.hawtio.plugin.PluginContextListener] Initialized artemis-plugin plugin
16:46:42,502 INFO [io.hawt.HawtioContextListener] Initialising hawtio services
16:46:42,572 INFO [io.hawt.web.JolokiaConfiguredAgentServlet] Jolokia overridden property: [key=policyLocation, value=file:/opt/brokers/ActiveMQServer2//etc/jolokia-access.xml]
16:46:42,617 INFO [io.hawt.web.RBACMBeanInvoker] Using MBean [hawtio:type=security,area=jmx,rank=0,name=HawtioDummyJMXSecurity] for role based access control
16:46:42,788 INFO [io.hawt.system.ProxyWhitelist] Initial proxy whitelist: [localhost, 127.0.0.1, aaa.bbb.ccc.50, nodep02.domain.local]
16:46:43,202 INFO [org.apache.activemq.artemis] AMQ241001: HTTP Server started at http://nodep02:8161
16:46:43,202 INFO [org.apache.activemq.artemis] AMQ241002: Artemis Jolokia REST API available at http://nodep02:8161/console/jolokia
Back on the Master
16:46:27,913 INFO [org.apache.activemq.artemis.integration.bootstrap] AMQ101000: Starting ActiveMQ Artemis Server
16:46:27,999 INFO [org.apache.activemq.artemis.core.server] AMQ221000: live Message Broker is starting with configuration Broker Configuration (clustered=true,journalDirectory=./data/journal,bindingsDirectory=./data/bindings,largeMessagesDirectory=./data/large-messages,pagingDirectory=./data/paging)
16:46:28,440 INFO [org.apache.activemq.artemis.core.server] AMQ221055: There were too many old replicated folders upon startup, removing /opt/brokers/ActiveMQServer1/./data/bindings/oldreplica.18
16:46:28,447 INFO [org.apache.activemq.artemis.core.server] AMQ222162: Moving data directory /opt/brokers/ActiveMQServer1/./data/bindings to /opt/brokers/ActiveMQServer1/./data/bindings/oldreplica.20
16:46:28,450 INFO [org.apache.activemq.artemis.core.server] AMQ221055: There were too many old replicated folders upon startup, removing /opt/brokers/ActiveMQServer1/./data/journal/oldreplica.18
16:46:28,478 INFO [org.apache.activemq.artemis.core.server] AMQ222162: Moving data directory /opt/brokers/ActiveMQServer1/./data/journal to /opt/brokers/ActiveMQServer1/./data/journal/oldreplica.20
16:46:28,482 INFO [org.apache.activemq.artemis.core.server] AMQ221055: There were too many old replicated folders upon startup, removing /opt/brokers/ActiveMQServer1/./data/paging/oldreplica.18
16:46:28,517 INFO [org.apache.activemq.artemis.core.server] AMQ222162: Moving data directory /opt/brokers/ActiveMQServer1/./data/paging to /opt/brokers/ActiveMQServer1/./data/paging/oldreplica.20
16:46:28,580 INFO [org.apache.activemq.artemis.core.server] AMQ221012: Using AIO Journal
16:46:33,209 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: AMQ119015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
16:46:33,266 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: AMQ119015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
16:46:33,273 WARN [org.apache.activemq.artemis.journal] File not opened, file=null: java.lang.NullPointerException: File not opened, file=null
at org.apache.activemq.artemis.core.io.aio.AIOSequentialFile.checkOpened(AIOSequentialFile.java:328) [artemis-journal-2.4.0.jar:2.4.0]
at org.apache.activemq.artemis.core.io.aio.AIOSequentialFile.writeDirect(AIOSequentialFile.java:242) [artemis-journal-2.4.0.jar:2.4.0]
at org.apache.activemq.artemis.core.io.AbstractSequentialFile$LocalBufferObserver.flushBuffer(AbstractSequentialFile.java:306) [artemis-journal-2.4.0.jar:2.4.0]
at org.apache.activemq.artemis.core.io.buffer.TimedBuffer.flushBatch(TimedBuffer.java:310) [artemis-journal-2.4.0.jar:2.4.0]
at org.apache.activemq.artemis.core.io.buffer.TimedBuffer.flush(TimedBuffer.java:281) [artemis-journal-2.4.0.jar:2.4.0]
at org.apache.activemq.artemis.core.io.AbstractSequentialFileFactory.flush(AbstractSequentialFileFactory.java:195) [artemis-journal-2.4.0.jar:2.4.0]
at org.apache.activemq.artemis.core.journal.impl.JournalImpl.flush(JournalImpl.java:2194) [artemis-journal-2.4.0.jar:2.4.0]
at org.apache.activemq.artemis.core.journal.impl.JournalImpl.stop(JournalImpl.java:2356) [artemis-journal-2.4.0.jar:2.4.0]
at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.stop(JournalStorageManager.java:266) [artemis-server-2.4.0.jar:2.4.0]
at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.stop(JournalStorageManager.java:203) [artemis-server-2.4.0.jar:2.4.0]
at org.apache.activemq.artemis.core.replication.ReplicationEndpoint.stop(ReplicationEndpoint.java:327) [artemis-server-2.4.0.jar:2.4.0]
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stopComponent(ActiveMQServerImpl.java:1256) [artemis-server-2.4.0.jar:2.4.0]
at org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:259) [artemis-server-2.4.0.jar:2.4.0]
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:2951) [artemis-server-2.4.0.jar:2.4.0]
16:46:33,318 ERROR [org.apache.activemq.artemis.core.server] AMQ224000: Failure in initialisation: ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=AMQ119026: Backup Server was not yet in sync with live]
at org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:314) [artemis-server-2.4.0.jar:2.4.0]
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:2951) [artemis-server-2.4.0.jar:2.4.0]
16:46:33,995 INFO [io.hawt.web.keycloak.KeycloakServlet] Keycloak integration is disabled
16:46:40,486 WARN [org.apache.activemq.artemis.core.server] AMQ222040: Server is stopped
16:46:50,033 WARN [org.apache.activemq.artemis.core.server] AMQ222040: Server is stopped
16:47:00,034 WARN [org.apache.activemq.artemis.core.server] AMQ222040: Server is stopped
Broker.xml (Master)
<?xml version="1.0"?>
<configuration xmlns="urn:activemq" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
<core xmlns="urn:activemq:core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq:core ">
<name>ActiveMQ1</name>
<persistence-enabled>true</persistence-enabled>
<journal-type>ASYNCIO</journal-type>
<paging-directory>./data/paging</paging-directory>
<bindings-directory>./data/bindings</bindings-directory>
<journal-directory>./data/journal</journal-directory>
<large-messages-directory>./data/large-messages</large-messages-directory>
<journal-datasync>true</journal-datasync>
<journal-min-files>2</journal-min-files>
<journal-pool-files>-1</journal-pool-files>
<journal-file-size>10M</journal-file-size>
<journal-buffer-size>33554432</journal-buffer-size>
<!-- Size in bytes -->
<journal-buffer-timeout>128000</journal-buffer-timeout>
<journal-max-io>4096</journal-max-io>
<connectors>
<connector name="artemis">tcp://nodep01:61616</connector>
<connector name="ActiveMQ2-Connector">tcp://nodep02:61616</connector>
</connectors>
<disk-scan-period>5000</disk-scan-period>
<max-disk-usage>100</max-disk-usage>
<critical-analyzer>true</critical-analyzer>
<critical-analyzer-timeout>120000</critical-analyzer-timeout>
<critical-analyzer-check-period>60000</critical-analyzer-check-period>
<critical-analyzer-policy>HALT</critical-analyzer-policy>
<acceptors>
<acceptor name="artemis">tcp://nodep01:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300</acceptor>
<acceptor name="amqp">tcp://nodep01:5672?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=AMQP;useEpoll=true;amqpCredits=1000;amqpMinCredits=300</acceptor>
<acceptor name="stomp">tcp://nodep01:61613?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=STOMP;useEpoll=true</acceptor>
<acceptor name="hornetq">tcp://nodep01:5445?protocols=HORNETQ,STOMP;useEpoll=true</acceptor>
<acceptor name="artemis+ssl">tcp://nodep01:61443?sslEnabled=true;keyStorePath=/opt/brokers/ActiveMQServer1/activemq.keystore;keyStorePassword=CCCCC;tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300</acceptor>
</acceptors>
<cluster-user>XXXXXXXX</cluster-user>
<cluster-password>YYYYYYY</cluster-password>
<cluster-connections>
<cluster-connection name="ActiveMQClusterPROD">
<connector-ref>artemis</connector-ref>
<min-large-message-size>33554432</min-large-message-size>
<!-- Size should be eqvialent to journal-buffer-size -->
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<static-connectors>
<connector-ref>ActiveMQ2-Connector</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
<ha-policy>
<replication>
<master>
<check-for-live-server>true</check-for-live-server>
</master>
</replication>
</ha-policy>
<security-settings>
<security-setting match="#">
<permission type="createNonDurableQueue" roles="amq" />
<permission type="deleteNonDurableQueue" roles="amq" />
<permission type="createDurableQueue" roles="amq" />
<permission type="deleteDurableQueue" roles="amq" />
<permission type="createAddress" roles="amq" />
<permission type="deleteAddress" roles="amq" />
<permission type="consume" roles="amq" />
<permission type="browse" roles="amq" />
<permission type="send" roles="amq" />
<!-- we need this otherwise ./artemis data imp wouldn't work-->
<permission type="manage" roles="amq" />
</security-setting>
</security-settings>
<address-settings>
<!-- if you define auto-create on certain queues, management has to be auto-create -->
<address-setting match="activemq.management#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
<!--default for catch all-->
<address-setting match="#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
</address-settings>
<addresses>
<address name="DLQ">
<anycast>
<queue name="DLQ" />
</anycast>
</address>
<address name="ExpiryQueue">
<anycast>
<queue name="ExpiryQueue" />
</anycast>
</address>
</addresses>
</core>
</configuration>
Broker.xml (Slave)
<?xml version="1.0"?>
<configuration xmlns="urn:activemq" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
<core xmlns="urn:activemq:core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq:core ">
<name>ActiveMQ2</name>
<persistence-enabled>true</persistence-enabled>
<journal-type>ASYNCIO</journal-type>
<paging-directory>./data/paging</paging-directory>
<bindings-directory>./data/bindings</bindings-directory>
<journal-directory>./data/journal</journal-directory>
<large-messages-directory>./data/large-messages</large-messages-directory>
<journal-datasync>true</journal-datasync>
<journal-min-files>2</journal-min-files>
<journal-pool-files>-1</journal-pool-files>
<journal-file-size>10M</journal-file-size>
<journal-buffer-size>33554432</journal-buffer-size>
<!-- size in bytes -->
<journal-buffer-timeout>104000</journal-buffer-timeout>
<journal-max-io>4096</journal-max-io>
<connectors>
<!-- Connector used to be announced through cluster connections and notifications -->
<connector name="artemis">tcp://nodep02:61616</connector>
<connector name="ActiveMQ1-Connector">tcp://nodep01:61616</connector>
</connectors>
<disk-scan-period>5000</disk-scan-period>
<max-disk-usage>100</max-disk-usage>
<critical-analyzer>true</critical-analyzer>
<critical-analyzer-timeout>120000</critical-analyzer-timeout>
<critical-analyzer-check-period>60000</critical-analyzer-check-period>
<critical-analyzer-policy>HALT</critical-analyzer-policy>
<acceptors>
<acceptor name="artemis">tcp://nodep02:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300</acceptor>
<acceptor name="amqp">tcp://nodep02:5672?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=AMQP;useEpoll=true;amqpCredits=1000;amqpMinCredits=300</acceptor>
<acceptor name="stomp">tcp://nodep02:61613?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=STOMP;useEpoll=true</acceptor>
<acceptor name="hornetq">tcp://nodep02:5445?protocols=HORNETQ,STOMP;useEpoll=true</acceptor>
<acceptor name="mqtt">tcp://nodep02:1883?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=MQTT;useEpoll=true</acceptor>
<acceptor name="artemis+ssl">tcp://nodep02:61443?sslEnabled=true;keyStorePath=/opt/brokers/ActiveMQServer2/activemq.keystore;keyStorePassword=CCCCC;tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300</acceptor>
</acceptors>
<cluster-user>XXXXXXXX</cluster-user>
<cluster-password>YYYYYYY</cluster-password>
<cluster-connections>
<cluster-connection name="ActiveMQClusterPROD">
<connector-ref>artemis</connector-ref>
<min-large-message-size>33554432</min-large-message-size>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<static-connectors>
<connector-ref>ActiveMQ1-Connector</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
<ha-policy>
<replication>
<slave>
<allow-failback>true</allow-failback>
</slave>
</replication>
</ha-policy>
<security-settings>
<security-setting match="#">
<permission type="createNonDurableQueue" roles="amq" />
<permission type="deleteNonDurableQueue" roles="amq" />
<permission type="createDurableQueue" roles="amq" />
<permission type="deleteDurableQueue" roles="amq" />
<permission type="createAddress" roles="amq" />
<permission type="deleteAddress" roles="amq" />
<permission type="consume" roles="amq" />
<permission type="browse" roles="amq" />
<permission type="send" roles="amq" />
<!-- we need this otherwise ./artemis data imp wouldn't work-->
<permission type="manage" roles="amq" />
</security-setting>
</security-settings>
<address-settings>
<!-- if you define auto-create on certain queues, management has to be auto-create -->
<address-setting match="activemq.management#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
<!--default for catch all-->
<address-setting match="#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<!-- with -1 only the global-max-size is in use for limiting -->
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-create-jms-queues>true</auto-create-jms-queues>
<auto-create-jms-topics>true</auto-create-jms-topics>
</address-setting>
</address-settings>
<addresses>
<address name="DLQ">
<anycast>
<queue name="DLQ" />
</anycast>
</address>
<address name="ExpiryQueue">
<anycast>
<queue name="ExpiryQueue" />
</anycast>
</address>
</addresses>
</core>
</configuration>

You can restore the full journal from the "oldreplica" backups. That's what they're there for. You simply need to copy the files from the backup location to the original location. As always, be careful when copying data so that you don't overwrite something you might need later.

Related

ActiveMQ Artemis rolling upgrade fails with AMQ214013

I've got two EC2 instances running Artemis in the master-slave replication setup and I always perform rolling upgrade by shutting down the slave, upgrading it and starting it again. Then I do the same with the master.
This no longer works for me while upgrading from 2.17 to 2.18. After upgrading one of the nodes I always get this message on that node:
AMQ214013: Failed to decode packet: java.lang.IndexOutOfBoundsException: readerIndex(57) + length(1) exceeds writerIndex(57): PooledUnsafeDirectByteBuf(ridx: 57, widx: 57, cap: 57)
No matter what I do. I even tried to setup a fresh new cluster only adding replication related bits of configuration:
Master
<connectors>
<connector name="broker-master">tcp://10.35.4.16:61616</connector>
<connector name="broker-slave">tcp://10.35.4.211:61616</connector>
</connectors>
<cluster-connections>
<cluster-connection name="cluster">
<connector-ref>broker-master</connector-ref>
<check-period>5000</check-period>
<static-connectors>
<connector-ref>broker-slave</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
<ha-policy>
<replication>
<master>
<cluster-name>cluster</cluster-name>
<group-name>rs1</group-name>
<check-for-live-server>true</check-for-live-server>
</master>
</replication>
</ha-policy>
Slave
<connectors>
<connector name="broker-master">tcp://10.35.4.16:61616</connector>
<connector name="broker-slave">tcp://10.35.4.211:61616</connector>
</connectors>
<cluster-connections>
<cluster-connection name="cluster">
<connector-ref>broker-slave</connector-ref>
<check-period>5000</check-period>
<static-connectors>
<connector-ref>broker-master</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
<ha-policy>
<replication>
<slave>
<cluster-name>cluster</cluster-name>
<group-name>rs1</group-name>
<allow-failback>true</allow-failback>
</slave>
</replication>
</ha-policy>
Is there anything wrong with my configuration, or is it just not possible to from 2.17 to 2.18 like this?
Full log from master
...
2022-04-07 10:00:17,610 INFO [org.apache.activemq.artemis.core.server] AMQ221012: Using AIO Journal
...
2022-04-07 10:00:18,726 INFO [org.apache.activemq.artemis.core.server] AMQ221080: Deploying address DLQ supporting [ANYCAST]
2022-04-07 10:00:18,728 INFO [org.apache.activemq.artemis.core.server] AMQ221003: Deploying ANYCAST queue DLQ on address DLQ
2022-04-07 10:00:18,759 INFO [org.apache.activemq.artemis.core.server] AMQ221080: Deploying address ExpiryQueue supporting [ANYCAST]
2022-04-07 10:00:18,760 INFO [org.apache.activemq.artemis.core.server] AMQ221003: Deploying ANYCAST queue ExpiryQueue on address ExpiryQueue
2022-04-07 10:00:19,072 INFO [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:61616 for protocols [CORE,MQTT,AMQP,STOMP,HORNETQ,OPENWIRE]
2022-04-07 10:00:19,075 INFO [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:5445 for protocols [HORNETQ,STOMP]
2022-04-07 10:00:19,079 INFO [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:5672 for protocols [AMQP]
2022-04-07 10:00:19,082 INFO [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:1883 for protocols [MQTT]
2022-04-07 10:00:19,085 INFO [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:61613 for protocols [STOMP]
2022-04-07 10:00:19,086 INFO [org.apache.activemq.artemis.core.server] AMQ221007: Server is now live
2022-04-07 10:00:19,086 INFO [org.apache.activemq.artemis.core.server] AMQ221001: Apache ActiveMQ Artemis Message Broker version 2.17.0 [broker-master, nodeID=eb08fe05-b640-11ec-b6e2-06e66cf4b718]
2022-04-07 10:00:19,470 INFO [org.apache.activemq.hawtio.branding.PluginContextListener] Initialized activemq-branding plugin
2022-04-07 10:00:19,590 INFO [org.apache.activemq.hawtio.plugin.PluginContextListener] Initialized artemis-plugin plugin
2022-04-07 10:00:20,032 INFO [io.hawt.HawtioContextListener] Initialising hawtio services
2022-04-07 10:00:20,123 INFO [io.hawt.system.ConfigManager] Configuration will be discovered via system properties
2022-04-07 10:00:20,126 INFO [io.hawt.jmx.JmxTreeWatcher] Welcome to Hawtio 2.11.0
2022-04-07 10:00:20,137 INFO [io.hawt.web.auth.AuthenticationConfiguration] Starting hawtio authentication filter, JAAS realm: "activemq" authorized role(s): "amq" role principal classes: "org.apache.activemq.artemis.spi.core.security.jaas.RolePrincipal"
2022-04-07 10:00:20,171 INFO [io.hawt.web.proxy.ProxyServlet] Proxy servlet is disabled
2022-04-07 10:00:20,182 INFO [io.hawt.web.servlets.JolokiaConfiguredAgentServlet] Jolokia overridden property: [key=policyLocation, value=file:/opt/artemis/broker-master/etc/jolokia-access.xml]
2022-04-07 10:00:20,342 INFO [org.apache.activemq.artemis] AMQ241001: HTTP Server started at http://10.35.4.16:8161
2022-04-07 10:00:20,343 INFO [org.apache.activemq.artemis] AMQ241002: Artemis Jolokia REST API available at http://10.35.4.16:8161/console/jolokia
2022-04-07 10:00:20,343 INFO [org.apache.activemq.artemis] AMQ241004: Artemis Console available at http://10.35.4.16:8161/console
2022-04-07 10:00:52,429 INFO [org.apache.activemq.artemis.core.server] AMQ221025: Replication: sending AIOSequentialFile{activemq-data-49.amq, opened=false, pendingClose=false, pendingCallbacks=org.apache.activemq.artemis.utils.AutomaticLatch#2907ccd6} (size=10,485,760) to replica.
2022-04-07 10:00:52,931 INFO [org.apache.activemq.artemis.core.server] AMQ221025: Replication: sending NIOSequentialFile /opt/artemis/broker-master/data/bindings/activemq-bindings-34.bindings (size=1,048,576) to replica.
2022-04-07 10:00:52,963 INFO [org.apache.activemq.artemis.core.server] AMQ221025: Replication: sending NIOSequentialFile /opt/artemis/broker-master/data/bindings/activemq-bindings-35.bindings (size=1,048,576) to replica.
2022-04-07 10:00:53,011 INFO [org.apache.activemq.artemis.core.server] AMQ221025: Replication: sending NIOSequentialFile /opt/artemis/broker-master/data/bindings/activemq-bindings-1.bindings (size=1,048,576) to replica.
2022-04-07 10:00:53,073 WARN [org.apache.activemq.artemis.core.server] AMQ222092: Connection to the backup node failed, removing replication now: ActiveMQRemoteDisconnectException[errorType=REMOTE_DISCONNECT message=null]
at org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl.connectionDestroyed(RemotingServiceImpl.java:582) [artemis-server-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor$Listener.connectionDestroyed(NettyAcceptor.java:942) [artemis-server-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.core.remoting.impl.netty.ActiveMQChannelHandler.lambda$channelInactive$0(ActiveMQChannelHandler.java:89) [artemis-core-client-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) [artemis-commons-2.17.0.jar:2.17.0]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [java.base:]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [java.base:]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.17.0.jar:2.17.0]
2022-04-07 10:01:23,032 WARN [org.apache.activemq.artemis.core.server] AMQ222010: Critical IO Error, shutting down the server. file=NULL, message=AMQ229114: Replication synchronization process timed out after waiting 30,000 milliseconds: ActiveMQReplicationTimeooutException[errorType=REPLICATION_TIMEOUT_ERROR message=AMQ229114: Replication synchronization process timed out after waiting 30,000 milliseconds]
at org.apache.activemq.artemis.core.replication.ReplicationManager.sendSynchronizationDone(ReplicationManager.java:660) [artemis-server-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:717) [artemis-server-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:180) [artemis-server-2.17.0.jar:2.17.0]
at java.base/java.lang.Thread.run(Thread.java:829) [java.base:]
2022-04-07 10:01:23,035 WARN [org.apache.activemq.artemis.core.server] AMQ222251: Unable to start replication: ActiveMQReplicationTimeooutException[errorType=REPLICATION_TIMEOUT_ERROR message=AMQ229114: Replication synchronization process timed out after waiting 30,000 milliseconds]
at org.apache.activemq.artemis.core.replication.ReplicationManager.sendSynchronizationDone(ReplicationManager.java:660) [artemis-server-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:717) [artemis-server-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:180) [artemis-server-2.17.0.jar:2.17.0]
at java.base/java.lang.Thread.run(Thread.java:829) [java.base:]
2022-04-07 10:01:23,045 WARN [org.apache.activemq.artemis.core.server] AMQ222013: Error when trying to start replication: ActiveMQReplicationTimeooutException[errorType=REPLICATION_TIMEOUT_ERROR message=AMQ229114: Replication synchronization process timed out after waiting 30,000 milliseconds]
at org.apache.activemq.artemis.core.replication.ReplicationManager.sendSynchronizationDone(ReplicationManager.java:660) [artemis-server-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:717) [artemis-server-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:180) [artemis-server-2.17.0.jar:2.17.0]
at java.base/java.lang.Thread.run(Thread.java:829) [java.base:]
2022-04-07 10:01:23,058 WARN [org.apache.activemq.artemis.utils.actors.OrderedExecutor] Server locator is closed (maybe it was garbage collected): java.lang.IllegalStateException: Server locator is closed (maybe it was garbage collected)
at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.assertOpen(ServerLocatorImpl.java:1848) [artemis-core-client-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:648) [artemis-core-client-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:549) [artemis-core-client-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:528) [artemis-core-client-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.core.server.cluster.ClusterController$ConnectRunnable.run(ClusterController.java:433) [artemis-server-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) [artemis-commons-2.17.0.jar:2.17.0]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [java.base:]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [java.base:]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.17.0.jar:2.17.0]
2022-04-07 10:01:23,297 INFO [io.hawt.web.auth.AuthenticationFilter] Destroying hawtio authentication filter
2022-04-07 10:01:23,303 INFO [io.hawt.HawtioContextListener] Destroying hawtio services
2022-04-07 10:01:23,325 INFO [org.apache.activemq.hawtio.plugin.PluginContextListener] Destroyed artemis-plugin plugin
2022-04-07 10:01:23,336 INFO [org.apache.activemq.hawtio.branding.PluginContextListener] Destroyed activemq-branding plugin
2022-04-07 10:01:23,384 INFO [org.apache.activemq.artemis.core.server] AMQ221002: Apache ActiveMQ Artemis Message Broker version 2.17.0 [eb08fe05-b640-11ec-b6e2-06e66cf4b718] stopped, uptime 1 minute
Full log from slave
2022-04-07 10:00:50,645 INFO [org.apache.activemq.artemis.integration.bootstrap] AMQ101000: Starting ActiveMQ Artemis Server
2022-04-07 10:00:50,703 INFO [org.apache.activemq.artemis.core.server] AMQ221000: backup Message Broker is starting with configuration Broker Configuration (clustered=true,journalDirectory=data/journal,bindingsDirectory=data/bindings,largeMessagesDirectory=data/large-messages,pagingDirectory=data/paging)
2022-04-07 10:00:50,747 INFO [org.apache.activemq.artemis.core.server] AMQ221055: There were too many old replicated folders upon startup, removing /opt/artemis/broker-slave/data/bindings/oldreplica.7
2022-04-07 10:00:50,750 INFO [org.apache.activemq.artemis.core.server] AMQ222162: Moving data directory /opt/artemis/broker-slave/data/bindings to /opt/artemis/broker-slave/data/bindings/oldreplica.9
2022-04-07 10:00:50,753 INFO [org.apache.activemq.artemis.core.server] AMQ221055: There were too many old replicated folders upon startup, removing /opt/artemis/broker-slave/data/journal/oldreplica.8
2022-04-07 10:00:50,755 INFO [org.apache.activemq.artemis.core.server] AMQ222162: Moving data directory /opt/artemis/broker-slave/data/journal to /opt/artemis/broker-slave/data/journal/oldreplica.10
2022-04-07 10:00:50,888 INFO [org.apache.activemq.artemis.core.server] AMQ221012: Using AIO Journal
2022-04-07 10:00:51,031 WARN [org.apache.activemq.artemis.core.server] AMQ222007: Security risk! Apache ActiveMQ Artemis is running with the default cluster admin user and default password. Please see the cluster chapter in the ActiveMQ Artemis User Guide for instructions on how to change this.
2022-04-07 10:00:51,124 INFO [org.apache.activemq.artemis.core.server] AMQ221057: Global Max Size is being adjusted to 1/2 of the JVM max size (-Xmx). being defined as 1,073,741,824
2022-04-07 10:00:51,323 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE
2022-04-07 10:00:51,326 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol support for: AMQP
2022-04-07 10:00:51,328 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol support for: HORNETQ
2022-04-07 10:00:51,329 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol support for: MQTT
2022-04-07 10:00:51,330 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding protocol support for: OPENWIRE
2022-04-07 10:00:51,330 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol support for: STOMP
2022-04-07 10:00:51,538 INFO [org.apache.activemq.hawtio.branding.PluginContextListener] Initialized activemq-branding plugin
2022-04-07 10:00:51,685 INFO [org.apache.activemq.hawtio.plugin.PluginContextListener] Initialized artemis-plugin plugin
2022-04-07 10:00:52,052 INFO [org.apache.activemq.artemis.core.server] AMQ221109: Apache ActiveMQ Artemis Backup Server version 2.18.0 [null] started, waiting live to fail before it gets active
2022-04-07 10:00:52,224 INFO [io.hawt.HawtioContextListener] Initialising hawtio services
2022-04-07 10:00:52,337 INFO [io.hawt.system.ConfigManager] Configuration will be discovered via system properties
2022-04-07 10:00:52,341 INFO [io.hawt.jmx.JmxTreeWatcher] Welcome to Hawtio 2.13.5
2022-04-07 10:00:52,351 INFO [io.hawt.web.auth.AuthenticationConfiguration] Starting hawtio authentication filter, JAAS realm: "activemq" authorized role(s): "amq" role principal classes: "org.apache.activemq.artemis.spi.core.security.jaas.RolePrincipal"
2022-04-07 10:00:52,383 INFO [io.hawt.web.proxy.ProxyServlet] Proxy servlet is disabled
2022-04-07 10:00:52,392 INFO [io.hawt.web.servlets.JolokiaConfiguredAgentServlet] Jolokia overridden property: [key=policyLocation, value=file:/opt/artemis/broker-slave/etc/jolokia-access.xml]
2022-04-07 10:00:52,570 INFO [org.apache.activemq.artemis] AMQ241001: HTTP Server started at http://localhost:8161
2022-04-07 10:00:52,571 INFO [org.apache.activemq.artemis] AMQ241002: Artemis Jolokia REST API available at http://localhost:8161/console/jolokia
2022-04-07 10:00:52,571 INFO [org.apache.activemq.artemis] AMQ241004: Artemis Console available at http://localhost:8161/console
2022-04-07 10:00:53,029 ERROR [org.apache.activemq.artemis.core.client] AMQ214013: Failed to decode packet: java.lang.IndexOutOfBoundsException: readerIndex(57) + length(1) exceeds writerIndex(57): PooledUnsafeDirectByteBuf(ridx: 57, widx: 57, cap: 57)
at io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1442) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.buffer.AbstractByteBuf.readByte(AbstractByteBuf.java:730) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.buffer.WrappedByteBuf.readByte(WrappedByteBuf.java:529) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at org.apache.activemq.artemis.core.buffers.impl.ChannelBufferWrapper.readByte(ChannelBufferWrapper.java:300) [artemis-commons-2.18.0.jar:2.18.0]
at org.apache.activemq.artemis.core.protocol.core.impl.wireformat.ReplicationStartSyncMessage.decodeRest(ReplicationStartSyncMessage.java:158) [artemis-server-2.18.0.jar:2.18.0]
at org.apache.activemq.artemis.core.protocol.core.impl.PacketImpl.decode(PacketImpl.java:371) [artemis-core-client-2.18.0.jar:2.18.0]
at org.apache.activemq.artemis.core.protocol.ServerPacketDecoder.slowPathDecode(ServerPacketDecoder.java:277) [artemis-server-2.18.0.jar:2.18.0]
at org.apache.activemq.artemis.core.protocol.ServerPacketDecoder.decode(ServerPacketDecoder.java:149) [artemis-server-2.18.0.jar:2.18.0]
at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:388) [artemis-core-client-2.18.0.jar:2.18.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$DelegatingBufferHandler.bufferReceived(ClientSessionFactoryImpl.java:1263) [artemis-core-client-2.18.0.jar:2.18.0]
at org.apache.activemq.artemis.core.remoting.impl.netty.ActiveMQChannelHandler.channelRead(ActiveMQChannelHandler.java:73) [artemis-core-client-2.18.0.jar:2.18.0]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.18.0.jar:2.18.0]
2022-04-07 10:00:53,031 ERROR [org.apache.activemq.artemis.core.client] AMQ214031: Failed to decode buffer, disconnect immediately.: java.lang.IllegalStateException: java.lang.IndexOutOfBoundsException: readerIndex(57) + length(1) exceeds writerIndex(57): PooledUnsafeDirectByteBuf(ridx: 57, widx: 57, cap: 57)
at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:401) [artemis-core-client-2.18.0.jar:2.18.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$DelegatingBufferHandler.bufferReceived(ClientSessionFactoryImpl.java:1263) [artemis-core-client-2.18.0.jar:2.18.0]
at org.apache.activemq.artemis.core.remoting.impl.netty.ActiveMQChannelHandler.channelRead(ActiveMQChannelHandler.java:73) [artemis-core-client-2.18.0.jar:2.18.0]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.18.0.jar:2.18.0]
Caused by: java.lang.IndexOutOfBoundsException: readerIndex(57) + length(1) exceeds writerIndex(57): PooledUnsafeDirectByteBuf(ridx: 57, widx: 57, cap: 57)
at io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1442) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.buffer.AbstractByteBuf.readByte(AbstractByteBuf.java:730) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at io.netty.buffer.WrappedByteBuf.readByte(WrappedByteBuf.java:529) [netty-all-4.1.66.Final.jar:4.1.66.Final]
at org.apache.activemq.artemis.core.buffers.impl.ChannelBufferWrapper.readByte(ChannelBufferWrapper.java:300) [artemis-commons-2.18.0.jar:2.18.0]
at org.apache.activemq.artemis.core.protocol.core.impl.wireformat.ReplicationStartSyncMessage.decodeRest(ReplicationStartSyncMessage.java:158) [artemis-server-2.18.0.jar:2.18.0]
at org.apache.activemq.artemis.core.protocol.core.impl.PacketImpl.decode(PacketImpl.java:371) [artemis-core-client-2.18.0.jar:2.18.0]
at org.apache.activemq.artemis.core.protocol.ServerPacketDecoder.slowPathDecode(ServerPacketDecoder.java:277) [artemis-server-2.18.0.jar:2.18.0]
at org.apache.activemq.artemis.core.protocol.ServerPacketDecoder.decode(ServerPacketDecoder.java:149) [artemis-server-2.18.0.jar:2.18.0]
at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:388) [artemis-core-client-2.18.0.jar:2.18.0]
... 20 more
2022-04-07 10:00:53,052 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure to 10.35.4.16/10.35.4.16:61616 has been detected: java.lang.IndexOutOfBoundsException: readerIndex(57) + length(1) exceeds writerIndex(57): PooledUnsafeDirectByteBuf(ridx: 57, widx: 57, cap: 57) [code=GENERIC_EXCEPTION]
2022-04-07 10:00:53,092 INFO [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: LiveFailoverQuorumVote
2022-04-07 10:00:53,093 INFO [org.apache.activemq.artemis.core.server] AMQ221084: Requested 0 quorum votes
2022-04-07 10:00:53,093 INFO [org.apache.activemq.artemis.core.server] AMQ221083: ignoring quorum vote as max cluster size is 1.
2022-04-07 10:00:53,094 INFO [org.apache.activemq.artemis.core.server] AMQ221071: Failing over based on quorum vote results.
2022-04-07 10:00:53,113 ERROR [org.apache.activemq.artemis.core.server] AMQ224000: Failure in initialisation: ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=AMQ229026: Backup Server was not yet in sync with live]
at org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:302) [artemis-server-2.18.0.jar:2.18.0]
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:4271) [artemis-server-2.18.0.jar:2.18.0]
2022-04-07 10:01:23,105 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure to 10.35.4.16/10.35.4.16:61616 has been detected: AMQ219015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
2022-04-07 10:01:23,107 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure to 10.35.4.16/10.35.4.16:61616 has been detected: AMQ219015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
2022-04-07 10:01:42,243 WARN [org.apache.activemq.artemis.core.client] AMQ212004: Failed to connect to server.
2022-04-07 10:01:42,274 INFO [io.hawt.web.auth.AuthenticationFilter] Destroying hawtio authentication filter
2022-04-07 10:01:42,278 INFO [io.hawt.HawtioContextListener] Destroying hawtio services
2022-04-07 10:01:42,296 INFO [org.apache.activemq.hawtio.plugin.PluginContextListener] Destroyed artemis-plugin plugin
2022-04-07 10:01:42,304 INFO [org.apache.activemq.hawtio.branding.PluginContextListener] Destroyed activemq-branding plugin
This bug was introduced via ARTEMIS-3340. It should be fixed in the next release (i.e. 2.22.0). Until then there's no work-around unfortunately.
You may consider creating a completely separate new live/backup pair and then using a DNS update to direct clients from the old pair to the new pair.

Log4j2 kafka appender failover is not working and with this error "Broker may not be available"

I have this log4j2.xml file I don't understand why Failover appender is not working when Kafka appender is failing.
My Kafka appender is:
<Kafka name="kafka" topic="myTopic" ignoreExceptions="false" >
<JsonLayout />
<Property name="bootstrap.servers">127.0.0.1:9092</Property>
</Kafka>
My Failover appender is:
<FailOver name="fail-over" primary="kafka" ignoreExceptions="false">
<Failovers>
<AppenderRef ref="randomFile" />
</Failovers>
</FailOver>
My root level is:
<Root level="info" includeLocation="false">
<AppenderRef ref="fail-over" />
</Root>
But for some reason I keep getting connection refusal error which exactly is:
16:32:05.669 [kafka-producer-network-thread | producer-1] DEBUG org.apache.kafka.common.network.Selector - [Producer clientId=producer-1] Connection with /127.0.0.1 disconnected
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.kafka.common.network.PlaintextTransportLayer.finishConnect(PlaintextTransportLayer.java:50)
at org.apache.kafka.common.network.KafkaChannel.finishConnect(KafkaChannel.java:224)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:526)
at org.apache.kafka.common.network.Selector.poll(Selector.java:481)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:561)
at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:327)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:242)
at java.lang.Thread.run(Thread.java:748)

Artemis broker is not announcing itself as backup on RHEL machine

I have 5 Red Hat Enterprise Linux (RHEL) 7.7 system with ActiveMQ Artemis brokers installed on each of them. I have configured 1 master and the rest as slaves. The snippet of broker.xml for the master broker is as follows:
<connectors>
<connector name="mainBrokerConnector">tcp://10.1.144.200:61616</connector>
</connectors>
<discovery-groups>
<discovery-group name="my-discovery-group">
<local-bind-address>10.1.144.200</local-bind-address>
<group-address>231.7.7.7</group-address>
<group-port>9876</group-port>
<refresh-timeout>10000</refresh-timeout>
</discovery-group>
</discovery-groups>
<cluster-connections>
<cluster-connection name="my-cluster">
<connector-ref>mainBrokerConnector</connector-ref>
<retry-interval>500</retry-interval>
<use-duplicate-detection>true</use-duplicate-detection>
<message-load-balancing>STRICT</message-load-balancing>
<max-hops>1</max-hops>
<discovery-group-ref discovery-group-name="my-discovery-group"/>
</cluster-connection>
</cluster-connections>
<broadcast-groups>
<broadcast-group name="my-broadcast-group">
<local-bind-address>10.1.144.200</local-bind-address>
<local-bind-port>5432</local-bind-port>
<group-address>231.7.7.7</group-address>
<group-port>9876</group-port>
<broadcast-period>2000</broadcast-period>
<connector-ref>mainBrokerConnector</connector-ref>
</broadcast-group>
</broadcast-groups>
<ha-policy>
<replication>
<master>
<check-for-live-server>true</check-for-live-server>
</master>
</replication>
</ha-policy>
Three of the four backup brokers when started (./artemis run) announce themselves as backups. In the logs of the backup brokers I see something like this (hence I know that they are in sync):
2020-10-07 14:19:43,480 INFO [org.apache.activemq.artemis.core.server] AMQ221024: Backup server ActiveMQServerImpl::serverUUID=e2fbfd28-04ec-11eb-b49a-005056aded6a is synchronized with live-server.
2020-10-07 14:19:43,768 INFO [org.apache.activemq.artemis.core.server] AMQ221031: backup announced
However, the fourth broker starts but does not logs anything about syncing. The broker.xml snippet of the fourth broker is as follows
<connectors>
<connector name="brokerTwoConnector">tcp://10.1.144.202:61616</connector>
</connectors>
<discovery-groups>
<discovery-group name="my-discovery-group">
<local-bind-address>10.1.144.202</local-bind-address>
<group-address>231.7.7.7</group-address>
<group-port>9876</group-port>
<refresh-timeout>10000</refresh-timeout>
</discovery-group>
</discovery-groups>
<cluster-connections>
<cluster-connection name="my-cluster">
<connector-ref>brokerTwoConnector</connector-ref>
<retry-interval>500</retry-interval>
<use-duplicate-detection>true</use-duplicate-detection>
<message-load-balancing>STRICT</message-load-balancing>
<max-hops>1</max-hops>
<discovery-group-ref discovery-group-name="my-discovery-group"/>
</cluster-connection>
</cluster-connections>
<broadcast-groups>
<broadcast-group name="my-broadcast-group">
<local-bind-address>10.1.144.202</local-bind-address>
<local-bind-port>5432</local-bind-port>
<group-address>231.7.7.7</group-address>
<group-port>9876</group-port>
<broadcast-period>2000</broadcast-period>
<connector-ref>brokerTwoConnector</connector-ref>
</broadcast-group>
</broadcast-groups>
<ha-policy>
<replication>
<slave>
<allow-failback>true</allow-failback>
</slave>
</replication>
</ha-policy>
Rest of the slave brokers have the same configuration except different host IP addresses and connector names. So what is not configured properly that this broker is not syncing with the master broker?
EDIT
When I start the culprit broker then following logs appear which shows that its part of a cluster:
2020-10-07 16:10:15,525 INFO [org.apache.activemq.artemis.integration.bootstrap] AMQ101000: Starting ActiveMQ Artemis Server
2020-10-07 16:10:15,572 INFO [org.apache.activemq.artemis.core.server] AMQ221000: backup Message Broker is starting with configuration Broker Configuration (clustered=true,journalDirectory=data/journal,bindingsDirectory=data/bindings,largeMessagesDirectory=data/large-messages,pagingDirectory=data/paging)
2020-10-07 16:10:15,598 INFO [org.apache.activemq.artemis.core.server] AMQ222162: Moving data directory /opt/brokerTwo/brokerTwo/data/journal to /opt/brokerTwo/brokerTwo/data/journal/oldreplica.2
2020-10-07 16:10:15,631 INFO [org.apache.activemq.artemis.core.server] AMQ221012: Using AIO Journal
2020-10-07 16:10:15,740 WARN [org.apache.activemq.artemis.core.server] AMQ222007: Security risk! Apache ActiveMQ Artemis is running with the default cluster admin user and default password. Please see the cluster chapter in the ActiveMQ Artemis User Guide for instructions on how to change this.
2020-10-07 16:10:15,749 INFO [org.apache.activemq.artemis.core.server] AMQ221057: Global Max Size is being adjusted to 1/2 of the JVM max size (-Xmx). being defined as 1,073,741,824
2020-10-07 16:10:15,964 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE
2020-10-07 16:10:15,966 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol support for: AMQP
2020-10-07 16:10:15,967 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol support for: HORNETQ
2020-10-07 16:10:15,968 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol support for: MQTT
2020-10-07 16:10:15,968 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding protocol support for: OPENWIRE
2020-10-07 16:10:15,969 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol support for: STOMP
2020-10-07 16:10:16,203 INFO [org.apache.activemq.hawtio.branding.PluginContextListener] Initialized activemq-branding plugin
2020-10-07 16:10:16,279 INFO [org.apache.activemq.hawtio.plugin.PluginContextListener] Initialized artemis-plugin plugin
2020-10-07 16:10:17,173 INFO [io.hawt.HawtioContextListener] Initialising hawtio services
2020-10-07 16:10:17,201 INFO [io.hawt.system.ConfigManager] Configuration will be discovered via system properties
2020-10-07 16:10:17,205 INFO [io.hawt.jmx.JmxTreeWatcher] Welcome to hawtio 1.5.5 : http://hawt.io/ : Don't cha wish your console was hawt like me? ;-)
2020-10-07 16:10:17,210 INFO [io.hawt.jmx.UploadManager] Using file upload directory: /opt/brokerTwo/brokerTwo/tmp/uploads
2020-10-07 16:10:17,239 INFO [io.hawt.web.AuthenticationFilter] Starting hawtio authentication filter, JAAS realm: "activemq" authorized role(s): "amq" role principal classes: "org.apache.activemq.artemis.spi.core.security.jaas.RolePrincipal"
2020-10-07 16:10:17,293 INFO [io.hawt.web.JolokiaConfiguredAgentServlet] Jolokia overridden property: [key=policyLocation, value=file:/opt/brokerTwo/brokerTwo/etc/jolokia-access.xml]
2020-10-07 16:10:17,355 INFO [io.hawt.web.RBACMBeanInvoker] Using MBean [hawtio:type=security,area=jmx,rank=0,name=HawtioDummyJMXSecurity] for role based access control

Artemis Master-Slave failover and sync error message in logs

After struggling with Artemis 2.11 and an older Java version I decided to update my whole system to the "latest greatest software" that is currently available. So I am using Artemis 2.14 and Java 14.0.2 on two Ubuntu 18.04 VM with 4 Cores an 16 GB RAM.
I configured the master-slave ha-policy like this:
MASTER:
<ha-policy>
<replication>
<master>
<check-for-live-server>true</check-for-live-server>
</master>
</replication>
</ha-policy>
SLAVE
<ha-policy>
<replication>
<slave>
<allow-failback>true</allow-failback>
</slave>
</replication>
</ha-policy>
And I am using the cluster-connection like this:
MASTER
<cluster-connections>
<cluster-connection name="artemis-cluster">
<connector-ref>Artemis-Node-A-Sync</connector-ref>
<retry-interval>500</retry-interval>
<use-duplicate-detection>true</use-duplicate-detection>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<static-connectors>
<connector-ref>Node-B-Sync</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
SLAVE
<cluster-connections>
<cluster-connection name="artemis-cluster">
<connector-ref>Node-B-Sync</connector-ref>
<retry-interval>500</retry-interval>
<use-duplicate-detection>true</use-duplicate-detection>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<static-connectors>
<connector-ref>Node-A-Sync</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
My problem is that I get this ERROR message from at SLAVE...
2020-08-07 12:45:37,548 INFO [io.hawt.system.ConfigManager] Configuration will be discovered via system properties
2020-08-07 12:45:37,550 INFO [io.hawt.jmx.JmxTreeWatcher] Welcome to hawtio 1.5.12 : http://hawt.io/ : Don't cha wish your console was hawt like me? ;-)
2020-08-07 12:45:37,552 INFO [io.hawt.jmx.UploadManager] Using file upload directory: /opt/mybroker-broker/tmp/uploads
2020-08-07 12:45:37,565 INFO [io.hawt.web.AuthenticationFilter] Starting hawtio authentication filter, JAAS realm: "activemq" authorized role(s): "amq" role principal classes: "org.apache.activemq.artemis.spi.core.security.jaas.RolePrincipal"
2020-08-07 12:45:37,585 INFO [io.hawt.web.JolokiaConfiguredAgentServlet] Jolokia overridden property: [key=policyLocation, value=file:/opt/mybroker-broker/etc/jolokia-access.xml]
2020-08-07 12:45:37,605 INFO [io.hawt.web.RBACMBeanInvoker] Using MBean [hawtio:type=security,area=jmx,rank=0,name=HawtioDummyJMXSecurity] for role based access control
2020-08-07 12:45:37,703 INFO [io.hawt.system.ProxyWhitelist] Initial proxy whitelist: [localhost, 127.0.0.1, *.*.*.*, *.*.*.*, localhost.localdomain]
2020-08-07 12:45:37,966 INFO [org.apache.activemq.artemis] AMQ241001: HTTP Server started at http://0.0.0.0:8161
2020-08-07 12:45:37,967 INFO [org.apache.activemq.artemis] AMQ241002: Artemis Jolokia REST API available at http://0.0.0.0:8161/console/jolokia
2020-08-07 12:45:37,967 INFO [org.apache.activemq.artemis] AMQ241004: Artemis Console available at http://0.0.0.0:8161/console
2020-08-07 12:45:46,905 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure to 192.168.144.75/192.168.144.75:22522 has been detected: AMQ219015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
2020-08-07 12:45:50,678 INFO [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: LiveFailoverQuorumVote
2020-08-07 12:45:50,678 INFO [org.apache.activemq.artemis.core.server] AMQ221084: Requested 0 quorum votes
2020-08-07 12:45:50,679 INFO [org.apache.activemq.artemis.core.server] AMQ221083: ignoring quorum vote as max cluster size is 1.
2020-08-07 12:45:50,679 INFO [org.apache.activemq.artemis.core.server] AMQ221071: Failing over based on quorum vote results.
2020-08-07 12:45:50,720 ERROR [org.apache.activemq.artemis.core.server] AMQ224000: Failure in initialisation: ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=AMQ229026: Backup Server was not yet in sync with live]
at org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:310) [artemis-server-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:3946) [artemis-server-2.14.0.jar:2.14.0]
The logfile at master is:
2020-08-07 12:44:58,292 INFO [org.apache.activemq.artemis.core.server] AMQ221014: 21% loaded
2020-08-07 12:44:58,540 INFO [org.apache.activemq.artemis.core.server] AMQ221014: 42% loaded
2020-08-07 12:44:59,020 INFO [org.apache.activemq.artemis.core.server] AMQ221014: 64% loaded
2020-08-07 12:44:59,416 INFO [org.apache.activemq.artemis.core.server] AMQ221014: 85% loaded
2020-08-07 12:45:12,143 INFO [org.apache.activemq.artemis.core.server] AMQ221080: Deploying address DLQ supporting [ANYCAST]
2020-08-07 12:45:12,145 INFO [org.apache.activemq.artemis.core.server] AMQ221003: Deploying ANYCAST queue DLQ on address DLQ
2020-08-07 12:45:12,151 INFO [org.apache.activemq.artemis.core.server] AMQ221080: Deploying address ExpiryQueue supporting [ANYCAST]
2020-08-07 12:45:12,152 INFO [org.apache.activemq.artemis.core.server] AMQ221003: Deploying ANYCAST queue ExpiryQueue on address ExpiryQueue
2020-08-07 12:45:12,382 INFO [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:1883 for protocols [MQTT]
2020-08-07 12:45:12,385 INFO [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:22522 for protocols [CORE,MQTT,AMQP,STOMP,HORNETQ,OPENWIRE]
2020-08-07 12:45:12,387 INFO [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:8883 for protocols [MQTT]
2020-08-07 12:45:12,388 INFO [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:18884 for protocols [MQTT]
2020-08-07 12:45:12,388 INFO [org.apache.activemq.artemis.core.server] AMQ221007: Server is now live
2020-08-07 12:45:12,389 INFO [org.apache.activemq.artemis.core.server] AMQ221001: Apache ActiveMQ Artemis Message Broker version 2.14.0 [0.0.0.0, nodeID=95f808d9-d641-11ea-9c48-005056073c33]
2020-08-07 12:45:12,650 INFO [org.apache.activemq.hawtio.branding.PluginContextListener] Initialized activemq-branding plugin
2020-08-07 12:45:12,686 INFO [org.apache.activemq.hawtio.plugin.PluginContextListener] Initialized artemis-plugin plugin
2020-08-07 12:45:13,281 INFO [io.hawt.HawtioContextListener] Initialising hawtio services
2020-08-07 12:45:13,304 INFO [io.hawt.system.ConfigManager] Configuration will be discovered via system properties
2020-08-07 12:45:13,306 INFO [io.hawt.jmx.JmxTreeWatcher] Welcome to hawtio 1.5.12 : http://hawt.io/ : Don't cha wish your console was hawt like me? ;-)
2020-08-07 12:45:13,311 INFO [io.hawt.jmx.UploadManager] Using file upload directory: /opt/mybroker-broker/tmp/uploads
2020-08-07 12:45:13,325 INFO [io.hawt.web.AuthenticationFilter] Starting hawtio authentication filter, JAAS realm: "activemq" authorized role(s): "amq" role principal classes: "org.apache.activemq.artemis.spi.core.security.jaas.RolePrincipal"
2020-08-07 12:45:13,344 INFO [io.hawt.web.JolokiaConfiguredAgentServlet] Jolokia overridden property: [key=policyLocation, value=file:/opt/mybroker-broker/etc/jolokia-access.xml]
2020-08-07 12:45:13,373 INFO [io.hawt.web.RBACMBeanInvoker] Using MBean [hawtio:type=security,area=jmx,rank=0,name=HawtioDummyJMXSecurity] for role based access control
2020-08-07 12:45:13,472 INFO [io.hawt.system.ProxyWhitelist] Initial proxy whitelist: [localhost, 127.0.0.1, *.*.*.*, 192.168.144.75, localhost.localdomain]
2020-08-07 12:45:13,680 INFO [org.apache.activemq.artemis] AMQ241001: HTTP Server started at http://0.0.0.0:8161
2020-08-07 12:45:13,681 INFO [org.apache.activemq.artemis] AMQ241002: Artemis Jolokia REST API available at http://0.0.0.0:8161/console/jolokia
2020-08-07 12:45:13,681 INFO [org.apache.activemq.artemis] AMQ241004: Artemis Console available at http://0.0.0.0:8161/console
2020-08-07 12:45:46,608 WARN [org.apache.activemq.artemis.core.server] AMQ222010: Critical IO Error, shutting down the server. file=NIOSequentialFile /opt/mybroker-broker/data/paging/9fbeba62-d6db-11ea-bb8e-005056073c33/000000003.page, message=/opt/mybroker-broker/data/paging/9fbeba62-d6db-11ea-bb8e-005056073c33/000000003.page (Too many open files): ActiveMQIOErrorException[errorType=IO_ERROR message=/opt/mybroker-broker/data/paging/9fbeba62-d6db-11ea-bb8e-005056073c33/000000003.page (Too many open files)]
at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.open(NIOSequentialFile.java:156) [artemis-journal-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.open(NIOSequentialFile.java:98) [artemis-journal-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.core.paging.impl.Page.open(Page.java:483) [artemis-server-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl.openNewPage(PagingStoreImpl.java:1136) [artemis-server-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl.forceAnotherPage(PagingStoreImpl.java:606) [artemis-server-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.getPageInformationForSync(JournalStorageManager.java:738) [artemis-server-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:665) [artemis-server-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:179) [artemis-server-2.14.0.jar:2.14.0]
at java.base/java.lang.Thread.run(Thread.java:832) [java.base:]
Caused by: java.io.FileNotFoundException: /opt/mybroker-broker/data/paging/9fbeba62-d6db-11ea-bb8e-005056073c33/000000003.page (Too many open files)
at java.base/java.io.RandomAccessFile.open0(Native Method) [java.base:]
at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:347) [java.base:]
at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:261) [java.base:]
at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:216) [java.base:]
at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.open(NIOSequentialFile.java:143) [artemis-journal-2.14.0.jar:2.14.0]
... 8 more
2020-08-07 12:45:46,615 WARN [org.apache.activemq.artemis.core.server] AMQ222251: Unable to start replication: java.io.FileNotFoundException: /opt/mybroker-broker/data/paging/9fbeba62-d6db-11ea-bb8e-005056073c33/000000003.page (Too many open files)
at java.base/java.io.RandomAccessFile.open0(Native Method) [java.base:]
at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:347) [java.base:]
at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:261) [java.base:]
at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:216) [java.base:]
at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.open(NIOSequentialFile.java:143) [artemis-journal-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.open(NIOSequentialFile.java:98) [artemis-journal-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.core.paging.impl.Page.open(Page.java:483) [artemis-server-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl.openNewPage(PagingStoreImpl.java:1136) [artemis-server-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl.forceAnotherPage(PagingStoreImpl.java:606) [artemis-server-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.getPageInformationForSync(JournalStorageManager.java:738) [artemis-server-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:665) [artemis-server-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:179) [artemis-server-2.14.0.jar:2.14.0]
at java.base/java.lang.Thread.run(Thread.java:832) [java.base:]
2020-08-07 12:45:46,693 WARN [org.apache.activemq.artemis.utils.actors.OrderedExecutor] Server locator is closed (maybe it was garbage collected): java.lang.IllegalStateException: Server locator is closed (maybe it was garbage collected)
at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.assertOpen(ServerLocatorImpl.java:1844) [artemis-core-client-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:644) [artemis-core-client-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:545) [artemis-core-client-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:524) [artemis-core-client-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.core.server.cluster.ClusterController$ConnectRunnable.run(ClusterController.java:433) [artemis-server-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.14.0.jar:2.14.0]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) [artemis-commons-2.14.0.jar:2.14.0]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [java.base:]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [java.base:]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.14.0.jar:2.14.0]
2020-08-07 12:45:47,407 INFO [io.hawt.HawtioContextListener] Destroying hawtio services
2020-08-07 12:45:47,409 INFO [io.hawt.web.AuthenticationFilter] Destroying hawtio authentication filter 2020-08-07 12:45:47,442 INFO [org.apache.activemq.hawtio.plugin.PluginContextListener] Destroyed artemis-plugin plugin
2020-08-07 12:45:47,444 INFO [org.apache.activemq.hawtio.branding.PluginContextListener] Destroyed activemq-branding plugin
2020-08-07 12:45:47,476 INFO [org.apache.activemq.artemis.core.server] AMQ221002: Apache ActiveMQ Artemis Message Broker version 2.14.0 [95f808d9-d641-11ea-9c48-005056073c33] stopped, uptime 1 minute
tail: /opt/mybroker-broker/log/artemis.log: file truncated
2020-08-07 12:46:07,212 INFO [org.apache.activemq.artemis.integration.bootstrap] AMQ101000: Starting ActiveMQ Artemis Server
2020-08-07 12:46:07,242 INFO [org.apache.activemq.artemis.core.server] AMQ221000: live Message Broker is starting with configuration Broker Configuration (clustered=true,journalDirectory=data/journal,bindingsDirectory=data/bindings,largeMessagesDirectory=data/large-messages,pagingDirectory=data/paging)
2020-08-07 12:46:09,516 INFO [org.apache.activemq.artemis.core.server] AMQ221012: Using AIO Journal
2020-08-07 12:46:09,542 INFO [org.apache.activemq.artemis.core.server] AMQ221057: Global Max Size is being adjusted to 1/2 of the JVM max size (-Xmx). being defined as 5,368,709,120
2020-08-07 12:46:09,589 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE
2020-08-07 12:46:09,590 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol support for: AMQP
2020-08-07 12:46:09,591 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol support for: HORNETQ
2020-08-07 12:46:09,592 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol support for: MQTT
2020-08-07 12:46:09,592 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding protocol support for: OPENWIRE
2020-08-07 12:46:09,593 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol support for: STOMP
2020-08-07 12:46:09,776 WARN [org.apache.activemq.artemis.core.server] AMQ222035: Directory /opt/mybroker-broker/data/paging/9fd77264-d6db-11ea-bb8e-005056073c33 did not have an identification file address.txt
2020-08-07 12:46:13,695 WARN [org.apache.activemq.artemis.core.server] AMQ222035: Directory /opt/mybroker-broker/data/paging/9fd77264-d6db-11ea-bb8e-005056073c33 did not have an identification file address.txt
...when I do the following:
Stop the SLAVE
Restart the MASTER
Start the SLAVE
Using Jolokia I see that the SLAVE is connected to the MASTER after some time.
Questions:
What does the ERROR message mean?
Is there an CLI command to get some information about synchronization?
Is there an CLI command to get some system/cluster status information?
The fundamental problem is noted in the log from the master broker:
2020-08-07 12:45:46,615 WARN [org.apache.activemq.artemis.core.server] AMQ222251: Unable to start replication: java.io.FileNotFoundException: /opt/mybroker-broker/data/paging/9fbeba62-d6db-11ea-bb8e-005056073c33/000000003.page (Too many open files)
This is an environmental error indicating that the operating system will not allow the process (i.e. the broker) to open any more files.
Since the broker can't open all the files it needs then it can't properly synchronize its data with the backup. Furthermore, the inability to open additional files is interpreted as a "critical" error which forces the broker to shut itself down. When the master broker shuts down the slave activates, but since the master was not able to complete the initial replication with the slave the slave states:
2020-08-07 12:45:50,720 ERROR [org.apache.activemq.artemis.core.server] AMQ224000: Failure in initialisation: ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=AMQ229026: Backup Server was not yet in sync with live]
The slave is simply logging the fact that it didn't complete the initial synchronization/replication process with the master before the master was shut down.
Ultimately you need to increase the file limit for the user running the broker. A quick web search should help you with the specifics of that task for your particular operating system.

HornetQ (Live & Backup Server setup)

Thanks in advance, please help me out is solving the error:
HornetQ Ver 2.2.14 Final
Live Server is getting started without any problem.
I am able to shift from (Live Sever to Backup Server) and (Backup Server to Live Server)
by down/Up the live server in a proper way, even though the below error is coming
Backup Server is getting started starting normally, after 5 secs...
* [Thread-0 (HornetQ-server-HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12-19420919)] 27-Mar 15:8:6,218 WARNING [ClusterConnectionImpl] Unable to announce backup, retrying
HornetQException[errorCode=3 message=Timed out waiting to receive initial broadcast from cluster] at org.hornetq.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:716) at org.hornetq.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:593)
at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl$2.run(ClusterConnectionImpl.java:485)at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
**Live Server Conf Files**
---------------------
1.*hornetq-configuration.xml*
-----------------------------
<configuration xmlns="urn:hornetq"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:hornetq /schema/hornetq-configuration.xsd">
<clustered>true</clustered>
<backup>false</backup>
<shared-store>true</shared-store>
<allow-failback>false</allow-failback>
<failover-on-shutdown>true</failover-on-shutdown>
<bindings-directory>d:/temp/data/bindings</bindings-directory>
<large-messages-directory>d:/temp/data/largemessages</large-messages-directory>
<paging-directory>d:/temp/data/paging</paging-directory>
<journal-directory>d:/temp/data/journal</journal-directory>
<journal-min-files>10</journal-min-files>
<connectors>
<connector name="netty">
<factory-class>org.hornetq.core.remoting.impl.netty.NettyConnectorFactory</factory-class>
<param key="host" value="192.100.101.42"/>
<param key="port" value="5445"/>
</connector>
</connectors>
<acceptors>
<acceptor name="netty">
<factory-class>org.hornetq.core.remoting.impl.netty.NettyAcceptorFactory</factory-class>
<param key="host" value="192.100.101.42"/>
<param key="port" value="5445"/>
</acceptor>
</acceptors>
<broadcast-groups>
<broadcast-group name="vsbg-group1">
<group-address>231.7.7.7</group-address>
<group-port>9876</group-port>
<broadcast-period>1000</broadcast-period>
<connector-ref>netty</connector-ref>
</broadcast-group>
</broadcast-groups>
<discovery-groups>
<discovery-group name="vsdg-group1">
<group-address>231.7.7.7</group-address>
<group-port>9876</group-port>
<refresh-timeout>10000</refresh-timeout>
</discovery-group>
</discovery-groups>
<cluster-connections>
<cluster-connection name="vs-cluster">
<address>jms</address>
<connector-ref>netty</connector-ref>
<discovery-group-ref discovery-group-name="vsdg-group1"/>
</cluster-connection>
</cluster-connections>
<security-settings>
<security-setting match="#">
<permission type="createNonDurableQueue" roles="guest"/>
<permission type="deleteNonDurableQueue" roles="guest"/>
<permission type="consume" roles="guest"/>
<permission type="send" roles="guest"/>
</security-setting>
</security-settings>
<address-settings>
<!--default for catch all-->
<address-setting match="#">
<dead-letter-address>jms.queue.DLQ</dead-letter-address>
<expiry-address>jms.queue.ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<max-size-bytes>10485760</max-size-bytes>
<address-full-policy>BLOCK</address-full-policy>
</address-setting>
</address-settings>
</configuration>
2.*hornetq-jms.xml*
-----------------
<configuration xmlns="urn:hornetq"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:hornetq /schema/hornetq-jms.xsd">
<connection-factory name="ConnectionFactory">
<xa>false</xa>
<connectors>
<connector-ref connector-name="netty"/>
</connectors>
<entries>
<entry name="ConnectionFactory"/>
</entries>
<ha>true</ha>
<retry-interval>100</retry-interval>
<retry-interval-multiplier>1.0</retry-interval-multiplier>
</connection-factory>
<queue name="LOAD_TEST">
<entry name="/queue/LOAD_TEST"/>
</queue>
<queue name="DLQ">
<entry name="/queue/DLQ"/>
</queue>
<queue name="ExpiryQueue">
<entry name="/queue/ExpiryQueue"/>
</queue>
</configuration>
3.*Hornetq.log*
--------------
* [main] 27-Mar 15:7:52,453 INFO [HornetQBootstrapServer] Starting HornetQ Server
* [main] 27-Mar 15:7:53,265 WARNING [FileConfigurationParser] AIO wasn't located on this platform, it will fall back to using pure Java NIO. If your platform is Linux, install LibAIO to enable the AIO journal
* [main] 27-Mar 15:7:53,312 INFO [HornetQServerImpl] live server is starting with configuration HornetQ Configuration (clustered=true,backup=false,sharedStore=true,journalDirectory=d:/temp/data/journal,bindingsDirectory=d:/temp/data/bindings,largeMessagesDirectory=d:/temp/data/largemessages,pagingDirectory=d:/temp/data/paging)
* [main] 27-Mar 15:7:53,312 INFO [HornetQServerImpl] Waiting to obtain live lock
* [main] 27-Mar 15:7:53,343 INFO [JournalStorageManager] Using NIO Journal
* [main] 27-Mar 15:7:53,359 WARNING [HornetQServerImpl] Security risk! It has been detected that the cluster admin user and password have not been changed from the installation default. Please see the HornetQ user guide, cluster chapter, for instructions on how to do this.
* [main] 27-Mar 15:7:53,484 INFO [FileLockNodeManager] Waiting to obtain live lock
* [main] 27-Mar 15:7:53,484 INFO [FileLockNodeManager] Live Server Obtained live lock
* [main] 27-Mar 15:7:55,0 INFO [HornetQServerImpl] trying to deploy queue jms.queue.LOAD_TEST
* [main] 27-Mar 15:7:55,0 INFO [HornetQServerImpl] trying to deploy queue jms.queue.DLQ
* [main] 27-Mar 15:7:55,15 INFO [HornetQServerImpl] trying to deploy queue jms.queue.ExpiryQueue
* [main] 27-Mar 15:7:55,109 INFO [NettyAcceptor] Started Netty Acceptor version 3.2.5.Final-a96d88c 192.100.101.42:5445 for CORE protocol
* [main] 27-Mar 15:7:55,109 INFO [HornetQServerImpl] Server is now live
* [main] 27-Mar 15:7:55,109 INFO [HornetQServerImpl] HornetQ Server version 2.2.14.Final (HQ_2_2_14_FINAL, 122) [80d285b6-96ba-11e2-9528-817a37231a12]) started
**Backup Server Config Files**
-------------------------
1.*hornetq-configuration.xml*
---------------------------
<configuration xmlns="urn:hornetq"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:hornetq /schema/hornetq-configuration.xsd">
<clustered>true</clustered>
<backup>true</backup>
<shared-store>true</shared-store>
<allow-failback>true</allow-failback>
<failover-on-shutdown>false</failover-on-shutdown>
<bindings-directory>d:/temp/data/bindings</bindings-directory>
<large-messages-directory>d:/temp/data/largemessages</large-messages-directory>
<paging-directory>d:/temp/data/paging</paging-directory>
<journal-directory>d:/temp/data/journal</journal-directory>
<journal-min-files>10</journal-min-files>
<connectors>
<connector name="netty">
<factory-class>org.hornetq.core.remoting.impl.netty.NettyConnectorFactory</factory-class>
<param key="host" value="192.100.101.42"/>
<param key="port" value="5446"/>
</connector>
</connectors>
<acceptors>
<acceptor name="netty">
<factory-class>org.hornetq.core.remoting.impl.netty.NettyAcceptorFactory</factory-class>
<param key="host" value="192.100.101.42"/>
<param key="port" value="5446"/>
</acceptor>
</acceptors>
<broadcast-groups>
<broadcast-group name="vsbg-group1">
<group-address>231.7.7.7</group-address>
<group-port>9876</group-port>
<broadcast-period>1000</broadcast-period>
<connector-ref>netty</connector-ref>
</broadcast-group>
</broadcast-groups>
<discovery-groups>
<discovery-group name="vsdg-group1">
<group-address>231.7.7.7</group-address>
<group-port>9876</group-port>
<refresh-timeout>10000</refresh-timeout>
</discovery-group>
</discovery-groups>
<cluster-connections>
<cluster-connection name="vs-cluster">
<address>jms</address>
<connector-ref>netty</connector-ref>
<discovery-group-ref discovery-group-name="vsdg-group1"/>
</cluster-connection>
</cluster-connections>
<security-settings>
<security-setting match="#">
<permission type="createNonDurableQueue" roles="guest"/>
<permission type="deleteNonDurableQueue" roles="guest"/>
<permission type="consume" roles="guest"/>
<permission type="send" roles="guest"/>
</security-setting>
</security-settings>
<address-settings>
<!--default for catch all-->
<address-setting match="#">
<dead-letter-address>jms.queue.DLQ</dead-letter-address>
<expiry-address>jms.queue.ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<max-size-bytes>10485760</max-size-bytes>
<address-full-policy>BLOCK</address-full-policy>
</address-setting>
</address-settings>
</configuration>
2. *hornetq-jms.xml (same as live server)*
---------------------------------------
3. *hornetq.log*
---------------
* [main] 27-Mar 15:7:55,203 INFO [HornetQBootstrapServer] Starting HornetQ Server
* [main] 27-Mar 15:7:56,15 WARNING [FileConfigurationParser] AIO wasn't located on this platform, it will fall back to using pure Java NIO. If your platform is Linux, install LibAIO to enable the AIO journal
* [main] 27-Mar 15:7:56,62 INFO [HornetQServerImpl] backup server is starting with configuration HornetQ Configuration (clustered=true,backup=true,sharedStore=true,journalDirectory=d:/temp/data/journal,bindingsDirectory=d:/temp/data/bindings,largeMessagesDirectory=d:/temp/data/largemessages,pagingDirectory=d:/temp/data/paging)
* [Activation for server HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12] 27-Mar 15:7:56,62 INFO [FileLockNodeManager] Waiting to become backup node
* [Activation for server HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12] 27-Mar 15:7:56,62 INFO [FileLockNodeManager] ** got backup lock
* [Activation for server HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12] 27-Mar 15:7:56,93 INFO [JournalStorageManager] Using NIO Journal
* [Activation for server HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12] 27-Mar 15:7:56,109 WARNING [HornetQServerImpl] Security risk! It has been detected that the cluster admin user and password have not been changed from the installation default. Please see the HornetQ user guide, cluster chapter, for instructions on how to do this.
* [Activation for server HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12] 27-Mar 15:7:56,218 INFO [HornetQServerImpl] HornetQ Backup Server version 2.2.14.Final (HQ_2_2_14_FINAL, 122) [80d285b6-96ba-11e2-9528-817a37231a12] started, waiting live to fail before it gets active
* [Thread-0 (HornetQ-server-HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12-19420919)] 27-Mar 15:8:6,218 WARNING [ClusterConnectionImpl] Unable to announce backup, retrying
HornetQException[errorCode=3 message=Timed out waiting to receive initial broadcast from cluster]
at org.hornetq.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:716)
at org.hornetq.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:593)
at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl$2.run(ClusterConnectionImpl.java:485)
at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
* [Thread-1 (HornetQ-server-HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12-19420919)] 27-Mar 15:8:16,734 WARNING [ClusterConnectionImpl] Unable to announce backup, retrying
HornetQException[errorCode=3 message=Timed out waiting to receive initial broadcast from cluster]
at org.hornetq.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:716)
at org.hornetq.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:593)
at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl$2.run(ClusterConnectionImpl.java:485)
at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
* [Thread-2 (HornetQ-server-HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12-19420919)] 27-Mar 15:8:27,250 WARNING [ClusterConnectionImpl] Unable to announce backup, retrying
HornetQException[errorCode=3 message=Timed out waiting to receive initial broadcast from cluster]
at org.hornetq.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:716)
at org.hornetq.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:593)
at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl$2.run(ClusterConnectionImpl.java:485)
at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
* [Activation for server HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12] 27-Mar 15:8:29,187 INFO [HornetQServerImpl] trying to deploy queue jms.queue.LOAD_TEST
* [Activation for server HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12] 27-Mar 15:8:29,187 INFO [HornetQServerImpl] trying to deploy queue jms.queue.DLQ
* [Activation for server HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12] 27-Mar 15:8:29,203 INFO [HornetQServerImpl] trying to deploy queue jms.queue.ExpiryQueue
* [Activation for server HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12] 27-Mar 15:8:29,265 INFO [NettyAcceptor] Started Netty Acceptor version 3.2.5.Final-a96d88c 192.100.101.42:5446 for CORE protocol
* [Thread-3 (HornetQ-server-HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12-19420919)] 27-Mar 15:8:29,281 WARNING [ClusterConnectionImpl] Unable to announce backup, retrying
HornetQException[errorCode=3 message=Timed out waiting to receive initial broadcast from cluster]
at org.hornetq.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:716)
at org.hornetq.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:593)
at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl$2.run(ClusterConnectionImpl.java:485)
at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
* [Activation for server HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12] 27-Mar 15:8:29,296 INFO [HornetQServerImpl] Backup Server is now live
* [Thread-8 (HornetQ-server-HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12-19420919)] 27-Mar 15:8:29,812 WARNING [ClusterConnectionImpl] Unable to announce backup, retrying
java.lang.NullPointerException
at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl$2.run(ClusterConnectionImpl.java:485)
at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
* [Thread-9 (HornetQ-server-HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12-19420919)] 27-Mar 15:8:30,328 WARNING [ClusterConnectionImpl] Unable to announce backup, retrying
java.lang.NullPointerException
at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl$2.run(ClusterConnectionImpl.java:485)
at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
* [Thread-10 (HornetQ-server-HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12-19420919)] 27-Mar 15:8:30,843 WARNING [ClusterConnectionImpl] Unable to announce backup, retrying
java.lang.NullPointerException
at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl$2.run(ClusterConnectionImpl.java:485)
at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
* [Thread-11 (HornetQ-server-HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12-19420919)] 27-Mar 15:8:31,359 WARNING [ClusterConnectionImpl] Unable to announce backup, retrying
java.lang.NullPointerException
at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl$2.run(ClusterConnectionImpl.java:485)
at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
* [Thread-12 (HornetQ-server-HornetQServerImpl::serverUUID=80d285b6-96ba-11e2-9528-817a37231a12-19420919)] 27-Mar 15:8:32,31 WARNING [ClusterConnectionImpl] Unable to announce backup, retrying
java.lang.NullPointerException
at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl$2.run(ClusterConnectionImpl.java:485)
at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
* [hornetq-shutdown-thread] 27-Mar 15:8:32,109 INFO [HornetQBootstrapServer] Stopping HornetQ Server...
I think if they are in the same machine, you should change host of connectors and acceptors to "localhost". (But i think your configs are OK and it's a network (UDP) problem)
In your hornetq-jms.xml, you should add the following attribute to ConnectionFactory element.
<discovery-group-ref discovery-group-name="my-discovery-group"/>