We trying to use ActiveMQ Artemis within our Docker container, but one scenario I am not able to get working. This is probably due to some bad configuration. Any help is appreciated (e.g. example configuration).
Installation:
Docker instance containing an embedded ActiveMQ Artemis broker and a web application
The broker has clustering, HA and share store defined
We start 3 docker instances
Scenario:
Add 200 messages to the queue in one of the web application
I can see in the logging that all docker instance are handling the messages (this is as expected)
Kill one of the docker instances
Outcome of scenario:
Not all messages are being processed (every message on the queue should result to item in the database)
When restarting the killed docker instance will not result in every message being processed.
Expected outcome:
When a node is down that a other node is picking up the messages
When a node comes online again that is help picking up the messages
Questions:
HA scale down probably does not work because I kill the server.
Does this only work with persistence on the file system or should this also work in an RDBMS?
Configuration:
Below is the configuration which is in every Docker instance, only the host name (project-two) and the HA settings (master/slave) differ per docker instance. It could be a typo in below because I removed the customer specific names in the configuration.
<configuration
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="urn:activemq"
xsi:schemaLocation="urn:activemq /schema/artemis-server.xsd">
<core xmlns="urn:activemq:core">
<security-enabled>true</security-enabled>
<jmx-management-enabled>true</jmx-management-enabled>
<management-address>activemq.management</management-address>
<persistence-enabled>true</persistence-enabled>
<store>
<database-store>
<jdbc-driver-class-name>${artemis.databaseDriverClass}</jdbc-driver-class-name>
<jdbc-connection-url>${artemis.databaseConnectionUrl}</jdbc-connection-url>
<jdbc-user>${artemis.databaseUsername}</jdbc-user>
<jdbc-password>${artemis.databasePassword}</jdbc-password>
<bindings-table-name>ARTEMIS_BINDINGS</bindings-table-name>
<message-table-name>ARTEMIS_MESSAGE</message-table-name>
<page-store-table-name>ARTEMIS_PS</page-store-table-name>
<large-message-table-name>ARTEMIS_LARGE_MESSAGES</large-message-table-name>
<node-manager-store-table-name>ARTEMIS_NODE_MANAGER</node-manager-store-table-name>
</database-store>
</store>
<connectors>
<connector name="netty-connector">tcp://project-two:61617</connector>
</connectors>
<acceptors>
<acceptor name="netty-acceptor">tcp://project-two:61617</acceptor>
</acceptors>
<!-- cluster information -->
<broadcast-groups>
<broadcast-group name="my-broadcast-group">
<group-address>231.7.7.7</group-address>
<group-port>9876</group-port>
<broadcast-period>2000</broadcast-period>
<connector-ref>netty-connector</connector-ref>
</broadcast-group>
</broadcast-groups>
<discovery-groups>
<discovery-group name="my-discovery-group">
<group-address>231.7.7.7</group-address>
<group-port>9876</group-port>
<refresh-timeout>10000</refresh-timeout>
</discovery-group>
</discovery-groups>
<cluster-connections>
<cluster-connection name="my-cluster">
<connector-ref>netty-connector</connector-ref>
<retry-interval>500</retry-interval>
<use-duplicate-detection>true</use-duplicate-detection>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<discovery-group-ref discovery-group-name="my-discovery-group"/>
</cluster-connection>
</cluster-connections>
<security-settings>
</security-settings>
<!-- Settings for the redelivery -->
<address-settings>
<address-setting match="#">
<redelivery-delay>5000</redelivery-delay>
<max-delivery-attempts>2</max-delivery-attempts>
</address-setting>
</address-settings>
<addresses>
</addresses>
<ha-policy>
<shared-store>
<slave/>
</shared-store>
</ha-policy>
</core>
</configuration>
Docker makes deploying microservice applications very easy but it has some limitations for a production environment. I would take a look to the ArtemisCloud.io operator that provide a way to deploy the Apache ActiveMQ Artemis Broker on Kubernetes.
The ArtemisCloud.io operator also supports message migration for brokers scale down, see https://artemiscloud.io/docs/tutorials/scaleup_and_scaledown/
Related
I have a master slave setup with 1 master and 2 slaves. When I kill the master, one of the slave tries to become master but fails with following exception:
2022/03/08 16:13:28.746 | mb | ERROR | 1-156 | o.a.a.a.c.server | | AMQ224000: Failure in initialisation: java.lang.IndexOutOfBoundsException: length(32634) exceeds src.readableBytes(32500) where src is: UnpooledHeapByteBuf(ridx: 78, widx: 32578, cap: 32578/32578)
at io.netty.buffer.AbstractByteBuf.checkReadableBounds(AbstractByteBuf.java:643)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1095)
at org.apache.activemq.artemis.core.message.impl.CoreMessage.reloadPersistence(CoreMessage.java:1207)
at org.apache.activemq.artemis.core.message.impl.CoreMessagePersister.decode(CoreMessagePersister.java:85)
at org.apache.activemq.artemis.core.message.impl.CoreMessagePersister.decode(CoreMessagePersister.java:28)
at org.apache.activemq.artemis.spi.core.protocol.MessagePersister.decode(MessagePersister.java:120)
at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.decodeMessage(AbstractJournalStorageManager.java:1336)
at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.lambda$loadMessageJournal$1(AbstractJournalStorageManager.java:1035)
at org.apache.activemq.artemis.utils.collections.SparseArrayLinkedList$SparseArray.clear(SparseArrayLinkedList.java:114)
at org.apache.activemq.artemis.utils.collections.SparseArrayLinkedList.clearSparseArrayList(SparseArrayLinkedList.java:173)
at org.apache.activemq.artemis.utils.collections.SparseArrayLinkedList.clear(SparseArrayLinkedList.java:227)
at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.loadMessageJournal(AbstractJournalStorageManager.java:990)
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java:3484)
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java:3149)
at org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:325)
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:4170)
I'm also observing a lot of messages like this one:
2022/03/08 16:13:28.745 | AMQ224009: Cannot find message 36,887,402,768
2022/03/08 16:13:28.745 | AMQ224009: Cannot find message 36,887,402,768
Master setup:
<ha-policy>
<replication>
<master>
<check-for-live-server>true</check-for-live-server>
</master>
</replication>
</ha-policy>
<connectors>
<connector name="connector-server-0">tcp://172.16.134.51:62616</connector>
<connector name="connector-server-1">tcp://172.16.134.52:62616</connector>
<connector name="connector-server-2">tcp://172.16.134.28:62616</connector>
</connectors>
<acceptors>
<acceptor name="netty-acceptor">tcp://172.16.134.51:62616</acceptor>
<acceptor name="invm">vm://0</acceptor>
</acceptors>
<cluster-connections>
<cluster-connection name="my-cluster">
<connector-ref>connector-server-0</connector-ref>
<retry-interval>500</retry-interval>
<use-duplicate-detection>true</use-duplicate-detection>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<static-connectors>
<connector-ref>connector-server-1</connector-ref>
<connector-ref>connector-server-2</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
Slave 1 setup:
<ha-policy>
<replication>
<slave>
<allow-failback>true</allow-failback>
</slave>
</replication>
</ha-policy>
<connectors>
<connector name="connector-server-0">tcp://172.16.134.51:62616</connector>
<connector name="connector-server-1">tcp://172.16.134.52:62616</connector>
<connector name="connector-server-2">tcp://172.16.134.28:62616</connector>
</connectors>
<acceptors>
<acceptor name="netty-acceptor">tcp://172.16.134.52:62616</acceptor>
<acceptor name="invm">vm://0</acceptor>
</acceptors>
<cluster-connections>
<cluster-connection name="cluster">
<connector-ref>connector-server-1</connector-ref>
<retry-interval>500</retry-interval>
<use-duplicate-detection>true</use-duplicate-detection>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<static-connectors>
<connector-ref>connector-server-0</connector-ref>
<connector-ref>connector-server-2</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
Slave 2
<ha-policy>
<replication>
<slave>
<allow-failback>true</allow-failback>
</slave>
</replication>
</ha-policy>
<connectors>
<connector name="connector-server-0">tcp://172.16.134.51:62616</connector>
<connector name="connector-server-1">tcp://172.16.134.52:62616</connector>
<connector name="connector-server-2">tcp://172.16.134.28:62616</connector>
</connectors>
<acceptors>
<acceptor name="netty-acceptor">tcp://172.16.134.28:62616</acceptor>
<acceptor name="invm">vm://0</acceptor>
</acceptors>
<cluster-connections>
<cluster-connection name="cluster">
<connector-ref>connector-server-2</connector-ref>
<retry-interval>500</retry-interval>
<use-duplicate-detection>true</use-duplicate-detection>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<static-connectors>
<connector-ref>connector-server-0</connector-ref>
<connector-ref>connector-server-1</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
Could you please tell me what is not correct in my setup?
I'm using activemq-artemis version 2.17.0
I recommend you upgrade to the latest release and retry.
Also, I recommend simplifying your configuration to use just a single live/backup pair. The broker will only ever replicate data to one other broker. The second backup will be completely idle until either the master or current backup fails.
Lastly, using a single live/backup pair with the replication ha-policy is very dangerous due to the possibility of split-brain. I strongly recommend that you use shared-storage or once you move to the latest release you configure pluggable quorum voting with ZooKeeper to mitigate the risk of split-brain.
I prepared java application with embedded Artemis broker. It is single jar that contains all dependencies.
I use org.apache.activemq:artemis-server:2.10.1 and I start broker in following way:
new EmbeddedActiveMQ().setConfigResourcePath( config ).start();
where config is a URL to configuration file.
I use one config file for master and one config file for slave.
master:
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration xmlns="urn:activemq" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:activemq /schema/artemis-server.xsd">
<core xmlns="urn:activemq:core">
<persistence-enabled>true</persistence-enabled>
<security-enabled>false</security-enabled>
<acceptors>
<acceptor name="netty-acceptor">tcp://127.0.0.1:61617</acceptor>
</acceptors>
<bindings-directory>artemis/bindings</bindings-directory>
<journal-directory>artemis/journal</journal-directory>
<address-settings>
<address-setting match="#">
<dead-letter-address>DLQ</dead-letter-address>
</address-setting>
</address-settings>
<addresses>
<address name="DLQ">
<anycast>
<queue name="DLQ"/>
</anycast>
</address>
</addresses>
<connectors>
<!-- Connector used to be announced through cluster connections and notifications -->
<connector name="artemis">tcp://127.0.0.1:61617</connector>
</connectors>
<cluster-user>artemis-cluster</cluster-user>
<cluster-password>artemis-cluster</cluster-password>
<broadcast-groups>
<broadcast-group name="bg-group1">
<group-address>231.7.7.7</group-address>
<group-port>9876</group-port>
<broadcast-period>5000</broadcast-period>
<connector-ref>artemis</connector-ref>
</broadcast-group>
</broadcast-groups>
<discovery-groups>
<discovery-group name="dg-group1">
<group-address>231.7.7.7</group-address>
<group-port>9876</group-port>
<refresh-timeout>10000</refresh-timeout>
</discovery-group>
</discovery-groups>
<cluster-connections>
<cluster-connection name="my-cluster">
<address></address>
<connector-ref>artemis</connector-ref>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<discovery-group-ref discovery-group-name="dg-group1"/>
</cluster-connection>
</cluster-connections>
<ha-policy>
<replication>
<master/>
</replication>
</ha-policy>
</core>
</configuration>
Slave:
<core xmlns="urn:activemq:core">
<persistence-enabled>true</persistence-enabled>
<security-enabled>false</security-enabled>
<acceptors>
<acceptor name="netty-acceptor">tcp://127.0.0.1:61618</acceptor>
</acceptors>
<bindings-directory>artemis/bindings</bindings-directory>
<journal-directory>artemis/journal</journal-directory>
<address-settings>
<address-setting match="#">
<dead-letter-address>DLQ</dead-letter-address>
</address-setting>
</address-settings>
<addresses>
<address name="DLQ">
<anycast>
<queue name="DLQ"/>
</anycast>
</address>
</addresses>
<connectors>
<!-- Connector used to be announced through cluster connections and notifications -->
<connector name="artemis">tcp://127.0.0.1:61618</connector>
</connectors>
<cluster-user>artemis-cluster</cluster-user>
<cluster-password>artemis-cluster</cluster-password>
<broadcast-groups>
<broadcast-group name="bg-group1">
<group-address>231.7.7.7</group-address>
<group-port>9876</group-port>
<broadcast-period>5000</broadcast-period>
<connector-ref>artemis</connector-ref>
</broadcast-group>
</broadcast-groups>
<discovery-groups>
<discovery-group name="dg-group1">
<group-address>231.7.7.7</group-address>
<group-port>9876</group-port>
<refresh-timeout>10000</refresh-timeout>
</discovery-group>
</discovery-groups>
<cluster-connections>
<cluster-connection name="my-cluster">
<address></address>
<connector-ref>artemis</connector-ref>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<discovery-group-ref discovery-group-name="dg-group1"/>
</cluster-connection>
</cluster-connections>
<ha-policy>
<replication>
<slave/>
</replication>
</ha-policy>
</core>
It works properly on Windows 10 and Centos 7. It does not work on Oracle Linux 7. All machines use the same Java version.
java -version
openjdk version "11.0.9" 2020-10-20
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.9+11)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.9+11, mixed mode)
There is no any exception on Windows and Centos and I see following logs:
11:14:30.867 [Thread-0] INFO org.apache.activemq.artemis.core.server - AMQ221025: Replication: sending NIOSequentialFile...
and
11:14:33.911 [Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6#2a3c96e3)] INFO org.apache.activemq.artemis.core.server - AMQ221031: backup announced
Exactly the same jar and exaclty the same configuration file throws exception on Oracle Linux:
11:31:56.074 [Thread-0 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6#63b1d4fa)] WARN org.apache.activemq.artemis.core.client - AMQ212025: did not connect the cluster connection to other nodes
java.lang.IllegalStateException: java.lang.ClassNotFoundException: org.hornetq.core.remoting.impl.netty.NettyConnectorFactory
at org.apache.activemq.artemis.utils.ClassloadingUtil.newInstanceFromClassLoader(ClassloadingUtil.java:59) ~[broker.jar:?]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$2.run(ClientSessionFactoryImpl.java:996) ~[broker.jar:?]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$2.run(ClientSessionFactoryImpl.java:993) ~[broker.jar:?]
at java.security.AccessController.doPrivileged(Native Method) ~[?:?]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.instantiateConnectorFactory(ClientSessionFactoryImpl.java:993) ~[broker.jar:?]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.<init>(ClientSessionFactoryImpl.java:181) ~[broker.jar:?]
at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:798) ~[broker.jar:?]
at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:655) ~[broker.jar:?]
at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:637) ~[broker.jar:?]
at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl$4.run(ServerLocatorImpl.java:595) ~[broker.jar:?]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) ~[broker.jar:?]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) ~[broker.jar:?]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) ~[broker.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [broker.jar:?]
Caused by: java.lang.ClassNotFoundException: org.hornetq.core.remoting.impl.netty.NettyConnectorFactory
at jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) ~[?:?]
at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) ~[?:?]
at java.lang.ClassLoader.loadClass(ClassLoader.java:522) ~[?:?]
at org.apache.activemq.artemis.utils.ClassloadingUtil.newInstanceFromClassLoader(ClassloadingUtil.java:55) ~[broker.jar:?]
... 15 more
Master and slave both throw the same exception. Why Artemis wants to use org.hornetq.core.remoting.impl.netty.NettyConnectorFactory on Oracle Linux? I debugged my application on Windows and org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnectorFactory is used there. Where it can be configured? Maybe there is some configuration file in user directory or sth like that? Probably I can solve the issue by adding hornetq to dependencies but I prefer to use NettyConnectorFactory from Artemis. I do not understand why one machine uses different NettyConnectorFactory than the other two machines.
This is almost certainly being caused by a HornetQ broker on your network which is using the same group-address and group-port (i.e. 231.7.7.7:9876) to broadcast its connector information. You're using the default values so this isn't terribly surprising. You should either find this HornetQ broker and stop it or change your Artemis' cluster configuration to use a different group-address and/or group-port.
I want to setup two clusters for Dev and QA (or prod and staging) in same network. Each cluster consists of several nodes. Nodes will be reachable on network and I want to eliminate QA node clustering with Dev node. For this what should I changed, following is my config.
<connectors>
<connector name="my-con">tcp://ip:61617</connector>
</connectors>
<acceptors>
<acceptor name="my-acc">tcp://ip:61617</acceptor>
</acceptors>
<broadcast-groups>
<broadcast-group name="bg-stage">
<group-address>${udp-address:231.7.7.7}</group-address>
<group-port>9877</group-port>
<broadcast-period>100</broadcast-period>
<connector-ref>my-con</connector-ref>
</broadcast-group>
</broadcast-groups>
<discovery-groups>
<discovery-group name="dg-stage">
<group-address>${udp-address:231.7.7.7}</group-address>
<group-port>9877</group-port>
<refresh-timeout>10000</refresh-timeout>
</discovery-group>
</discovery-groups>
<cluster-connections>
<cluster-connection name="my-cluster">
<connector-ref>my-con</connector-ref>
<retry-interval>500</retry-interval>
<use-duplicate-detection>true</use-duplicate-detection>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<discovery-group-ref discovery-group-name="dg-stage"/>
</cluster-connection>
</cluster-connections>
Do I need to change different cluster-connection name, broadcast group and discovery groups?
Changing the group-port on the broadcast-group and discovery-group from the default 9876 to 9877 should be sufficient to isolate the clusters from each other. Changing the names in the configuration won't really impact anything other than your own ability to differentiate the configuration between the environments.
I am deploying a simple master/slave replication configuration within Kubernetes environment. They use static connectors. When I delete the master pod the slave successfully takes over duties, but when the master pod comes back up the slave pod does not terminate as live so I end up with two live servers. When this happens I notice they also form an internal bridge. I've ran the exact same configurations locally, outside of Kubernetes, and the slave successfully terminates and goes back to being a slave when the master comes back up. Any ideas as to why this is happening? I am using Artemis version 2.6.4.
master broker.xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<configuration xmlns="urn:activemq" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
<jms xmlns="urn:activemq:jms">
<queue name="jms.queue.acmadapter_to_acm_design">
<durable>true</durable>
</queue>
</jms>
<core xmlns="urn:activemq:core" xsi:schemaLocation="urn:activemq:core ">
<acceptors>
<acceptor name="netty-acceptor">tcp://0.0.0.0:61618</acceptor>
</acceptors>
<connectors>
<connector name="netty-connector-master">tcp://artemis-service-0.artemis-service.falconx.svc.cluster.local:61618</connector>
<connector name="netty-connector-backup">tcp://artemis-service2-0.artemis-service.falconx.svc.cluster.local:61618</connector>
</connectors>
<ha-policy>
<replication>
<master>
<!--we need this for auto failback-->
<check-for-live-server>true</check-for-live-server>
</master>
</replication>
</ha-policy>
<cluster-connections>
<cluster-connection name="my-cluster">
<connector-ref>netty-connector-master</connector-ref>
<static-connectors>
<connector-ref>netty-connector-master</connector-ref>
<connector-ref>netty-connector-backup</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
</core>
</configuration>
slave broker.xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<configuration xmlns="urn:activemq" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
<core xmlns="urn:activemq:core" xsi:schemaLocation="urn:activemq:core ">
<acceptors>
<acceptor name="netty-acceptor">tcp://0.0.0.0:61618</acceptor>
</acceptors>
<connectors>
<connector name="netty-connector-backup">tcp://artemis-service2-0.artemis-service.test.svc.cluster.local:61618</connector>
<connector name="netty-connector-master">tcp://artemis-service-0.artemis-service.test.svc.cluster.local:61618</connector>
</connectors>
<ha-policy>
<replication>
<slave>
<allow-failback>true</allow-failback>
<!-- not needed but tells the backup not to restart after failback as there will be > 0 backups saved -->
<max-saved-replicated-journals-size>0</max-saved-replicated-journals-size>
</slave>
</replication>
</ha-policy>
<cluster-connections>
<cluster-connection name="my-cluster">
<connector-ref>netty-connector-backup</connector-ref>
<static-connectors>
<connector-ref>netty-connector-master</connector-ref>
<connector-ref>netty-connector-backup</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
</core>
</configuration>
It's likely that the master's journal is being lost when it is stopped. The journal (specifically the server.lock file) holds the node's unique identifier (which is shared by the replicated slave). If the journal is lost when the node is dropped then it has no way to pair with its slave when it comes back up which would explain the behavior you're observing. Make sure the journal is on a persistent volume claim.
Also, it's worth noting that a single master/slave pair is not recommended due to the risk of split brain. In general, it's recommended that you have 3 master/slave pairs to establish a proper quorum.
I was using HorentQ in clustered mode in JBoss AS 7.1; However I wanted to see if I can cluster specific topics and queues only. I understood from this link that it is possible by configuring the address. However I am not able to find the address which works. Here is a snapshot of the doamin.xml; Where clustering is NOT working
<cluster-connections>
<cluster-connection name="my-cluster">
<address>mro</address>
<connector-ref>netty</connector-ref>
<discovery-group-ref discovery-group-name="dg-group1"/>
</cluster-connection>
</cluster-connections>
Here is how the Queue and topic is defined. Changing the address to jms makes everything clustered and it is working, but that is not what I want
<jms-queue name="MROQueue">
<entry name="mro/MROQueue"/>
<entry name="java:jboss/exported/mro/MROQueue"/>
</jms-queue>
<jms-topic name="MROTopic">
<entry name="mro/MROTopic"/>
<entry name="java:jboss/exported/mro/MROTopic"/>
</jms-topic>
I tried various wildcards in the address but nothing was working. So in the end got this working
<cluster-connections>
<cluster-connection name="my-cluster">
<address>jms.queue.cluster</address>
<connector-ref>netty</connector-ref>
<discovery-group-ref discovery-group-name="dg-group1"/>
</cluster-connection>
</cluster-connections>
and queues
<jms-queue name="cluster.MROQueue">
<entry name="cluster.MROQueue"/>
<entry name="java:jboss/exported/cluster.MROQueue"/>
</jms-queue>
<jms-topic name="cluster.MROTopic">
<entry name="cluster.MROTopic"/>
<entry name="java:jboss/exported/cluster.MROTopic"/>
</jms-topic>
The above made both my Queues and Topics to be clustered. To test I changed to
<cluster-connections>
<cluster-connection name="my-cluster">
<address>jms.queue.cluster3</address>
<connector-ref>netty</connector-ref>
<discovery-group-ref discovery-group-name="dg-group1"/>
</cluster-connection>
</cluster-connections>
And changed queues to
<jms-queue name="cluster2.MROQueue">
<entry name="cluster2.MROQueue"/>
<entry name="java:jboss/exported/cluster2.MROQueue"/>
</jms-queue>
and it did not cluster; So that seems to be the way at least in this version for specific clustering
From the official documentation:
address. Each cluster connection only applies to messages sent to an address that starts with this value. Note: this does not use wild-card matching.
https://docs.jboss.org/hornetq/2.3.0.Final/docs/user-manual/html/clusters.html#clusters.cluster-connections
What is not in the documentation is how the address is formed, it does say
All JMS queues and topic subscriptions are bound to addresses that
start with "jms."