Fail back from slave to master in Artemis colocated HA config is not working - activemq-artemis

I have a 4 node Artemis 2.10 cluster on Linux configured for async IO journal, replication and colocated HA servers. I have been testing the failover and fail back but its not working. When shutting down one server (A) in an HA pair the colocated backup on the second server (B) will activate and correctly processes messages intended for the original server A. I modified the ColocatedFailoverExample from Artemis examples to check this and it is working. The problem is that when I bring up the original server A it starts, becomes live, registers acceptors and addresses and joins the cluster but a number of things are wrong:
looking at the artemis console for server A there is no longer a colocated_backup_1 listed to show that it is providing a colocated backup to server B.
Server A coming back up causes the server that was failed over to, server B, to go totally offline and only function as a backup. The master it was providing stops and no longer displays addresses or accepters in the UI.
Although it says its running as a backup Server B also doesn't have the colocated_backup_1 object shown in its console either.
Server B seems to be part of the cluster still but in the UI there is no green master node shown for it anymore - just a red slave node circle. Client connections to server B fail, most likely because the colocated master that was running on it was shutdown.
In the Artemis UI for server B under node-name > cluster-connections > cluster-name the attributes for the cluster show no nodes in the node array and the node id is wrong. The node id is now the same as the id of the master broker on server A. Its almost as if the information for the colocated_backup_01 broker on server B that was running before failover has replaced the server B live server and there's now only one broker on server B - the colocated backup.
This all happens immediately when I bring up server A. The UI for server B immediately refreshes at that time and the colocated_backup_01 entry disappears and the acceptors and addresses links under what was the master broker name for server B just disappear. The cluster attributes page will refresh and the 3 nodes that were listed there in the "nodes" attribute disappear and the "nodes" attribute is empty.
Now if I take down server B instead and bring it up, the roles between the server pair are swapped. Now server B becomes live again and is shown as a master node in the topology (but still no colocated_backup_01 in the UI) and the server A master broker goes offline and server A reconfigures as a backup/slave node. Whether server A or B is in this "offline", broker backup-only state the value of the Node property in the cluster attributes shown in the UI is the same value for both. Prior to doing the failover test they had different node ids which makes sense but the colocated_backup_01 backup on each did share the node id of the node it was backing up.
To summarize what I think is happening: The master that is coming backup after failover seems to trigger its partner backup node to come up as a backup but to also stop being a master node itself. From that point the pair colocation stops and there is only ever one live master between the two servers instead of one on each. The fail-back feature seems to be not only failing the original master back but shutting down the colocated master on that backup as well. Almost as if the topology between the two was configured to be colocated and its treating it a the standard two-node HA config where one server is the master and one is the slave.
The only way to fix the issue with the pair is to stop both servers and
remove everything under the broker "data" directory on both
boxes before starting them again. Just removing the colocated backup files on each machine isn't enough - everything under "data" has to go. After doing this they come up correctly and both are live masters and they pair up as HA colocated backups for each other again.
Here is the ha-policy section of the broker.xml file which is the same for all 4 servers:
<ha-policy>
<replication>
<colocated>
<backup-request-retries>5</backup-request-retries>
<backup-request-retry-interval>5000</backup-request-retry-interval>
<max-backups>1</max-backups>
<request-backup>true</request-backup>
<backup-port-offset>20</backup-port-offset>
<master>
<check-for-live-server>true</check-for-live-server>
<vote-on-replication-failure>true</vote-on-replication-failure>
</master>
<slave>
<max-saved-replicated-journals-size>-1</max-saved-replicated-journals-size>
<allow-failback>true</allow-failback>
<vote-on-replication-failure>true</vote-on-replication-failure>
<restart-backup>false</restart-backup>
</slave>
</colocated>
</replication>
</ha-policy>

Related

Redis master wipes out Redis slave data on restart

Sorry this is my first time working with Redis. I have a redis master deployment and a redis slave deployment (via K8s). The replication from master to slave is working as expected. However, when I kill the master altogether and bring it back up again, the sync wipes out the data of slave as well.
I have tried enabling appendonly on either and both but had no luck.
Question # 1: How can I preserve the data in slave when the master node comes back to life?
Question # 2: Is it a practice to sync data back from slave into master?
Yes, the correct practice would be to promote the slave to master and then slave the restarted node to it to sync the state. If you bring up an empty node that is declared as the master, the slave will faithfully replicate whatever is - or isn't - on it.
You can configure periodic saving to disk, so that you can restart a master node and have it load the state as of the last save to disk. You can also manually cause a save to disk via the SAVE command. See the persistence chapter in the manual. If you SAVE to disk, then immediately restart the master node, the state as saved to disk will be loaded back up. Any writes that occur between the last SAVE and node shutdown will be lost.
Along these lines, Redis HA is often done with Redis Sentinel, which manages auto-promotion and discovery of master nodes within a replicated cluster, so that the cluster can survive and auto-heal from the loss of the current master. This lets slaves replicate from the active master, and on the loss of the master (or a network partition that causes a quorum of sentinels to lose visibility to the master), the Sentinel quorum will elect a new master and coordinate the re-slaving of other nodes to it for ensure uptime. This is an AP system, as Redis replication is eventually consistent, and therefore does have the potential to lose writes which are not replicated to a slave or flushed to disk before node shutdown.

Avoid split brain in PostgreSQL without 3rd party tool

I have 2 PostgreSQL servers and I have set up streaming replication between them.
I have built a shell script which will ping to master server every minute and will promote the slave as master when master won't respond. I am using rh-postgresql95 and 3rd party tools are not working with this version.
My JDBC connection string has comma separated nodes with targetServerType=master like below
jdbc:postgresql://node1,node2/accounting?targetServerType=master.
I just want to know how can I avoid split brain scenario if the slave is promoted to master and old master also comes up somehow?
or
Is there anyway so that old master never comes up automatically?
EDIT
node1 is master and node2 is the slave in my JDBC connection string.
I stopped postgres service on master and promoted the slave to New Master. In this case, service was pointing to the new master.
Then I restarted postgres service on old master and service started pointing to the old master(node1 is old master's ip and it comes first in JDBC connection string.).
So, I didn't get split brain issue, but this scenario will lead to a data inconsistency.
As an idea, your ping script could check if both servers think they are masters:
select pg_is_in_recovery();
A server that is not in recovery is a master. Then you could check the last received WAL's number:
select pg_last_wal_receive_lsn();
The server with the highest LSN is the server that was promoted last. You could then shut down the other server.
If you change your mind about third party options, have a look at the PostgresSQL wiki.

What to do after one node in zookeeper cluster fails?

According to https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkMulitServerSetup
Cross Machine Requirements For the ZooKeeper service to be active,
there must be a majority of non-failing machines that can communicate
with each other. To create a deployment that can tolerate the failure
of F machines, you should count on deploying 2xF+1 machines. Thus, a
deployment that consists of three machines can handle one failure, and
a deployment of five machines can handle two failures. Note that a
deployment of six machines can only handle two failures since three
machines is not a majority. For this reason, ZooKeeper deployments are
usually made up of an odd number of machines.
To achieve the highest probability of tolerating a failure you should
try to make machine failures independent. For example, if most of the
machines share the same switch, failure of that switch could cause a
correlated failure and bring down the service. The same holds true of
shared power circuits, cooling systems, etc.
My question is:
What should we do after we identified a node failure within Zookeeper cluster to make the cluster 2F+1 again? Do we need to restart all the zookeeper nodes? Also the clients connects to Zookeeper cluster, suppose we used DNS name and the recovered node using same DNS name.
For example:
10.51.22.89 zookeeper1
10.51.22.126 zookeeper2
10.51.23.216 zookeeper3
if 10.51.22.89 dies and we bring up 10.51.22.90 as zookeeper1, and all the nodes can identify this change.
If you connect 10.51.22.90 as zookeeper1 (with the same myid file and configuration as 10.51.22.89 had before) and the data dir is empty, the process will connect to current leader (zookeeper2 or zookeeper3) and copy snapshot of the data. After successful initialization the node will inform rest of the cluster nodes and you have 2F+1 again.
Try this yourself, having tail -f on log files. It won't hurt the cluster and you will learn a lot on zookeeper internals ;-)

Zooker Failover Strategies

We are young team building an applicaiton using Storm and Kafka.
We have common Zookeeper ensemble of 3 nodes which is used by both Storm and Kafka.
I wrote a test case to test zooker Failovers
1) Check all the three nodes are running and confirm one is elected as a Leader.
2) Using Zookeeper unix client, created a znode and set a value. Verify the values are reflected on other nodes.
3) Modify the znode. set value in one node and verify other nodes have the change reflected.
4) Kill one of the worker nodes and make sure the master/leader is notified about the crash.
5) Kill the leader node. Verify out of other two nodes, one is elected as a leader.
Do i need i add any more test case? additional ideas/suggestion/pointers to add?
From the documentation
Verifying automatic failover
Once automatic failover has been set up, you should test its operation. To do so, first locate the active NameNode. You can tell which node is active by visiting the NameNode web interfaces -- each node reports its HA state at the top of the page.
Once you have located your active NameNode, you may cause a failure on that node. For example, you can use kill -9 to simulate a JVM crash. Or, you could power cycle the machine or unplug its network interface to simulate a different kind of outage. After triggering the outage you wish to test, the other NameNode should automatically become active within several seconds. The amount of time required to detect a failure and trigger a fail-over depends on the configuration of ha.zookeeper.session-timeout.ms, but defaults to 5 seconds.
If the test does not succeed, you may have a misconfiguration. Check the logs for the zkfc daemons as well as the NameNode daemons in order to further diagnose the issue.
more on setting up automatic failover

Going from a single zookeeper server to a clustered configuration

I have a single server that I now want to replicate and go for higher availability. One of the elements in my software stack if Zookeeper, so it seems natural to go to a clustered configuration on it.
However, I have data on my single server, and I couldn't find any guide on going to a clustered setup. I tried setting up two independent instances and then going to a clustered configuration, but only data present on the elected master was preserved.
So, how can I safely go from a single server setup to a clustered setup without losing data?
If you go from 1 server straight to 3 servers, you may lose data, as the 2 new servers are sufficient to form a quorum, and elect one of themselves as a leader, ignoring the old server, and losing all data on that machine.
If you grow your cluster from 1 to 2, when the two servers start up, then a quorum can't form without the old server being involved, and data will not be lost. When the cluster finishes starting, all data will be synced to both servers.
Then you can grow your cluster from 2 to 3, and again a quorum can't form without at least 1 server that has a copy of the database, and again when the cluster finishes starting all data will be synced to all three servers.