SolrCloud role assignment - apache-zookeeper

I have a SolrCloud with 3 shards, 3 replicas and a Zookeeper ensemble with 5 members.
Replica 2 is being retired, the root device is EBS backed and it has an attached EBS volume. I'm assuming on restart it will migrate to new hardware with new public and private IPs.
I'm also assuming I'll have to restart all the shards and replicas. What's the best way to do this to assign the new replica to the same slot as the old replica? Aren't the shard / replica roles assigned to each host on the very first SolrCloud startup and aren't those assignments stored in Zookeeper?

replica2 restarted with new public and private IPs as expected. I stopped Tomcat on all SOLR hosts and restarted them in the normal order
shard1
shard1
shard3
replica1
replica2
replica3
This did not work as replica2 assigned itself to shard1 on repeated SolrCloud restarts. The shard and replica assignments are (as I thought) maintained in the binary files under the version-2 directory on every Zookeeper host. The following was successful:
stop Tomcat on all SOLR hosts
stop all Zookeeper hosts
delete the version-2 directory on all Zookeeper hosts
start all Zookeeper hosts
re-upload the SOLR conf directory using the CLI tool
start all SOLR hosts in above order
This produced the correct assignments.

Related

mongosync fails while synchronising data between source and destination sharded cluster

I am trying to synchronize data from source cluster to the destination cluster using mongosync however hitting the below error.
{"level":"fatal","serverID":"891fbc43","mongosyncID":"myShard_0","error":"(InvalidOptions) The 'clusteredIndex' option is not supported for namespace mongosync_reserved_for_internal_use.lastWriteStates.footballdb","time":"2022-12-16T02:21:15.209841065-08:00","message":"Error during replication"}
source cluster details:-
3 shards where each shard is a single node replicaset
Dataset: one database with one collection sharded across shards
destination cluster details:-
3 shards where each shard is a single node replicaset
No user data
All the mongosync pre-reqs including RBAC are verified successfully.
I am unable to diagnose the error here - The 'clusteredIndex' option is not supported for namespace mongosync_reserved_for_internal_use.lastWriteStates.footballdb
I tried the same usecase with same dataset for N node replicaset source(without shard) and destination cluster and the synchronisation worked fine.

Can etcd detect problems and elect leaders for other clusters?

To my knowledge, etcd uses Raft as a consensus and leader selection algorithm to maintain a leader that is in charge of keeping the ensemble of etcd nodes in sync with data changes within the etcd cluster. Among other things, this allows etcd to recover from node failures in the cluster where etcd runs.
But what about etcd managing other clusters, i.e. clusters other than the one where etcd runs?
For example, say we have an etcd cluster and separately, a DB (e.g. MySQL or Redis) cluster comprised of master (read and write) node/s and (read-only) replicas. Can etcd manage node roles for this other cluster?
More specifically:
Can etcd elect a leader for clusters other than the one running etcd and make that information available to other clusters and nodes?
To make this more concrete, using the example above, say a master node in the MySQL DB cluster mentioned in the above example goes down. Note again, that the master and replicas for the MySQL DB are running on a different cluster from the nodes running and hosting etcd data.
Does etcd provide capabilities to detect this type of node failures on clusters other than etcd's automatically? If yes, how is this done in practice? (e.g. MySQL DB or any other cluster where nodes can take on different roles).
After detecting such failure, can etcd re-arrange node roles in this separate cluster (i.e. designate new master and replicas), and would it use the Raft leader selection algorithm for this as well?
Once it has done so, can etcd also notify client (application) nodes that depend on this DB and configuration accordingly?
Finally, does any of the above require Kubernetes? Or can etcd manage external clusters all by its own?
In case it helps, here's a similar question for Zookeper.
etcd's master election is strictly for electing a leader for etcd itself.
Other clusters, however can use a distributed strongly-consistent key-value store (such as etcd) to implement their own failure detection, leader election and to allow clients of that cluster to respond.
Etcd doesn't manage clusters other than its own. It's not magic awesome sauce.
If you want to use etcd to manage a mysql cluster, you will need a component which manages the mysql nodes and stores cluster state in etcd. Clients can watch for changes in that cluster state and adjust.

Restarting NiFi Node Joins Cluster as New Node

I am currently running Apache NiFi as a StatefulSet on Kubernetes. I'm testing to see how the cluster recovers if I kill a pod but am experiencing a problem when the pod (NiFi node) rejoins the cluster.
The node will rejoin as an additional node instead of appearing as it's original identity. For example, if I have a 3 node NiFi cluster and kill and restart one pod/NiFi node I will end up with a 4 node cluster with one disconnected.
Before:
After:
I believe that the NiFi node is identified somehow in a config file which isn't persisting when it is killed. So far I am using persistent volumes to persist the following config files:
state-management.xml
authorizers.xml
I haven't persisted nifi.properties (it is dynamically generated on startup a
and I can't see anything in there that could uniquely identify the node).
So I guess, the question is how is the node uniquely identified to the server and where is it stored?
EDIT: I'm using an external Zookeeper.
Thank you in advance,
Harry
Each node stores the state of the cluster in the local state manager which by default would be written to a write-ahead-log in nifi-home/state/local. Most likely you are losing the state/local directory on the node being restarted.

Zookeeper restart always falls when restarting

I have a 3 node master setup for marathon and mesos masters, everything is fine until I reboot one of the nodes or try restarting zookeeper.
The issue is somehow correlating to the internal DB zookeeper is using, when I delete /var/lib/zookeeper/version2/* then zookeeper comes up fine and re syncs with the other nodes.
I am using the current marathon mesos packages with zookeeper from the mesosphere repo on rhel7.
Does anybody know how to fix this? The Filesystem below does not change anything, I tried with xfs and ext4.
Another question would be to somehow backup the marathon apps - if I reboot let's say all nodes in the quorum with deleting the zookeeper DB everything is lost.

Mongos + AutoScaling

We're currently running a cluster of application servers that are under an autoscaling group in AWS. Each of this application servers has it's own instance of mongos running, so the application just connects to localhost to gain access to the MongoDB cluster.
I read in the documentation that the balancer is a process running under mongos. What happens if the server is scaled down and the balancer is running in that server? Would it be possible to say that only this mongos instance at this server ip will run the balancer?
Thanks
Yes the documentation explicitly states that every mongos has a balancer process which is associated with it which is responsible for distributing data (evenly) in a sharded collection across different shard. By default 'balancer' process is enabled. Optionally it can be disabled.
Hence,
If a server is scaled down 'balancer' will still be running on server with mongos
Only servers that run mongos instance will have 'balancer' running.