Moving a zookeeper server to new physical box - apache-zookeeper

I have a zookeeper ensemble with 3 server with ensemble cfg as
ensemble configuration
server.1=zoo01:2888:3888
server.2=zoo02:2888:3888
server.3=zoo03:2888:3888
Note zoo01 is a entry in etc/hosts file.
I have to move zoo01 to a different physical box, but I want to retain it's identity (myid)
Is this possible ? If so, how can I achieve this. I can take a very short down time if required

Related

What to do after one node in zookeeper cluster fails?

According to https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkMulitServerSetup
Cross Machine Requirements For the ZooKeeper service to be active,
there must be a majority of non-failing machines that can communicate
with each other. To create a deployment that can tolerate the failure
of F machines, you should count on deploying 2xF+1 machines. Thus, a
deployment that consists of three machines can handle one failure, and
a deployment of five machines can handle two failures. Note that a
deployment of six machines can only handle two failures since three
machines is not a majority. For this reason, ZooKeeper deployments are
usually made up of an odd number of machines.
To achieve the highest probability of tolerating a failure you should
try to make machine failures independent. For example, if most of the
machines share the same switch, failure of that switch could cause a
correlated failure and bring down the service. The same holds true of
shared power circuits, cooling systems, etc.
My question is:
What should we do after we identified a node failure within Zookeeper cluster to make the cluster 2F+1 again? Do we need to restart all the zookeeper nodes? Also the clients connects to Zookeeper cluster, suppose we used DNS name and the recovered node using same DNS name.
For example:
10.51.22.89 zookeeper1
10.51.22.126 zookeeper2
10.51.23.216 zookeeper3
if 10.51.22.89 dies and we bring up 10.51.22.90 as zookeeper1, and all the nodes can identify this change.
If you connect 10.51.22.90 as zookeeper1 (with the same myid file and configuration as 10.51.22.89 had before) and the data dir is empty, the process will connect to current leader (zookeeper2 or zookeeper3) and copy snapshot of the data. After successful initialization the node will inform rest of the cluster nodes and you have 2F+1 again.
Try this yourself, having tail -f on log files. It won't hurt the cluster and you will learn a lot on zookeeper internals ;-)

specification of list of servers in zookeeper cluster

Is the second component of the list of zookeepers just a sequence numbering or does it have to correspond to the myId of the instance? For instance if I set up a new node with myId=4 and take deprovision the instance with myId=3 of the existing nodes, would my config have to look like the following:
tickTime=2000
dataDir=/usr/src/zookeeper
clientPort=2181
initLimit=10
syncLimit=5
server.1=192.168.1.2:2888:3888
server.2=192.168.1.3:2888:3888
server.4=192.168.1.5:2888:3888
It corresponds to the myId of the instance.
Every machine that is part of the ZooKeeper ensemble should know about
every other machine in the ensemble. You accomplish this with the
series of lines of the form server.id=host:port:port. The parameters
host and port are straightforward. You attribute the server id to each
machine by creating a file named myid, one for each server, which
resides in that server's data directory, specified by the
configuration file parameter dataDir.
Source: http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServerSetup

Zooker Failover Strategies

We are young team building an applicaiton using Storm and Kafka.
We have common Zookeeper ensemble of 3 nodes which is used by both Storm and Kafka.
I wrote a test case to test zooker Failovers
1) Check all the three nodes are running and confirm one is elected as a Leader.
2) Using Zookeeper unix client, created a znode and set a value. Verify the values are reflected on other nodes.
3) Modify the znode. set value in one node and verify other nodes have the change reflected.
4) Kill one of the worker nodes and make sure the master/leader is notified about the crash.
5) Kill the leader node. Verify out of other two nodes, one is elected as a leader.
Do i need i add any more test case? additional ideas/suggestion/pointers to add?
From the documentation
Verifying automatic failover
Once automatic failover has been set up, you should test its operation. To do so, first locate the active NameNode. You can tell which node is active by visiting the NameNode web interfaces -- each node reports its HA state at the top of the page.
Once you have located your active NameNode, you may cause a failure on that node. For example, you can use kill -9 to simulate a JVM crash. Or, you could power cycle the machine or unplug its network interface to simulate a different kind of outage. After triggering the outage you wish to test, the other NameNode should automatically become active within several seconds. The amount of time required to detect a failure and trigger a fail-over depends on the configuration of ha.zookeeper.session-timeout.ms, but defaults to 5 seconds.
If the test does not succeed, you may have a misconfiguration. Check the logs for the zkfc daemons as well as the NameNode daemons in order to further diagnose the issue.
more on setting up automatic failover

Should Zookeeper cluster be assigned to only one SolrCloud cluster

I wonder about the best strategy with regard to Zookeeper and SolrCloud clusters. Should one Zookeeper cluster be dedicated per SolrCloud cluster or multiple SolrCloud clusters can share one Zookeeper cluster? I guess the former must be a very safe approach but I am wondering if the 2nd option is fine as well.
As far as I know, SolrCloud use Zookeeper to share cluster state (up, down nodes) and to load core shared configurations (solrconfig.xml, schema.xml, etc...) on boot. If you have clients based on SolrJ's CloudSolrServer implementation than they will mostly perform reads of the cluster state.
In this respect, I think it should be fine to share the same ZK ensemble. Many reads and few writes, this is exactly what ZK is designed for.
SolrCloud puts very little load on a ZooKeeper cluster, so if it's purely a performance consideration then there's no problem. It would probably be a waste of resources to have one ZK cluster per SolrCloud if they're all on a local network. Just make sure the ZooKeeper configurations are in separate ZooKeeper paths. For example, using -zkHost :/ for one SolrCloud, and replace "path1" with "path2" for the second one will put the solr files in separate paths within ZooKeeper to ensure they don't conflict.
Note that the ZK cluster should be well-configured and robust, because if it goes down then none of the SolrClouds are going to be able to respond to changes in node availability or state. (If SolrCloud leader is lost, not connectable, or if a node enters recovering state, etc.)

Going from a single zookeeper server to a clustered configuration

I have a single server that I now want to replicate and go for higher availability. One of the elements in my software stack if Zookeeper, so it seems natural to go to a clustered configuration on it.
However, I have data on my single server, and I couldn't find any guide on going to a clustered setup. I tried setting up two independent instances and then going to a clustered configuration, but only data present on the elected master was preserved.
So, how can I safely go from a single server setup to a clustered setup without losing data?
If you go from 1 server straight to 3 servers, you may lose data, as the 2 new servers are sufficient to form a quorum, and elect one of themselves as a leader, ignoring the old server, and losing all data on that machine.
If you grow your cluster from 1 to 2, when the two servers start up, then a quorum can't form without the old server being involved, and data will not be lost. When the cluster finishes starting, all data will be synced to both servers.
Then you can grow your cluster from 2 to 3, and again a quorum can't form without at least 1 server that has a copy of the database, and again when the cluster finishes starting all data will be synced to all three servers.