Hierarchical quorums in Zookeeper - apache-zookeeper

I am trying to understand hierarchical quorums in Zookeeper. The documentation here
gives an example but I am still not quiet sure I understand it. My question is, if I have a two node Zookeeper cluster (I know it is not recommended but let's consider it for the sake of this example)
server.1 and
server.2,
can I have hierarchical quorums as follows:
group.1=1:2
weight.1=2
weight.2=2
With the above configuration:
Even if one node goes down I still have enough votes (?) to
maintain a quorum ? is this a correct statement ?
What is the zookeeper quorum value here (2 - for two nodes or 3 -
for 4 votes)
In a second example, say I have:
group.1=1:2
weight.1=2
weight.2=1
In this case if server.2 goes down,
Should I still have sufficient votes (2) to maintain a quorum ?

As far as I understand from the documentation, When we give weight to a node, then the majority varies from being the number of nodes. For example, if there are 10 nodes and 3 of the nodes have been given 70 percent of weightage, then it is enough to have those three nodes active in the network. Hence,
You don't have enough majority since both nodes have equal weight of 2. So, if one node goes down, we have only 50 percent of the network being active. Hence quorum is not achieved.
Since total weight is 4. we require 70 percent of 4 which would be 2.8 so closely 3, since we have only two nodes, both needs to be active to meet the quorum.
In the second example, it is clear from the weights given that 2/3 of the network would be enough (depends on the configuration set by us, I would assume 70 percent always,) if 65 percent is enough to say that network is alive, then the quorum is reached with one node which has weightage 2.

Related

Legal Hierarchical Quorums in Zookeeper

I am trying to understand hierarchical quorums in Zookeeper. I may not understand the example shown in the documentation (here). Are votes [from at least two servers from each of two different groups] enough to form a legal quorum?
In my opinion, the example here does not gain the majority of all the weight; it only gains more than 4 ballots. A legal quorum should earn more than 5 ballots (9/2+1).
I also read the source code. The algorithm implementation is shown from line 352 to line 371. Zookeeper only checks if all groups have a majority and if the number of selected groups is larger than half of the group number.
Maybe I find the answer.
A different construction that uses weights and is useful in wide-area deployments (co-locations) is a hierarchical one. With this construction, we split the servers into disjoint groups and assign weights to processes. To form a quorum, we have to get a hold of enough servers from a majority of groups G, such that for each group g in G, the sum of votes from g is larger than half of the sum of weights in g. Interestingly, this construction enables smaller quorums. If we have, for example, 9 servers, we split them into 3 groups, and assign a weight of 1 to each server, then we are able to form quorums of size 4.
Note that two subsets of processes composed each of a majority of servers from each of a majority of groups necessarily have a non-empty intersection. It is reasonable to expect that a majority of co-locations will have a majority of servers available with high probability.

Opensearch: Data node costs

I don't understand the costs of having 1 data node vs having 2 or more data nodes.
Will I have the same cost regardless of the number of nodes?
If I have 2 data nodes, that means that I will have double the cost of the instances?
Thanks
Depends on the instance size: i3.2xlarge would be ~2x more expensive than i3.xlarge.
If you use one instance size then yes, 2 nodes would be 2x more expensive than 1 node but you'll get more resilience (if one node goes down your cluster can still get updates and serve data) and rolling restarts.
Though, Opensearch requires an odd number of nodes for master election to work reliably so 3 smaller nodes might be better than 2 larger ones.

Is it advisable to add more brokers to kafka cluster although the load is still low

although we do not have any perfomance issues yet, and the nodes are pretty much idle, is it advisable to increase the number of kafka brokers (and zookeepers) from 3 to 5 immediately to improve cluster high availability? The intention is then of course to increase the replication factor from 3 to 5 as a default config for critical topics.
If high level of data replication is essential for your business, it is advisable to increase the count of brokers. To attain this, on top of extra nodes, you are creating a technical debt on network load also. Obviously if you increase the number of brokers in cluster, you are decreasing the risk related to loosing high availability.
Depending of your needs. If you do not have to ensure a very high availability(example a bank), the increase of replication factor in your cluster will reducer the overall performance because when you write a message on a topic/partition, that message will be replicated in 5 nodes instead of 3. You can increase the number of nodes for high availability and distribute less partitions on every node, but without a increase of replication factor.

consensus algorithm: what will happen if an odd cluster becomes even because of a node failure?

Consensus algorithm (e.g. raft) requires the cluster contains an odd number of nodes to avoid the split-brain problem.
Say I have a cluster of 5 nodes, what will happen if only one node fails? The cluster has 4 nodes now, which breaks the odd number rule, will the cluster continue to behave right?
One solution is to drop one more node to make the cluster contain only 3 nodes, but what if the previously failed node comes back? then the cluster has 4 nodes again, and we have to bring the afore-dropped node back in order to keep the cluster odd.
Do implementations of the consensus algorithm handle this problem automatically, or I have to do it in my application code (for example, drop a node)?
Yes, the cluster will continue to work normally. A cluster of N nodes, and if N is odd (N = 2k + 1), can handle k node fails. As long as a majority of nodes is alive, it can work normally. If one node fails, and we still have the majority, everything is fine. Only when you lose majority of nodes, you have a problem.
There is no reason to force the cluster to have an odd number of nodes, and implementations don't consider this as a problem and thus don't handle it (drop nodes).
You can run a consensus algorithm on an even number of nodes, but it usually makes more sense to have it odd.
3 node cluster can handle 1 node fail (the majority is 2 nodes).
4 node cluster can handle 1 node fail (the majority is 3 nodes).
5 node cluster can handle 2 node fail (the majority is 3 nodes).
6 node cluster can handle 2 node fail (the majority is 4 nodes).
I hope this makes it more clear why it makes more sense to have the cluster size to be an odd number, it can handle the same number of node failures with fewer nodes in the cluster.

Whats the maximum size of the zookeeper ensemble

How many nodes at max can be part of the Zookeeper ensemble , is it 255 . If you want to go beyond that should there be multiple ensembles ?
Here is a similar question: Maximum servers in a ZooKeeper ensemble cluster?
Not sure about the actual limits in ZK code, but any cluster of size larger than e.g. 13 would be really strange. At some point write performance would start to suffer significantly.
Proper scaling would be having multiple clusters for different use cases. Alternatively, using Observers which don't affect write speed.