I understand that Kafka mirroring is replicating data across different data centers. I have some questions here as below -
1) Does these different data centers needs different zookeeper ensemble ?
2) Can these Kafka mirroring concept be a good option if we want to replicate data across 2 racks which are part of data center ?
Thanks
1) If you have a separate kafka cluster in each data center, then yes, you should also have a separate zookeeper ensemble in each data center.
2) Replication between racks, in general, involves lower latency than replicating between data centers. You can consider mirroring between racks or running a kafka cluster / zookeeper ensemble that stretch across racks and letting kafka replication do the job (and possibly use rack-aware replica assignment [1]) depending on consistency, availability and network resiliency trade-offs for your use-case. In the case of 2 racks however, you wouldn't be able to retain a quorum and tolerance to either of the racks going down, so unless you can go for 3 racks, mirroring feels like a safer first choice.
[1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
Related
I don't understand the costs of having 1 data node vs having 2 or more data nodes.
Will I have the same cost regardless of the number of nodes?
If I have 2 data nodes, that means that I will have double the cost of the instances?
Thanks
Depends on the instance size: i3.2xlarge would be ~2x more expensive than i3.xlarge.
If you use one instance size then yes, 2 nodes would be 2x more expensive than 1 node but you'll get more resilience (if one node goes down your cluster can still get updates and serve data) and rolling restarts.
Though, Opensearch requires an odd number of nodes for master election to work reliably so 3 smaller nodes might be better than 2 larger ones.
although we do not have any perfomance issues yet, and the nodes are pretty much idle, is it advisable to increase the number of kafka brokers (and zookeepers) from 3 to 5 immediately to improve cluster high availability? The intention is then of course to increase the replication factor from 3 to 5 as a default config for critical topics.
If high level of data replication is essential for your business, it is advisable to increase the count of brokers. To attain this, on top of extra nodes, you are creating a technical debt on network load also. Obviously if you increase the number of brokers in cluster, you are decreasing the risk related to loosing high availability.
Depending of your needs. If you do not have to ensure a very high availability(example a bank), the increase of replication factor in your cluster will reducer the overall performance because when you write a message on a topic/partition, that message will be replicated in 5 nodes instead of 3. You can increase the number of nodes for high availability and distribute less partitions on every node, but without a increase of replication factor.
I have an IOT application which is architected in the following way :-
There are plants which has its own sets of devices.
Now the entire pipeline is deployed in kubernetes consisting of the following units :-
A job which wakes up every x seconds, reads data from all plants, pushes the data to a mqtt broker.
An MQTT broker.
A subscriber service which receives data from all plants and pushes it to a timeseries database.
Jobs running at intervals of 5min, 15min, 1hr, 4hr, 1day and performs downsampling on the data of all the projects and pushes it to separate downsampled tables.
Jobs running every day to check if there was in holes/gaps in the data and tries to fill it up if possible.
Now this works fine for few Plants, but when the number of plants increases it becomes difficult perform data retrieval, push, downsampling using single service/job as it becomes too much memory intensive and chokes at multiple places. As a temporary fix scaling it vertically fixes the issue to some extent but in that case I need to put all the pods in a single machine which I am scaling vertically and scaling multiple nodes vertically is quite expensive.
Hence I am planning to break down the system in a way so that I can scale horizontally and I am looking for suggestions for the possible ways I can achieve this.
How many nodes at max can be part of the Zookeeper ensemble , is it 255 . If you want to go beyond that should there be multiple ensembles ?
Here is a similar question: Maximum servers in a ZooKeeper ensemble cluster?
Not sure about the actual limits in ZK code, but any cluster of size larger than e.g. 13 would be really strange. At some point write performance would start to suffer significantly.
Proper scaling would be having multiple clusters for different use cases. Alternatively, using Observers which don't affect write speed.
I am trying to understand hierarchical quorums in Zookeeper. The documentation here
gives an example but I am still not quiet sure I understand it. My question is, if I have a two node Zookeeper cluster (I know it is not recommended but let's consider it for the sake of this example)
server.1 and
server.2,
can I have hierarchical quorums as follows:
group.1=1:2
weight.1=2
weight.2=2
With the above configuration:
Even if one node goes down I still have enough votes (?) to
maintain a quorum ? is this a correct statement ?
What is the zookeeper quorum value here (2 - for two nodes or 3 -
for 4 votes)
In a second example, say I have:
group.1=1:2
weight.1=2
weight.2=1
In this case if server.2 goes down,
Should I still have sufficient votes (2) to maintain a quorum ?
As far as I understand from the documentation, When we give weight to a node, then the majority varies from being the number of nodes. For example, if there are 10 nodes and 3 of the nodes have been given 70 percent of weightage, then it is enough to have those three nodes active in the network. Hence,
You don't have enough majority since both nodes have equal weight of 2. So, if one node goes down, we have only 50 percent of the network being active. Hence quorum is not achieved.
Since total weight is 4. we require 70 percent of 4 which would be 2.8 so closely 3, since we have only two nodes, both needs to be active to meet the quorum.
In the second example, it is clear from the weights given that 2/3 of the network would be enough (depends on the configuration set by us, I would assume 70 percent always,) if 65 percent is enough to say that network is alive, then the quorum is reached with one node which has weightage 2.