Zoopekeeper instances in Kafka - apache-kafka

I have 3 nodes where Kafka is installed. All these 3 nodes have their own zookeeper instances. Are 3 zookeeper instances required or is 1 zookeeper instance suffice? Should we have multiple zookeeper instances for fault tolerance & in such a scenario would one of the instances act as primary and would others be replica?

I'm not sure what you mean by "All these 3 nodes have their own zookeeper instances" Basically you should have a single cluster of one, three or five Zookeeper instances and all Kafka brokers should use the same cluster. You don't need more than one Zookeeper instance but I'd highly recommend to use three or five instances because of availability. We use three instances of Zookeeper to run our Kafka cluster.

Related

multiple kafka clusters on single zookeeper ensemble

I currently have a 3 node Kafka cluster which connects to base chroot path in my zookeeper ensemble.
zookeeper.connect=172.12.32.123:2181,172.11.43.211:2181,172.18.32.131:2181
Now, I want to add a new 5 node Kafka cluster which will connect to some other chroot path in the same zookeeper ensemble.
zookeeper.connect=172.12.32.123:2181,172.11.43.211:2181,172.18.32.131:2181/cluster/2
Will these configurations work as in the relative paths for the two chroots? I understand that the original Kafka cluster should have been connected on some path other than the base chroot path for better isolation.
Also, is it good to have same zookeeper ensemble across Kafka clusters? The documentation says that it is generally better to have isolated zookeeper ensembles for different clusters.
If you're only limited to a single Zookeeper cluster, then it should work out fine with a unique chroot that doesn't collide with the other cluster's znodes.
It is not "good" to share, no, because Zookeeper losing quorum causes two clusters to be down, but again if you're limited on hardware, then it'll still work
Note: You can only afford to lose one ZK server with 3 nodes in the cluster, which is why a cluster of 5 is recommended

How to scale Zookeeper with kafka

I am working on scaling the kafka cluster in Prod. Confluent provides easy way to add kafka brokers. However, how do I know how to scale zookeeper along with Kafka. What should be the ratio? Right now we have 5 zookeeper nodes for 5 kafka brokers. If I have 10 kafka brokers how many zookeeper nodes should be there?
Zookeeper works as a coordination service for Apache Kafka which stores metadata of kafka cluster. Zookeeper cluster is called ensemble.
Number of servers in a zookeeper ensemble are an odd number(3,5 etc).These numbers represents, how much your cluster is fault tolerant.A three node ensemble ,you can run with one node missing.
With five node ensemble,you can run with two nodes missing and your cluster will be available.
You can add as many zookeeper servers based on how much you want system to be functional inspire of failures, however a ZooKeeper cluter of more than 7 nodes is not recommended for issues with overhead of latency and over-communication between those nodes.

How to span the kafka partitions across multiple VM's?

I am familiar with basic kafka system. I want to span a single kafka instance across 2 VM's such that some partitions are in one VM and some more in another VM. Please tell me how to configure this kind of system.
What do you mean by "to span kafka instance across 2 VMs" ? What you can due is having two different Kafka instances running on the 2 VMs. They should be configured in order to connect to the same Zookeeper cluster. When you create a new topic with a specific number of partitions, Kafka will span such partitions over the 2 VMs.

zookeeper failover for kafka cluster

I am wondering is there any way to make the zookeeper failover for kafka cluster.
For example: i want to setup 2 zookeeper instances for my kafka cluster. In case of one zookeeper fails, Kafka servers still able to read metadata of topics from second zookeeper.
any advice is highle appricicated.
Zookeeper works as a so-called quorum – a cluster of nodes that forms a consensus based on simple majority votes.
For production, you should use 3 or 5 Zookeeper instances in a quorum.
If you're using 3, your cluster can survive losing one server (because the remaining two form a simple majority). With 5, you can lose two servers because 3 is a majority of 5.
2 is a bad idea because your cluster won't work if 1 node goes down.
Please check this question
$KAFKA_HOME/config/server.properties
Here you can set multiple zookeeper
zookeeper.connect=<server1>:2181,<server2>:2181,<server2>:2181
Maintain 2n+1(quorum ) rule in case of zookeeper

Installing kafka cluster

I want to install 2 node Kafka cluster on Amazon EC2.
I follow the steps from this link: https://www.digitalocean.com/community/tutorials/how-to-install-apache-kafka-on-ubuntu-14-04
Also, I want to have zookeeper on both nodes, because If I have it only on one node, if that node dies, my kafka cluster dies.
In step 9 (Installing multi-node cluster), they say that I need to modify zookeeper.connect in kafka server properties, so that it has comma separated list of ip:port for each node where zookeeper is installed.
On the other hand, when I want to create a topic, in the script I only specify 1 zookeeper!
1) Will the other zookeeper node know that the topic has been created?
2) In case that 1 zookeeper node fails, will the other one takeover?
3) `When the failed node goes up again, will it take again the information about topics from the node that stayed alive?
Regards,
Srdjan
You should create a cluster with no less than three nodes. Like Serejja mentioned, it should be odd-numbered for fault-tolerance.
3,5,7,9 etc.
For Kafka, you should specify a --replication-factor when creating the topic. In a three node cluster, it's recommended to set it to two or three.
In this scenario if one of the brokers goes down, the data will get replicated across the available nodes, and then once the unavailable node comes back online, the data will propagate to it.
The Kafka Documentation is fantastic, and I recommend further reading of the Replication topic.