Zookeeper on multiple networks - apache-zookeeper

I am setting up a zookeeper ensemble. However I would like to use multiple networks (each machine would have 2 IP addresses) on each node for redundancy purpose.
Can I archive this thru zookeeper configuration?
tickTime=2000
dataDir=/var/zookeeper/
clientPort=2181
initLimit=5
syncLimit=2
server.1=ip1-A:2888:3888;ip2-A:2888:3888 <=?
server.2=ip1-B:2888:3888;ip2-B:2888:3888
server.3=ip1-C:2888:3888;ip2-C:2888:3888

Related

Kafka Static IP and Service Discovery

I have a three node Kakfa cluster that also has a three node Zookeeper ensemble managing it. My configuration for this cluster looks like
Node 1
IP - 192.168.1.11
Kafka Port - 9092
Zookeeper Port - 2181
Node 2
IP - 192.168.1.12
Kafka Port - 9092
Zookeeper Port - 2181
Node 3
IP - 192.168.1.13
Kafka Port - 9092
Zookeeper Port - 2181
For each of these nodes I have both the Zookeeper and Kakfa configuration files. My sample Zookeeper config file looks like
# Zookeeper server config
dataDir=/tmp/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=192.168.1.11:2889:3889
server.2=192.168.1.12:2889:3889
server.3=192.168.1.13:2889:3889
since each Zookeeper instance needs to know about each other Zookeeper instance and generally from what I have seen, even when managing massive Kafka clusters, there is usually less than 10 Zookeeper nodes. So here we would only need to keep track of 10 IPs. Also from my understanding, these IPs are not as volatile and usually do not change often if ever.
For my Kafka configuration file I have the following on each node
# Kafka server properties file
broker.id=<ID for this node>
log.dirs=/tmp/kafka-logs
zookeeper.connect=192.168.1.11:2181,192.168.1.12:2181,192.168.1.13:2181
zookeeper.connection.timeout.ms=36000
listeners=PLAINTEXT://<IP of this node>:9092
Now it makes sense to me that each Kafka node we introduce into our cluster has to be aware of all the Zookeeper nodes so it can be managed. But the issue for me is that as we scale the Kafka nodes up or down, we are less certain about their IPs. For example, if I wanted to create a new Kafka topic, I would use the kafka-topics.sh shell file that they provide and type something like
kafka-topics.sh --create --topic MyTopic --bootstrap-server <IP of one of the Kafka nodes>
# Could also use the broker-list option instead of bootstrap-server to allow multiple IPs
The problem for me is, we never know which Kafka IPs are up and running, so passing the IPs to --bootstrap-server seems like a guessing game, or I need to manually check a working node for its IP.
So for Kafka, how do I configure a static IP (maybe virtual IP?) so that other services that use my Kafka cluster can always connect to it? How do I perform service discovery for a cluster with changing IPs?
there is usually less than 10 Zookeeper nodes
According to Kafka Definitely Guide, 7 is generally the max size of a Zookeeper cluster for large Kafka clusters. Personally, I've not seen more than 5 on a Kafka cluster serving millions of events a day...
You could make a DNS record that resolves to the healthy instances
However, if IPs aren't static, then clients, in general, would have issues because partition leaders are hosted by IP and broker ID. If an ID moves to a new IP or an IP no longer resolves to a (healthy) Kafka broker, your clients start experiencing errors
Note: both bootstrap-server and broker-list accept multiple addresses, but only the console producer uses broker-list param
There are also other ways to create topics, such as Terraform where you could statically store the Kafka addresses as a variable in source code and rarely ever change it. In particular, you don't need to list every IP each time you use a Kafka client, only a handful

How to configure server.properties for clustering in Kafka

I've been following Kafka Quickstart for "Setting up a multi-broker cluster" on a single machine. (Just for testing purposes).
Running Kafka with three properties files worked good. (I ran them on a single machine for testing)
server.properties :
broker.id=0
listeners=PLAINTEXT://:9092
server-1.properties :
broker.id=1
listeners=PLAINTEXT://:9093
server-2.properties :
broker.id=2
listeners=PLAINTEXT://:9094
Now, I want to create a cluster with three machines.
1) Do I run three Zookeeper for three machines? With the same port (2181)? Or Run just one Zookeeper on one machine?
2) When I run Kafka with server.properties, I know that I should have different broker.id for each machine. How about the listeners part? Do I use the same port?
listeners=PLAINTEXT://192.168.0.5:9092 (machine 1)
listeners=PLAINTEXT://192.168.0.6:9092 (machine 2)
listeners=PLAINTEXT://192.168.0.7:9092 (machine 3)
The number of Zookeeper machines affects service availability and reliability. For testing purpose, one is enough. If three machines, using same port or different ports are both ok because there is a conf in server.properties:
zookeeper.connect=localhost:2181
# if using three zookeeper machines and different ports, modify it to following
# zookeeper.connect=192.168.0.5:2181,192.168.0.6:2182,192.168.0.7:2183
Same port is good and recommended. Also make sure to set advertised.listeners to an address that is resolvable by each of the machines in the clusters as well as from where your clients will run.

multiple kafka clusters on single zookeeper ensemble

I currently have a 3 node Kafka cluster which connects to base chroot path in my zookeeper ensemble.
zookeeper.connect=172.12.32.123:2181,172.11.43.211:2181,172.18.32.131:2181
Now, I want to add a new 5 node Kafka cluster which will connect to some other chroot path in the same zookeeper ensemble.
zookeeper.connect=172.12.32.123:2181,172.11.43.211:2181,172.18.32.131:2181/cluster/2
Will these configurations work as in the relative paths for the two chroots? I understand that the original Kafka cluster should have been connected on some path other than the base chroot path for better isolation.
Also, is it good to have same zookeeper ensemble across Kafka clusters? The documentation says that it is generally better to have isolated zookeeper ensembles for different clusters.
If you're only limited to a single Zookeeper cluster, then it should work out fine with a unique chroot that doesn't collide with the other cluster's znodes.
It is not "good" to share, no, because Zookeeper losing quorum causes two clusters to be down, but again if you're limited on hardware, then it'll still work
Note: You can only afford to lose one ZK server with 3 nodes in the cluster, which is why a cluster of 5 is recommended

Connection to node -1 (x.x.x.x/:9092) could not be established. Broker may not be available

my problem is i have setup 2 kafka brokers and 2 zookeeper nodes on different docker containers(Ubuntu).
Here is my configuration file of server1.properties
broker.id=1
############################# Socket Server Settings
listeners=PLAINTEXT://0.0.0.0:9092
advertised.listeners=PLAINTEXT://ipaddress_server1:9092
zookeeper.properties configuration file -
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
# disable the per-ip limit on the number of connections since this is a non-production config
maxClientCnxns=200
tickTime=2000
initLimit=20
syncLimit=10
this is are the properties for both the kafka and zookeeper servers. I have given unique broker id for each server and also create myid file insite /tmp/zookeeper dir.
now when i am testing kafka cluster by producing messages through only one ip address like this ./bin/kafka-console-producer.sh --broker-list 172.171.0.3:9092 --topic demo it works fine. But when i shutdown one container which is leader, still i am getting messages from the topic. But when i am running the consumer script again it is showing me WARN msgs : -
Connection to node -1 (/172.171.0.3:9092) could not be established.
Broker may not be available
Now i am not able to get the messages, what should i do ???

storm dies when certain zookeeper nodes fail

I have a nimbus server and 3 zookeeper nodes.
My storm.yaml file looks like this:
storm.zookeeper.servers:
- "server1"
- "server2"
- "server3"
nimbus.host: "nimbus-server"
storm.local.dir: "/var/storm"
My zoo.cfg files all look like this:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/zookeeper
clientPort=2181
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
server.3=server1:2888:3888
server.4=server2:2888:3888
server.5=server3:2888:3888
When all three zookeeper nodes are running, everything is fine according to the storm_ui. If I shut down one of these three nodes, the nimbus server complains that it can't connect to the zookeeper cluster and it dies. I can't find anywhere why this might be happening. The documentation says that if I have three zookeeper nodes, it should tolerate one of them dying. Is there something that has to be set in one of these for this to work?
This turned out to be iptables. There never was a quorum between the zookeeper servers, so in effect, after the one I stopped was out, it behaved just as it should have. I opened port 2181, 2888, and 3888 on the one server that didn't have them opened and now I can kill one of them with storm still alive.