Dynamic addition of broker to a Kafka Cluster - apache-kafka

Can we add more brokers dynamically to Kafka Cluster and achieve Broker fail-over? Don't we get issues with metadata update?

Yes brokers can be added to increase the size of Kafka clusters. However, Kafka does not rebalance partitions onto the new brokers automatically, so this needs to be done manually or via an external tool.
There's a section in the docs that details the steps to expand a cluster: http://kafka.apache.org/documentation/#basic_ops_cluster_expansion

Related

Kafka - is it possible to alter cluster size while keeping the change transparent to Producers and Consumers?

I am investigating on Kafka to assess its suitability for our use case. Can you please help me understand how flexible is Kafka with altering the size of an existing cluster?
I am investigating on Kafka to assess its suitability for our use case. Can you please help me understand how flexible is Kafka with adding brokers to an existing cluster without tearing down the cluster? Is there anything to be taken care of when doing this?
Adding servers to a Kafka cluster is easy, just assign them a unique
broker id and start up Kafka on your new servers. However these new
servers will not automatically be assigned any data partitions, so
unless partitions are moved to them they won't be doing any work until
new topics are created. So usually when you add machines to your
cluster you will want to migrate some existing data to these machines.
Refer here
Kafka supports:
Expanding your cluster
Automatically migrating data to new machines
Custom partition assignment and migration
Yes, you can increase the number of partitions of a topic using command line or AdminClient without restarting the cluster processes.
Example:
bin/kafka-topics.sh --zookeeper zk_host:port --alter --topic testtopic1
--partitions 20
Please be aware, that altering the partitions doesn't change the partitioning of existing topics.
Kafka only allows to increase the partitions but you can't decrease the partitions. If in case, you have to reduce the partitions, you need to delete and recreate the topic.
For your question "what happens with producer/consumer behave for newly added partitions"
Kafka has a property metadata.max.age.ms for producers and consumers which defaults to 300000.
metadata.max.age.ms : The period of time in milliseconds after which we force a refresh of metadata even if we haven't seen any partition leadership changes to proactively discover any new brokers or partitions.
After the given value, metadata are updated and any newly added partitions will be detected by the producers/consumers.

What if a Kafka broker cannot connect to zookeeper?

If I have, say, 3 partitions with replication factor 3. Now what I understood is that they have all to connect to the same zookeeper. Ok what if they can't due to network issues ? Will the replication continue when the network is avaialble again?
If ZK is down, your Kafka cluster will have limited functionality. For details, see How does Kafka depend on Zookeeper?
Kafka requires Zookeeper (ZK). If ZK is down, then the entire Kafka cluster will be "down" (meaning: will be almost unusable). ZK is used for a bunch of things like managing internal topics etc.
If ZK becomes available to the Kafka cluster, the cluster will be operational.

Can a Kafka broker keep id but change advertised listener?

I have a cluster of 3 Kafka brokers. Most of the topics have replication factor of 2, while the consumer offsets all have a replication factor of 3.
I need to change where the individual brokers are listening, i.e. the IPs/hostnames on which they are listening. Is it possible to change the advertised listeners for a given broker ID? Or do I have to create a new broker with a different ID, repartition topics, and remove the old broker?
Assuming it does work, does the official Java Kafka client realize that the listener has changed and re-request the list of brokers for the topic(s)?
For the interested, I am running Kafka in Kubernetes. Originally, I needed access from both inside and outside the cluster, so I had services with nodePort (hostPort did not work with CNI prior to Kubernetes 1.7).
It worked, but was complex. I no longer need access from outside Kubernetes, so would like to keep it simple and have three brokers that advertise their hostname.
Can I bring down a broker and restart it with a different advertised listener? Or must I add a new broker, rebalance, and remove the old one?

How to auto scale apache zookeeper

Anyone using Auto Scale to scale you Zookeeper cluster? If the zookeeper scale, how clients know it has been scale up or down? Specially like Kafka where the zookeeper list is being added into config file, what happen zookeeper scaled how kafka now it has been scale etc?
Short answer: ZooKeeper clients do not need to essentially know/track if there are new nodes added to the ZooKeeper cluster. They just need at least one ZK node available (healthy) for them.
Longer answer (with Kafka as example client of ZK):
If you're only adding new nodes to the ZooKeeper cluster, it's not essential for Kafka brokers to know about this, because the zookeeper.connect configuration still contains healthy ZK nodes.
If however, you're replacing/removing some of the ZooKeeper nodes, and these are the only nodes present in the zookeeper.connect configuration, then a rolling restart of the Kafka nodes will be required, after updating the zookeeper.connect configuration.
For #1 above, best to add the new ZK nodes to the Kafka configuration at the next opportunity of Kafka cluster restart.
Same is applicable for other technologies also that depend on ZK (e.g. Apache Storm).

Kafka- How to automatically use the second cluster when the first cluster is down?

I am trying to replicate data from one to another kafka cluster by using mirror maker . Suppose if master cluster is down, is it possible to automatically send the kafka messages to the second cluster ? And also is it possible to synchronise the cluster 1 with cluster 2 when the cluster 1 is up again automatically with less manual intervention?
any help is highly appreciated .
I think you meant to ask how to maintain copies between Kafka brokers, that together are considered to be a Kafka Cluster.
If that's the case, it's pretty simple, all you have to do is configure a Kafka Cluster and to create a topic with replication factor with size that is equal to the size of the nodes in the Cluster.
For example:
Let's say that we want to have 3 Brokers on our Kafka Cluster, then you'll need to prepare for each broker a different configuration file, then startup them as a cluster, and then create a topic with replication factor of 3.
Kafka will be responsible for maintaining the Fault Tolerance.
For further info on actually do the configuration, watch these videos on youtube:
https://www.youtube.com/channel/UCDLPjuuYHxPbHdN8RXxrGdw