How to add a new Kafka brokers machines dynamically to a cluster - apache-kafka

We have Kafka confluent cluster , cluster include 3 Kafka brokers ,
Version details:
Kafka machines are installed on rhel version 7.2
Kafka confluent version is 0.1x
Zookeeper version: 3.4.10
schema-registry version: 4.0.0
Each Kafka broker machine include the following services
Kafka broker
Zookeeper server
Schema registry
Now we want to add a additionally new 3 Kafka brokers machines to the current Kafka cluster ( the Additional Kafka machines are – kafka04/05/06 with the same Kafka version – 0.1X , )
So cluster should be finally with:
6 Kafka brokers machines - kafka01, kafka02 , kafka03 , kafka04 , kakfa05 , kafka05
3 zookeeper servers - kafka01, kafka02 , kafka03
3 schema registry services - kafka01, kafka02 , kafka03
In order to connect the new 3 Kafka brokers to the existing Kafka cluster we need to change the Configuration on all Kafka Machines ( the old Kafka machines and the new Kafka machines )
We are not sure what are exactly the configuration files in Kafka brokers that should be changed, But From my understanding We should change the Kafka and zookeeper settings as the following
I will happy to get remarks/notes about the following procedure
Edit the server.properties file on the new Kafka brokers - kafka04/05/06 , and change the broker.id parameter as the following
On kafka04 - broker.id=4
On kafka05 - broker.id=5
On kafka06 - broker.id=6
Edit the server.properties on all Kafka machines - kafka01/02/03/04/05/06
And change the following parameters to the number of total nodes in the cluster
offsets.topic.replication.factor=6
transaction.state.log.replication.factor=6
On the new Kafka machines - Kafka 04/05/06 edit the server.properties and update the parameter -- zookeeper.connect with the zookeeper server ip’s that located on kafka01 , kafka02 , kafka03
Example
zookeeper.connect=10.10.10.1:2181,10.10.10.2:2181,10.10.10.3:2181
On Kafka machines – kafka 04/05/06 , edit the file - zookeeper.properties as the
Following
server.1=10.10.10.1:2888:3888
server.2=10.10.10.2:2888:3888
server.3=10.10.10.3:2888:3888
Edit the file - myid file on Kafka 04/05/06 , and change the parameter broker.id as the
Following
on kafka04 set:
broker.id=4
on kafka05 set:
broker.id=5
on kafka06 set:
broker.id=6
After settings as above , restart all Kafka brokers services on kafka01/02/03/04/05/06 and Restart the zookeeper servers on kafka01/02/03
And verify all Kafka services and zookeeper service started successfully
Reference info - https://www2.microstrategy.com/producthelp/current/PlatformAnalytics/en-us/Content/Add_kafka_node_to_kafka_cluster.htm

When adding brokers, you don't need to change the configuration of existing brokers, nor restart them. The same applies to ZooKeeper, if you are not adding new ZooKeeper servers.
On the new brokers, you just need to set a different broker.id value.
I don't recommend increasing the replication factor of the topics (including internal) beyond 3.
Once your new brokers are started, you may want to rebalance your existing data. There are many tools to do that, including the kafka-reassign-partitions.sh tool. The Kafka docs has a section detailing the process: https://kafka.apache.org/documentation/#basic_ops_cluster_expansion

Related

Kafka broker setup

To connect to a Kafka cluster I've been provided with a set of bootstrap servers with name and port :
s1:90912
s2:9092
s3:9092
Kafka and Zookeeper are running on the instance s4. From reading https://jaceklaskowski.gitbooks.io/apache-kafka/content/kafka-properties-bootstrap-servers.html, it states:
bootstrap server is a comma-separated list of host and port pairs that
are the addresses of the Kafka brokers in a "bootstrap" Kafka cluster
that a Kafka client connects to initially to bootstrap itself.
I reference the above bootstrap server definition as I'm trying to understand the relationship between the kafka brokers s1,s2,s3 and kafka,zookeeper running on s4.
To connect to the Kafka cluster, I set the broker to a CSV list of 's1,s1,s3'. When I send messages to the CSV list of brokers, to verify the messages are added to the topic, I ssh onto the s4 box and view the messages on the topic.
What is the link between the Kafka brokers s1,s2,s3 and s4? I cannot ssh onto any of the brokers s1,s2,s3 as these brokers do not seem accessible using ssh, should s1,s2,s3 be accessible?
The individual responsible for the setup of the Kafka box is no longer available, and I'm confused as to how this configuration works. I've searched for config references of the brokers s1,s2,s3 on s4 but there does not appear to be any configuration.
When Kafka is being set up and configured what allows the linking between the brokers (in this case s1,s2,s3) and s4?
I start Kafka and Zookeeper on the same server, s4.
Should Kafka and Zookeeper also be running on s1,s2,s3?
What is the link between the Kafka brokers s1,s2,s3 and s4?
As per the Kafka documentation about adding nodes to a cluster, each server must share the same zookeeper.connect string and have a unique broker.id to be part of the cluster.
You may check which nodes are in the cluster via zookeeper-shell with an ls /brokers/ids, or via the Kafka AdminClient API, or kafkacat -L
should s1,s2,s3 be accessible?
Via SSH? They don't have to be.
They should respond to TCP connections from your Kafka client machines on their Kafka server ports, though
Should Kafka and Zookeeper also be running on s1,s2,s3?
You should not have 4 Zookeeper servers in a cluster (odd numbers, only)
Otherwise, you've at least been given some ports for Kafka on those machines, therefore Kafka should be

How to delete Kafka brokers machines from kafka cluster

We have Kafka cluster , cluster include 7 Kafka brokers ,
Version details:
Kafka machines are installed on rhel version 7.2
Kafka version is 0.1x
Zookeeper version: 3.4.10
schema-registry version: 4.0.0
Each Kafka broker machine include the following services
Kafka broker
Zookeeper server
Schema registry
Now we want to delete 2 Kafka brokers machines from the current Kafka cluster ( the Kafka machines that we want to delete are – 06/07 ( Kafka version – 0.1X , )
So cluster should be finally with:
5 Kafka brokers machines - kafka01, kafka02 , kafka03 , kafka04 , kakfa05 , kafka05
3 zookeeper servers/services that installed on kafka01, kafka02 , kafka03
5 schema registry services that installed on kafka01, kafka02 , kafka03 , kakfa05 , kafka05
We are not sure what are exactly the configuration files in Kafka brokers that should be changed, when we delete from the cluster the Kafka brokers - kafka 07/06
I will happy to get remarks/notes about Kafka brokers deletion procedure
from my understanding we need to do the following in order to remove the kafka machines - kafka 06/07
on kafka06/07
stop the services of
Kafka broker
Zookeeper server
Schema registry
then shutdown the kafka06/07 machines as
init 0
then restart the following service on kafka01/02/03/04/05
Kafka broker
Zookeeper server
Schema registry
To remove a Kafka Broker component, first identify and reassign the Kafka topic partition Leaders from the Kafka Broker, that you want to decommission, by using the kafka-reassign-partitions.sh script, and then shutdown the Kafka Broker service.
And remove the host from the bootstrap-servers in producers and consumers.

2 cluster of zookeper servers in hadoop+kafka cluster - is it posible?

We have Kafka cluster with the following details
3 kafka machines
3 zookeeper servers
We also have Hadoop cluster that includes datanode machines
And all application are using the zookeeper servers, including the kafka machines
Now
We want to do the following changes
We want to add additional 3 zookeeper servers that will be in a separate cluster
And only kafka machine will use this additional zookeeper servers
Is it possible ?
Editing the ha.zookeeper.quorum in Hadoop configurations to be separate from zookeeper.connect in Kafka configurations, such that you have two individual Zookeeper clusters, can be achieved, yes.
However, I don't think Ambari or Cloudera Manager, for example, allow you to view or configure more than one Zookeeper cluster at a time.
Yes, that's possible. Kafka uses Zookeeper to perform various distributed coordination tasks, such as deciding which Kafka broker is responsible for allocating partition leaders, and storing metadata on topics in the broker.
After closing kafka, the original zookeeper cluster data will be copied to the new cluster using tools, this is a zookeeper cluster data transfer util zkcopy
But if your Kafka cluster didn't stop work, you should think about Zookeeper data transfer to additional zookeeper servers.

kafka machines in the cluster and kafka communications

We have kafka cluster with 3 kafka brokers nodes and 3 zookeepers servers
kafka version - 10.1 ( hortonworks )
from my understanding since all meta data is located on the zookeeper servers , and kafka brokers are using this data ( kafka talk with zookeeper server via port 2181 )
I just wondering if each kafka machine talk with other kafka in the cluster , or maybe kafka are get/put the data only on/from the zookeepers servers ?
So dose kafka service need to communicate with other kafka in the cluster ? ,
Or maybe kafka machines get all is need only from the zookeepers server ?
Kafka brokers certainly need to communicate with each other, most importantly to replica data. Data produced to Kafka is replicated across brokers for fault-tolerance and data durability. Partition followers send FetchRequests to partition leaders in order to replicate the data.
Additionally, the Controller broker sends a LeaderAndIsr request to brokers whenever a partition leader/follower is changed - that's how it informs brokers to start leading a partition or replicating it.
I would recommend these two introductory articles of mine in order to help you get more context:
https://hackernoon.com/thorough-introduction-to-apache-kafka-6fbf2989bbc1
https://hackernoon.com/apache-kafkas-distributed-system-firefighter-the-controller-broker-1afca1eae302

how zookeper talk with kafka to know kafka is up

we have with 3 kafka machine and 3 zookeper servers
while kafka machines are not co-hosted with zookeper server ( kafka are on different machines , OS is redhat version 7.x )
in order to get the brokers id , we do the following on the zookeper servers
cd /usr/hdp/current/zookeeper-server/bin
./zkCli.sh
ls /brokers/ids
results should be the three brokers id's as
1011 1012 1013
my question is - in which way zookeper know that broker is up?
or to be more specific
which cli zookeper execute in order to identify that kafka broker is up ?
Zookeeper is basically a distributed key-value store. Upon startup, a Kafka broker connects to Zookeeper (using the zookeeper.connect setting) and create a znode (a key-value pair) with its own broker.id under /brokers/ids. Kafka brokers then stay connected to Zookeeper while they are running.
The znode is created as "Ephemeral" (this is a feature of Zookeeper). It means that Zookeeper will delete it if the broker disconnects.
This way, Zookeeper knows at any time which brokers are alive (it does not necessarily mean the broker is healthy!). This is used by brokers to discover the other brokers in a cluster.