How to delete Kafka brokers machines from kafka cluster - apache-kafka

We have Kafka cluster , cluster include 7 Kafka brokers ,
Version details:
Kafka machines are installed on rhel version 7.2
Kafka version is 0.1x
Zookeeper version: 3.4.10
schema-registry version: 4.0.0
Each Kafka broker machine include the following services
Kafka broker
Zookeeper server
Schema registry
Now we want to delete 2 Kafka brokers machines from the current Kafka cluster ( the Kafka machines that we want to delete are – 06/07 ( Kafka version – 0.1X , )
So cluster should be finally with:
5 Kafka brokers machines - kafka01, kafka02 , kafka03 , kafka04 , kakfa05 , kafka05
3 zookeeper servers/services that installed on kafka01, kafka02 , kafka03
5 schema registry services that installed on kafka01, kafka02 , kafka03 , kakfa05 , kafka05
We are not sure what are exactly the configuration files in Kafka brokers that should be changed, when we delete from the cluster the Kafka brokers - kafka 07/06
I will happy to get remarks/notes about Kafka brokers deletion procedure
from my understanding we need to do the following in order to remove the kafka machines - kafka 06/07
on kafka06/07
stop the services of
Kafka broker
Zookeeper server
Schema registry
then shutdown the kafka06/07 machines as
init 0
then restart the following service on kafka01/02/03/04/05
Kafka broker
Zookeeper server
Schema registry

To remove a Kafka Broker component, first identify and reassign the Kafka topic partition Leaders from the Kafka Broker, that you want to decommission, by using the kafka-reassign-partitions.sh script, and then shutdown the Kafka Broker service.
And remove the host from the bootstrap-servers in producers and consumers.

Related

How to add a new Kafka brokers machines dynamically to a cluster

We have Kafka confluent cluster , cluster include 3 Kafka brokers ,
Version details:
Kafka machines are installed on rhel version 7.2
Kafka confluent version is 0.1x
Zookeeper version: 3.4.10
schema-registry version: 4.0.0
Each Kafka broker machine include the following services
Kafka broker
Zookeeper server
Schema registry
Now we want to add a additionally new 3 Kafka brokers machines to the current Kafka cluster ( the Additional Kafka machines are – kafka04/05/06 with the same Kafka version – 0.1X , )
So cluster should be finally with:
6 Kafka brokers machines - kafka01, kafka02 , kafka03 , kafka04 , kakfa05 , kafka05
3 zookeeper servers - kafka01, kafka02 , kafka03
3 schema registry services - kafka01, kafka02 , kafka03
In order to connect the new 3 Kafka brokers to the existing Kafka cluster we need to change the Configuration on all Kafka Machines ( the old Kafka machines and the new Kafka machines )
We are not sure what are exactly the configuration files in Kafka brokers that should be changed, But From my understanding We should change the Kafka and zookeeper settings as the following
I will happy to get remarks/notes about the following procedure
Edit the server.properties file on the new Kafka brokers - kafka04/05/06 , and change the broker.id parameter as the following
On kafka04 - broker.id=4
On kafka05 - broker.id=5
On kafka06 - broker.id=6
Edit the server.properties on all Kafka machines - kafka01/02/03/04/05/06
And change the following parameters to the number of total nodes in the cluster
offsets.topic.replication.factor=6
transaction.state.log.replication.factor=6
On the new Kafka machines - Kafka 04/05/06 edit the server.properties and update the parameter -- zookeeper.connect with the zookeeper server ip’s that located on kafka01 , kafka02 , kafka03
Example
zookeeper.connect=10.10.10.1:2181,10.10.10.2:2181,10.10.10.3:2181
On Kafka machines – kafka 04/05/06 , edit the file - zookeeper.properties as the
Following
server.1=10.10.10.1:2888:3888
server.2=10.10.10.2:2888:3888
server.3=10.10.10.3:2888:3888
Edit the file - myid file on Kafka 04/05/06 , and change the parameter broker.id as the
Following
on kafka04 set:
broker.id=4
on kafka05 set:
broker.id=5
on kafka06 set:
broker.id=6
After settings as above , restart all Kafka brokers services on kafka01/02/03/04/05/06 and Restart the zookeeper servers on kafka01/02/03
And verify all Kafka services and zookeeper service started successfully
Reference info - https://www2.microstrategy.com/producthelp/current/PlatformAnalytics/en-us/Content/Add_kafka_node_to_kafka_cluster.htm
When adding brokers, you don't need to change the configuration of existing brokers, nor restart them. The same applies to ZooKeeper, if you are not adding new ZooKeeper servers.
On the new brokers, you just need to set a different broker.id value.
I don't recommend increasing the replication factor of the topics (including internal) beyond 3.
Once your new brokers are started, you may want to rebalance your existing data. There are many tools to do that, including the kafka-reassign-partitions.sh tool. The Kafka docs has a section detailing the process: https://kafka.apache.org/documentation/#basic_ops_cluster_expansion

kafka + what could be the reasons for kafka broker isn't the leader for topic partition

we have HDP cluster - 2.6.4 with ambari 2.6.1 version
we have 3 kafka brokers with version 10.1 , and 3 zookeeper servers
we saw in the /var/log/kafka/server.log many errors messages as :
in this example we have 6601 errors lines about:
This server is not the leader for that topic-partition
example
[2019-01-06 14:56:53,312] ERROR [ReplicaFetcherThread-0-1011], Error for partition [topic1-example,34] to broker 1011:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
we check the connectivity's between the kafka brokers and connectivity seems to be ok ( we verify the /var/log/messages and dmesg on the Linux kafka machines )
we are also suspect is the connections between the zookeeper client on kafka brokers to the zookeepers servers
but we not know how to check the relationship between client on kafka to zookeeper servers
we also know that kafka send heartbeat to zookeeper servers ( I think the heartbeat value is 2 seconds ) but we not sure if this is the right direction to search what cause the leader to disappears
any ideas what are the reasons that - kafka broker isn't the leader for topic partition ?
other related links
https://jar-download.com/artifacts/org.apache.kafka/kafka-clients/0.10.2.0/source-code/org/apache/kafka/common/protocol/Errors.java
kafka : one broker keeping print INFO log : "NOT_LEADER_FOR_PARTITION"
https://github.com/SOHU-Co/kafka-node/issues/297

kafka machines in the cluster and kafka communications

We have kafka cluster with 3 kafka brokers nodes and 3 zookeepers servers
kafka version - 10.1 ( hortonworks )
from my understanding since all meta data is located on the zookeeper servers , and kafka brokers are using this data ( kafka talk with zookeeper server via port 2181 )
I just wondering if each kafka machine talk with other kafka in the cluster , or maybe kafka are get/put the data only on/from the zookeepers servers ?
So dose kafka service need to communicate with other kafka in the cluster ? ,
Or maybe kafka machines get all is need only from the zookeepers server ?
Kafka brokers certainly need to communicate with each other, most importantly to replica data. Data produced to Kafka is replicated across brokers for fault-tolerance and data durability. Partition followers send FetchRequests to partition leaders in order to replicate the data.
Additionally, the Controller broker sends a LeaderAndIsr request to brokers whenever a partition leader/follower is changed - that's how it informs brokers to start leading a partition or replicating it.
I would recommend these two introductory articles of mine in order to help you get more context:
https://hackernoon.com/thorough-introduction-to-apache-kafka-6fbf2989bbc1
https://hackernoon.com/apache-kafkas-distributed-system-firefighter-the-controller-broker-1afca1eae302

how zookeper talk with kafka to know kafka is up

we have with 3 kafka machine and 3 zookeper servers
while kafka machines are not co-hosted with zookeper server ( kafka are on different machines , OS is redhat version 7.x )
in order to get the brokers id , we do the following on the zookeper servers
cd /usr/hdp/current/zookeeper-server/bin
./zkCli.sh
ls /brokers/ids
results should be the three brokers id's as
1011 1012 1013
my question is - in which way zookeper know that broker is up?
or to be more specific
which cli zookeper execute in order to identify that kafka broker is up ?
Zookeeper is basically a distributed key-value store. Upon startup, a Kafka broker connects to Zookeeper (using the zookeeper.connect setting) and create a znode (a key-value pair) with its own broker.id under /brokers/ids. Kafka brokers then stay connected to Zookeeper while they are running.
The znode is created as "Ephemeral" (this is a feature of Zookeeper). It means that Zookeeper will delete it if the broker disconnects.
This way, Zookeeper knows at any time which brokers are alive (it does not necessarily mean the broker is healthy!). This is used by brokers to discover the other brokers in a cluster.

Kafka cluster and Zookeeper

suppose I have 3 Kafka server.
server1 zoopkeeper1
server2 zoopkeeper2
server3 zoopkeeper3
In a cluster config what happens to the zoopkeepers ? are they maintained individually for each server or will their data sync up in cluster configuration?
Zookeepers need to be configured to form a cluster [1] and then they will indeed sync-up their data. Each kafka broker in a kafka cluster will be talking to the zookeeper cluster and this way the kafka cluster will function correctly.
On the other hand, if zookeepers haven't been configured for replication and each kafka broker talks to its individual zookeeper, then they will not constitute a healthy kafka cluster.
[1] https://zookeeper.apache.org/doc/r3.4.10/zookeeperStarted.html#sc_RunningReplicatedZooKeeper