Kafka: how to remove broker from Replica Set

Kafka: how to remove broker from Replica Set - apache-kafka

I have a partition with the following Replicas:
Topic: topicname Partition: 10 Leader: 1 Replicas: 1,2,4,3 Isr: 1,2,3
Where Replica 4 is a non-existent broker. I accidentally added this broker into the replica set as a typo.
I want to remove 4 from the Replica set. but after running kafka-reassign-partitions.sh, the reassignment to remove Replica #4 never finishes.
kafka-reassign-partitions.sh --zookeeper myzookeeperhost:2181 --reassignment-json-file remove4.txt --execute
Where remove4.txt looks like
{ "partitions": [
{ "topic": "topicname", "partition": 2, "replicas": [1,2,3] }
], "version": 1 }
The reassignment is stuck:
kafka-reassign-partitions.sh --zookeeper myzookeeperhost:2181 --reassignment-json-file remove4.txt --verify
Status of partition reassignment:
Reassignment of partition [topicname,10] is still in progress
I checked the controller log, it looks like the reassignment command was picked up, but nothing happens afterwards:
[2017-08-01 06:46:07,653] DEBUG [PartitionsReassignedListener on 101 (the controller broker)]: Partitions reassigned listener fired for path /admin/reassign_partitions. Record partitions to be reassigned {"version":1,"partitions":[{"topic":"topicname","partition":10,"replicas":[1,2,3]}]} (kafka.controller.PartitionsReassignedListener)
Any ideas on what I'm doing wrong? How to I remove broker #4 from the replica set?
update: I'm running kafka 10

I was able to solve this issue by spinning up a new broker with a broker-id matching the one that was added (in your case 4).
The Kafka Quickstart guide shows you how to spin up a broker with a specific id. Once you have your node up with id 4, run:
./bin/kafka-topics.sh --zookeeper localhost:2181 --topic badbrokertest --describe
You should see that all the replicas are in the isr column like so:
Topic:badbrokertest PartitionCount:3 ReplicationFactor:3 Configs:
Topic: badbrokertest Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
Topic: badbrokertest Partition: 1 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
Topic: badbrokertest Partition: 2 Leader: 1 Replicas: 1,2,3,4 Isr: 1,2,3,4
Now you can reassign your partitions!
./bin/kafka-reassign-partitions.sh --reassignment-json-file badbroker2.json --zookeeper localhost:2181
Where badbroker2.json looks like:
{
"version":1,
"partitions":[
{"topic":"badbrokertest","partition":0,"replicas":[1,2,3]},
{"topic":"badbrokertest","partition":1,"replicas":[1,2,3]},
{"topic":"badbrokertest","partition":2,"replicas":[1,2,3]}
]
}
So in short, once you've sync'd all your replicas by adding the missing broker you can remove the unneeded broker.
If you're working with several servers, be sure to set the listeners field in the config to make your temporary broker available to the others brokers. The Quickstart guide doesn't consider that case.
listeners=PLAINTEXT://10.9.1.42:9093

Related

Kafka configuration min.insync.replicas not working

Its my early days in learning kafka. And I am checking out every kafka property/concept in my local machine.
So I came across this property min.insync.replicas and here is my understanding. Please correct me if I've misunderstood anything.
Once a message is sent to a topic, the message must be written to at least min.insync.replicas number of followers.
min.insync.replicas also includes the leader.
If number of available live brokers( indirectly, in sync replicas ) are less than the specified min.insync.replicas , then producer will raise an exception failing to publish the message.
Following are the steps I followed to create the above scenario
Started 3 brokers in local with broker Ids 0, 1 and 2
created the topic insync and set min.insync.replicas to 2
using the following command
sudo ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic insync --config min.insync.replicas=2
Describe the topic resulted in the following
Topic:insync PartitionCount:1 ReplicationFactor:3 Configs:min.insync.replicas=2
Topic: insync Partition: 0 Leader: 2 Replicas: 2,0,1 Isr: 1,2,0
At this point, I made sure the property I've provided is picked by kafka
I started sending messages and consuming them from terminal using following command
Producer: ./kafka-console-producer.sh --broker-list localhost:9092 --topic insync --producer.config ../config/producer.properties
Consumer: ./kafka-console-consumer.sh --zookeeper localhost:2181 --topic insync
At this point, I was able to send and receive messages successfully.
Bought down 2 brokers (0 and 2) and described the topic and resulted in following
Topic:insync PartitionCount:1 ReplicationFactor:3 Configs:min.insync.replicas=2
Topic: insync Partition: 0 Leader: 1 Replicas: 2,0,1 Isr: 1
At this point, the In Sync Replicas are just 1(Isr: 1)
Then I tried to produce the message and it worked. I was able to send messages from console-producer and I could see those messages in console consumer.
My Kafka version: kafka_2.10-0.10.0.0
following are the producer properties:
bootstrap.servers=localhost:9092
compression.type=none
batch.size=20
acks=all
I expected the producer to fail with NotEnoughReplicasException as mentioned in this.
public class NotEnoughReplicasException
extends RetriableException
Number of insync replicas for the partition is lower than >min.insync.replicas
but it worked normally.
Am I missing something? How can I create the scenario?
*************** EDIT **********************
Instead of producing the messages from console producer, I tried to generate messages from java code. This time, I got the expected exception in the kafka broker. Although I expected it in the producer (java code). As this experiment is raising more questions, I've posted another question.

is acks set to "all"? if not, try setting it to all

I believe that error is for transactional producer, you may need to add this config:
transactional.id=TID-TEST
if still not working, please check your replicator factor and min insync isr for the internal topic: __transaction_state

Delete unused Kafka partition

I performed reassignment of a topic within my cluster. There were some problems throughout (running out of disk space on one of the target brokers), but I managed to fix it and the process completed succesfully.
It seems, however, that while one of the partitions was reassigned to another brokers, the data was not removed from the source broker's disk. And since the partion is quite big, I'd like it gone.
For obvious reasons I do not want to login to shell and rm -rf the directory. What would be the steps I could take to debug why the data was not deleted and then how to "encourage" the cluster to performe a clean up?
For a while thought that the retention policy might kick in and delete the data, but it's set to run every 10 minutes and it's been over a day since the reassignment has finished.
The replicas are as follows:
# bin/kafka-topics.sh -describe --zookeeper 1.2.3.4 --topic topic-name
Topic:topic-name PartitionCount:4 ReplicationFactor:3 Configs:retention.ms=3153600000000,compression.type=lz4
Topic: topic-name Partition: 0 Leader: 1014 Replicas: 1014,1012,1002 Isr: 1012,1002,1014
Topic: topic-name Partition: 1 Leader: 1007 Replicas: 1007,1006,1003 Isr: 1006,1007,1003 <--- this is the partition
Topic: topic-name Partition: 2 Leader: 1013 Replicas: 1013,1008,1001 Isr: 1013,1008,1001
Topic: topic-name Partition: 3 Leader: 1011 Replicas: 1011,1016,1010 Isr: 1010,1011,1016
And here we can see that the broker 1008 holds two partitions: 2 (it should) and 1 (it should not, this we need gone).
/data_disk_0/kafka-logs# cat meta.properties | grep broker.id
broker.id=1008
/data_disk_0/kafka-logs# du -h --max-depth=1 . | grep topic-name-1
295G ./topic-name-2
292G ./topic-name-1
edit: What's curious, all files in the topic directory (/data_disk_0/kafka-logs/topic-name-1/*) are opened by Kafka (checked with lsof). I don't know whether it's a default behaviour for Kafka to read all files in its data dir regardless of their status or it means that these files are still being somehow used.

It is not possible to delete partitions from a topic in Kafka. A partition is a file, Kafka assigns data to each partition depending on the Key. Let's say the partition 2 has data for the Key AAA, but the AAA Key isn't longer produced then you might see partition 2 not used.
Take a look to this video:
https://developer.confluent.io/learn-kafka/apache-kafka/partitions/#:~:text=Kafka%20Partitioning&text=Partitioning%20takes%20the%20single%20topic,many%20nodes%20in%20the%20cluster.
The only way is to delete the topic and create it again with the correct number of partitions

Kafka per topic retention.bytes and global log.retention.bytes not working

We are running a 6 node cluster of kafka 0.11.0. We have set a global as well as a per-topic retention in bytes, neither of which is being applied. There are no errors that I can see in the logs, just nothing being deleted (by size; the time retention does seem to be working)
See relevant configs below:
./config/server.properties :
# global retention 75GB or 60 days, segment size 512MB
log.retention.bytes=75000000000
log.retention.check.interval.ms=60000
log.retention.hours=1440
log.cleanup.policy=delete
log.segment.bytes=536870912
topic configuration (30GB):
[tstumpges#kafka-02 kafka]$ bin/kafka-topics.sh --zookeeper zk-01:2181/kafka --describe --topic stg_logtopic
Topic:stg_logtopic PartitionCount:12 ReplicationFactor:3 Configs:retention.bytes=30000000000
Topic: stg_logtopic Partition: 0 Leader: 4 Replicas: 4,5,6 Isr: 4,5,6
Topic: stg_logtopic Partition: 1 Leader: 5 Replicas: 5,6,1 Isr: 5,1,6
...
And, disk usage showing 910GB usage for one partition!
[tstumpges#kafka-02 kafka]$ sudo du -s -h /data1/kafka-data/*
82G /data1/kafka-data/stg_logother3-2
155G /data1/kafka-data/stg_logother2-9
169G /data1/kafka-data/stg_logother1-6
910G /data1/kafka-data/stg_logtopic-4
I can see there are plenty of segment log files (512MB each) in the partition directory... what is going on?!
Thanks in advance,
Thunder

Found the answer to this via the kafka user mailing list. We were apparently hitting kafka bug KAFKA-6030 (Integer overflow in log cleaner cleanable ratio computation)
Upgrading to v1.0.0 has fixed this for us!

Creating a Kafka topic results in no leader

I am using Kafka v0.9.0.1 (Scala v2.11) and the com.101tec:zkclient v0.7. I am trying to use AdminUtils to create a kafka topic. My code is the following.
String zkServers = "node1:2181,node2:2181,node3:2181,node4:2181";
Integer sessionTimeout = (int)TimeUnit.SECONDS.toMillis(10L);
Integer connectionTimeout = (int)TimeUnit.SECONDS.toMillis(8L);
ZkSerializer zkSerializer = ZKStringSerializer$.MODULE$;
Boolean isSecureKafkaCluster = false;
String topic = "test";
Integer partitions = 1;
Integer replication = 3;
ZkClient zkClient = new ZkClient(zkServers, sessionTimeout, connectionTimeout, zkSerializer);
ZkUtils zkUtils = new ZkUtils(zkClient, new ZkConnection(zkServers), isSecureKafkaCluster)
if(!AdminUtils.topicExists(zkUtils, topic)) {
AdminUtils.createTopic(zkUtils, topic, partitions, replications, new Properties());
}
The topic is actually created as verified by the following command.
bin/kafka-topics.sh --describe --zookeeper node1:2181 --topic test
However, the output is not as expected.
Topic:test PartitionCount:1 ReplicationFactor:1 Configs:
Topic: test Partition: 0 Leader: -1 Replicas: 4 Isr:
If I use the script.
bin/kafka-topics.sh --create --zookeeper node1:2181 --replication-factor 3 --partitions 1 --topic topic1
Then I see the following.
Topic:test1 PartitionCount:1 ReplicationFactor:3 Configs:
Topic: test1 Partition: 0 Leader: 2 Replicas: 2,3,4 Isr: 2
Any ideas on what I'm doing wrong? The effect is that if I use a Producer to send a ProducerRecord to the topic, nothing shows up on the topic.

I had the same issue.
Solution:
Clean zk meta info (/brokers/topic)
Clean all /data dir to remove all topic-partition folders belongs to that topic
Restart the whole kafka cluster all brokers at once.
Recreate that topic.
This solved my problem. And I think the root cause was the defect from kafka itself failing to handle clean removal topics (this has been fixed since v1.0.0).
Edit:
even with Kafka(>= v1.0.0), sometimes deleting topic will stuck if you are deleting an empty topic or if your kafka cluster is under extreme load.
solution would be as simple as restarting the controller broker. (you can always find the controller broker under ZK: /controller by get /controller). so just restarting one broker instead of the whole kafka cluster.

Apache kafka : broker leader -1 (topic received from Orion via Cygnus)

I'm working with Apache Kafka and receiving topics from Orion Context Broker via Cygnus (Fiware Labs)
I'm receiving 10 topics, and I can see data arriving in the consumer console for 8 topics.
But for 2 others topics, I cannot see any data arriving. And there is no error code (the consumer is just empty). If i try to add a test line to the topic via the producer console, i get this error:
ERROR Error when sending message to topic sensors_presence2_sensors with key: null, value: 4 bytes with error: Batch Expired (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
So I used the describe command and I get this :
Topic:sensors_presence2_sensors PartitionCount:1 ReplicationFactor:1 Configs:
Topic: sensors_presence2_sensors Partition: 0 Leader: -1 Replicas: 2 Isr:
I'm just starting with Kafka, so for the moment I have 1 broker(0) and no partition. But why is my leader -1 ? This broker do not even exist. How can I change that ? I didn't choose the configuration for my topic, they arrived automatically from Cygnus (Orion Context Broker) with a OrionKafkaSink.
An example of one of the 8 topics that works well :
Topic:sensors_presence1_sensors PartitionCount:1 ReplicationFactor:1 Configs:
Topic:sensors_presence1_sensors Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Thanks
Edit : In Cygnus logs, it shows that data is correctly send to kafka :
time=2016-03-02T11:07:09.504UTC | lvl=INFO | trans=1456915468-194-0000000039 | srv=egmmqtt | subsrv=egmmqttpath | function=persistAggregation | comp=Cygnus | msg=com.telefonica.iot.cygnus.sinks.OrionKafkaSink[279] : [kafka-sink] Persisting data at OrionKafkaSink. Topic (sensors_presence2_sensors), Data (...

The result of describe command is showing Replicas:2 and Isr: (empty) that means broker with id 2 was active at the time of creation of that topic and the same broker(id=2) is not active now. Because of that Isr( In sync replicas) showing empty.
There is no chance of getting Replicas: 2 when you have only one node (broker-id=0) kafka cluster. Make broker-2 up and everything will works well.
Hope this helps!

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse