I performed reassignment of a topic within my cluster. There were some problems throughout (running out of disk space on one of the target brokers), but I managed to fix it and the process completed succesfully.
It seems, however, that while one of the partitions was reassigned to another brokers, the data was not removed from the source broker's disk. And since the partion is quite big, I'd like it gone.
For obvious reasons I do not want to login to shell and rm -rf the directory. What would be the steps I could take to debug why the data was not deleted and then how to "encourage" the cluster to performe a clean up?
For a while thought that the retention policy might kick in and delete the data, but it's set to run every 10 minutes and it's been over a day since the reassignment has finished.
The replicas are as follows:
# bin/kafka-topics.sh -describe --zookeeper 1.2.3.4 --topic topic-name
Topic:topic-name PartitionCount:4 ReplicationFactor:3 Configs:retention.ms=3153600000000,compression.type=lz4
Topic: topic-name Partition: 0 Leader: 1014 Replicas: 1014,1012,1002 Isr: 1012,1002,1014
Topic: topic-name Partition: 1 Leader: 1007 Replicas: 1007,1006,1003 Isr: 1006,1007,1003 <--- this is the partition
Topic: topic-name Partition: 2 Leader: 1013 Replicas: 1013,1008,1001 Isr: 1013,1008,1001
Topic: topic-name Partition: 3 Leader: 1011 Replicas: 1011,1016,1010 Isr: 1010,1011,1016
And here we can see that the broker 1008 holds two partitions: 2 (it should) and 1 (it should not, this we need gone).
/data_disk_0/kafka-logs# cat meta.properties | grep broker.id
broker.id=1008
/data_disk_0/kafka-logs# du -h --max-depth=1 . | grep topic-name-1
295G ./topic-name-2
292G ./topic-name-1
edit: What's curious, all files in the topic directory (/data_disk_0/kafka-logs/topic-name-1/*) are opened by Kafka (checked with lsof). I don't know whether it's a default behaviour for Kafka to read all files in its data dir regardless of their status or it means that these files are still being somehow used.
It is not possible to delete partitions from a topic in Kafka. A partition is a file, Kafka assigns data to each partition depending on the Key. Let's say the partition 2 has data for the Key AAA, but the AAA Key isn't longer produced then you might see partition 2 not used.
Take a look to this video:
https://developer.confluent.io/learn-kafka/apache-kafka/partitions/#:~:text=Kafka%20Partitioning&text=Partitioning%20takes%20the%20single%20topic,many%20nodes%20in%20the%20cluster.
The only way is to delete the topic and create it again with the correct number of partitions
Related
The problem
Hi, I am trying to visualize Kafka lags using Grafana. I have been trying to log kafka lags using Metricbeat and doing the math myself since Metricbeat does not support logging Kafka lags in the version that I am using (but it has been implemented recently). Instead of using max(partition.offset.newest) - max(consumergroup.offset) to calculate the lags, I am using sum(partition.offset.newest) - sum(consumergroup.offset) filtered on a particular kafka.topic.name. However, the sum does not tally, upon further investigation, I found out that the count does not even tally! The count for partition offsets is 30 per 10s while the count for consumergroup offsets is 12 per 10s. I expect the count for both to be the same
I do not understand why Metricbeat logs the partition more than the consumergroup. At first I thought it was because of my Metricbeat configuration where I have 2 host groups defined, which might caused it to be logged multiple times. However, after changing my configurations, the count just droppped by half.
TL;DR
Why is the Metricbeat counts of partition and consumergroup different?
Setup
Kafka 2 brokers
Kafka topic partitions:
Topic: xxx PartitionCount:3 ReplicationFactor:2 Configs:
Topic: xxx Partition: 0 Leader: 2 Replicas: 2,1 Isr: 2,1
Topic: xxx Partition: 1 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: xxx Partition: 2 Leader: 2 Replicas: 2,1 Isr: 2,1
Metricbeat config (modules.d/kafka.yml):
- module: kafka
#metricsets:
# - partition
# - consumergroup
period: 10s
hosts: ["xxx.yyy:9092"]
Versions
Kafka 2.11-0.11.0.0
Elasticsearch-7.2.0
Kibana-7.2.0
Metricbeats-7.2.0
after much debugging I have figured out what is wrong:
For some reason, my kafka broker 1 has only producer metric and no consumer metric, connecting to broker 2 solved this problem. Connecting both brokers will add both metrics together.
Lucene uses fuzzy search so my data has some other consumer groups inside as well. For exact word matching, use kafka.partition.topic.keyword: 'xxx' instead. This made the ratio of my kafka producer offset to consumer offset 2:1
metricbeat logs the replicas as well, so I need to set NOT kafka.partition.partition.is_leader: false to get all partition leaders. This made the consumer to partition ratio 1:1.
After the 3 steps is done, I can use the formula sum(partition.offset.newest) - sum(consumergroup.offset) to get the lags
However, I do not know why broker 1 doesn't have the consumer information.
We are running a 6 node cluster of kafka 0.11.0. We have set a global as well as a per-topic retention in bytes, neither of which is being applied. There are no errors that I can see in the logs, just nothing being deleted (by size; the time retention does seem to be working)
See relevant configs below:
./config/server.properties :
# global retention 75GB or 60 days, segment size 512MB
log.retention.bytes=75000000000
log.retention.check.interval.ms=60000
log.retention.hours=1440
log.cleanup.policy=delete
log.segment.bytes=536870912
topic configuration (30GB):
[tstumpges#kafka-02 kafka]$ bin/kafka-topics.sh --zookeeper zk-01:2181/kafka --describe --topic stg_logtopic
Topic:stg_logtopic PartitionCount:12 ReplicationFactor:3 Configs:retention.bytes=30000000000
Topic: stg_logtopic Partition: 0 Leader: 4 Replicas: 4,5,6 Isr: 4,5,6
Topic: stg_logtopic Partition: 1 Leader: 5 Replicas: 5,6,1 Isr: 5,1,6
...
And, disk usage showing 910GB usage for one partition!
[tstumpges#kafka-02 kafka]$ sudo du -s -h /data1/kafka-data/*
82G /data1/kafka-data/stg_logother3-2
155G /data1/kafka-data/stg_logother2-9
169G /data1/kafka-data/stg_logother1-6
910G /data1/kafka-data/stg_logtopic-4
I can see there are plenty of segment log files (512MB each) in the partition directory... what is going on?!
Thanks in advance,
Thunder
Found the answer to this via the kafka user mailing list. We were apparently hitting kafka bug KAFKA-6030 (Integer overflow in log cleaner cleanable ratio computation)
Upgrading to v1.0.0 has fixed this for us!
I have a partition with the following Replicas:
Topic: topicname Partition: 10 Leader: 1 Replicas: 1,2,4,3 Isr: 1,2,3
Where Replica 4 is a non-existent broker. I accidentally added this broker into the replica set as a typo.
I want to remove 4 from the Replica set. but after running kafka-reassign-partitions.sh, the reassignment to remove Replica #4 never finishes.
kafka-reassign-partitions.sh --zookeeper myzookeeperhost:2181 --reassignment-json-file remove4.txt --execute
Where remove4.txt looks like
{ "partitions": [
{ "topic": "topicname", "partition": 2, "replicas": [1,2,3] }
], "version": 1 }
The reassignment is stuck:
kafka-reassign-partitions.sh --zookeeper myzookeeperhost:2181 --reassignment-json-file remove4.txt --verify
Status of partition reassignment:
Reassignment of partition [topicname,10] is still in progress
I checked the controller log, it looks like the reassignment command was picked up, but nothing happens afterwards:
[2017-08-01 06:46:07,653] DEBUG [PartitionsReassignedListener on 101 (the controller broker)]: Partitions reassigned listener fired for path /admin/reassign_partitions. Record partitions to be reassigned {"version":1,"partitions":[{"topic":"topicname","partition":10,"replicas":[1,2,3]}]} (kafka.controller.PartitionsReassignedListener)
Any ideas on what I'm doing wrong? How to I remove broker #4 from the replica set?
update: I'm running kafka 10
I was able to solve this issue by spinning up a new broker with a broker-id matching the one that was added (in your case 4).
The Kafka Quickstart guide shows you how to spin up a broker with a specific id. Once you have your node up with id 4, run:
./bin/kafka-topics.sh --zookeeper localhost:2181 --topic badbrokertest --describe
You should see that all the replicas are in the isr column like so:
Topic:badbrokertest PartitionCount:3 ReplicationFactor:3 Configs:
Topic: badbrokertest Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
Topic: badbrokertest Partition: 1 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
Topic: badbrokertest Partition: 2 Leader: 1 Replicas: 1,2,3,4 Isr: 1,2,3,4
Now you can reassign your partitions!
./bin/kafka-reassign-partitions.sh --reassignment-json-file badbroker2.json --zookeeper localhost:2181
Where badbroker2.json looks like:
{
"version":1,
"partitions":[
{"topic":"badbrokertest","partition":0,"replicas":[1,2,3]},
{"topic":"badbrokertest","partition":1,"replicas":[1,2,3]},
{"topic":"badbrokertest","partition":2,"replicas":[1,2,3]}
]
}
So in short, once you've sync'd all your replicas by adding the missing broker you can remove the unneeded broker.
If you're working with several servers, be sure to set the listeners field in the config to make your temporary broker available to the others brokers. The Quickstart guide doesn't consider that case.
listeners=PLAINTEXT://10.9.1.42:9093
I'm working with Apache Kafka and receiving topics from Orion Context Broker via Cygnus (Fiware Labs)
I'm receiving 10 topics, and I can see data arriving in the consumer console for 8 topics.
But for 2 others topics, I cannot see any data arriving. And there is no error code (the consumer is just empty). If i try to add a test line to the topic via the producer console, i get this error:
ERROR Error when sending message to topic sensors_presence2_sensors with key: null, value: 4 bytes with error: Batch Expired (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
So I used the describe command and I get this :
Topic:sensors_presence2_sensors PartitionCount:1 ReplicationFactor:1 Configs:
Topic: sensors_presence2_sensors Partition: 0 Leader: -1 Replicas: 2 Isr:
I'm just starting with Kafka, so for the moment I have 1 broker(0) and no partition. But why is my leader -1 ? This broker do not even exist. How can I change that ? I didn't choose the configuration for my topic, they arrived automatically from Cygnus (Orion Context Broker) with a OrionKafkaSink.
An example of one of the 8 topics that works well :
Topic:sensors_presence1_sensors PartitionCount:1 ReplicationFactor:1 Configs:
Topic:sensors_presence1_sensors Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Thanks
Edit : In Cygnus logs, it shows that data is correctly send to kafka :
time=2016-03-02T11:07:09.504UTC | lvl=INFO | trans=1456915468-194-0000000039 | srv=egmmqtt | subsrv=egmmqttpath | function=persistAggregation | comp=Cygnus | msg=com.telefonica.iot.cygnus.sinks.OrionKafkaSink[279] : [kafka-sink] Persisting data at OrionKafkaSink. Topic (sensors_presence2_sensors), Data (...
The result of describe command is showing Replicas:2 and Isr: (empty) that means broker with id 2 was active at the time of creation of that topic and the same broker(id=2) is not active now. Because of that Isr( In sync replicas) showing empty.
There is no chance of getting Replicas: 2 when you have only one node (broker-id=0) kafka cluster. Make broker-2 up and everything will works well.
Hope this helps!
I installed kafka on a linux server. I defined a topic with a few partitions. I know that each partition is mapped to a physical file on disk, but I don't know where it is.
Where are the partition files saved ?
In your config/server.properties you'll find a section on "Log Basics". The property log.dirs is defining where your logs/partitions will be stored on disk.
By default on Linux it is stored in /tmp/kafka-logs. If you will navigate to this folder you will see something like this:
recovery-point-offset-checkpoint
replication-offset-checkpoint
topic-0
msg-0
msg-1
Which means that you have two topics (topic which has 1 partition and msg which has 2).
As it was noted by Ludd, you can find the location inside config/server.properties file by looking for log.dirs.
Try running this command
bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic test
you will get output
Topic:test Partition: 0 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0
now try going to \config file
cat server.properties
and search for broker_id
if broker_id matches with leader number then topic partition is stored in that broker