kafka how to delete topic with no leader - apache-kafka

I can't delete / reassign / make any change to topics without leader
reproduce:
Create a topic with ReplicationFactor=1
Shutdown the only one broker host
Use kafka-topic --delete to delete the topic
Delete process will never end (I waited for more than 6 month, and it starting to get hurt)
describe of topic
Topic:topic_73 PartitionCount:1 ReplicationFactor:1
Configs:unclean.leader.election.enable=true
Topic: topic_73 Partition: 0 Leader: -1 Replicas: 755 Isr:
broker 755 can never gone back
how can i fix this?

You can try this script to clear metadata from zookeeper and directly delete the kafka logs.
./clear.sh /path-to-kafka-logs sampletopic /path-to-kafka-bin-dir
clear.sh content is as below:
#!/bin/bash
# this script is for the situation that leader is set to -1 and there is no ISR
ZK_HOST=`hostname`
ROOT_DIR=$1
TOPIC=$2
KAFKA_LOG_DIR=$3
# make sure kafka service is stopped while running this
rm -rf ${KAFKA_LOG_DIR}/${TOPIC}*
${ROOT_DIR}/kafka/bin/kafka-topics.sh --zookeeper ${ZK_HOST}:2181 --topic ${TOPIC} --delete
${ROOT_DIR}/kafka/bin/zookeeper-shell.sh ${ZK_HOST}:2181 rmr /brokers/topics/${TOPIC}
${ROOT_DIR}/kafka/bin/zookeeper-shell.sh ${ZK_HOST}:2181 rmr /admin/delete_topics/${TOPIC}
make sure chmod +x clear.sh before using it.

Related

Altering kafka topic - deleting config results in "Invalid config(s): retention.bytes"

I'm trying to delete a topic-level configuration retention.bytes that was applied on my topic. But when I try to delete the config with the command as described in Kafka documentation I am getting the following message:
kafka-0:/opt/bitnami/kafka/bin$ kafka-configs.sh --bootstrap-server kafka:9092 --entity-type topics --entity-name foo --alter --delete-config retention.bytes
Invalid config(s): retention.bytes
I've already dug into Kafka's source code but the only thing it mentions is that it throws this error if "the command if any of the configs to be deleted does not exist". However when I describe my topic, I can see the config there:
kafka-0:/opt/bitnami/kafka/bin$ kafka-topics.sh --bootstrap-server kafka:9092 --describe --topic foo
Topic: foo PartitionCount: 1 ReplicationFactor: 3 Configs: cleanup.policy=compact,delete,flush.ms=1000,segment.bytes=1073741824,retention.ms=7776000000,flush.messages=10000,max.message.bytes=1000012,retention.bytes=1073741824
Topic: foo Partition: 0 Leader: 2 Replicas: 2,0,1 Isr: 2,1,0
So obviously it does exist...
Could anyone pinpoint what the problem could be here?
I needed to do the same thing today, looks like the same issue.
After some tinkering, I figured it out.
First, I was able to add the retention.bytes (I increased it 10 times compare to default 10737418240). After that, I was able to delete it, and it fell back to the default.
So, I think, you can only delete it if you override it yourself. Otherwise it does not have that config, and uses default value. Which means, you cannot make retention size to be unlimited.
Edit:
Overriding the default policy with -1 will enable unlimited retention (do so at your own risk).

Kafka configuration min.insync.replicas not working

Its my early days in learning kafka. And I am checking out every kafka property/concept in my local machine.
So I came across this property min.insync.replicas and here is my understanding. Please correct me if I've misunderstood anything.
Once a message is sent to a topic, the message must be written to at least min.insync.replicas number of followers.
min.insync.replicas also includes the leader.
If number of available live brokers( indirectly, in sync replicas ) are less than the specified min.insync.replicas , then producer will raise an exception failing to publish the message.
Following are the steps I followed to create the above scenario
Started 3 brokers in local with broker Ids 0, 1 and 2
created the topic insync and set min.insync.replicas to 2
using the following command
sudo ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic insync --config min.insync.replicas=2
Describe the topic resulted in the following
Topic:insync PartitionCount:1 ReplicationFactor:3 Configs:min.insync.replicas=2
Topic: insync Partition: 0 Leader: 2 Replicas: 2,0,1 Isr: 1,2,0
At this point, I made sure the property I've provided is picked by kafka
I started sending messages and consuming them from terminal using following command
Producer: ./kafka-console-producer.sh --broker-list localhost:9092 --topic insync --producer.config ../config/producer.properties
Consumer: ./kafka-console-consumer.sh --zookeeper localhost:2181 --topic insync
At this point, I was able to send and receive messages successfully.
Bought down 2 brokers (0 and 2) and described the topic and resulted in following
Topic:insync PartitionCount:1 ReplicationFactor:3 Configs:min.insync.replicas=2
Topic: insync Partition: 0 Leader: 1 Replicas: 2,0,1 Isr: 1
At this point, the In Sync Replicas are just 1(Isr: 1)
Then I tried to produce the message and it worked. I was able to send messages from console-producer and I could see those messages in console consumer.
My Kafka version: kafka_2.10-0.10.0.0
following are the producer properties:
bootstrap.servers=localhost:9092
compression.type=none
batch.size=20
acks=all
I expected the producer to fail with NotEnoughReplicasException as mentioned in this.
public class NotEnoughReplicasException
extends RetriableException
Number of insync replicas for the partition is lower than >min.insync.replicas
but it worked normally.
Am I missing something? How can I create the scenario?
*************** EDIT **********************
Instead of producing the messages from console producer, I tried to generate messages from java code. This time, I got the expected exception in the kafka broker. Although I expected it in the producer (java code). As this experiment is raising more questions, I've posted another question.
is acks set to "all"? if not, try setting it to all
I believe that error is for transactional producer, you may need to add this config:
transactional.id=TID-TEST
if still not working, please check your replicator factor and min insync isr for the internal topic: __transaction_state

Kafka: configuration by partition or by topic?

I created a topic in my kafka cluster with the following command.
/opt/kafka/bin/kafka-topics.sh --zookeeper kaf1:2181,kaf2:2181,kaf3:2181 --create --topic mytopic --partitions 10 --replication-factor 2 --config retention.bytes=1074000000 --config delete.retention.ms=6000 --config segment.bytes=105000000
So, if I understand correctly the documentation, I have a topic with 10 partitions replicate 2 times beetween my 3 kafka hosts.
Next, each kafka host must retain 1Go of data. Each segment has a size of 100Mo and all old logs will be delete after 1 minute.
Now, when I do a du -h on my logs directory on a kafka hosts, I have this:
1,2G ./mytopic-2
1,1G ./mytopic-8
1,2G ./mytopic-9
1,1G ./mytopic-6
1,1G ./mytopic-3
1,1G ./mytopic-0
1,2G ./mytopic-4
7,6G .
I thought get 1Go for the directory entirely and not for each partition.
So my question is simple, the topic configuration is for each partition or for the all topic ?
Thanks.
Please, see the picture below (distribution of partitions by the cluster nodes might be different):

Where kafka stores partitions for the topics?

I installed kafka on a linux server. I defined a topic with a few partitions. I know that each partition is mapped to a physical file on disk, but I don't know where it is.
Where are the partition files saved ?
In your config/server.properties you'll find a section on "Log Basics". The property log.dirs is defining where your logs/partitions will be stored on disk.
By default on Linux it is stored in /tmp/kafka-logs. If you will navigate to this folder you will see something like this:
recovery-point-offset-checkpoint
replication-offset-checkpoint
topic-0
msg-0
msg-1
Which means that you have two topics (topic which has 1 partition and msg which has 2).
As it was noted by Ludd, you can find the location inside config/server.properties file by looking for log.dirs.
Try running this command
bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic test
you will get output
Topic:test Partition: 0 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0
now try going to \config file
cat server.properties
and search for broker_id
if broker_id matches with leader number then topic partition is stored in that broker

Is there a way to delete all the data from a topic or delete the topic before every run?

Is there a way to delete all the data from a topic or delete the topic before every run?
Can I modify the KafkaConfig.scala file to change the logRetentionHours property? Is there a way the messages gets deleted as soon as the consumer reads it?
I am using producers to fetch the data from somewhere and sending the data to a particular topic where a consumer consumes, can I delete all the data from that topic on every run? I want only new data every time in the topic. Is there a way to reinitialize the topic somehow?
As I mentioned here Purge Kafka Queue:
Tested in Kafka 0.8.2, for the quick-start example: First, Add one line to server.properties file under config folder:
delete.topic.enable=true
then, you can run this command:
bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test
Don't think it is supported yet. Take a look at this JIRA issue "Add delete topic support".
To delete manually:
Shutdown the cluster
Clean kafka log dir (specified by the log.dir attribute in kafka config file ) as well the zookeeper data
Restart the cluster
For any given topic what you can do is
Stop kafka
Clean kafka log specific to partition, kafka stores its log file in a format of "logDir/topic-partition" so for a topic named "MyTopic" the log for partition id 0 will be stored in /tmp/kafka-logs/MyTopic-0 where /tmp/kafka-logs is specified by the log.dir attribute
Restart kafka
This is NOT a good and recommended approach but it should work.
In the Kafka broker config file the log.retention.hours.per.topic attribute is used to define The number of hours to keep a log file before deleting it for some specific topic
Also, is there a way the messages gets deleted as soon as the consumer reads it?
From the Kafka Documentation :
The Kafka cluster retains all published messages—whether or not they have been consumed—for a configurable period of time. For example if the log retention is set to two days, then for the two days after a message is published it is available for consumption, after which it will be discarded to free up space. Kafka's performance is effectively constant with respect to data size so retaining lots of data is not a problem.
In fact the only metadata retained on a per-consumer basis is the position of the consumer in in the log, called the "offset". This offset is controlled by the consumer: normally a consumer will advance its offset linearly as it reads messages, but in fact the position is controlled by the consumer and it can consume messages in any order it likes. For example a consumer can reset to an older offset to reprocess.
For finding the start offset to read in Kafka 0.8 Simple Consumer example they say
Kafka includes two constants to help, kafka.api.OffsetRequest.EarliestTime() finds the beginning of the data in the logs and starts streaming from there, kafka.api.OffsetRequest.LatestTime() will only stream new messages.
You can also find the example code there for managing the offset at your consumer end.
public static long getLastOffset(SimpleConsumer consumer, String topic, int partition,
long whichTime, String clientName) {
TopicAndPartition topicAndPartition = new TopicAndPartition(topic, partition);
Map<TopicAndPartition, PartitionOffsetRequestInfo> requestInfo = new HashMap<TopicAndPartition, PartitionOffsetRequestInfo>();
requestInfo.put(topicAndPartition, new PartitionOffsetRequestInfo(whichTime, 1));
kafka.javaapi.OffsetRequest request = new kafka.javaapi.OffsetRequest(requestInfo, kafka.api.OffsetRequest.CurrentVersion(),clientName);
OffsetResponse response = consumer.getOffsetsBefore(request);
if (response.hasError()) {
System.out.println("Error fetching data Offset Data the Broker. Reason: " + response.errorCode(topic, partition) );
return 0;
}
long[] offsets = response.offsets(topic, partition);
return offsets[0];
}
Tested with kafka 0.10
1. stop zookeeper & Kafka server,
2. then go to 'kafka-logs' folder , there you will see list of kafka topic folders, delete folder with topic name
3. go to 'zookeeper-data' folder , delete data inside that.
4. start zookeeper & kafka server again.
Note : if you are deleting topic folder/s inside kafka-logs but not from zookeeper-data folder, then you will see topics are still there.
Below are scripts for emptying and deleting a Kafka topic assuming localhost as the zookeeper server and Kafka_Home is set to the install directory:
The script below will empty a topic by setting its retention time to 1 second and then removing the configuration:
#!/bin/bash
echo "Enter name of topic to empty:"
read topicName
/$Kafka_Home/bin/kafka-configs --zookeeper localhost:2181 --alter --entity-type topics --entity-name $topicName --add-config retention.ms=1000
sleep 5
/$Kafka_Home/bin/kafka-configs --zookeeper localhost:2181 --alter --entity-type topics --entity-name $topicName --delete-config retention.ms
To fully delete topics you must stop any applicable kafka broker(s) and remove it's directory(s) from the kafka log dir (default: /tmp/kafka-logs) and then run this script to remove the topic from zookeeper. To verify it's been deleted from zookeeper the output of ls /brokers/topics should no longer include the topic:
#!/bin/bash
echo "Enter name of topic to delete from zookeeper:"
read topicName
/$Kafka_Home/bin/zookeeper-shell localhost:2181 <<EOF
rmr /brokers/topics/$topicName
ls /brokers/topics
quit
EOF
As a dirty workaround, you can adjust per-topic runtime retention settings, e.g. bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my_topic --config retention.bytes=1 (retention.bytes=0 might also work)
After a short while kafka should free the space. Not sure if this has any implications compared to re-creating the topic.
ps. Better bring retention settings back, once kafka done with cleaning.
You can also use retention.ms to persist historical data
We tried pretty much what the other answers are describing with moderate level of success.
What really worked for us (Apache Kafka 0.8.1) is the class command
sh kafka-run-class.sh kafka.admin.DeleteTopicCommand --topic yourtopic --zookeeper localhost:2181
For brew users
If you're using brew like me and wasted a lot of time searching for the infamous kafka-logs folder, fear no more. (and please do let me know if that works for you and multiple different versions of Homebrew, Kafka etc :) )
You're probably going to find it under:
Location:
/usr/local/var/lib/kafka-logs
How to actually find that path
(this is also helpful for basically every app you install through brew)
1) brew services list
kafka started matbhz
/Users/matbhz/Library/LaunchAgents/homebrew.mxcl.kafka.plist
2) Open and read that plist you found above
3) Find the line defining server.properties location open it, in my case:
/usr/local/etc/kafka/server.properties
4) Look for the log.dirs line:
log.dirs=/usr/local/var/lib/kafka-logs
5) Go to that location and delete the logs for the topics you wish
6) Restart Kafka with brew services restart kafka
All data about topics and its partitions are stored in tmp/kafka-logs/. Moreover they are stored in a format topic-partionNumber, so if you want to delete a topic newTopic, you can:
stop kafka
delete the files rm -rf /tmp/kafka-logs/newTopic-*
As of kafka 2.3.0 version, there is an alternate way to soft deletion of Kafka (old approach are deprecated ).
Update retention.ms to 1 sec (1000ms) then set it again after a min, to default setting i.e 7 days (168 hours, 604,800,000 in ms )
Soft deletion:- (rentention.ms=1000) (using kafka-configs.sh)
bin/kafka-configs.sh --zookeeper 192.168.1.10:2181 --alter --entity-name kafka_topic3p3r --entity-type topics --add-config retention.ms=1000
Completed Updating config for entity: topic 'kafka_topic3p3r'.
Setting to default:- 7 days (168 hours , retention.ms= 604800000)
bin/kafka-configs.sh --zookeeper 192.168.1.10:2181 --alter --entity-name kafka_topic3p3r --entity-type topics --add-config retention.ms=604800000
Simplest way without restarting servers(I am using this with AWS MSK seamlessly):
cd kafka_2.12-2.6.2/bin
Topic Deletion:
Please replace $topic_name:
./kafka-topics.sh \
--bootstrap-server $kafka_bootstrap_servers \
--command-config client.properties \
--delete \
--topic $topic_name
Here is the client.properties file:
kafka_2.12-2.6.2/bin/client.properties
ssl.truststore.location=/usr/lib/jvm/java-11-openjdk-amd64/lib/security/cacerts
security.protocol=SASL_SSL
sasl.mechanism=AWS_MSK_IAM
sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;
sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
max.request.size=104857600
Topic Data Deletion:
Option A:
./kafka-delete-records.sh \
--bootstrap-server $kafka_bootstrap_servers \
--command-config client.properties \
--offset-json-file ./delete-records.json
This is most clean way to delete the data immediately rather than waiting for Kafka to do this as a background job. But there is one time extra effort on specifiying all the partitions for a particular topic in the delete JSON file.
Here is the delete-records.json content is:
{
"partitions": [
{
"topic": $topic_name,
"partition": 0,
"offset": -1
},
{
"topic": $topic_name,
"partition": 1,
"offset": -1
},
{
"topic": $topic_name,
"partition": 2,
"offset": -1
}
],
"version": 1
}
Option B:
Step1:
./kafka-configs.sh \
--bootstrap-server $kafka_bootstrap_servers \
--command-config client.properties
--alter \
--entity-type topics \
--add-config retention.ms=1 \
--entity-name $topic_name
Now, wait for couple of minutes to let Kafka delete the data from topic and now come back and revert to default 7 days data retention.
Step2:
./kafka-configs.sh \
--bootstrap-server $kafka_bootstrap_servers \
--command-config client.properties
--alter \
--entity-type topics \
--add-config retention.ms=604800000 \
--entity-name $topic_name
Stop ZooKeeper and Kafka
In server.properties, change log.retention.hours value. You can comment log.retention.hours and add log.retention.ms=1000. It would keep the record on Kafka Topic for only one second.
Start zookeeper and kafka.
Check on consumer console. When I opened the console for the first time, record was there. But when I opened the console again, the record was removed.
Later on, you can set the value of log.retention.hours to your desired figure.
I use the utility below to cleanup after my integration test run.
It uses the latest AdminZkClient api. The older api has been deprecated.
import javax.inject.Inject
import kafka.zk.{AdminZkClient, KafkaZkClient}
import org.apache.kafka.common.utils.Time
class ZookeeperUtils #Inject() (config: AppConfig) {
val testTopic = "users_1"
val zkHost = config.KafkaConfig.zkHost
val sessionTimeoutMs = 10 * 1000
val connectionTimeoutMs = 60 * 1000
val isSecure = false
val maxInFlightRequests = 10
val time: Time = Time.SYSTEM
def cleanupTopic(config: AppConfig) = {
val zkClient = KafkaZkClient.apply(zkHost, isSecure, sessionTimeoutMs, connectionTimeoutMs, maxInFlightRequests, time)
val zkUtils = new AdminZkClient(zkClient)
val pp = new Properties()
pp.setProperty("delete.retention.ms", "10")
pp.setProperty("file.delete.delay.ms", "1000")
zkUtils.changeTopicConfig(testTopic , pp)
// zkUtils.deleteTopic(testTopic)
println("Waiting for topic to be purged. Then reset to retain records for the run")
Thread.sleep(60000L)
val resetProps = new Properties()
resetProps.setProperty("delete.retention.ms", "3000000")
resetProps.setProperty("file.delete.delay.ms", "4000000")
zkUtils.changeTopicConfig(testTopic , resetProps)
}
}
There is an option delete topic. But, it marks the topic for deletion. Zookeeper later deletes the topic. Since this can be unpredictably long, I prefer the retention.ms approach
do:
cd /path/to/kafkaInstallation/kafka-server
bin/kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic name_of_kafka_topic
then you can recreate it using:
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic name_of_kafka_topic
In manually deleting a topic from a kafka cluster , you just might check this out https://github.com/darrenfu/bigdata/issues/6
A vital step missed a lot in most solution is in deleting the /config/topics/<topic_name> in ZK.
I use this script:
#!/bin/bash
topics=`kafka-topics --list --zookeeper zookeeper:2181`
for t in $topics; do
for p in retention.ms retention.bytes segment.ms segment.bytes; do
kafka-topics --zookeeper zookeeper:2181 --alter --topic $t --config ${p}=100
done
done
sleep 60
for t in $topics; do
for p in retention.ms retention.bytes segment.ms segment.bytes; do
kafka-topics --zookeeper zookeeper:2181 --alter --topic $t --delete-config ${p}
done
done
There are two solutions to clean up topics data
Change the zookeeper dataDir path "dataDir=/dataPath" to some other
value, delete kafka logs folder and restart zookeeper and kafka
server
Run zkCleanup.sh from zookeeper server