Kafka consumer with new API not working - apache-kafka

I found something very weird with Kafka.
I have a producer with 3 brokers :
bin/kafka-console-producer.sh --broker-list localhost:9093, localhost:9094, localhost:9095 --topic topic
Then I try to run a consumer with the new API :
bin/kafka-console-consumer.sh --bootstrap-server localhost:9093,localhost:9094,localhost:9095 --topic topic --from-beginning
I got nothing ! BUT if I use the old API :
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic topic
I got my messages !
What is wrong with me ?
PS : I am using Kafka 10

I eventually resolved my problem thanks to this similar post : Kafka bootstrap-servers vs zookeeper in kafka-console-consumer
I believe it is a bug / wrong configuration of mine leading to a problem with zookeeper and kafka.
SOLUTION :
First be sure to have enable topic deleting in server.properties files of your brokers :
# Switch to enable topic deletion or not, default value is false
delete.topic.enable=true
Then delete the topic :
bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic myTopic
Remove all the /tmp/log.dir directories of your brokers.
EDIT : I faced again the problem and I had to remove also the log files of zookeeper in /tmp/zookeeper/version-2/.
Finally delete the topic in /brokers/topics in zookeeper as follow :
$ kafka/bin/zookeeper-shell.sh localhost:2181
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is disabled
rmr /broker/topics/mytopic
And restart your brokers and create your topic again.

After fighting a while with same problem. Specify --partition and console consumer new API works (but hangs..). I have CDH 5.12 + Kafka 0.11 (from parcel).
UPD:
Also find out that Kafka 0.11 (versioned as 3.0.0 in CDH parclel) does not work correctly with consuming messages. After downgrading to Kafka 0.10 it become OK. --partition does not need any more.

I had the same problem, but I was using a single broker instance for my "cluster", and I was getting this error:
/var/log/messages
[2018-04-04 22:29:39,854] ERROR [KafkaApi-20] Number of alive brokers '1' does not meet the required replication factor '3' for the offsets topic (configured via 'offsets.topic.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)
I just added in my server configuration file the setting offsets.topic.replication.factor=1 and restarted. It started to work fine.

Related

kafka + Query the detailed configuration of all Topics

we have kafka linux server with kafka version 2.6
I want the details about the Query of all Topics configuration
I am using bootstrap-server flag
here my approach ( 172.23.248.85 is Kafka IP )
kafka-configs.sh --describe --bootstrap-server 172.23.248.85:6667 --entity-type topics
Dynamic configs for topic ggt_opl are:
Error while executing config command with args '--describe --bootstrap-server 172.23.248.85:6667 --entity-type topics'
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.UnsupportedVersionException: The broker does not support DESCRIBE_CONFIGS
at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
at kafka.admin.ConfigCommand$.getResourceConfig(ConfigCommand.scala:543)
at kafka.admin.ConfigCommand$.$anonfun$describeResourceConfig$4(ConfigCommand.scala:504)
at kafka.admin.ConfigCommand$.$anonfun$describeResourceConfig$4$adapted(ConfigCommand.scala:496)
at scala.collection.immutable.List.foreach(List.scala:333)
at kafka.admin.ConfigCommand$.describeResourceConfig(ConfigCommand.scala:496)
at kafka.admin.ConfigCommand$.describeConfig(ConfigCommand.scala:480)
at kafka.admin.ConfigCommand$.processCommand(ConfigCommand.scala:303)
at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:96)
at kafka.admin.ConfigCommand.main(ConfigCommand.scala)
Caused by: org.apache.kafka.common.errors.UnsupportedVersionException: The broker does not support DESCRIBE_CONFIGS
where I am wrong ?
other way that is working on my Kafka is:
kafka-topics.sh --zookeeper zoo_server:2181 --describe
but I want to know if we can get the described details with --bootstrap-server
we also try:
kafka-topics.sh --bootstrap-server=172.23.248.85:6667 --describe
Error while executing topic command : The broker does not support DESCRIBE_CONFIGS
[2021-09-02 09:35:34,054] ERROR org.apache.kafka.common.errors.UnsupportedVersionException: The broker does not support DESCRIBE_CONFIGS
(kafka.admin.TopicCommand$)
but on other way the following option is working:
kafka-topics.sh --bootstrap-server=172.23.248.85:6667 --list
so I not understand why ( --list ) works but ( --describe ) isn't ?
--bootstrap-servers + --describe uses the AdminClient class, which your brokers need to be a certain version to support, as the error says.
Related KIP-133 (implemented in Kafka 0.11, according to JIRA)

kafka consumer group id does not work as expected

I am new people on apache Kafka. When I go through quick start instruction via http://kafka.apache.org/quickstart with latest version kafka_2.12-2.2.0. I got a problem and can't figure it out by myself.
The issue is, on my laptop, I created 3 brokers to simulate cluster situation.
Each broker has its owned server property file. I made below change for each server property file and leave other default value as what it is.
broker.id=1 (server2: broker.id=2; server3: broker.id=3)
listeners=PLAINTEXT://127.0.0.1:9092 (server2: 127.0.0.1:9023; server3: 127.0.0.1:9004)
log.dirs=/tmp/kafka-logs (server2: /tmp/kafka-logs-2; server3: /tmp/kafka-logs-3)
num.partitions=3 (for all servers)
offsets.topic.replication.factor=3 (for all servers)
After I started ZK and those 3 brokers, I (can) create a topic 'TestTopic' with 3 partitions on any broker
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 3 --topic TestTopic
And then I use below command to start 3 consumers in the same group 'rickygroup'.
//consumer one
bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --from-beginning --topic TestTopic —group.id rickygroup —group.name rickygroup
//consumer two
bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9093 --from-beginning --topic TestTopic —group.id rickygroup —group.name rickygroup
//consumer three
bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9094 --from-beginning --topic TestTopic —group.id rickygroup —group.name rickygroup
Now, I use another terminal to publish some messages on Topic 'TestTopic'. The issue is, all of the above 3 consumers will get all and exactly the same messages. My understanding is 3 consumers should consume all messages indifference instead of the same. Otherwise, the consumer group shows repeated consuming instead of balance consuming.
Is there any misunderstanding on consumer group concept by me? or anything I did wrong here?
The console consumer uses --group (with two dashes), not -group.id and/or -group.name, which are not parsed options.

Kafka Multi Node setup "Unreasonable length" in Zookeeper logs

I have setup a multi node setup for kafka, everything seems to work well and show no error logs unless i try to push message to one producer. I get a message:
Bootstrap broker host2:2181 disconnected (org.apache.kafka.clients.NetworkClient)
and on the zookeeper logs i am getting:
"WARN Exception causing close of session 0x0 due to java.io.IOException:
Unreasonable length = 1701969920 (org.apache.zookeeper.server.NIOServerCnxn)"
i cleaned up my data directory which is "/var/zookeeper/data" still no luck.
Any help on the the would be much appriciated
Vaibhav looking at this line (Bootstrap broker host2:2181) looks like you are trying to connect to zookeeper instance rather than broker instance. By Default Kafka broker runs on 9092 port. So producer and consumer should be created as per below command
Producer :
bin/kafka-console-producer.sh --broker-list host1:9092,host2:9092 \
--topic "topic_name"
Consumer:
bin/kafka-console-consumer.sh --bootstrap-server <host_ip_of_producer>:9092 \
--topic "topic_name" --from-beginning

Consumer not receiving messages, kafka console, new consumer api, Kafka 0.9

I am doing the Kafka Quickstart for Kafka 0.9.0.0.
I have zookeeper listening at localhost:2181 because I ran
bin/zookeeper-server-start.sh config/zookeeper.properties
I have a single broker listening at localhost:9092 because I ran
bin/kafka-server-start.sh config/server.properties
I have a producer posting to topic "test" because I ran
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
yello
is this thing on?
let's try another
gimme more
When I run the old API consumer, it works by running
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
However, when I run the new API consumer, I don't get anything when I run
bin/kafka-console-consumer.sh --new-consumer --topic test --from-beginning \
--bootstrap-server localhost:9092
Is it possible to subscribe to a topic from the console consumer using the new api? How can I fix it?
I my MAC box I was facing the same issue of console-consumer not consuming any messages when used the command
kafka-console-consumer --bootstrap-server localhost:9095 --from-beginning --topic my-replicated-topic
But when I tried with
kafka-console-consumer --bootstrap-server localhost:9095 --from-beginning --topic my-replicated-topic --partition 0
It happily lists the messages sent. Is this a bug in Kafka 1.10.11?
I just ran into this issue and the solution was to delete /brokers in zookeeper and restart the kafka nodes.
bin/zookeeper-shell <zk-host>:2181
and then
rmr /brokers
Not sure why this solves it.
When I enabled debug logging, I saw this error message over and over again in the consumer:
2017-07-07 01:20:12 DEBUG AbstractCoordinator:548 - Sending GroupCoordinator request for group test to broker xx.xx.xx.xx:9092 (id: 1007 rack: null)
2017-07-07 01:20:12 DEBUG AbstractCoordinator:559 - Received GroupCoordinator response ClientResponse(receivedTimeMs=1499390412231, latencyMs=84, disconnected=false, requestHeader={api_key=10,api_version=0,correlation_id=13,client_id=consumer-1}, responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) for group test
2017-07-07 01:20:12 DEBUG AbstractCoordinator:581 - Group coordinator lookup for group test failed: The group coordinator is not available.
2017-07-07 01:20:12 DEBUG AbstractCoordinator:215 - Coordinator discovery failed for group test, refreshing metadata
For me the solution described in this thread worked - https://stackoverflow.com/a/51540528/7568227
Check if
offsets.topic.replication.factor
(or probably other config parameters related to replication)
is not higher than the number of brokers. That was the problem in my case.
There was no need to use --partition 0 anymore after this fix.
Otherwise I recommend to follow the debugging procedure described in the mentioned thread.
In my case, this doesn't work
kafka-console-consumer --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic
and this works
kafka-console-consumer --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic --partition 0
because the topic __consumer_offsets was located on the unaccessible broker. Basically, I'd forgotten to replicate it. Relocating __consumer_offsets solved my issue.
Was getting the same issue on my Mac.
I checked the logs and found the following error.
Number of alive brokers '1' does not meet the required replication factor '3' for the offsets topic (configured via 'offsets.topic.replication.factor').
This error can be ignored if the cluster is starting up and not all brokers are up yet.
This can be fixed by changing the replication factor to 1. Add the following line in server.properties and restart Kafka/Zookeeper.
offsets.topic.replication.factor=1
I got the same problem, now I have figured out.
When you use --zookeeper, it is supposed to be provided with zookeeper address as parameter.
When you use --bootstrap-server, it is supposed to be provided with broker address as parameter.
Your localhost is the foo here.
if you replace the localhost word for the actual hostname, it should work.
like this:
producer
./bin/kafka-console-producer.sh --broker-list \
sandbox-hdp.hortonworks.com:9092 --topic test
consumer:
./bin/kafka-console-consumer.sh --topic test --from-beginning \
--bootstrap-server bin/kafka-console-consumer.sh --new-consumer \
--topic test --from-beginning \
--bootstrap-server localhost:9092
This problem also impacts ingesting data from the kafka using flume and sink the data to HDFS.
To fix the above issue:
Stop Kafka brokers
Connect to zookeeper cluster and remove /brokers z node
Restart kafka brokers
There is no issue with respect to kafka client version and scala version that we are using the cluster. Zookeeper might have wrong information about broker hosts.
To verify the action:
Create topic in kafka.
$ kafka-console-consumer --bootstrap-server slavenode01.cdh.com:9092 --topic rkkrishnaa3210 --from-beginning
Open a producer channel and feed some messages to it.
$ kafka-console-producer --broker-list slavenode03.cdh.com:9092 --topic rkkrishnaa3210
Open a consumer channel to consume the message from a specific topic.
$ kafka-console-consumer --bootstrap-server slavenode01.cdh.com:9092 --topic rkkrishnaa3210 --from-beginning
To test this from flume:
Flume agent config:
rk.sources = source1
rk.channels = channel1
rk.sinks = sink1
rk.sources.source1.type = org.apache.flume.source.kafka.KafkaSource
rk.sources.source1.zookeeperConnect = ip-20-0-21-161.ec2.internal:2181
rk.sources.source1.topic = rkkrishnaa321
rk.sources.source1.groupId = flume1
rk.sources.source1.channels = channel1
rk.sources.source1.interceptors = i1
rk.sources.source1.interceptors.i1.type = timestamp
rk.sources.source1.kafka.consumer.timeout.ms = 100
rk.channels.channel1.type = memory
rk.channels.channel1.capacity = 10000
rk.channels.channel1.transactionCapacity = 1000
rk.sinks.sink1.type = hdfs
rk.sinks.sink1.hdfs.path = /user/ce_rk/kafka/%{topic}/%y-%m-%d
rk.sinks.sink1.hdfs.rollInterval = 5
rk.sinks.sink1.hdfs.rollSize = 0
rk.sinks.sink1.hdfs.rollCount = 0
rk.sinks.sink1.hdfs.fileType = DataStream
rk.sinks.sink1.channel = channel1
Run flume agent:
flume-ng agent --conf . -f flume.conf -Dflume.root.logger=DEBUG,console -n rk
Observe logs from the consumer that the message from the topic is written in HDFS.
18/02/16 05:21:14 INFO internals.AbstractCoordinator: Successfully joined group flume1 with generation 1
18/02/16 05:21:14 INFO internals.ConsumerCoordinator: Setting newly assigned partitions [rkkrishnaa3210-0] for group flume1
18/02/16 05:21:14 INFO kafka.SourceRebalanceListener: topic rkkrishnaa3210 - partition 0 assigned.
18/02/16 05:21:14 INFO kafka.KafkaSource: Kafka source source1 started.
18/02/16 05:21:14 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: source1: Successfully registered new MBean.
18/02/16 05:21:14 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: source1 started
18/02/16 05:21:41 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
18/02/16 05:21:42 INFO hdfs.BucketWriter: Creating /user/ce_rk/kafka/rkkrishnaa3210/18-02-16/FlumeData.1518758501920.tmp
18/02/16 05:21:48 INFO hdfs.BucketWriter: Closing /user/ce_rk/kafka/rkkrishnaa3210/18-02-16/FlumeData.1518758501920.tmp
18/02/16 05:21:48 INFO hdfs.BucketWriter: Renaming /user/ce_rk/kafka/rkkrishnaa3210/18-02-16/FlumeData.1518758501920.tmp to /user/ce_rk/kafka/rkkrishnaa3210/18-02-16/FlumeData.1518758501920
18/02/16 05:21:48 INFO hdfs.HDFSEventSink: Writer callback called.
Use this:
$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
Note: Remove --new-consumer from your command
For reference see here: https://kafka.apache.org/quickstart
Can you please try like this:
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic my-replicated-topic
In my case it didn't worked using either approaches then I also increased the log level to DEBUG at config/log4j.properties, started the console consumer
./bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --from-beginning --topic MY_TOPIC
Then got the log below
[2018-03-11 12:11:25,711] DEBUG [MetadataCache brokerId=10] Error while fetching metadata for MY_TOPIC-3: leader not available (kafka.server.MetadataCache)
The point here is that I have two kafka nodes but one is down, by some reason by default kafka-console consumer will not consume if there is some partition not available because the node is down (the partition 3 in that case). It doesn't happen in my application.
Possible solutions are
Startup the down brokers
Delete the topic and create it again that way all partitions will be placed at the online broker node
Run the below command from bin:
./kafka-console-consumer.sh --topic test --from-beginning --bootstrap-server localhost:9092
"test" is the topic name
I had this problem that consumer finished executing
in kafka_2.12-2.3.0.tgz.
Tried debugging but no logs were printed.
Try running fine with kafka_2.12-2.2.2
.Works fine.
And try running the zookeeper and kafka from the quickstart guide!
In my case, broker.id=1 in server.properties was problem.
This should be broker.id=0 when you use only one kafka server for development.
Don't forget remove all logs and restart zookeper and kafka
Remove /tmp/kafka-logs (defined in server.properties file)
Remove [your_kafka_home]/logs
Restart Zookeper and Kafka
In kafka_2.11-0.11.0.0 the zookeeper server is deprecated and and it is using bootstrap-server, and it will take broker ip address and port. If you give correct broker parameters you will be able to consume messages.
e.g. $ bin/kafka-console-consumer.sh --bootstrap-server :9093 --topic test --from-beginning
I'm using port 9093, for you it may vary.
regards.
replication factor must be at least 3
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic test

Is there a way to delete all the data from a topic or delete the topic before every run?

Is there a way to delete all the data from a topic or delete the topic before every run?
Can I modify the KafkaConfig.scala file to change the logRetentionHours property? Is there a way the messages gets deleted as soon as the consumer reads it?
I am using producers to fetch the data from somewhere and sending the data to a particular topic where a consumer consumes, can I delete all the data from that topic on every run? I want only new data every time in the topic. Is there a way to reinitialize the topic somehow?
As I mentioned here Purge Kafka Queue:
Tested in Kafka 0.8.2, for the quick-start example: First, Add one line to server.properties file under config folder:
delete.topic.enable=true
then, you can run this command:
bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test
Don't think it is supported yet. Take a look at this JIRA issue "Add delete topic support".
To delete manually:
Shutdown the cluster
Clean kafka log dir (specified by the log.dir attribute in kafka config file ) as well the zookeeper data
Restart the cluster
For any given topic what you can do is
Stop kafka
Clean kafka log specific to partition, kafka stores its log file in a format of "logDir/topic-partition" so for a topic named "MyTopic" the log for partition id 0 will be stored in /tmp/kafka-logs/MyTopic-0 where /tmp/kafka-logs is specified by the log.dir attribute
Restart kafka
This is NOT a good and recommended approach but it should work.
In the Kafka broker config file the log.retention.hours.per.topic attribute is used to define The number of hours to keep a log file before deleting it for some specific topic
Also, is there a way the messages gets deleted as soon as the consumer reads it?
From the Kafka Documentation :
The Kafka cluster retains all published messages—whether or not they have been consumed—for a configurable period of time. For example if the log retention is set to two days, then for the two days after a message is published it is available for consumption, after which it will be discarded to free up space. Kafka's performance is effectively constant with respect to data size so retaining lots of data is not a problem.
In fact the only metadata retained on a per-consumer basis is the position of the consumer in in the log, called the "offset". This offset is controlled by the consumer: normally a consumer will advance its offset linearly as it reads messages, but in fact the position is controlled by the consumer and it can consume messages in any order it likes. For example a consumer can reset to an older offset to reprocess.
For finding the start offset to read in Kafka 0.8 Simple Consumer example they say
Kafka includes two constants to help, kafka.api.OffsetRequest.EarliestTime() finds the beginning of the data in the logs and starts streaming from there, kafka.api.OffsetRequest.LatestTime() will only stream new messages.
You can also find the example code there for managing the offset at your consumer end.
public static long getLastOffset(SimpleConsumer consumer, String topic, int partition,
long whichTime, String clientName) {
TopicAndPartition topicAndPartition = new TopicAndPartition(topic, partition);
Map<TopicAndPartition, PartitionOffsetRequestInfo> requestInfo = new HashMap<TopicAndPartition, PartitionOffsetRequestInfo>();
requestInfo.put(topicAndPartition, new PartitionOffsetRequestInfo(whichTime, 1));
kafka.javaapi.OffsetRequest request = new kafka.javaapi.OffsetRequest(requestInfo, kafka.api.OffsetRequest.CurrentVersion(),clientName);
OffsetResponse response = consumer.getOffsetsBefore(request);
if (response.hasError()) {
System.out.println("Error fetching data Offset Data the Broker. Reason: " + response.errorCode(topic, partition) );
return 0;
}
long[] offsets = response.offsets(topic, partition);
return offsets[0];
}
Tested with kafka 0.10
1. stop zookeeper & Kafka server,
2. then go to 'kafka-logs' folder , there you will see list of kafka topic folders, delete folder with topic name
3. go to 'zookeeper-data' folder , delete data inside that.
4. start zookeeper & kafka server again.
Note : if you are deleting topic folder/s inside kafka-logs but not from zookeeper-data folder, then you will see topics are still there.
Below are scripts for emptying and deleting a Kafka topic assuming localhost as the zookeeper server and Kafka_Home is set to the install directory:
The script below will empty a topic by setting its retention time to 1 second and then removing the configuration:
#!/bin/bash
echo "Enter name of topic to empty:"
read topicName
/$Kafka_Home/bin/kafka-configs --zookeeper localhost:2181 --alter --entity-type topics --entity-name $topicName --add-config retention.ms=1000
sleep 5
/$Kafka_Home/bin/kafka-configs --zookeeper localhost:2181 --alter --entity-type topics --entity-name $topicName --delete-config retention.ms
To fully delete topics you must stop any applicable kafka broker(s) and remove it's directory(s) from the kafka log dir (default: /tmp/kafka-logs) and then run this script to remove the topic from zookeeper. To verify it's been deleted from zookeeper the output of ls /brokers/topics should no longer include the topic:
#!/bin/bash
echo "Enter name of topic to delete from zookeeper:"
read topicName
/$Kafka_Home/bin/zookeeper-shell localhost:2181 <<EOF
rmr /brokers/topics/$topicName
ls /brokers/topics
quit
EOF
As a dirty workaround, you can adjust per-topic runtime retention settings, e.g. bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my_topic --config retention.bytes=1 (retention.bytes=0 might also work)
After a short while kafka should free the space. Not sure if this has any implications compared to re-creating the topic.
ps. Better bring retention settings back, once kafka done with cleaning.
You can also use retention.ms to persist historical data
We tried pretty much what the other answers are describing with moderate level of success.
What really worked for us (Apache Kafka 0.8.1) is the class command
sh kafka-run-class.sh kafka.admin.DeleteTopicCommand --topic yourtopic --zookeeper localhost:2181
For brew users
If you're using brew like me and wasted a lot of time searching for the infamous kafka-logs folder, fear no more. (and please do let me know if that works for you and multiple different versions of Homebrew, Kafka etc :) )
You're probably going to find it under:
Location:
/usr/local/var/lib/kafka-logs
How to actually find that path
(this is also helpful for basically every app you install through brew)
1) brew services list
kafka started matbhz
/Users/matbhz/Library/LaunchAgents/homebrew.mxcl.kafka.plist
2) Open and read that plist you found above
3) Find the line defining server.properties location open it, in my case:
/usr/local/etc/kafka/server.properties
4) Look for the log.dirs line:
log.dirs=/usr/local/var/lib/kafka-logs
5) Go to that location and delete the logs for the topics you wish
6) Restart Kafka with brew services restart kafka
All data about topics and its partitions are stored in tmp/kafka-logs/. Moreover they are stored in a format topic-partionNumber, so if you want to delete a topic newTopic, you can:
stop kafka
delete the files rm -rf /tmp/kafka-logs/newTopic-*
As of kafka 2.3.0 version, there is an alternate way to soft deletion of Kafka (old approach are deprecated ).
Update retention.ms to 1 sec (1000ms) then set it again after a min, to default setting i.e 7 days (168 hours, 604,800,000 in ms )
Soft deletion:- (rentention.ms=1000) (using kafka-configs.sh)
bin/kafka-configs.sh --zookeeper 192.168.1.10:2181 --alter --entity-name kafka_topic3p3r --entity-type topics --add-config retention.ms=1000
Completed Updating config for entity: topic 'kafka_topic3p3r'.
Setting to default:- 7 days (168 hours , retention.ms= 604800000)
bin/kafka-configs.sh --zookeeper 192.168.1.10:2181 --alter --entity-name kafka_topic3p3r --entity-type topics --add-config retention.ms=604800000
Simplest way without restarting servers(I am using this with AWS MSK seamlessly):
cd kafka_2.12-2.6.2/bin
Topic Deletion:
Please replace $topic_name:
./kafka-topics.sh \
--bootstrap-server $kafka_bootstrap_servers \
--command-config client.properties \
--delete \
--topic $topic_name
Here is the client.properties file:
kafka_2.12-2.6.2/bin/client.properties
ssl.truststore.location=/usr/lib/jvm/java-11-openjdk-amd64/lib/security/cacerts
security.protocol=SASL_SSL
sasl.mechanism=AWS_MSK_IAM
sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;
sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
max.request.size=104857600
Topic Data Deletion:
Option A:
./kafka-delete-records.sh \
--bootstrap-server $kafka_bootstrap_servers \
--command-config client.properties \
--offset-json-file ./delete-records.json
This is most clean way to delete the data immediately rather than waiting for Kafka to do this as a background job. But there is one time extra effort on specifiying all the partitions for a particular topic in the delete JSON file.
Here is the delete-records.json content is:
{
"partitions": [
{
"topic": $topic_name,
"partition": 0,
"offset": -1
},
{
"topic": $topic_name,
"partition": 1,
"offset": -1
},
{
"topic": $topic_name,
"partition": 2,
"offset": -1
}
],
"version": 1
}
Option B:
Step1:
./kafka-configs.sh \
--bootstrap-server $kafka_bootstrap_servers \
--command-config client.properties
--alter \
--entity-type topics \
--add-config retention.ms=1 \
--entity-name $topic_name
Now, wait for couple of minutes to let Kafka delete the data from topic and now come back and revert to default 7 days data retention.
Step2:
./kafka-configs.sh \
--bootstrap-server $kafka_bootstrap_servers \
--command-config client.properties
--alter \
--entity-type topics \
--add-config retention.ms=604800000 \
--entity-name $topic_name
Stop ZooKeeper and Kafka
In server.properties, change log.retention.hours value. You can comment log.retention.hours and add log.retention.ms=1000. It would keep the record on Kafka Topic for only one second.
Start zookeeper and kafka.
Check on consumer console. When I opened the console for the first time, record was there. But when I opened the console again, the record was removed.
Later on, you can set the value of log.retention.hours to your desired figure.
I use the utility below to cleanup after my integration test run.
It uses the latest AdminZkClient api. The older api has been deprecated.
import javax.inject.Inject
import kafka.zk.{AdminZkClient, KafkaZkClient}
import org.apache.kafka.common.utils.Time
class ZookeeperUtils #Inject() (config: AppConfig) {
val testTopic = "users_1"
val zkHost = config.KafkaConfig.zkHost
val sessionTimeoutMs = 10 * 1000
val connectionTimeoutMs = 60 * 1000
val isSecure = false
val maxInFlightRequests = 10
val time: Time = Time.SYSTEM
def cleanupTopic(config: AppConfig) = {
val zkClient = KafkaZkClient.apply(zkHost, isSecure, sessionTimeoutMs, connectionTimeoutMs, maxInFlightRequests, time)
val zkUtils = new AdminZkClient(zkClient)
val pp = new Properties()
pp.setProperty("delete.retention.ms", "10")
pp.setProperty("file.delete.delay.ms", "1000")
zkUtils.changeTopicConfig(testTopic , pp)
// zkUtils.deleteTopic(testTopic)
println("Waiting for topic to be purged. Then reset to retain records for the run")
Thread.sleep(60000L)
val resetProps = new Properties()
resetProps.setProperty("delete.retention.ms", "3000000")
resetProps.setProperty("file.delete.delay.ms", "4000000")
zkUtils.changeTopicConfig(testTopic , resetProps)
}
}
There is an option delete topic. But, it marks the topic for deletion. Zookeeper later deletes the topic. Since this can be unpredictably long, I prefer the retention.ms approach
do:
cd /path/to/kafkaInstallation/kafka-server
bin/kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic name_of_kafka_topic
then you can recreate it using:
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic name_of_kafka_topic
In manually deleting a topic from a kafka cluster , you just might check this out https://github.com/darrenfu/bigdata/issues/6
A vital step missed a lot in most solution is in deleting the /config/topics/<topic_name> in ZK.
I use this script:
#!/bin/bash
topics=`kafka-topics --list --zookeeper zookeeper:2181`
for t in $topics; do
for p in retention.ms retention.bytes segment.ms segment.bytes; do
kafka-topics --zookeeper zookeeper:2181 --alter --topic $t --config ${p}=100
done
done
sleep 60
for t in $topics; do
for p in retention.ms retention.bytes segment.ms segment.bytes; do
kafka-topics --zookeeper zookeeper:2181 --alter --topic $t --delete-config ${p}
done
done
There are two solutions to clean up topics data
Change the zookeeper dataDir path "dataDir=/dataPath" to some other
value, delete kafka logs folder and restart zookeeper and kafka
server
Run zkCleanup.sh from zookeeper server