Clean Kafka topic in a cluster - apache-kafka

I know I can clean Kafka topic on a broker by either deleting logs under /data/kafka-logs/topic/* or by setting retention.ms config to 1000. I want to know how can clean topics in a multi-node cluster. Should I stop Kafka process on each broker, delete logs and start Kafka or only leader broker would suffice? If I want to clean by setting retension.ms to 1000, do I need to set it on each broker?

To delete all messages in a specific topic, you can run kafka-delete-records.sh
For example, I have a topic called test, which has 4 partitions.
Create a Json file , for example j.json:
{
"partitions": [
{
"topic": "test",
"partition": 0,
"offset": -1
}, {
"topic": "test",
"partition": 1,
"offset": -1
}, {
"topic": "test",
"partition": 2,
"offset": -1
}, {
"topic": "test",
"partition": 3,
"offset": -1
}
],
"version": 1
}
now delete all messages by this command :
/opt/kafka/confluent-4.1.1/bin/kafdelete-records --bootstrap-server 192.168.XX.XX:9092 --offset-json-file j.json
After executing the command, this message will be displayed
Records delete operation completed:
partition: test-0 low_watermark: 7
partition: test-1 low_watermark: 7
partition: test-2 low_watermark: 7
partition: test-3 low_watermark: 7
if you want to delete one topic, you can use kafka-topics :
for example, I want to delete test topic :
/opt/kafka/confluent-4.0.0/bin/kafka-topics --zookeeper 109.XXX.XX.XX:2181 --delete --topic test
You do not need to restart Kafka

Related

Kafka console consumer to read avro messages in HDP 3

Trying to consume kafka Avro messages from console consumer and not exactly sure how to deserialize the messages.
sh /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --bootstrap-server localhost:6667 --topic test --consumer.config /home/user/kafka.consumer.properties --from-beginning --value-deserializer ByteArrayDeserializer
The Avro Schema in Schema Registry for the test topic is:
{
"type": "record",
"namespace": "test",
"name": "TestRecord",
"fields": [
{
"name": "Name",
"type": "string",
"default": "null"
},
{
"name": "Age",
"type": "int",
"default": -1
}
]
}
Using HDP 3.1 version and Kafka-clients-2.0.0.3.1.0.0-78
Could someone help me what would be the Deserializer required to read Avro messages from console.
Use kafka-avro-console-consumer
e.g.
sh /usr/hdp/current/kafka-broker/bin/kafka-avro-console-consumer.sh \
--bootstrap-server localhost:6667 \
--topic test \
--from-beginning \
--property schema.registry.url=http://localhost:8081

Kafka rebalance the data in a topic due to slow(er) consumer

For an example, say I have a topic with 4 partitions. I send 4k messages to this topic. Each partition gets 1k messages. Due to outside factors, 3 of the consumers process all 1k of their messages respectively. However, the 4th partition was only able to get through 200 messages, leaving 800 messages left to process. Is there a mechanism to allow me to "rebalance" the data in the topic to say give partition 1-3 200 of partition 4s data leaving all partitions with 200 messages a piece of process?
I am not looking for a way adding additional nodes to the consumer group and have kafka balance the partitions.
Added output from reassign partitions:
Current partition replica assignment
{
"version": 1,
"partitions": [
{
"topic": "MyTopic",
"partition": 0,
"replicas": [
0
],
"log_\ndirs": [
"any"
]
},
{
"topic": "MyTopic",
"partition": 1,
"replicas": [
0
],
"log_dirs": [
"any"
]
},
{
"topic": "MyTopic",
"partition": 4,
"replicas": [
0
],
"log_dirs": [
"any"
]
},
{
"topic": "MyTopic",
"partition": 3,
"replicas": [
0
],
"log_dirs": [
"any"
]
},
{
"topic": "MyTopic",
"p\nartition": 2,
"replicas": [
0
],
"log_dirs": [
"any"
]
},
{
"topic": "MyTopic",
"partition": 5,
"replicas": [
0
],
"log_dirs": [
"any"
]
}
]
}
Proposed partition reassignment configuration
{
"version": 1,
"partitions": [
{
"topic": "MyTopic",
"partition": 3,
"replicas": [
0
],
"log_ dirs": [
"any"
]
},
{
"topic": "MyTopic",
"partition": 0,
"replicas": [
0
],
"log_dirs": [
"any"
]
},
{
"topic": "MyTopic",
"partition": 5,
"replicas": [
0
],
"log_dirs": [
"any"
]
},
{
"topic": "MyTopic",
"partition": 2,
"replicas": [
0
],
"log_dirs": [
"any"
]
},
{
"topic": "MyTopic",
"p artition": 4,
"replicas": [
0
],
"log_dirs": [
"any"
]
},
{
"topic": "MyTopic",
"partition": 1,
"replicas": [
0
],
"log_dirs": [
"any"
]
}
]
}
The partition is assigned when a message is produced. They are never automatically moved between partitions. In general, for each partition there can be multiple consumers (with different consumer group id) consuming at different paces so the broker can't move the messages between partitions based on the slowness of a consumer (group). There are a few things you can try though:
more partitions, hoping for a fairer distribution of load (you can have more partitions than consumers)
have producers explicitly set the partition on each message to produce a distribution between partitions that the consumers can better cope with
have consumers monitor their lag and actively unsubscribe from partitions when they fall behind so as to let other consumers pick up the load.
Couple of things which you can do to improve the performance
Increase number of partitions
Increase the consumer groups which are consuming the partitions.
The first will rebalance the load on your partitions and the second will increase the parallelism on your partitions to consume messages quickly.
I hope this helps. You can refer to this link for more understanding
https://xyu.io/2016/02/29/balancing-kafka-on-jbod/
Kafka consumers are part of consumer groups. A group has one or more consumers in it. Each partition gets assigned to one consumer.
If you have more consumers than partitions, then some of your consumers will be idle. If you have more partitions than consumers, more than one partition may get assigned to a single consumer.
Whenever a new consumer joins, a rebalance gets initiated and the new consumer is assigned some partitions previously assigned to other consumers.
For example, if there are 20 partitions all being consumed by one consumer, and another consumer joins, there'll be a rebalance.
During rebalance, the consumer group "pauses".

Kafka consumer not able to consume messages using bootstrap server name

I am facing an issue while consuming message using the bootstrap-server i.e. Kafka server. Any idea why is it not able to consume messages without zookeeper?
Kafka Version: kafka_2.11-1.0.0
Zookeeper Version: kafka_2.11-1.0.0
Zookeeper Host and port: zkp02.mp.com:2181
Kafka Host and port: kfk03.mp.com:9092
Producing some message:
[kfk03.mp.com ~]$ /bnsf/kafka/bin/kafka-console-producer.sh --broker-list kfk03.mp.com:9092 --topic test
>hi
>hi
Consumer not able to consume messages if I give –-bootstrap-server:
[kfk03.mp.com ~]$
/bnsf/kafka/bin/kafka-console-consumer.sh --bootstrap-server kfk03.mp.com:9092 --topic test --from-beginning
Consumer able to consume messages when --zookeeper server is given instead of --bootstrap-server -:
[kfk03.mp.com ~]$ /bnsf/kafka/bin/kafka-console-consumer.sh --zookeeper zkp02.mp.com:2181 --topic test --from-beginning
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
hi
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
{"properties": {"messageType": "test", "sentDateTime": "2018-02-25T21:46:00.000+0000"}, "name": "Uttam Anand", "age": 29}
hi
hi
uttam
hi
hi
hi
hello
hi
^CProcessed a total of 17 messages
While consuming messages from kafka using bootstrap-server parameter, the connection happens via the kafka server instead of zookeeper. Kafka broker stores offset details in __consumer_offsets topic.
Check if __consumer_offsets is present in your topics list. If it's not there, check kafka logs to find the reason.
We faced a similar issue. In our case the __consumer_offsets was not created because of the following error:
ERROR [KafkaApi-1001] Number of alive brokers '1' does not meet the required replication factor '3' for the offsets topic (configured via 'offsets.topic.replication.factor').

How do I delete/clean Kafka queued messages without deleting Topic

Is there any way to delete queue messages without deleting Kafka topics?
I want to delete queue messages when activating the consumer.
I know there are several ways like:
Resetting retention time
$ ./bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic MyTopic --config retention.ms=1000
Deleting kafka files
$ rm -rf /data/kafka-logs/<topic/Partition_name>
In 0.11 or higher you can run the bin/kafka-delete-records.sh command to mark messages for deletion.
https://github.com/apache/kafka/blob/trunk/bin/kafka-delete-records.sh
For example, publish 100 messages
seq 100 | ./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic mytest
then delete 90 of those 100 messages with the new kafka-delete-records.sh
command line tool
./bin/kafka-delete-records.sh --bootstrap-server localhost:9092 --offset-json-file ./offsetfile.json
where offsetfile.json contains
{"partitions": [{"topic": "mytest", "partition": 0, "offset": 90}], "version":1 }
and then consume the messages from the beginning to verify that 90 of the 100 messages are indeed marked as deleted.
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic mytest --from-beginning
91
92
93
94
95
96
97
98
99
100
To delete all messages in a specific topic, you can run kafka-delete-records.sh
For example, I have a topic called test, which has 4 partitions.
Create a Json file , for example j.json:
{
"partitions": [
{
"topic": "test",
"partition": 0,
"offset": -1
}, {
"topic": "test",
"partition": 1,
"offset": -1
}, {
"topic": "test",
"partition": 2,
"offset": -1
}, {
"topic": "test",
"partition": 3,
"offset": -1
}
],
"version": 1
}
now delete all messages by this command :
/opt/kafka/confluent-4.1.1/bin/kafdelete-records --bootstrap-server 192.168.XX.XX:9092 --offset-json-file j.json
After executing the command, this message will be displayed
Records delete operation completed:
partition: test-0 low_watermark: 7
partition: test-1 low_watermark: 7
partition: test-2 low_watermark: 7
partition: test-3 low_watermark: 7

Increasing Replication Factor in Kafka gives error - "There is an existing assignment running"

I am trying to increase the replication factor of a topic in Apache Kafka.In order to do so I am using the command
kafka-reassign-partitions --zookeeper ${zookeeperid} --reassignment-json-file ${aFile} --execute
Initially my topic has a replication factor of 1 and has 5 partitions, I am trying to increase it's replication factor to 3.There are quite a bit of messages in my topic. When I run the above command the error is - "There is an existing assignment running".
My json file looks like this :
{
"version": 1,
"partitions": [
{
"topic": "IncreaseReplicationTopic",
"partition": 0,
"replicas": [2,4,0]
},{
"topic": "IncreaseReplicationTopic",
"partition": 1,
"replicas": [3,2,1]
}, {
"topic": "IncreaseReplicationTopic",
"partition": 2,
"replicas": [4,1,0]
}, {
"topic": "IncreaseReplicationTopic",
"partition": 3,
"replicas": [0,1,3]
}, {
"topic": "IncreaseReplicationTopic",
"partition": 4,
"replicas": [1,4,2]
}
]
}
I am not able to figure out where I am getting wrong. Any pointers will be greatly appreciated.
This message means that there is already another assignment of any topic is being executed.
Please try it again after some time. Then you won't see this message