Change kafka topic retention from infinite to finite - apache-kafka

I have kafka topic configured with cleanup.policy=compact and retention.ms=-1 - so infinite retention.
What I want to do is change it, I want the events to be compacted time to time. When I create new topic with following configuration it works fine:
cleanup.policy=compact
config retention.ms=5000
min.cleanable.dirty.ratio=0
segment.ms=1000
When I produce events to such topic it is compacted as expected.
Problem is when I have topic with retention set to infinite (-1), and change it's config by kafka-configs --alter. Using kafka-topics --describe I can verify that the changes were successful, but the topic is not compacted. How can I "trigger" those config changes? I also tried restarting the broker and zookeeper, but it didn't do the trick.
I'm using kafka 2.7.0

Related

Kafka topics not created empty

I have a Kafka cluster consisting on 3 servers all connected through Zookeeper. But when I delete a topic that has some information and create the topic again with the same name, the offset does not start from zero.
I tried restarting both Kafka and Zookeeper and deleting the topics directly from Zookeeper.
What I expect is to have a clean topic When I create it again.
I found the problem. A consumer was consuming from the topic and the topic was never actually deleted. I used this tool to have a GUI that allowed me to see the topics easily https://github.com/tchiotludo/kafkahq. Anyway, the consumers can be seen running this:
bin/kafka-consumer-groups.sh --list --bootstrap-server localhost:9092

Kafka Consumer does not receive data when one of the brokers is down

Kafka Quickstart
Using Kafka v2.1.0 on RHEL v6.9
Consumer fails to receive data when one of the Kafka brokers is down.
Steps performed:
1. Start zookeeper
2. Start Kafka-Server0 (localhost:9092, kafkalogs1)
3. Start Kafka-Server1 (localhost:9094, kafkalog2)
4. Create topic "test1", num of partitions = 1, replication factor = 2
5. Run producer for topic "test1"
6. Run consumer
7. Send messages from the producer
8. Receive messages on the consumer side.
All the above steps worked without any issues.
When I shutdown Kafka-Server0, the consumer stops getting data from Producer.
When I bring back up Kafka-Server0, the consumer starts to get messages from where it left off.
These are the commands used
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test1
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test1
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 1 --topic test1
The behavior is the same (no message received on the consumer side) when I run the consumer with two servers specified in the --bootstrap-server option.
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092,localhost:9094 --topic test1
Any idea why the consumer stops getting messages when server0 is down even though the replication factor for the topic test1 was set to 2?
There is a similar question already but it was not answered completely
Kafka 0.10 quickstart: consumer fails when "primary" broker is brought down
If the offsets topic is unavailable, you cannot consume.
Look at the server.properties file for these, and see the comment above, and increase accordingly (only applies if topic doesn't already exist)
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
According to your previous question, looks like it only has one replica
See how you can increase replication factor for an existing topic
In initial versions of Kafka, offset was being managed at zookeeper, but Kafka has continuously evolved over the time introducing lot of new features. Now Kafka manages the offset in a topic __consumer_offsets.
You can think of a scenario where you created a topic with a replication factor of 1. In case the broker goes down the data is only on that Kafka node which is down. So you can't get this data. Same analogy applies to __consumer_offsets topic.
You need to revisit the server.properties in order to get features you are expecting. But in case you still wanna consume the messages from the replica partition, you may need to re-start the console consumer with --from-beginning true

Checking Offset of Kafka topic for a storm consumer

I am using storm-kafka-client 1.2.1 and creating my spout config for KafkaTridentSpoutOpaque as below
kafkaSpoutConfig = KafkaSpoutConfig.builder(brokerURL, kafkaTopic)
.setProp(ConsumerConfig.GROUP_ID_CONFIG,"storm-kafka-group")
.setProcessingGuarantee(ProcessingGuarantee.AT_MOST_ONCE)
.setProp(ConsumerConfig.CLIENT_ID_CONFIG,InetAddress.getLocalHost().getHostName())
I am unable to find neither my group-id nor the offset in both Kafka and Zookeeper. Through Zookeeper I tried with zkCli.sh and tried ls /consumers but there were none as I think Kafka itself is now maintaining offsets rather than zookeeper.
I tried with Kafka too with the command below
bin/kafka-run-class.sh kafka.admin.ConsumerGroupCommand --list --bootstrap-server localhost:9092
Note: This will not show information about old Zookeeper-based consumers.
console-consumer-20130
console-consumer-82696
console-consumer-6106
console-consumer-67393
console-consumer-14333
console-consumer-21174
console-consumer-64550
Can someone help me how I can find my offset and will it replay my events in Kafka again if I restart the topology ?
Trident doesn't store offsets in Kafka, but in Storm's Zookeeper. If you're running with default settings for Storm's Zookeeper config the path in Storm's Zookeeper will be something like /coordinator/<your-topology-id>/meta.
The objects below that path will contain the first and last offset, as well as topic partition for each batch. So e.g. /coordinator/<your-topology-id>/meta/15 would contain the first and last offset emitted in batch number 15.
Whether the spout replays offsets after restart is controlled by the FirstPollOffsetStrategy you set in the KafkaSpoutConfig. The default is UNCOMMITTED_EARLIEST, which does not start over on restart. See the Javadoc at https://github.com/apache/storm/blob/v1.2.1/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java#L126.

Kafka topic is getting reappeared after 10 sec of deletion

I'm facing issue with kafka topic deletion.
Using kafka rest API's to create/delete topic and for producing & consuming messages. I have tried to delete the topic where the topic will be deleted, but after some time say 10 sec, topic gets reappeared.
Have checked the Consumer Group offsets and LAG is listed as negative.
docker run --net=host --rm confluentinc/cp-kafka:3.1.0 kafka-consumer-groups --zookeeper localhost:2181 --describe --group grp1
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER
grp1 topic1 0 3 0 -3 none
It seems deleting a kafka topic is still has some bugs.
* The only way to delete a topic permanently is as follows: *
stop the brokers
remove the directories on disk
sudo rm -rf kafka_data_dir/topic_name
remove the topic from zookeeper:
bin/zkCli.sh - to start zookeeper shell
rmr /config/topics/topic_name
rmr /brokers/topics/topic_name
rmr /admin/delete_topics/topic_name
You might have enabled the auto.create.topics.enable property which automatically creates a topic if any Producer [or] Consumer issues a request to Kafka broker with send / subscribe / assign request.
Once you have deleted the topic, either Consumer / Producer issued a request to the broker which in-turn created a new topic (with the same name).
You can disable the property(default: true) and re-test your setup. Use AdminUtils to create the topic. Topic deletion is much improved in the latest version of Kafka.

Kafka topic is marked for deletion but not getting deleted in kafka 0.9

I am trying to delete my kafka topic which following command.
bin/kafka-topics.sh --zookeeper <zkserver>:2181 --delete --topic test1
My kafka version is 0.9 and I have also set delete.topic.enable flag to true. Still when I fire above command my topic is only marked for deletion not actually getting deleted.
logic topic are composed of multiple partition, and each partition may have multiple copy. In a word, your topic are physically distributed in multiple instance.
If any instance is down, your topic deletion will not able to finish.
There was an orphan producer process running on that topic which was spawned by my java Kafka producer program. That I eventually came to know when I started a console consumer on the same topic. After manually killing that process I was able to delete the topic.