Kafka topics not created empty - apache-kafka

I have a Kafka cluster consisting on 3 servers all connected through Zookeeper. But when I delete a topic that has some information and create the topic again with the same name, the offset does not start from zero.
I tried restarting both Kafka and Zookeeper and deleting the topics directly from Zookeeper.
What I expect is to have a clean topic When I create it again.

I found the problem. A consumer was consuming from the topic and the topic was never actually deleted. I used this tool to have a GUI that allowed me to see the topics easily https://github.com/tchiotludo/kafkahq. Anyway, the consumers can be seen running this:
bin/kafka-consumer-groups.sh --list --bootstrap-server localhost:9092

Related

MM2.0 consumer group behavior

I'm trying to run some tests to understand MM2 behavior. As part of that I had the following questions:
How to correctly pass a custom consumer group for MM2 in mm2.properties?
Based on this question, tried passing <alias>.group.id=temp_cons_group in mm2.properties and on restarting the MM2 instance could see the consumer group mentioned in the MM2 logs.
However, when I try listing consumer groups registered in the source broker, the group doesn't show up?
How to test if the property <alias>.consumer.auto.offset.reset works?
Here, I want to consume the same messages again so in reference to the question, tried setting <source_alias>.consumer.auto.offset.reset to earliest and restarted MM2.
I was able to see the property set correctly in MM2 logs but did not get the messages from the beginning in the target cluster topic.
How do I start a MM2 instance to start consuming messages from a specific offset for a topic present in the source cluster?
MirrorMaker does not use a consumer group to run and instead uses the assign() API, so it's expected that you don't see a group.
It's hard to "test". One way to verify this configuration was picked up is to check it's present in the logs when MirrorMaker starts its consumers.
This is currently not trivial to do. There's a KIP in progress to improve the process but at the moment it requires manually updating the internal offset topic from your Connect instance. At a very high level, here's the process:
First, ensure MirrorMaker is not running. Then you need to find the offset records for MirrorMaker in the offsets topic using a command like:
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 \
--topic <CONNECT_OFFSET_TOPIC \
--from-beginning \
--property print.key=true | grep <SOURCE_CONNECTOR_NAME>
You will see records with offsets for each partition MirrorMaker handles. To update the offsets, you need to produce new records to this topic with the offsets you want. For each partition, ensure your record has the same key as the existing message so it replaces the existing stored offsets.

How to check that Kafka does rebalance?

I'm writing a Go service that works with Kafka. I have a problems with bad commits when broker rebalances. I want to do an experiment forcing Kafka to rebalance and to see how the service behaves.
What I do:
running Kafka in Docker locally (broker, zookeeper, schema registry and control center)
created a topic with 2 partition
running producer that sends messages to both partitions
Then I'm running two consumers with the same groupID, after that I'm closing one. It seems to me that broker should start rebalancing this moment. Or no? Who's logs should I check for it?
You can check that by running the following commands:
bin/kafka-consumer-groups --bootstrap-server host:9092 --list
and to describe:
bin/kafka-consumer-groups --bootstrap-server host:9092 --describe --group foo
Full documentation could be found here: Kafka consumer group
Who's logs should I check for it?
The running consumer's log should be checked, depending on if the library you're using actually logs such information.

Best way to run multiple Kafka console consumer?

I write Kafka consumed messages on a file (backup.log). To do this, I created a service on my CentOS which run kafka-console-consumer.sh --bootstrap-server kafka1:9092 --topic test --consumer.config consumer.properties >> /backup/backup.log
I have two Linux host.
Each Linux host one Kafka consumer and each consumer belong to a consumer group. So I have two consumer group.
This way, If a consumer is down, the other consumer has all the messages as he belongs to another consumer group.
Currently, I have only one topic with four partitions.
Question: I have to consume multiples topics. I read that it's recommended to add consumers (parallelize) instead of using only one consumer for too many topics/partitions.
-> What is the best way to achieve this? Duplicate the service which runs kafka-console-consumer.sh? Or another way?

Kafka Topic creation upon specific broker

Regarding Kafka topic creation. I have an understanding that Kafka cluster can have several brokers/nodes/servers. Each broker can have one or more topics configured. A topic created could be in one or more brokers depending on partitions provided during topic creation. Is there any way how one can tell in which broker/s should a topic and it's partitions be created?
Regards,
Lokesh
When creating the topic, you can either just specify the number of partitions and replicas and let Kafka distribute them. Or you can directly specify the assignment - which partition and replica goes where.
If you are using the kafka-topics.sh script which is part of kafka, you can use the option --replica-assignment for it. For example:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --topic topic1 --replica-assignment 0:1:2,0:1:2,0:1:2
If the topic already exists, you can use the kafka-reassign-partitions.sh tool to change the assignment.
This might contain some more details about is: https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-2.2CreateTopics

Kafka topic is marked for deletion but not getting deleted in kafka 0.9

I am trying to delete my kafka topic which following command.
bin/kafka-topics.sh --zookeeper <zkserver>:2181 --delete --topic test1
My kafka version is 0.9 and I have also set delete.topic.enable flag to true. Still when I fire above command my topic is only marked for deletion not actually getting deleted.
logic topic are composed of multiple partition, and each partition may have multiple copy. In a word, your topic are physically distributed in multiple instance.
If any instance is down, your topic deletion will not able to finish.
There was an orphan producer process running on that topic which was spawned by my java Kafka producer program. That I eventually came to know when I started a console consumer on the same topic. After manually killing that process I was able to delete the topic.