Data deleted from kafka topics - apache-kafka

The topics that I had started yesterday are not having any data. The topics appear in the list command but all data is lost. The data was produced and saved in the topic. However, after a period that I dont know, the data is gone. It should be noted that I did consume the data once.
How?
What settings should I change?

If you are trying to consumer using the same consumer group and that group's offsets are not expired yet then you will not get anything back.
If you are not consuming as part of the same consumer group, perhaps topic data is expired and deleted?
You can check the start and end offset of topic; e.g.
bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic test-orders --time -2
bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic test-orders --time -1
If reported start and end offsets match there is no data in the topic. In that case, you can take a look at retention.ms at topic or broker level. Increasing the value of this config would keep the records longer at the expense of using additional disk space.

Related

delete specific messages from kafka topic __consumer_offsets

I want to delete all messages that are contained in the __consumer_offsets table that start with a given key (resetting one particular consumer group without affecting the rest).
Is there a way to do this?
Kafka comes with a ConsumerGroupCommand tool. You cand find some information in the Kafka documentation.
If you plan to reset a particular Consumer Group ("myConsumerGroup") without affecting the rest you can use
> bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --reset-offsets --group myConsumerGroup --topic topic1 --to-latest
Depending on your requirement you can reset the offsets for each partition of the topic with that tool. The help function or documentation explain the options.

Kafka Consumer does not receive data when one of the brokers is down

Kafka Quickstart
Using Kafka v2.1.0 on RHEL v6.9
Consumer fails to receive data when one of the Kafka brokers is down.
Steps performed:
1. Start zookeeper
2. Start Kafka-Server0 (localhost:9092, kafkalogs1)
3. Start Kafka-Server1 (localhost:9094, kafkalog2)
4. Create topic "test1", num of partitions = 1, replication factor = 2
5. Run producer for topic "test1"
6. Run consumer
7. Send messages from the producer
8. Receive messages on the consumer side.
All the above steps worked without any issues.
When I shutdown Kafka-Server0, the consumer stops getting data from Producer.
When I bring back up Kafka-Server0, the consumer starts to get messages from where it left off.
These are the commands used
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test1
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test1
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 1 --topic test1
The behavior is the same (no message received on the consumer side) when I run the consumer with two servers specified in the --bootstrap-server option.
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092,localhost:9094 --topic test1
Any idea why the consumer stops getting messages when server0 is down even though the replication factor for the topic test1 was set to 2?
There is a similar question already but it was not answered completely
Kafka 0.10 quickstart: consumer fails when "primary" broker is brought down
If the offsets topic is unavailable, you cannot consume.
Look at the server.properties file for these, and see the comment above, and increase accordingly (only applies if topic doesn't already exist)
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
According to your previous question, looks like it only has one replica
See how you can increase replication factor for an existing topic
In initial versions of Kafka, offset was being managed at zookeeper, but Kafka has continuously evolved over the time introducing lot of new features. Now Kafka manages the offset in a topic __consumer_offsets.
You can think of a scenario where you created a topic with a replication factor of 1. In case the broker goes down the data is only on that Kafka node which is down. So you can't get this data. Same analogy applies to __consumer_offsets topic.
You need to revisit the server.properties in order to get features you are expecting. But in case you still wanna consume the messages from the replica partition, you may need to re-start the console consumer with --from-beginning true

How to find the consumer topic and the group id from the __consumer_offsets topic in kafka?

I am trying to parse the logs from the __consumer_offsets topic in kafka. The idea is to find the group id, topic and the consumer which is creating load in my kafka cluster.
The command I am executing is below
bin/kafka-console-consumer.sh --topic __consumer_offsets --bootstrap-server brokers --formatter "kafka.coordinator.group.GroupMetadataManager\$OffsetsMessageFormatter" --new-consumer --consumer.config consumer.conf
Now the output looks like this
[dedupeconsumergroup,daas.dedupe.avrosyslog.incoming,4]::[OffsetMetadata[8646,NO_METADATA],CommitTime 1538115746766,ExpirationTime 1538202146766]
[dedupeconsumergroup,daas.dedupe.avrosyslog.incoming,6]::[OffsetMetadata[8639,NO_METADATA],CommitTime 1538115746766,ExpirationTime 1538202146766]
Can someone help me in understanding this log or point me to the documentation. Thanks in advance.
[dedupeconsumergroup,daas.dedupe.avrosyslog.incoming,6]::[OffsetMetadata[8639,NO_METADATA],CommitTime 1538115746766,ExpirationTime 1538202146766]
it means that consumer group with name dedupeconsumergroup read/commited offset 8639 on partition 6 in topic daas.dedupe.avrosyslog.incoming
which particular consumer read particular offset is probably little more complicated to find as you would have to know which partition was assigned to which consumer at a given point in time - it can change over time due to rebalancing.

Kafka Consumer offset fetch

I am using a kafka version where the offset storage is kafka i.e._consumer_offsets
How to retrieve the consumer offset ,when the consumer is down or inactive?
Below Command work but with a flaw that it reads from beginning not the latest one
kafka-console-consumer --consumer.config /tmp/consumer.config --formatter "kafka.coordinator.GroupMetadataManager\$OffsetsMessageFormatter" --zookeeper <> --topic __consumer_offsets --from-beginning
If your consumer is down and you want last committed offset of the topic for the specific group please refer to the below mentioned sources
https://github.com/yahoo/kafka-manager
KafkaConsumerOffsets

Kafka 10.2 new consumer vs old consumer

I've spent some hours to figure out what was going on but didn't manage to find the solution.
Here is my set up on a single machine:
1 zookeeper running
3 broker running (on port 9092/9093/9094)
1 topic with 3 partitions and 3 replications (each partition are properly assigned between brokers)
I'm using kafka console producer to insert messages. If i check the replication offset (cat replication-offset-checkpoint), I see that my messages are properly ingested by Kafka.
Now I use the kafka console consumer (new):
sudo bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic testTopicPartitionned2
I dont see anything consumed. I tried to delete my logs folder (/tmp/kafka-logs-[1,2,3]), create new topics, still nothing.
However when I use the old kafka consumer:
sudo bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic testTopicPartitionned2
I can see my messages.
Am I missing something big here to make this new consumer work ?
Thanks in advance.
Check to see what setting the consumer is using for auto.offset.reset property
This will affect what a consumer group without a previously committed offset will do in terms of setting where to start reading messages from a partition.
Check the Kafka docs for more on this.
Try providing all your brokers to --bootstrap-server argument to see if you notice any differnce:
sudo bin/kafka-console-consumer.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094 --from-beginning --topic testTopicPartitionned2
Also, your topic name is rather long. I assume you've already made sure you provide the correct topic name.