How to identify a specific message in Kafka - apache-kafka

As I understand that a Kafka message can be identified by topic, partition and offset. If I add the message along with the topic, partition and offset into my local database, then I can compare this when a new Kafka message received to ensure I won't insert the same message again.
But by default Kafka topic has a retention policy to keep the Kafka messages only for 7 days. After that the messages will be removed.
My question is that after a Kafka message is removed by the retention policy, will the message offset be re-used for new message? If yes then it will be an issue for me to mistreat a new message as an existing message as they held the same offset. Please advise how the offset works for the retention policy and how to handle this. Thank you!

No, as long as the Kafka cluster is not recreated, a topic will not reuse offsets. It is common to keep the offset stored (e.g. in the database or automatically using consumer groups) to know up to which point a consumer has processed a topic.

Related

If my service consumes Kafka messages, can kafka somehow lose my offsets?

If I have a service that connects to kafka as a message consumer, and every message I read I send a commit to that message offset, so that if my service shutsdown and restarts it will start reading from the last read message onwards. My understanding is that the committed offset will be maintained by kafka.
Now my question is, do I have to worry about the offset? Can kafka somehow lose that information and when the service restarts start reading messages from the beginning of the topic or the end of it depending on my initial offset config? Or if kafka loses my offset it will also have lost all messages in the topic so that it is alright to read from the beginning?
Note: I use spring-kafka on the service, but not sure if that is relevant to the question.
In most cases where you have an active consumer (with manual or auto-committing), you don't need to worry about it.
The cases where you do need to consider the behavior of auto.offset.reset setting is when the offsets.retention.minutes time on the broker has elapsed while your consumer group(s) are inactive. When this happens, Kafka compacts the __consumer_offsets topic and removes any offsets stored for those inactive groups
Losing offsets doesn't affect the source topic. Your client topic(s) have their own independent retention settings, and its message can be removed as well (or not), depending on how you've configured it.

Removing one message from a topic in Kafka

I'm new at using Kafka and I have one question. Can I delete only ONE message from a topic if I know the topic, the offset and the partition? And if not is there any alternative?
It is not possible to remove a single message from a Kafka topic, even though you know its partition and offset.
Keep in mind, that Kafka is not a key/value store but a topic is rather an append-only(!) log that represents a stream of data.
If you are looking for alternatives to remove a single message you may
Have your consumer clients ignore that message
Enable log compaction and send a tompstone message
Write a simple job (KafkaStreams) to consume the data, filter out that one message and produce all messages to a new topic.

Kafka to Kafka -> reading source kafka topic multiple times

I new to Kafka and i have a configuration where i have a source Kafka topic which has messages with a default retention for 7 days. I have 3 brokers with 1 partition and 1 replication.
When i try to consume messages from source Kafka topic and to my target Kafka topic i was able to consume messages in the same order. Now my question is if i am trying to reprocess all the messages from my source Kafka and consume in ,y Target Kafka i see that my Target Kafka is not consuming any messages. I know that duplication should be avoided but lets say i have a scenario where i have 100 messages in my source Kafka and i am expecting 200 messages in my target Kafka after running it twice. But i am just getting 100 messages in my first run and my second run returns nothing.
Can some one please explain why this is happening and what is the functionality behind it ?
Kafka consumer reads data from a partition of a topic. One consumer can read from one partition at one time only.
Once a message has been read by the consumer, it can't be re-read again. Let me first explain the current offset. When we call a poll method, Kafka sends some messages to us. Let us assume we have 100 records in the partition. The initial position of the current offset is 0. We made our first call and received 100 messages. Now Kafka will move the current offset to 100.
The current offset is a pointer to the last record that Kafka has already sent to a consumer in the most recent poll and that has been committed. So, the consumer doesn't get the same record twice because of the current offset. Please go through the following diagram and URL for complete understanding.
https://www.learningjournal.guru/courses/kafka/kafka-foundation-training/offset-management/

Group name is not polling data

I have topic with messages, and having consumer with group name as "KafkaConsumerExample". when i restart consumer, all the messages from topic was received without issues. but, when i change the name of my consumer group with same consumer code, consumer is not pulling data from topic ? what would be reason for this issue, changing the consumer name changes the behavior from topic. can you please help here ?
You cannot consume messages again using the same consumer group name. This is because kafka maintains offsets to maintain logs of the data it has consumed. This ensures kafka's exactly once semantics.
If you wish to consume the same data again from the topic, you need to change the name of the consumer group.
I hope this helps! let me know if I addressed your question or if there is anything else?
The problem you're running into is that when a new consumer group is used to read a topic, the first messages read are just after the messages already read by some other consumer group ( see auto.offset.reset explanation). This consumer config property lets you start reading at the latest offset (default, so you can start roughly where other groups have left off) but you want to set this property to "earliest" in order to force the consumer to read from the first message in each partition.

How do i reset, update or clear the offset of each consumer group for a topic in Kafka?

I have a topic let's say as test001 and supposed there is 10000 messages in topic . I have two consumer group's lets say test-group1 and test-group2 for consuming the message from the above topic.
If test-group1 consumer's has consumed 4000 message and test-group2 consumer's has consumed 4500 message so how can i do:
Reset the offset to 0 of test-group1 consumer group?
update the test-group1 consumer groups offset to 4500?
delete the message from topic and reset offset of all consumer group to 0?
I don't think you can reset the offset at consumer group level. You can use the seek method (in the Java client API) to move to the start of the partition (offset 0), end or any other offset of your choice. Try exploring some of the CLI options such as kafka-consumer-groups.sh, kafka-topics.sh
This ticket shows that one can produce directly to the __consumer_offsets topic to overwrite the offsets, using the special "__admin_client" id:
https://issues.apache.org/jira/browse/KAFKA-5246
I'm not familiar with the format of the __consumer_offsets topic messages. This post may help a bit but you'd need to do more digging yourself:
http://dayooliyide.com/post/kafka-consumer-offsets-topic/
It may be simpler to write an app that given your group id does a seek to the given position and commits the offset.
Offsets are stored for each topic+partition+group.id, not overall for the entire topic. You cannot delete committed offsets, only commit newer ones, or wait for them to expire from the _consumer-offsets topic (default of 24 hours).
In 0.11 there will be an offset management tool so you can change offsets from the CLI independently from the consuming apps.