Using seek to listen get only uncommitted offset from beginning - apache-kafka

I am using Spring Kafka and have a requirement where I have to listen from a DLQ topic and put the message to another topic after few minutes. Here I am only acknowledging a msg only when it is put to another topic else I am not committing it and calling kafkaListenerEndpointRegistry.stop() which is stopping my kafka consumer. Then there is scheduled cron job running after every 3 minutes and starts the consumer by running kafkaListenerEndpointRegistry.start() and since auto.offset.reset is set to earliest then consumer is getting all msgs from previously uncommitted offset and checking their eligibility to be put on other topic.
This approach is working fine for small volume but for very large volume I am not seeing the expected retries in both topics. So I am suspecting that this might be happening because I am using kafkaListenerEndpointRegistry.stop() to stop the consumer. If I am able to seek to beginning of offset for each partition and get all msgs from uncommitted offset then I don't have to stop and start my consumer.
For this, I tried ConsumerSeekAware.onPartitionAssigned and calling callback.seekToBeginning() to reset offsets. But looks like it's also consuming all committed offset which is increasing huge load on my services. So is there anything I am missing or seekToBeginning always read all msgs(committed and uncommitted).
and is there any way to trigger partition assignment manually while running kafka consumer so that it goes to onPartitionAssigned method?

auto.offset.reset is set to earliest then consumer is getting all msgs from previously uncommitted
auto.offset.reset is meaningless if there is a committed offset; it just determines the behavior if there is no committed offset.
seekToBeginning always read all msgs(committed and uncommitted).
Kafka maintains 2 pointers - the current position and the committed offset; seek has nothing to do with committed offset, seekToBeginning just changes the position to the earliest record, so the next poll will return all records.
This approach is working fine for small volume but for very large volume I am not seeing the expected retries in both topics. So I am suspecting that this might be happening because I am using kafkaListenerEndpointRegistry.stop() to stop the consumer.
That should not be a problem; you might want to consider using a container stopping error handler instead; then throw an exception and the container will stop itself (you should also set the stopImmediate container property).
https://docs.spring.io/spring-kafka/docs/current/reference/html/#container-stopping-error-handlers

Related

Kafka consumer - how to recognized offset skipping/missing offsets?

Setup:
We have a Debezium/Kafka Connect setup with an Debezium Oracle producer and a Confluend JDBC consumer/sink.
Starting position / background / problem:
Due to high traffic we have decreased the log.retention.minutes to 1h which is suitable in 99% of the time.
But in some rare cases one of the kafka consumers gets a slow down and can't keep up any longer. In that case messages will be deleted in Kafka (due to the aforementioned retention period) before they were picked up and handled by the consumer.
In the default config, the consumer then will skip the missing records be choosing the earliest available offset. This leads to inconsistencies on the target side.
Question:
How to handle those situations (if raising the log.retention.minutes isn't an option)?
Note: We would be fine, if the consumer would just throw an exception/stop/etc in case it can't find a message for its given offset.
What we've tried to far...
We tried setting auto.offset.reset to none for the consumer and expected the consumer to stop in case it can't find an offset. In theory this should work. In practice it immeadiately throws an exception when the consumer gets instantiated because there's no first/initial offset.
Final thoughts
So is there another config parameter we could use? (Something like "throw exception if offset is missing/skipped, but not on first start"?) Or is there a JMX metric we could monitor in case a consumer is skipping messages?
setting auto.offset.reset to none for the consumer and expected the consumer to stop in case it can't find an offset
That's what it'll do, yes.
In practice it immediately throws an exception when the consumer gets instantiated because there's no first/initial offset
You'll need to actually initialize the group first, then seek it to the earliest offset. E.g. kafka-consumer-offsets --reset-offsets --to-earliest --group connect-<name>
Something like "throw exception if offset is missing/skipped, but not on first start"?)
There's nothing to differentiate auto.offset.reset between "first" and "next" starts. But, you could create the connector with consumer.override.auto.offset.reset=earliest, then wait for it to be running, then set it back to none with a PUT /config call. Then repeat whenever it stops running again.
JMX metric we could monitor in case a consumer is skipping messages
Not that I know of; the metrics are mostly reporting bytes processed. You'd have to additionally track how many bytes you expect it to read.
You'd need other monitoring solutions to detect log segments being deleted on the broker, and tracking those offset ranges compared to the offsets your consumer is currently reading.

Apache Kafka Cleanup while consuming messages

Playing around with Apache Kafka and its retention mechanism I'm thinking about following situation:
A consumer fetches first batch of messages with offsets 1-5
The cleaner deletes the first 10 messages, so the topic now has offsets 11-15
In the next poll, the consumer fetches the next batch with offsets 11-15
As you can see the consumer lost the offsets 6-10.
Question, is such a situation possible at all? With other words, will the cleaner execute while there is an active consumer? If yes, is the consumer able to somehow recognize that gap?
Yes such a scenario can happen. The exact steps will be a bit different:
Consumer fetches message 1-5
Messages 1-10 are deleted
Consumer tries to fetch message 6 but this offset is out of range
Consumer uses its offset reset policy auto.offset.reset to find a new valid offset.
If set to latest, the consumer moves to the end of the partition
If set to earliest the consumer moves to offset 11
If none or unset, the consumer throws an exception
To avoid such scenarios, you should monitor the lead of your consumer group. It's similar to the lag, but the lead indicates how far from the start of the partition the consumer is. Being near the start has the risk of messages being deleted before they are consumed.
If consumers are near the limits, you can dynamically add more consumers or increase the topic retention size/time if needed.
Setting auto.offset.reset to none will throw an exception if this happens, the other values only log it.
Question, is such a situation possible at all? will the cleaner execute while there is an active consumer
Yes, if the messages have crossed TTL (Time to live) period before they are consumed, this situation is possible.
Is the consumer able to somehow recognize that gap?
In case where you suspect your configuration (high consumer lag, low TTL) might lead to this, the consumer should track offsets. kafka-consumer-groups.sh command gives you the information position of all consumers in a consumer group as well as how far behind the end of the log they are.

Kafka Polling Mechanism

Kafka messages that has been posted by the producer are keep appearing the consumer end after a specific interval
I tried to consume a message from my Kafka topic, That Time I face the issue I explained above. I suppose, it happens due to repolling after 5 mins(Which is a default poll interval) set. Is my understanding correct?
My Expected result is the message should not be reprocessed again and again. It should be processed only once. How Can I achieve that?
Your configuration seems to be
enable.auto.commit: false and auto.commit.interval.ms: some value
Second configuration is causing messages appearing after some specific interval(some value).
Same message is appearing at consumer end for processing because the message was not processed successfully first time.
If no last offset information available with zookeeper or broker, and auto.offset.reset is set to smallest (or earliest) then processing will start from 0th offset.
Change auto.offset.reset to largest (or latest) if you do not want to reprocess the same message.

KafkaConsumer resume partition cannot continue to receive uncommitted messages

I'm using one topic, one partition, one consumer, Kafka client version is 0.10.
I got two different results:
If I paused partition first, then to produce a message and to invoke resume method. KafkaConsumer can poll the uncommitted message successfully.
But If I produced message first and didn't commit its offset, then to pause the partition, after several seconds, to invoke the resume method. KafkaConsumer would not receive the uncommitted message. I checked it on Kafka server using kafka-consumer-groups.sh, it shows LOG-END-OFFSET minus CURRENT-OFFSET = LAG = 1.
I have been trying to figure out it for two days, I repeated such tests a lot of times, the results are always like so. I need some suggestion or someone can tell me its Kafka's original mechanism.
For your observation#2, if you restart the application, it will supply you all records from the un-committed offset, i.e. the missing record and if your consumer again does not commit, it will be sent again when application registers consumer with Kafka upon restart. It is expected.
Assuming you are using consumer.poll() which creates a hybrid-streaming interface i.e. if accumulates data coming into Kafka for the duration mentioned and provides it to the consumer for processing once the duration is finished. This continuous accumulation happens in the backend and is not dependent on whether you have committed offset or not.
KafkaConsumer
The position of the consumer gives the offset of the next record that
will be given out. It will be one larger than the highest offset the
consumer has seen in that partition. It automatically advances every
time the consumer receives messages in a call to poll(long).

Simple-Kafka-consumer message delivery duplication

I am trying to implement a simple Producer-->Kafka-->Consumer application in Java. I am able to produce as well as consume the messages successfully, but the problem occurs when I restart the consumer, wherein some of the already consumed messages are again getting picked up by consumer from Kafka (not all messages, but a few of the last consumed messages).
I have set autooffset.reset=largest in my consumer and my autocommit.interval.ms property is set to 1000 milliseconds.
Is this 'redelivery of some already consumed messages' a known problem, or is there any other settings that I am missing here?
Basically, is there a way to ensure none of the previously consumed messages are getting picked up/consumed by the consumer?
Kafka uses Zookeeper to store consumer offsets. Since Zookeeper operations are pretty slow, it's not advisable to commit offset after consumption of every message.
It's possible to add shutdown hook to consumer that will manually commit topic offset before exit. However, this won't help in certain situations (like jvm crash or kill -9). To guard againts that situations, I'd advise implementing custom commit logic that will commit offset locally after processing each message (file or local database), and also commit offset to Zookeeper every 1000ms. Upon consumer startup, both these locations should be queried, and maximum of two values should be used as consumption offset.