I am using a kafka publish connector, which is publishing the payload supplied through postman. Now, there is kafka listener which listens to the messages in that topic (which has only 1 partition). In the same kafka listener flow there is a seek operation which is trying to seek a value for the particular offset. And as it is getting auto committed, the seek is not fetching the the offset value we are providing. Need your valuable suggestions to make the seek operator seek values from previous offsets.
Set the acknowledgement mode to MANUAL in the Kafka connector configuration and ensure to have a Commit operation when it ends correctly.
Related
I use Kafka Spring to insert to database processing messages as a batch with container "ConcurrentKafkaListenerContainerFactory" and in case of error occurs
Bad message I will send that messages to another topic.
If connection failed or time out I will rollback both database transaction and producer transaction to prevent false positive
And I don't get assignmentCommitOption option how dose it work and how it different between ALWAYS,NEVER,LATEST_ONLY and LATEST_ONLY_NO_TX,
If there is no current committed offset for a partition that is assigned, this option controls whether or not to commit an initial offset during the assignment.
It is really only useful when using auto.offset.reset=latest.
Consider this scenario.
Application comes up and is assigned a "new" partition; the consumer will be positioned at the end of the topic.
No records are received from that topic/partition and the application is stopped.
A record is then published to the topic/partition and the consumer application restarted.
Since there is still no committed offset for the partition, it will again be positioned at the end and we won't receive the published record.
This may be what you want, but it may not be.
Setting the option to ALWAYS, LATEST_ONLY, or LATEST_ONLY_NO_TX (default) will cause the initial position to be committed during assignment so the published record will be received.
The _NO_TX variant commits the offset via the Consumer, the other one commits it via a transactional producer.
I'm working on an application of spring boot which uses Kafka stream, in my application, I want to manage Kafka offset and commit the offset in case of the successful message processing only. This is important, to be certain I won't lose messages even if Kafka restarted or the zookeeper is down. my current situation is when my Kafka is down and up my consumer starts from the beginning and consumes all the previous messages.
also, I need to know what is the difference between managing the Kafka offset automatic using autoCommitOffset and manging it manually using HBase or zookeeper or checkpoints?
also, what are the benefits of managing it manually if there is an automatic config we can use?
You have no guarantee of durability with auto commit
Older Kafka clients did use Zookeeper for offset storage, but now it is all in the broker to minimize dependencies. Kafka Streams API has no way to integrate offset storage outside of Kafka itself, and so you must use the Consumer API to lookup and seek/commit offsets to external storage, if you choose to do so, however, you can still then end up with less than optimal message processing.
my current situation is when my Kafka is down and up my consumer starts from the beginning and consumes all the previous messages
Sounds like you set auto.offset.reset=earliest and you never commit any offsets at all...
The auto commit setting does a periodic commit, not "automatic after reading any message".
If you want to guarantee delivery, then you need to set at least acks=1 in the producer and actually do a commitSync in the consumer
Is it possible to do kafka stream processing from a specific offset of input topic to an end offset?
I have one Kafka stream application which consume an input topic but for some reason it failed. I fixed the issue and started it again but it started consuming from the latest offset of the input topic. I know the offset of the input topic till which the application has processed. Now, how can I process the input topic from one offset to another. I am using confluent Platform 5.1.2.
By default, KStreams supports two possible values for auto.offset.reset. It could be either "earliest" or "latest". You can't set it to a specific offset in your application code.
There is an option during the application reset. If you use application reset script, you can use the --to-offset property and assign it to the specific offset. It will reset the application to that point.
<path-to-confluent>/bin/kafka-streams-application-reset --application-id app1 --input-topics a,b --to-offset 1000
You can find the details in the documentation :
https://docs.confluent.io/5.1.2/streams/developer-guide/app-reset-tool.html
In case, if you are fixing the bugs, it will be better to reset to the earliest state if possible.
I deal with timeseries data for a live application. So old data has no significance. I just want to process data received after the stream app has started and not from previously committed offset. What is the correct way to ignore old records on kafka stream app after restart?
With kafka consumer API I generally used the seekToEnd() method to skip forward to the latest record. Is there a equivalent mechanism for streams?
I want to avoid filtering through all messages since last commit to ignore old messages.
You can create another consumer using Kafka Consumer API with groupId same as the applicationId for kafka-streams and use that consumer to do a seekToEnd() before starting your stream. Disable autoCommit for this special consumer and commit the offset manually after seekToEnd(). Then try starting your stream.
Make sure the stream has not started until your offsets from reset consumer are committed.
I'm using Apache Kafka (no confluent) and have a simple Kafka producer that pulls from a rest API, sends data to Kafka and shuts down.
I had this for testing while I was developing a consumer.
In the consumer I can keep track of the offset, but I don't seem to be able to set a custom offset in my producer.
In my rest calls I would need to keep track of a date so that I don't pull the same data all the time. Am I forced to store that "last timestamp" myself or am I missing something?
I think that in your scenario you aren't interested in a "Kafka" offset on the producer side (when you write to Kafka) but an "offset" tracing latest data you pulled from the REST API so you are right you have to do that by yourself.
On the Kafka producer side you are able to know the offset assigned to the latest sent message (inside the RecordMetadata) but it has no relationship with the latest timestamp when you pulled data from the REST API.