I'm using Apache Kafka (no confluent) and have a simple Kafka producer that pulls from a rest API, sends data to Kafka and shuts down.
I had this for testing while I was developing a consumer.
In the consumer I can keep track of the offset, but I don't seem to be able to set a custom offset in my producer.
In my rest calls I would need to keep track of a date so that I don't pull the same data all the time. Am I forced to store that "last timestamp" myself or am I missing something?
I think that in your scenario you aren't interested in a "Kafka" offset on the producer side (when you write to Kafka) but an "offset" tracing latest data you pulled from the REST API so you are right you have to do that by yourself.
On the Kafka producer side you are able to know the offset assigned to the latest sent message (inside the RecordMetadata) but it has no relationship with the latest timestamp when you pulled data from the REST API.
Related
I am using a kafka publish connector, which is publishing the payload supplied through postman. Now, there is kafka listener which listens to the messages in that topic (which has only 1 partition). In the same kafka listener flow there is a seek operation which is trying to seek a value for the particular offset. And as it is getting auto committed, the seek is not fetching the the offset value we are providing. Need your valuable suggestions to make the seek operator seek values from previous offsets.
Set the acknowledgement mode to MANUAL in the Kafka connector configuration and ensure to have a Commit operation when it ends correctly.
I have a database with time series data and this data is sent to Kafka.
Many consumers build aggregations and reporting based on this data.
My Kafka cluster stores data with TTL for 1 day.
But how I can build a new report and run a new consumer from 0th position that does not exist in Kafka but exists in source storage.
For example - some callback for the producer if I request an offset that does not exist in Kafka?
If it is not possible please advise other architectural solutions. I want to use the same codebase to aggregate this data.
For example - some callback for the producer if I request an offset
that does not exist in Kafka?
If the data does not exist in Kafka, you cannot consume it much less do any aggregation on top of it.
Moreover, there is no concept of a consumer requesting a producer. Producer sends data to Kafka broker(s) and consumers consume from those broker(s). There is no direct interaction between a producer and a consumer as such.
Since you say that the data still exists in the source DB, you can fetch your data from there and reproduce it to Kafka.
When you produce that data again, they will be new messages which will be eventually consumed by the consumers as usual.
In case you would like to differentiate between initial consumption and re-consumption, you can produce these messages to a new topic and have your consumers consume from them.
Other way is to increase your TTL (I suppose you mean retention in Kafka when you say TTL) and then you can seek back to a timestamp in the consumers using the offsetsForTimes(Map<TopicPartition,Long> timestampToSearch) and seek(TopicPartition topicPartition, long offset) methods.
I'm working on an application of spring boot which uses Kafka stream, in my application, I want to manage Kafka offset and commit the offset in case of the successful message processing only. This is important, to be certain I won't lose messages even if Kafka restarted or the zookeeper is down. my current situation is when my Kafka is down and up my consumer starts from the beginning and consumes all the previous messages.
also, I need to know what is the difference between managing the Kafka offset automatic using autoCommitOffset and manging it manually using HBase or zookeeper or checkpoints?
also, what are the benefits of managing it manually if there is an automatic config we can use?
You have no guarantee of durability with auto commit
Older Kafka clients did use Zookeeper for offset storage, but now it is all in the broker to minimize dependencies. Kafka Streams API has no way to integrate offset storage outside of Kafka itself, and so you must use the Consumer API to lookup and seek/commit offsets to external storage, if you choose to do so, however, you can still then end up with less than optimal message processing.
my current situation is when my Kafka is down and up my consumer starts from the beginning and consumes all the previous messages
Sounds like you set auto.offset.reset=earliest and you never commit any offsets at all...
The auto commit setting does a periodic commit, not "automatic after reading any message".
If you want to guarantee delivery, then you need to set at least acks=1 in the producer and actually do a commitSync in the consumer
We could able to insert data to kafka using kafka producer API and offset got incremented as well could able to consume data using kafka consumer API and consumer offset got incremented.
But sometimes, Offset was not working properly in the process of push and consume data from kafka. Please help me out.
We have a requirement that the consumer should be able to go back to any point of message stream and reprocess the message.
It looks like to set an offset in High level consumer as it needs setting offset in the zookeeper and re-start of consumer is needed.
Simple consumer supports this ,but need to handle broker leader election ,broker leader failure etc.
The new consumer api provide this ,but it looks like this is still in beta.
So we might have to select simple consumer. Any know issues with simple consumers