I have keep my kafka client offset yet, but the kafka server were delete data in a time delta.
so i can't get anything when i call the kafka server for data .
How could i get the kafka server's offset in this time?
I just want to get data from now on
我有保存我的卡夫卡客户端偏移还,但卡夫卡服务器会在一段时间后删除数据。
因此,当我向卡夫卡服务器要数据时,我不能得到任何东西。
我怎么能得到卡夫卡服务器现在的偏移量?
我只是想从现在开始获取数据
SimpleConsumer.getOffsetsBefore(topic, partition, time, 1)
time should be kafka.api.OffsetRequest.EarliestTime() , get minimal valid offset
Related
I have code in place to find offsets and TopicPartition from the KafkaConsumer, but can't find a way to just retrieve the timestamp based on that information.
I have looked through ConsumerRecord but since this is a monitoring service I do not think I should .poll() as I might cause some records to fall through if my monitoring service is directly polling from Kafka.
I know there's CLI kafka-console-consumer which can fetch the timestamp of a message based on partition and offset, but not sure if that's an SDK available for that.
Does anyone have any insights or readings I can go through to try to get time lag? I have been trying to find an SDK or any type of API that can do this.
There is no other way (as of 3.1) to do this - you can do consumer.poll. Of course, if you want to access only one, then you should set the max received records property to 1, so you don't waste effort. A consumer can be basically treated as an accessor to remote record-array, what you are doing is just accessing record[offset], and getting this record's ts.
So to sum it up:
get timestamp out of offset -> seek + poll 1,
get offset out of timestamp -> offsetsForTimes.
If I understood your question, given a ConsumerRecord, from Kafka 0.11+, all records have a .timestamp() method.
Alternatively, given a topic, (list of) partition(s), and offset(s), then you'd need to seek a consumer, with max.poll.records=1, then extract the timestamps from each polled partition after the seeked position.
The Confluent Monitoring Interceptors already do something very similiar to what you're asking, but for Control Center.
I have a database with time series data and this data is sent to Kafka.
Many consumers build aggregations and reporting based on this data.
My Kafka cluster stores data with TTL for 1 day.
But how I can build a new report and run a new consumer from 0th position that does not exist in Kafka but exists in source storage.
For example - some callback for the producer if I request an offset that does not exist in Kafka?
If it is not possible please advise other architectural solutions. I want to use the same codebase to aggregate this data.
For example - some callback for the producer if I request an offset
that does not exist in Kafka?
If the data does not exist in Kafka, you cannot consume it much less do any aggregation on top of it.
Moreover, there is no concept of a consumer requesting a producer. Producer sends data to Kafka broker(s) and consumers consume from those broker(s). There is no direct interaction between a producer and a consumer as such.
Since you say that the data still exists in the source DB, you can fetch your data from there and reproduce it to Kafka.
When you produce that data again, they will be new messages which will be eventually consumed by the consumers as usual.
In case you would like to differentiate between initial consumption and re-consumption, you can produce these messages to a new topic and have your consumers consume from them.
Other way is to increase your TTL (I suppose you mean retention in Kafka when you say TTL) and then you can seek back to a timestamp in the consumers using the offsetsForTimes(Map<TopicPartition,Long> timestampToSearch) and seek(TopicPartition topicPartition, long offset) methods.
We could able to insert data to kafka using kafka producer API and offset got incremented as well could able to consume data using kafka consumer API and consumer offset got incremented.
But sometimes, Offset was not working properly in the process of push and consume data from kafka. Please help me out.
I want to read all the messages starting from a specific time in kafka.
Say I want to read all messages between 0600 to 0800
Request messages between two timestamps from Kafka
suggests the solution as the usage of offsetsForTimes.
Problem with that solution is :
If say my consumer is switched on everyday at 1300. The consumer would not have read any messages that day, which effectively means no offset was committed at/after 0600, which means offsetsForTimes(< partitionname > , <0600 for that day in millis>) will return null.
Is there any way I can read a message which was published to kafka queue at a certain time, irrespective of offsets?
offsetsForTimes() returns offsets of messages that were produced for the requested time. It works regardless if offsets were committed or not because the offsets are directly fetched from the partition logs.
So yes you should be using this method to find the first offset produced after 0600, seek to that position and consume messages until you reach 0800.
I'm using Apache Kafka (no confluent) and have a simple Kafka producer that pulls from a rest API, sends data to Kafka and shuts down.
I had this for testing while I was developing a consumer.
In the consumer I can keep track of the offset, but I don't seem to be able to set a custom offset in my producer.
In my rest calls I would need to keep track of a date so that I don't pull the same data all the time. Am I forced to store that "last timestamp" myself or am I missing something?
I think that in your scenario you aren't interested in a "Kafka" offset on the producer side (when you write to Kafka) but an "offset" tracing latest data you pulled from the REST API so you are right you have to do that by yourself.
On the Kafka producer side you are able to know the offset assigned to the latest sent message (inside the RecordMetadata) but it has no relationship with the latest timestamp when you pulled data from the REST API.