How to make Kafka consumer read from last committed offset but not from last consumed offset? - apache-kafka

My requirement is simple yet not able to implement using plain consumer, I would like to consume records from Last committed offset position, every time I poll. I mean once after I polled set of records, if I am not manually committed the offset for those records, then I would expect the same set of records to be returned to me on the next poll. Is it possible to use plain Kafka consumers..? FYI I already configured my consumer, not to auto-commit.
The current workaround I employed is that I am manually seeking offset to last committed offset before every poll, but it is adding needless roundtrip and adding latency to message processing? is there an out of the box configuration available in Kafka consumer to achieve what I am expecting?

Related

Is it possible in Kafka to read messages in reverse manner?

Can be created a new consumer group with a consumer which assigned to existing topiс, but somehow set a preference to consume backward: offset will move from the latest message for the moment to the earliest in every partition?
Kafka topics are meant to be consumed sequentually in the order of appearance within the topic partitions.
However, I see two options to solve your issue:
You can steer the consumer what data it poll from the topic partition like: Have your consumer seek to the latestet offset, then consume it and then seek to the latest offset minus one but read only one offset. Again seek to the previous offset and so on. Although I have never seen it, this should be possible with the consumer.seek and the ConsumerConfiguration max.poll.records.
You could use any kind of state store and order it descending by the offset for each partition. Then have another consumer reading the state store in the desired order.

How does Kafka provides next batch of records to poll when commitAsync gets failed in committing offset

I have a use-case regarding consuming records by Kafka consumer.
For instance,
I have 1 topic which has 1 partition. Currently, it has 10 records and while consuming the first 10 records, another 10 records are written to the partition.
myConsumer polls the first time and returns the first 10 records say 0 - 9 records.
It processed all the records successfully.
It invoked commitAsync() to Kafka to commit the last offset.
Commit response is in processing. It can be a success or a failure.
But, since it is an asynchronous mode, it continues to poll for the next batch.
Now, how does either Kafka or consumer poll know that it has to read from the 10th position? Because the commitAsync request has not yet completed.
Please help me in understanding this concept.
Commit Offset tells the broker that the consumer has processed the corresponding message successfully. The consumer itself would be aware of its progress (except for start of consumer where it gets its last committed offset from broker).
At step-5 in your description, the commit offset is in progress. So:
Broker does not know that 0-9 records have been processed
Consumer itself has the read the messages and so it knows that is has read 0-9 messages. So it will know to read 10th onwards next.
Possible Scenarios
Lets say the commit fails for (0-9). Your next batch, say (10-15) is processed and committed succesfully then there is no harm done. Since we mark to the broker that processing till 15 is complete.
Lets say the commit fails for (0-9). Your next batch, (10-15) is processed and before committing, the consumer goes down. When your consumer is brought back up, it takes its state from broker (which does not have commit for either of the batch). So it will start reading from 0th message.
You can come up with several other scenarios as well. I guess the bottom line is, the importance of commit will come into picture when your consumer is restarted for whatever reason and it has get its last processed offset from kafka broker.

Reading messages for specific timestamp in kafka

I want to read all the messages starting from a specific time in kafka.
Say I want to read all messages between 0600 to 0800
Request messages between two timestamps from Kafka
suggests the solution as the usage of offsetsForTimes.
Problem with that solution is :
If say my consumer is switched on everyday at 1300. The consumer would not have read any messages that day, which effectively means no offset was committed at/after 0600, which means offsetsForTimes(< partitionname > , <0600 for that day in millis>) will return null.
Is there any way I can read a message which was published to kafka queue at a certain time, irrespective of offsets?
offsetsForTimes() returns offsets of messages that were produced for the requested time. It works regardless if offsets were committed or not because the offsets are directly fetched from the partition logs.
So yes you should be using this method to find the first offset produced after 0600, seek to that position and consume messages until you reach 0800.

Relationship between maxPollRecords and autoCommitEnable in kafka

Can Someone Please give me some good example and relationship between the kafka params maxPollRecords and autoCommitEnable in Kafka.
There is no relationship as such between them . Let me explain the two configs to you.
In Kafka there are two ways a consumer can commit offsets -
1.Manual Offset Commit - where the responsibility of committing offsets lies with the developer.
2.Enable Auto Commit- This is where the Kafka consumer takes the responsibility of committing offsets for you. How it works is, on every poll() call you make on the consumer , it is checked whether it is time to commit the offset ( this is dictated by auto.commit.interval.ms configuration), if it is time, it commits the offset.
For example - Suppose the auto.commit.interval.ms is set to 7 secs and every call to poll() takes 8 secs. So on a particular call to poll(), it will check, if the time to commit offset has elapsed , which in this example would have , then it will commit the offsets fetched from the previous poll.
Offsets are also committed during the closing of a consumer.
Here are some links you can look at -
https://kafka.apache.org/documentation/#consumerconfigs
https://kafka.apache.org/11/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html
Does kafka lose message if consumer holds message longer then auto commit interval time?
Now , onto Max.poll.records. By, this configuration, you can tell the kafka consumer, what are the maximum number of records , you would like it return on a single call to poll(). Note you will generally not change the defaults for this , unless your record processing is slow , and you want to ensure that your consumer is not considered dead , because of the slowness of processing too many records.

kafka subscribe commit offset manually

I am using Kafka 9 and confused with the behavior of subscribe.
Why does it expects group.id with subscribe.
Do we need to commit the offset manually using commitSync. Even if don't do that I see that it always starts from the latest.
Is there a way a replay the messages from beginning.
Why does it expects group.id with subscribe?
The concept of consumer groups is used by Kafka to enable parallel consumption of topics - every message will be delivered once per consumer group, no matter how many consumers actually are in that group. This is why the group parameter is mandatory, without a group Kafka would not know how this consumer should be treated in relation to other consumers that might subscribe to the same topic.
Whenever you start a consumer it will join a consumer group, based on how many other consumers are in this consumer group it will then be assigned partitions to read from. For these partitions it then checks whether a list read offset is known, if one is found it will start reading messages from this point.
If no offset is found, the parameter auto.offset.reset controls whether reading starts at the earliest or latest message in the partition.
Do we need to commit the offset manually using commitSync? Even if
don't do that I see that it always starts from the latest.
Whether or not you need to commit the offset depends on the value you choose for the parameter enable.auto.commit. By default this is set to true, which means the consumer will automatically commit its offset regularly (how often is defined by auto.commit.interval.ms). If you set this to false, then you will need to commit the offsets yourself.
This default behavior is probably also what is causing your "problem" where your consumer always starts with the latest message. Since the offset was auto-committed it will use that offset.
Is there a way a replay the messages from beginning?
If you want to start reading from the beginning every time, you can call seekToBeginning, which will reset to the first message in all subscribed partitions if called without parameters, or just those partitions that you pass in.