Apache Kafka : commitSync after pause - apache-kafka

In our code, we plan to manually commit the offset. Our processing of data is long run and hence we follow the pattern suggested before
Read the records
Process the records in its own thread
pause the consumer
continue polling paused consumer so that it is alive
When the records are processed, commit the offsets
When commit done, then resume the consumer
The code somewhat looks like this:
while (true) {
ConsumerRecords<String, String> records = consumer.poll(kafkaConfig.getTopicPolling());
if (!records.isEmpty()) {
task = pool.submit(new ProcessorTask(processor, createRecordsList(records)));
}
if (shouldPause(task)) {
consumer.pause(listener.getPartitions());
}
if (isDoneProcessing(task)) {
consumer.commitSync();
consumer.resume(listener.getPartitions());
}
}
If you notice, we commit using commitSync() (without any parameters).
Since the consumer is paused, in the next iteration we would get no records. But commitSync() would happen later. In that case which offset's would it try to commit? I have read the definitive guide and googled but cannot find any information about it.
I think we should explicitly save the offsets. But I am not sure if the current code would be an issue.
Any information would be helpful.
Thanks,
Prateek

If you call consumer.commitSync() with no parameters it should commit the latest offset that your consumer has received. Since you can receive many messages in a single poll() you might want to have finer control over the commit and explicitly commit a specific offset such as the latest message that your consumer has successfully processed. This can be done by calling commitSync(Map<TopicPartition,OffsetAndMetadata> offsets)
You can see the syntax for the two ways to call commitSync here in the Consumer Javadoc http://kafka.apache.org/0110/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#commitSync()

Related

Kafka resume consumer can't receive the first message

I need to pause the Kafka consumer from consuming messages from the topic until the message reaches it's waiting time. For this one, I used pause/resume methods in Kafka. But when I resume, the first message that consumed before pausing, will not be received again. But still, the offset of the topic has not been updated since I do manual acknowledgment (lag is one).
#StreamListener(ChannelName.MESSAGE_INPUT_RETRY_CHANNEL)
public void onMessageRetryReceive(org.springframework.messaging.Message<Message> message, #Header(KafkaHeaders.CONSUMER)KafkaConsumer<?,?> consumer){
long waitTime = //Calculate the wait time of the message
Acknowledgment acknowledgment = message.getHeaders().get(KafkaHeaders.ACKNOWLEDGMENT, Acknowledgment.class);
if(waitTime > 0){
consumer.pause(Collections.singleton(new TopicPartition("message-retry-topic",0)));
}else{
messageProducer.sendMessage(message.getPayload());
acknowledgment.acknowledge();
}
}
#Bean
public ApplicationListener<ListenerContainerIdleEvent> idleListener() {
return event -> {
boolean isReady = //Logi to check if ready to resume
if(isReady){
event.getConsumer().resume(event.getConsumer().paused());
}
};
}
This relates to the question mentioned in KafkaConsumer resume partition cannot continue to receive uncommitted messages. But I'm not sure how the seeks method can be helpful to retrieve the 1st consumed message. I'm using the spring cloud stream. I need some suggestion on this
The fact that you don't call acknowledgment.acknowledge();, doesn't mean that your KafkaConsumer instance doesn't keep the last consumed position in the memory.
We definitely need to commit offsets for subsequent consumers on the partition. Currently ran consumer doesn't need such an information to be committed, because it has it in its own in-memory state.
To be able to reconsume the same record you need to perform seek() operation.
See Docs for more info: https://docs.spring.io/spring-kafka/docs/current/reference/html/#seek

how to get last committed offset from read_committed Kafka Consumer

I am using the transactional KafkaProducer to send messages to a topic. This works fine. I use a KafkaConsumer with read_committed isolation level and I have an issue with the seek and seekToEnd methods. According to the documentation, the seek and seekToEnd methods give me the LSO (Last Stable Offset). But this is a bit confusing. As it gives me always the same value, the END of the topic. No matter if the last entry is committed (by the Producer) or part of an aborted transaction.
Example, after I abort the last 5 tries to insert 20_000 messages, the last 100_000 records should not be read by the Consumer. But during a seekToEnd it moves to the end of the Topic (including the 100_000 messages). But the poll() does not return them.
I am looking for a way to retrieve the Last Committed Offset (so the last successful committed message by the Producer). There seems to be no proper API method for this. So do I need to roll my own?
Option would be to move back and poll until no more records are retrieved, this would result in the last committed message. But I would assume that Kafka provides this method.
We use Kafka 1.0.0.
The class KafkaConsumer has some nice methods like: partitionFor, begginingOffsets and endOffsets also commited and position.
Check which one fits to your needs. Especially carefully consider all 4 offset-related methods.
The method partitionFor returns complete metadata object with other information, but can be useful for enriching the logging.
To get the last committed offset of a topic partitions you can use the KafkaConsumer.committed(TopicPartition partition) function.
TopicPartition topicPartition = new TopicPartition(record.topic(), record.partition());
Long committedOffset = consumer.committed(topicPartition).offset();
System.out.println("last committed offset: " + committedOffset);

Kafka consumer offset commit when later message is consumed first

I have a java Kafka consumer in which I am fetching ConsumerRecords in a batch to process. The sample code is as follows -
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
DoSomeProcessing (record.value());
}
consumer.commitAsync();
}
private void DoSomeProcessing(String record) {
//make an external call to a system which can take random time for different requests or timeout in 5 seconds.
}
The problem I have is for how or which offset to commit if the later record is produced but the previous record is still not timed out.
Lets suppose I get 2 records in a batch, the external call for 1st message is still awaited, and for 2nd call completed. If I wait for 5 seconds for the external response, the consumption from Kafka message can become super slow in cases. If I do not wait for 1st request to complete before doing another poll, what offset do I commit to Kafka? If I commit 2, and if the consumer crashes, 1st message will be lost as next time latest committed offset would be 2.
I think you analyzed the problem correctly, and the answer is probably what you suspect: you can't commit offsets until every offset less than and equal to that offset has been processed. That's just how Kafka works: it's very much oriented around strong ordering.
The solution is to increase the number of partitions and consumers so you get the parallelism you desire. This is not great from some angles—you needs more threads and resources—but at least you get to write synchronous code.
What you can do is that you can setup an error pipeline. For the messages that are failing, you will commit that message and push it to the error queue and will process it later.

Is there a way to stop Kafka consumer at a specific offset?

I can seek to a specific offset. Is there a way to stop the consumer at a specific offset? In other words, consume till my given offset. As far as I know, Kafka does not offer such a function. Please correct me if I am wrong.
Eg. partition has offsets 1-10. I only want to consume from 3-8. After consuming the 8th message, program should exit.
Yes, kafka does not offer this function, but you could achieve this in your consumer code. You could try use commitSync() to control this.
public void commitSync(Map offsets)
Commit the specified offsets for the specified list of topics and partitions.
This commits offsets to Kafka. The offsets committed using this API will be used on the first fetch after every rebalance and also on startup. As such, if you need to store offsets in anything other than Kafka, this API should not be used. The committed offset should be the next message your application will consume, i.e. lastProcessedMessageOffset + 1.
This is a synchronous commits and will block until either the commit succeeds or an unrecoverable error is encountered (in which case it is thrown to the caller).
Something like this:
while (goAhead) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
if (record.offset() > OFFSET_BOUND) {
consumer.commitSync(Collections.singletonMap(new TopicPartition(record.topic(), record.partition()), new OffsetAndMetadata(record.offset())));
goAhead = false;
break;
}
process(record);
}
}
You should set the "enable.auto.commit" to false in code above. In your case the OFFSET_BOUND could be set to 8. Because the commited offset is just 9 in your example, So next time the consumer will fetch from this position.
Assuming that partition offsets are continuous (i.e. not log compacted) you could configure your consumer (using max.poll.records config) so it reads certain number of records in each poll. This would let you stop at the offset you want.
As I know max.poll.records is a client feature. Kafka fetch protocol has only bytes limitations https://kafka.apache.org/protocol#The_Messages_Fetch
So you will read more messages under hood in general

How to commit manually with Kafka Stream?

Is there a way to commit manually with Kafka Stream?
Usually with using the KafkaConsumer, I do something like below:
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records){
// process records
}
consumer.commitAsync();
}
Where I'm calling commit manually. I don't see a similar API for KStream.
Commits are handled by Streams internally and fully automatic, and thus there is usually no reason to commit manually. Note, that Streams handles this differently than consumer auto-commit -- in fact, auto-commit is disabled for the internally used consumer and Streams manages commits "manually". The reason is, that commits can only happen at certain points during processing to ensure no data can get lost (there a many internal dependencies with regard to updating state and flushing results).
For more frequent commits, you can reduce commit interval via StreamsConfig parameter commit.interval.ms.
Nevertheless, manual commits are possible indirectly, via low-level Processor API. You can use the context object that is provided via init() method to call context#commit(). Note, that this is only a "request to Streams" to commit as soon as possible -- it's not issuing a commit directly.