Kafka Auto-Commit doesn't effect for slower transactions - apache-kafka

Using Spring Boot Kafka - v 2.2.1.RELEASE (which uses kafka-clients.jar v 2.3.1)
I'm using confluent v 5.1.2 for my Kafka Broker.
I've got enabled the autocommit to true and autocommit interval every 3 seconds.
I've got a spring based listener class (#KafkaListener) and each message processing takes about 6-7 seconds as it's a resource intensive operation.
What I noticed here is the offsets are never committed by the listener and the lag permanently remains as "550".
Because of this (I think), the messages are indefinitely processed by the listener.
When I changed the listener to manually commit the offset, it works fine.
Why is the autocommit not working in my case where a long running transaction is involved?
Is there any settings missing or set to too small/large a number? I'd like to know what's the recommended strategy for such long running transactions.
Thanks!

Auto commit runs on the consumer thread so auto.commit.interval.ms should be considered a minimum - the actual commit won't happen until the next poll.
However, Spring recommends not using auto commit and letting the container perform the commits (e.g. with AckMode.RECORD); it is more deterministic and reliable.

Related

spring-kafka - error handling via group rebalance

I'm writing a kafka listener which should just forward the messages from a topic to a jms queue. I need to stop processing new messages only for a custom exception JmsBrokerConnectionException but i want to continue processing new messages for any other exceptions (ie invalid data) and send error messages to a DLT.
I am using spring-kafka 2.2.7 and cannot upgrade it.
I have currently a solution which uses:
SeekToCurrentErrorHandler (configured with 0 retries and a DeadLetterPubishingRecoverer)
a retry template used in the #KafkaListener method configured with Integer.MAX_VALUE retries, which retries only for JmsBrokerConnectionException
MANUAL_IMMEDIATE ack
The solution seems to do the job but it has the drawback that, for long outages of the jms broker, it would cause a rebalance each max.poll.interval.ms (ie 5 mins).
The question:
Is it a good idea to let max.poll.interval.ms expire and have a group rebalance to handle error conditions for which you want to stop message consumption?
I don't have high-throughput requirements.
The input topic has 10 partitions and i will have 2 consumers.
I know there are other solutions using stateful retry or pausing/resuming the container, but i'd like to keep using the current solution unless i am missing any major drawbacks about it.
I am using spring-kafka 2.2.7 and cannot upgrade it.
That version is no longer supported.
Version 2.3 added backoff and exception classification to the STCEH, eliminating the need for a retry template at the listener level.
That said, you can use stateful retry (https://docs.spring.io/spring-kafka/docs/current/reference/html/#stateful-retry) with a STCEH that always retries, and do the dead letter publishing in the RecoveryCallback at the listener level. The consumer record is available in the retry context with the RetryingMessageListenerAdapter.CONTEXT_RECORD key.
Since you are doing manual acks, you will also need to commit the offset via the CONTEXT_CONSUMER key.

Kafka consumer consuming older messages on restarting

I am consuming Kafka messages from a topic, but the issue is that every time the consumer restarts it reads older processed messages.
I have used auto.offset.reset=earliest. Will setting it manually using commit async help me overcome this issue?
I see that Kafka already has enabled auto commit to true by default.
I have used auto.offset.reset=earliest. Wwill setting it manually
using commit async help me overcome this issue?
When the setting auto.offset.reset=earliest is set the consumer will read from the earliest offset that is available instead of from the last offset. So, the first time you start your process with a new group.id and set this to earliest it will read from the starting offset.
Here is how we the issue can be debugged..
If your consumer group.id is same across every restart, you need to check if the commit is actually happening.
Cross check if you are manually overriding enable.auto.commit to false anywhere.
Next, check the auto commit interval (auto.commit.interval.ms) which is by default 5 sec and see if you have changed it to something higher and that you are restarting your process before the commit is getting triggered.
You can also use commitAsync() or even commitSync() to manually trigger. Use commitSync() (blocking call) for testing if there is any exception while committing. Few possible errors during committing are (from docs)
CommitFailedException - When you are trying to commit to partitions
that are no longer assigned to this consumer because the consumer is
for example no longer part of the group this exception would be thrown
RebalanceInProgressException - If the consumer instance is in the
middle of a rebalance so it is not yet determined which partitions
would be assigned to the consumer.
TimeoutException - if the timeout specified by
default.api.timeout.ms expires before successful completion of the
offset commit
Apart from this..
Also check if you are doing seek() or seekToBeginning() in your consumer code anywhere. If you are doing this and calling poll() you will likely get older messages also.
If you are using Embedded Kafka and doing some testing, the topic and the consumer groups will likely be created everytime you restart your test, there by reading from start. Check if it is a similar case.
Without looking into the code it is hard to tell what exactly is the error. This answer provides only an insight on debugging your scenario.

Kafka fails to keep track of last-commited offset

Is there any known issue with kakfa-broker in managing the offsets? Bcz, problem which we are facing is when we try to restart of kafka-consumer(i.e, app restart) sometimes all the offset are reset to 0.
Completely clueless on why are consumers not able to start from the last commited offset.
We are eventually facing this issue in prod wherein the whole q events are replayed again :
spring-boot version -- 2.2.6 release
spring-kafka - 2.3.7 release
kafka-client -2.3.1
apache-kafka - kafka_2.12-2.3.1
We have 10 topics with 50 partitions for each topic which belongs to same group, we increase topic-partition and consumer count at run-time based on load.
auto-commit = false
sync commit each offset after processing
max-poll-records is set to 1
After all this config it runs as expected in local setup, after deployed to prod we see such issues nut not at every restart.
Is there any config that i'm missing.
Completely Clueless!!!!!
Do not enable auto commit per the suggestion in another answer; the listener container will more reliably commit the offsets and, as you say, you don't have the problem all the time.
Is it possible that you receive no records for a week?
Or, is it possible that your broker has a shorter offsets.retention.minutes property?
In 2.0, it was changed from a 1 day default to 1 week. If the offsets have been removed because they expired and you restart the consumer, you'll get the behavior you observe.
You need to make sure that:
1) You are using the same Consumer Group ID
2) auto.offset.reset is set to latest
spring.kafka.consumer.group-id=your-consumer-group-id
spring.kafka.consumer.auto-offset-reset=latest
In case you are still seeing this issue, try to enable auto-commit
spring.kafka.consumer.enable-auto-commit=true
and if the issue goes away then it means that your manual commits are not working as expected.

kafka offset management auto vs manual

I'm working on an application of spring boot which uses Kafka stream, in my application, I want to manage Kafka offset and commit the offset in case of the successful message processing only. This is important, to be certain I won't lose messages even if Kafka restarted or the zookeeper is down. my current situation is when my Kafka is down and up my consumer starts from the beginning and consumes all the previous messages.
also, I need to know what is the difference between managing the Kafka offset automatic using autoCommitOffset and manging it manually using HBase or zookeeper or checkpoints?
also, what are the benefits of managing it manually if there is an automatic config we can use?
You have no guarantee of durability with auto commit
Older Kafka clients did use Zookeeper for offset storage, but now it is all in the broker to minimize dependencies. Kafka Streams API has no way to integrate offset storage outside of Kafka itself, and so you must use the Consumer API to lookup and seek/commit offsets to external storage, if you choose to do so, however, you can still then end up with less than optimal message processing.
my current situation is when my Kafka is down and up my consumer starts from the beginning and consumes all the previous messages
Sounds like you set auto.offset.reset=earliest and you never commit any offsets at all...
The auto commit setting does a periodic commit, not "automatic after reading any message".
If you want to guarantee delivery, then you need to set at least acks=1 in the producer and actually do a commitSync in the consumer

KafkaIO checkpoint - how to commit offsets to Kafka

I'm running a job using Beam KafkaIO source in Google Dataflow and cannot find an easy way to persist offsets across job restarts (job update option is not enough, i need to restart the job)
Comparing Beam's KafkaIO against PubSubIO (or to be precise comparing PubsubCheckpoint with KafkaCheckpointMark) I can see that checkpoint persistence is not implemented in KafkaIO (KafkaCheckpointMark.finalizeCheckpoint method is empty) whereas it's implemented in PubsubCheckpoint.finalizeCheckpoint which does acknowledgement to PubSub.
Does this mean I have no means of reliably managing Kafka offsets on job restarts with minimum effort?
Options I considered so far:
Implement my own logic for persisting offsets - sounds complicated, I'm using Beam though Scio in Scala.
Do nothing but that would result in many duplicates on job restarts (topic has 30 days retention period).
Enable auto-commit but that would result in lost messages so even worse.
There two options : enable commitOffsetsInFinalize() in KafkaIO or alternately enable auto-commit in Kafka consumer configuration. Note that while commitOffsetsInFinalize() is more in sync with what has been processed in Beam than Kafka's auto-commit, it does not provide strong guarantees exactly-once processing. Imagine a two stage pipeline, Dataflow finalizes Kafka reader after the first stage, without waiting for second stage to complete. If you restart the pipeline from scratch at that time, you would not process the records that completed first stage, but haven't been processed by the second. The issue is no different for PubsubIO.
Regd option (2) : You can configure KafkaIO to start reading from specific timestamp (assuming Kafka server supports it (version 10+)). But does not look any better than enabling auto_commit.
That said, KafkaIO should support finalize. Might be simpler to use than enabling auto_commit (need to think about frequency etc). We haven't had many users asking for it. Please mention it on user#beam.apache.org if you can.
[Update: I am adding support for committing offsets to KafkaCheckpointMark in PR 4481]