Kafka fails to keep track of last-commited offset - apache-kafka

Is there any known issue with kakfa-broker in managing the offsets? Bcz, problem which we are facing is when we try to restart of kafka-consumer(i.e, app restart) sometimes all the offset are reset to 0.
Completely clueless on why are consumers not able to start from the last commited offset.
We are eventually facing this issue in prod wherein the whole q events are replayed again :
spring-boot version -- 2.2.6 release
spring-kafka - 2.3.7 release
kafka-client -2.3.1
apache-kafka - kafka_2.12-2.3.1
We have 10 topics with 50 partitions for each topic which belongs to same group, we increase topic-partition and consumer count at run-time based on load.
auto-commit = false
sync commit each offset after processing
max-poll-records is set to 1
After all this config it runs as expected in local setup, after deployed to prod we see such issues nut not at every restart.
Is there any config that i'm missing.
Completely Clueless!!!!!

Do not enable auto commit per the suggestion in another answer; the listener container will more reliably commit the offsets and, as you say, you don't have the problem all the time.
Is it possible that you receive no records for a week?
Or, is it possible that your broker has a shorter offsets.retention.minutes property?
In 2.0, it was changed from a 1 day default to 1 week. If the offsets have been removed because they expired and you restart the consumer, you'll get the behavior you observe.

You need to make sure that:
1) You are using the same Consumer Group ID
2) auto.offset.reset is set to latest
spring.kafka.consumer.group-id=your-consumer-group-id
spring.kafka.consumer.auto-offset-reset=latest
In case you are still seeing this issue, try to enable auto-commit
spring.kafka.consumer.enable-auto-commit=true
and if the issue goes away then it means that your manual commits are not working as expected.

Related

Is it possible to force abort a Kafka transaction?

We have a test Kafka cluster that we were experimenting with adjusting various settings.
One of the settings that was adjusted was to set the transaction.max.timeout.ms to 7 days.
While that setting was in place we had a network failure to one of the ZK nodes. It was brief but enough that it triggered a broker leader election. This leader election wasn't clean as it only registered 6 of the 8 brokers when it came up. We manually triggered another election and everything came up cleanly.
The problem that we have now is that we have a bunch of zombie transactions that have not aborted or committed.
This means that our apps that use transactions/have an isolation level of read_committed are no longer reading from certain partitions.
I know this is because that Last Stable Offset (LSO) is at the point where the transaction was created.
I've tested this by using the console consumer to read from a particular topic:partition offset and it was fine and then added --isolation-level read_committed and it doesn't return any records.
Is there any way to force the transaction coordinator to abort the zombie transactions or to manually set the LSO? I've even 'purged' the topic by setting retention.ms to 100 and seen the consumer group offset record shift but any read_committed clients still wont read from the partition and the consumer group wont advance past the log rotation.
Thanks

Kafka consumer consuming older messages on restarting

I am consuming Kafka messages from a topic, but the issue is that every time the consumer restarts it reads older processed messages.
I have used auto.offset.reset=earliest. Will setting it manually using commit async help me overcome this issue?
I see that Kafka already has enabled auto commit to true by default.
I have used auto.offset.reset=earliest. Wwill setting it manually
using commit async help me overcome this issue?
When the setting auto.offset.reset=earliest is set the consumer will read from the earliest offset that is available instead of from the last offset. So, the first time you start your process with a new group.id and set this to earliest it will read from the starting offset.
Here is how we the issue can be debugged..
If your consumer group.id is same across every restart, you need to check if the commit is actually happening.
Cross check if you are manually overriding enable.auto.commit to false anywhere.
Next, check the auto commit interval (auto.commit.interval.ms) which is by default 5 sec and see if you have changed it to something higher and that you are restarting your process before the commit is getting triggered.
You can also use commitAsync() or even commitSync() to manually trigger. Use commitSync() (blocking call) for testing if there is any exception while committing. Few possible errors during committing are (from docs)
CommitFailedException - When you are trying to commit to partitions
that are no longer assigned to this consumer because the consumer is
for example no longer part of the group this exception would be thrown
RebalanceInProgressException - If the consumer instance is in the
middle of a rebalance so it is not yet determined which partitions
would be assigned to the consumer.
TimeoutException - if the timeout specified by
default.api.timeout.ms expires before successful completion of the
offset commit
Apart from this..
Also check if you are doing seek() or seekToBeginning() in your consumer code anywhere. If you are doing this and calling poll() you will likely get older messages also.
If you are using Embedded Kafka and doing some testing, the topic and the consumer groups will likely be created everytime you restart your test, there by reading from start. Check if it is a similar case.
Without looking into the code it is hard to tell what exactly is the error. This answer provides only an insight on debugging your scenario.

Kafka Auto-Commit doesn't effect for slower transactions

Using Spring Boot Kafka - v 2.2.1.RELEASE (which uses kafka-clients.jar v 2.3.1)
I'm using confluent v 5.1.2 for my Kafka Broker.
I've got enabled the autocommit to true and autocommit interval every 3 seconds.
I've got a spring based listener class (#KafkaListener) and each message processing takes about 6-7 seconds as it's a resource intensive operation.
What I noticed here is the offsets are never committed by the listener and the lag permanently remains as "550".
Because of this (I think), the messages are indefinitely processed by the listener.
When I changed the listener to manually commit the offset, it works fine.
Why is the autocommit not working in my case where a long running transaction is involved?
Is there any settings missing or set to too small/large a number? I'd like to know what's the recommended strategy for such long running transactions.
Thanks!
Auto commit runs on the consumer thread so auto.commit.interval.ms should be considered a minimum - the actual commit won't happen until the next poll.
However, Spring recommends not using auto commit and letting the container perform the commits (e.g. with AckMode.RECORD); it is more deterministic and reliable.

Pentaho Data Integration - Kafka Consumer

I am using the Kafka Consumer Plugin for Pentaho CE and would appreciate your help in its usage. I would like to know if any of you were in a situation where pentaho failed and you lost any messages (based on the official docs there's no way to read the message twice, am I wrong ?). If this situation occurs how do you capture these messages so you can reprocess them?
reference:
http://wiki.pentaho.com/display/EAI/Apache+Kafka+Consumer
Kafka retains messages for the configured retention period whether they've been consumed or not, so it allows consumers to go back to an offset they previously processed and pick up there again.
I haven't used the Kafka plugin myself, but it looks like you can disable auto-commit and manage that yourself. You'll probably need the Kafka system tools from Apache and some command line steps in the job. You'd have to fetch the current offset at the start, get the last offset from the messages you consume and if the job/batch reaches the finish, commit that last offset to the cluster.
It could be that you can also provide the starting offset as a field (message key?) to the plugin, but I can't find any documentation on what that does. In that scenario, you could store the offset with your destination data and go back to the last offset there at the start of each run. A failed run wouldn't update the destination offset, so would not lose any messages.
If you go the second route, pay attention to the auto.offset.reset setting and behavior, as it may happen that the last offset in your destination has already disappeared from the cluster if it's been longer than the retention period.

What determines Kafka consumer offset?

I am relatively new to Kafka. I have done a bit of experimenting with it, but a few things are unclear to me regarding consumer offset. From what I have understood so far, when a consumer starts, the offset it will start reading from is determined by the configuration setting auto.offset.reset (correct me if I am wrong).
Now say for example that there are 10 messages (offsets 0 to 9) in the topic, and a consumer happened to consume 5 of them before it went down (or before I killed the consumer). Then say I restart that consumer process. My questions are:
If the auto.offset.reset is set to earliest, is it always going to start consuming from offset 0?
If the auto.offset.reset is set to latest, is it going to start consuming from offset 5?
Is the behavior regarding this kind of scenario always deterministic?
Please don't hesitate to comment if anything in my question is unclear.
It is a bit more complex than you described.
The auto.offset.reset config kicks in ONLY if your consumer group does not have a valid offset committed somewhere (2 supported offset storages now are Kafka and Zookeeper), and it also depends on what sort of consumer you use.
If you use a high-level java consumer then imagine following scenarios:
You have a consumer in a consumer group group1 that has consumed 5 messages and died. Next time you start this consumer it won't even use that auto.offset.reset config and will continue from the place it died because it will just fetch the stored offset from the offset storage (Kafka or ZK as I mentioned).
You have messages in a topic (like you described) and you start a consumer in a new consumer group group2. There is no offset stored anywhere and this time the auto.offset.reset config will decide whether to start from the beginning of the topic (earliest) or from the end of the topic (latest)
One more thing that affects what offset value will correspond to earliest and latest configs is log retention policy. Imagine you have a topic with retention configured to 1 hour. You produce 5 messages, and then an hour later you post 5 more messages. The latest offset will still remain the same as in previous example but the earliest one won't be able to be 0 because Kafka will already remove these messages and thus the earliest available offset will be 5.
Everything mentioned above is not related to SimpleConsumer and every time you run it, it will decide where to start from using the auto.offset.reset config.
If you use Kafka version older than 0.9, you have to replace earliest, latest with smallest,largest.
Just an update: From Kafka 0.9 and forth, Kafka is using a new Java version of the consumer and the auto.offset.reset parameter names have changed; From the manual:
What to do when there is no initial offset in Kafka or if the current
offset does not exist any more on the server (e.g. because that data
has been deleted):
earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
none: throw exception to the consumer if no previous offset is found
for the consumer's group
anything else: throw exception to the consumer.
I spent some time to find this after checking the accepted answer, so I thought it might be useful for the community to post it.
Further more there's offsets.retention.minutes. If time since last commit is > offsets.retention.minutes, then auto.offset.reset also kicks in