CURRENT-OFFSET and LAG of kafka consumer group that has no active members - apache-kafka

How are these two set? Behaviour that I observe with kafka-consumer-groups.sh is that when new message is appended to a certain partition, it increments at first its LOG-END-OFFSET and LAG columns, and after some time, CURRENT-OFFSET column gets incremented and LAG column gets decremented, although no offset was actually commited by any consumer, as there are no active consumers. Am I right, and is this always happening with consumer groups that have no active members, or is there a possibility to turn off the second stage, that simulates commiting offsets by non-existing consumers? This is actually confusing, you have to take into account the information that there are no active members in a consumer group, in order to have the right perspective of what the CURRENT-OFFSET and LAG columns actually mean (not much in that case).
OK, it seems that the consumer actually does continuously connect and poll the messages and commits the offsets, but in a volatile fashion (disconnecting each time) so that kafka-consumer-groups.sh always reports as if there are no active members in a group.
This is a flink job that acts this way. Is that possible?

If the retention policy kicks in, and deletes old messages, the lag could decrease (if published logs are less than deleted ones), since the CURRENT-OFFSET positions itself at the earliest avaliable log.
I'd check what's the retention policy for your topic, since this may be due to deleted messages: The lag doesn't care about purgued messages, only active ones.

This has nothing to do with connecting to and disconnecting from the kafka cluster, that would be way to slow and ineffective. It has to do with the way that flink kafka consumer is implemented, which is described here: Flink Kafka Connector
The committed offsets are only a means to expose the consumer’s
progress for monitoring purposes.
What it basically does, it does not subscribe to topics as standard consumers that use consumer groups and their standard coordinators and leader mechanisms, but it directly assigns partitions, and only commits offsets to a consumer group for monitoring purposes, although it has methods of using these offsets for continuation too, see here, but anyway, that is why these groups appear to kafka as non having active members, and still getting offsets commited.

Related

Kafka consumer group's offset stuck for one topic

I have an application that uses fs2-kafka for reading business events from a kafka cluster. In that application, I have multiple fs2-kafka consumers, each subscribed to a different topic. But one of the consumers seems to be stuck, as it does not consume any events.
Checking the consumer group's offsets yielded the following results:
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
consumer topic 0 - 5 - consumer-consumer-1-99c1c19a-faaf-40e6-a3dc-75b7d04e96f9 /10.0.3.2 consumer-consumer-1
(edited this slightly cause privacy)
I have also managed to get the CURRENT-OFFSET to be 1 (though, it seems like no actual consuming happened because none of my logs were triggered), but regardless - the group does not seem to want to move its offset.
The topic has just one partition and there's only one consumer/consumer group reading from that topic. There is no reason I can see for kafka to hold consumers from consuming. If it matters - that topic, as well as any other topic in this cluster, is created automatically, using kafka's "AUTO_CREATE_TOPICS". (this is a dev environment, it's simply more convinient than creating topics by hand)
The strangest thing is this - the same code, working on a different topic, works. Also, as it is always the case with these things, on my laptop the issue does not reproduce. There's barely any differences between my local kafka and the kafka in our dev cluster.
Originally, I had just one consumer group for the entire application. I have now tried multiple consumer groups per consumer and even sharing a consumer for reading from multiple topics. The only topic that's stuck is this one, every other topic works.
I have also tried:
Restarting kafka and the app, updating kafka to a newer version
Manually resetting the consumer group's offsets
Deleting the topic
Apart from deleting all the data of kafka, I believe I have tried everything on my and kafka's side.

Kafka consumer-group liveness empty topic partitions

Following up on this question - I would like to know semantics between consumer-groups and offset expiry. In general I'm curious to know, how kafka protocol determines some specific offset (for consumer-group, topic, partition combination) to be expired ? Is it basing on periodic commits from consumer that are part of the group-protocol or does the offset-tick gets applied after all consumers are deemed dead/closed ? Im thinking this could have repercussions when dealing with topic-partitions to which data isn't produced frequently. In my case, we have a consumer-group reading from a fairly idle topic (not much data produced). Since, the consumer-group doesnt periodically commit any offsets, can we ever be in danger of loosing previously committed offsets. For example, when some unforeseen rebalance happens, the topic-partitions could get re-assigned with lost offset-commits and this could cause the consumer to read data from the earliest (configured auto.offset.reset) point ?
For user-topics, offset expiry / topic retention is completely decoupled from consumer-group offsets. Segments do not "reopen" when a consumer accesses them.
At a minimum, segment.bytes, retention.ms(or minutes/hours), retention.bytes all determine when log segments get deleted.
For the internal __consumer_offsets topic, offsets.retention.minutes controls when it is deleted (also in coordination with its segment.bytes).
The LogCleaner thread actively removes closed segments on a periodic basis, not the consumers. If a consumer is lagging considerably, and upon requesting offsets from a segment that had been deleted, then the auto.offset.reset gets applied.

Horizontally Scaled Kafka Consumers consuming different offsets

I've been developing a kafka consumer application (C# in kubernetes) and have been running it as a single node for a while, consuming from a single topic.
I noticed today that the topic I have been consuming from was quite full - I was doing continuous processing and was at offsets around ~38k (in general, agnostic of partition), but records my producer were putting on the topic (also, ignoring partition differences) were around offsets ~58k.
I decided to scale up another consumer pod - same code and config all around (group id, etc)
When it came online, it logged that it was processing messages in the ~58k offset range. I considered that this was maybe just a different partition, but I can see the same partition in both logs (with different offsets).
I was under the impression that if multiple consumers had the same groupid, that message consumption would be balanced between them, in order.
In other words, why wouldn't my second (or n-th) consumer come online and process messages in the same offset range as my first consumer which has been running for days?
I did eye some of the IConsumer settings such as:
https://docs.confluent.io/platform/current/clients/confluent-kafka-dotnet/_site/api/Confluent.Kafka.ConsumerConfig.html#Confluent_Kafka_ConsumerConfig_QueuedMinMessages
which seems to specify the minimum number of messages to keep in a "local consumer queue" (Default: 100,000), but I don't know if this actually indicates that ConsumerA has laid claim on 100K+ messages and ConsumerB is naturally starting +100k down the line
Other notes:
What limited access I have to the Administrative Tools (Control Center) shows that my consumer group id is about 900k messages behind.
ControlCenter says my topic has 60 partitions
Autocommit is not off (default: true)
Regardless of the Autocommit setting I am still doing a _consumer.Commit(msg) at the finally{} block after processing each individual message
I don't want to kill my long-running consumer (that's still processing like a champ) in the event that there's an offset retention problem and I will "miss" all messages in the delta between these two

Kafka assigning partitions, do you need to commit offsets

Having an app that is running in several instances and each instance needs to consume all messages from all partitions of a topic.
I have 2 strategies that I am aware of:
create a unique consumer group id for each app instance and subscribe and commit as usual,
downside is kafka still needs to maintain a consumer group on behalf of each consumer.
ask kafka for all partitions for the topic and assign the consumer to all of those. As I understand there is no longer any consumer group created on behalf of the consumer in Kafka. So the question is if there still is a need for committing offsets as there is no consumer group on the kafka side to keep up to date. The consumer was created without assigning it a 'group.id'.
ask kafka for all partitions for the topic and assign the consumer to
all of those. As I understand there is no longer any consumer group
created on behalf of the consumer in Kafka. So the question is if
there still is a need for committing offsets as there is no consumer
group on the kafka side to keep up to date. The consumer was created
without assigning it a 'group.id'.
When you call consumer.assign() instead of consumer.subscribe() no group.id property is required which means that no group is required or is maintained by Kafka.
Committing offsets is basically keeping track of what has been processed so that you dont process them again. This may as well be done manually also. For example, reading polled messages and writing the offsets to a file once after the messages have been processed.
In this case, your program is responsible for writing the offsets and also reading from the next offset upon restart using consumer.seek()
The only drawback is, if you want to move your consumer from one machine to another, you would need to copy this file also.
You can also store them in some database that is accessible from any machine in case you don't want the file to be copied (though writing to a file may be relatively simpler and faster).
On the other hand, if there is a consumer group, so long as your consumer has access to Kafka, your Kafka will let your consumer automatically read from the last committed offset.
There will always be a consumer group setting. If you're not setting it, whatever consumer you're running will use its default setting or Kafka will assign one.
Kafka will keep track of the offset of all consumers using the consumer group.
There is still a need to commit offsets. If no offsets are being committed, Kafka will have no idea what has been read already.
Here is the command to view all your consumer groups and their lag:
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --all-groups

Apache Kafka Cleanup while consuming messages

Playing around with Apache Kafka and its retention mechanism I'm thinking about following situation:
A consumer fetches first batch of messages with offsets 1-5
The cleaner deletes the first 10 messages, so the topic now has offsets 11-15
In the next poll, the consumer fetches the next batch with offsets 11-15
As you can see the consumer lost the offsets 6-10.
Question, is such a situation possible at all? With other words, will the cleaner execute while there is an active consumer? If yes, is the consumer able to somehow recognize that gap?
Yes such a scenario can happen. The exact steps will be a bit different:
Consumer fetches message 1-5
Messages 1-10 are deleted
Consumer tries to fetch message 6 but this offset is out of range
Consumer uses its offset reset policy auto.offset.reset to find a new valid offset.
If set to latest, the consumer moves to the end of the partition
If set to earliest the consumer moves to offset 11
If none or unset, the consumer throws an exception
To avoid such scenarios, you should monitor the lead of your consumer group. It's similar to the lag, but the lead indicates how far from the start of the partition the consumer is. Being near the start has the risk of messages being deleted before they are consumed.
If consumers are near the limits, you can dynamically add more consumers or increase the topic retention size/time if needed.
Setting auto.offset.reset to none will throw an exception if this happens, the other values only log it.
Question, is such a situation possible at all? will the cleaner execute while there is an active consumer
Yes, if the messages have crossed TTL (Time to live) period before they are consumed, this situation is possible.
Is the consumer able to somehow recognize that gap?
In case where you suspect your configuration (high consumer lag, low TTL) might lead to this, the consumer should track offsets. kafka-consumer-groups.sh command gives you the information position of all consumers in a consumer group as well as how far behind the end of the log they are.