Is there any mechanism to get notified (by a specific logfile entry or similar) in case an event within a kafka topic is expired due to retention policies? (I know this should avoided by design, but still).
I know about consumer lag monitoring tools for monitoring offset discrepancies between a published event and related consumer groups but they provide afaik only numbers (the offset difference).
In another simple words: How can we find out if kafka events were never consumed and therefore expired?
The log cleaner thread will output deletion events to the broker logs, but it'll reflect file segments not particular messages
Related
In my case, it is a valid possibility that a consumer is offline for a longer period. During that offline period, events are still published to the topic.
When the consumer comes back online, it will re-use its existing consumer group, which has been lagging. Is it possible to skip forward to the latest message only? That is, ignore all earlier messages. In other words, I want to alter the offset to the latest message prior to consuming.
There is the spring.kafka.consumer.auto-offset property, but as far as I understand, this is only applicable for new consumer groups. Here, I am re-using an existing consumer group when the consumer comes back online. That said, if there is a possibility to automatically prune a consumer group when its consumer goes offline, this property could work, but I am not sure if such functionality exists?
I am working with the Spring Boot Kafka integration.
You can use consumer seek method after you calculate the last offset then subtract one from that, commit, and start polling.
Otherwise, simply don't use the same group. Generate a unique one and/or disable auto commits, then you're guaranteed to always use the auto.offset.reset config and lag is meaningless across app restarts
I'm using regex pattern to subscribe a group of topics, which might be created dynamically. However, there might be quite a while before the consumer discovers the new created topics.
I can set the topic.metadata.refresh.interval.ms property to change the polling intervals, but I'm concerned that short intervals might lead to overhead. So I think a notification approach would be better, i.e, when a new topic is created, the creator will notify the consumer service.
I'm looking for an API forcing the consumer to refresh its topic metadata. Didn't find a after looking through kafka Consumer APIs...any ideas?
The only API for this would be to .close() the consumer and re-subscribe it upon receiving such "notification event"
Is there any way we can detect crash or shut down of consumer?
I want that kafka server publish event when mentioned situation to all kafka clients (publishers, consumers....).
Is it possible?
Kafka keeps track of the consumed offsets per consumer on special internal topics. You could setup a special "monitoring service", have that constantly consuming from those offset internal topics, and trigger any notification/alerting mechanisms as needed so that your publishers and other consumers are notified programatically. This other SO question has more details about that.
Depending on your use case, lag monitoring is also a really good way to know if your consumers are falling behind and/or crashed. There's multiple solutions for that out there, or again, you could build your own to customize alerting/notification behavior.
Topic messages are disappearing from a topic when using confluent client. The only ones I can see (while not reloading page), are messages which I create using the "Produce" option in the same page. Kafka configurations are ok (I think), but I still don't understand what is wrong?
Looks like you are producing and consuming messages through a web browser.
Consumers typically subscribe to a topic and commit the offsets which have been consumed. The subsequent polls do not return the older messages (unless you do a seek operation) but only the newly produced messages.
The term disappearing may be applicable in two contexts:
As said above, consumer has already consumed that message and doesn't consume it again (because it has polled it already)
Your topic retention policy could be deleting older messages. You can check this, by using built in tools like kafka-console-consumer or kafka-avro-console-consumer with --from-beginning flag. If the messages are there means that is an issue with your consumer.
If you are calling consumer.poll() on every reload, then you will only get the messages after the previous call to poll (i.e. produced after the last reload). In case, you want all messages that have been present in the topic, since beginning or since sometime, you need to seek from beginning or since some timestamp or offset. See seek in KafkaConsumer
Is it possible to track who the consumers are in Kafka? Perhaps some way for a consumer to 'register' itself as a consumer to a Kafka topic?
If #1 is possible, then is it also possible for Kafka to track the time when a consumer consumed a message?
It is possible to implement these features in the application itself, but I wonder if Kafka already provides some way to do this. I can't find any documentation on these features so perhaps this is not possible in Kafka, but it would be great to get confirmation. Thank you!
You can track consumer groups, but I do not think you can track consumers within the group very easily. Within a group, it gives you lag, and from lag, you would need to read that offset difference to actually get the time
There is no other such "registration process".
What you could do is develop an "interceptor" that is able to track messages and times throughout the system. That is how Confluent Control Center is able to graphically display if/when consumers get messages
However, that requires additional configurations on all consumers. More specifically, the interceptor on the classpath.