is there any way to detect a deleted consumer group in Kafka? - apache-kafka

having a Kafka server using Kafka to store consumer groups offsets, I wonder if __consumer-offsets topic would have an event for a deleted consumer group?
or how can I subscribe to this event without asking Kafka if a specific consumer group exists

There is no such Kafka event when a topic with cleanup.policy=delete removes a segment of messages.
You could parse the broker server logs and look for the LogCleaner actions and push them into a Kafka topic if you really wanted to, but you still would not know what groups were being removed.

Related

Retrieve Message Consuming Time in Kafka

Suppose I have a topic called topic1 in Kafka. And I have a consumer group called group1 which has 8 consumers consuming messages from topic1.
If I searched for a message in the past from Kafka, how can I get which consumer did cosume this message and when?
Kafka doesn't store any information about what actions consumers take ; it only cares about the offsets they commit back (if they do at all)
As commented, you're better off using log collection frameworks along with systems like Elasticsearch or Splunk for searching for such historical information

Kafka consumer: how to check if all the messages in the topic partition are completely consumed?

Is there any API or attributes that can be used or compared to determine if all messages in one topic partition are consumed? We are working on a test that will use another consumer in the same consumer group to check if the topic partition still has any message or not. One of our app services is also using Kafka to process internal events. So is there any way to sync the progress of message consumption?
Yes you can use the admin API.
From the admin API you can get the topic offsets for each partition, and a given consumer group offsets. If all messages read, the subtraction of the later from the first will evaluate to 0 for all partitions.

Kafka assigning partitions, do you need to commit offsets

Having an app that is running in several instances and each instance needs to consume all messages from all partitions of a topic.
I have 2 strategies that I am aware of:
create a unique consumer group id for each app instance and subscribe and commit as usual,
downside is kafka still needs to maintain a consumer group on behalf of each consumer.
ask kafka for all partitions for the topic and assign the consumer to all of those. As I understand there is no longer any consumer group created on behalf of the consumer in Kafka. So the question is if there still is a need for committing offsets as there is no consumer group on the kafka side to keep up to date. The consumer was created without assigning it a 'group.id'.
ask kafka for all partitions for the topic and assign the consumer to
all of those. As I understand there is no longer any consumer group
created on behalf of the consumer in Kafka. So the question is if
there still is a need for committing offsets as there is no consumer
group on the kafka side to keep up to date. The consumer was created
without assigning it a 'group.id'.
When you call consumer.assign() instead of consumer.subscribe() no group.id property is required which means that no group is required or is maintained by Kafka.
Committing offsets is basically keeping track of what has been processed so that you dont process them again. This may as well be done manually also. For example, reading polled messages and writing the offsets to a file once after the messages have been processed.
In this case, your program is responsible for writing the offsets and also reading from the next offset upon restart using consumer.seek()
The only drawback is, if you want to move your consumer from one machine to another, you would need to copy this file also.
You can also store them in some database that is accessible from any machine in case you don't want the file to be copied (though writing to a file may be relatively simpler and faster).
On the other hand, if there is a consumer group, so long as your consumer has access to Kafka, your Kafka will let your consumer automatically read from the last committed offset.
There will always be a consumer group setting. If you're not setting it, whatever consumer you're running will use its default setting or Kafka will assign one.
Kafka will keep track of the offset of all consumers using the consumer group.
There is still a need to commit offsets. If no offsets are being committed, Kafka will have no idea what has been read already.
Here is the command to view all your consumer groups and their lag:
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --all-groups

Replicator filter for already consumed messages

Why does confluent replicator replicate messages which are already consumed ? There are many use cases where a message is not needed on the topic once consumed. Is n't it a good idea to filter out already-consumed messages (provided there is a consumer-registry of some sort)
It shouldn't replicate messages already consumed if the offsets are being commited correctly back for the consumer group of the Replicator
Messages consumed by your other processes doesn't remove them from the Kafka topic, and when Replicator is part of another consumer group, it wouldn't know what's consumed

Does kafka support wildcard topic matching and subscription on the broker side?

Would like the consumer to subscribe to all topics I*
New topics starting with I may be created in the kafka cluster while the consumer is up and running already
Is there a way the kafka cluster/broker can auto subscribe the consumer to the new topics starting with I
Is the only way to do this kind of dynamic topic filtering/discovery/subscription is by the consumer calling metadata refresh every time interval and check for new topics added?
The consumer is the one that subscribed and polls. The brokers do not push this metadata to the consumers.