Azure Kafka trigger: how to receive only events after w specified offset - apache-kafka

How can I start reading topic messages for a specific offset or partition using Azure Trigger.
I’m building a new trigger for an existing topic but i do not want to consume all the previous messages.
Ive tried AutooffsetReset= AutoOffsetReset.Earliest or AutoOffsetReset.Latest in my cde in config file and Consumer consumes all the messages from all the offsets but this is not what i want.

There's no properties for setting specific offsets from consumer config.
If you know, or can configure, the consumer group id, then you'll need to externally use kafka-consumer-groups.sh to first set the group offsets to a particular position before starting the trigger.

Related

is there any way to detect a deleted consumer group in Kafka?

having a Kafka server using Kafka to store consumer groups offsets, I wonder if __consumer-offsets topic would have an event for a deleted consumer group?
or how can I subscribe to this event without asking Kafka if a specific consumer group exists
There is no such Kafka event when a topic with cleanup.policy=delete removes a segment of messages.
You could parse the broker server logs and look for the LogCleaner actions and push them into a Kafka topic if you really wanted to, but you still would not know what groups were being removed.

Kafka assigning partitions, do you need to commit offsets

Having an app that is running in several instances and each instance needs to consume all messages from all partitions of a topic.
I have 2 strategies that I am aware of:
create a unique consumer group id for each app instance and subscribe and commit as usual,
downside is kafka still needs to maintain a consumer group on behalf of each consumer.
ask kafka for all partitions for the topic and assign the consumer to all of those. As I understand there is no longer any consumer group created on behalf of the consumer in Kafka. So the question is if there still is a need for committing offsets as there is no consumer group on the kafka side to keep up to date. The consumer was created without assigning it a 'group.id'.
ask kafka for all partitions for the topic and assign the consumer to
all of those. As I understand there is no longer any consumer group
created on behalf of the consumer in Kafka. So the question is if
there still is a need for committing offsets as there is no consumer
group on the kafka side to keep up to date. The consumer was created
without assigning it a 'group.id'.
When you call consumer.assign() instead of consumer.subscribe() no group.id property is required which means that no group is required or is maintained by Kafka.
Committing offsets is basically keeping track of what has been processed so that you dont process them again. This may as well be done manually also. For example, reading polled messages and writing the offsets to a file once after the messages have been processed.
In this case, your program is responsible for writing the offsets and also reading from the next offset upon restart using consumer.seek()
The only drawback is, if you want to move your consumer from one machine to another, you would need to copy this file also.
You can also store them in some database that is accessible from any machine in case you don't want the file to be copied (though writing to a file may be relatively simpler and faster).
On the other hand, if there is a consumer group, so long as your consumer has access to Kafka, your Kafka will let your consumer automatically read from the last committed offset.
There will always be a consumer group setting. If you're not setting it, whatever consumer you're running will use its default setting or Kafka will assign one.
Kafka will keep track of the offset of all consumers using the consumer group.
There is still a need to commit offsets. If no offsets are being committed, Kafka will have no idea what has been read already.
Here is the command to view all your consumer groups and their lag:
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --all-groups

Group name is not polling data

I have topic with messages, and having consumer with group name as "KafkaConsumerExample". when i restart consumer, all the messages from topic was received without issues. but, when i change the name of my consumer group with same consumer code, consumer is not pulling data from topic ? what would be reason for this issue, changing the consumer name changes the behavior from topic. can you please help here ?
You cannot consume messages again using the same consumer group name. This is because kafka maintains offsets to maintain logs of the data it has consumed. This ensures kafka's exactly once semantics.
If you wish to consume the same data again from the topic, you need to change the name of the consumer group.
I hope this helps! let me know if I addressed your question or if there is anything else?
The problem you're running into is that when a new consumer group is used to read a topic, the first messages read are just after the messages already read by some other consumer group ( see auto.offset.reset explanation). This consumer config property lets you start reading at the latest offset (default, so you can start roughly where other groups have left off) but you want to set this property to "earliest" in order to force the consumer to read from the first message in each partition.

producer to multiple subcriber paradigm

We have a dispatcher which receives a message - and then 'fans' it out to multiple downstream environments.
Each set of downstream environment needs to consume this message.
Will it suffice to tag the different set of environments with different group.ID to force all the environments to consume the same message (1 producer - multiple subscriber broadcast).
If a particular environment (group) crashes,will it possible to replay the messages to the particular group only ?
Yes, this is typically how you achieve such a data flow.
If you have multiple consumer groups subscribed to the same topics, they will all consume all messages. As you said, you use the group.id configuration to identify each consumer groups.
In addition each consumer group tracks its own offsets. So you can easily make a particular group replay part of the log without impacting the other groups. This can be achieved for example by using the kafka-consumer-groups.sh tool with one of the reset options.
Yes, that's how Kafka works. So long as the retention for the topic is configured such, then any particular consumer group can re-consume from any offset in the log, whether the beginning or just the last point from which it successfully read. All other consumers are unaffected.

Using Kafka & Zookeeper, Different group id does not retrieve Kafka msgs from the beginning

We have kafka v2.10 and zookeeper v3.4 set up and working. We have written high level consumers consuming log msgs from Kafka. Consumer A starts up consuming msgs for topic T and group id G1 (following the high level consumer example provided on Apache Kafka documentation).
Then when consumer B starts up with the same topic T but group id G2, it connects to kafka/zookeeper, but consumes log msgs starting with the offset after the last one used by Consumer A.
My understanding is that it should be given log msgs starting with the lowest offset available in Kafka for that topic. Any idea why it's not doing that?
We are not replicating kafka or zookeeper yet. OUr set up at this point is simple and straight forward and we are trying to get them to work with basic functionality.
Any help is appreciated.
Also, do you know where we can locate the new directories that supposedly zookeeper is creating everytime a consumer with a new group id establishes connection with the zookeeper (for tracking offset for that group id)?
Can you try adding this to the configuration while creating the Consumer group
props.put("auto.offset.reset", "smallest");