Kafka consumer-client is not registering offset of consumer group on zookeeper - apache-kafka

I'm trying to create multiple consumers with different consumer groups to a kafka topic using kafka-clients v.0.10.2.1. Although I'm not able to retrieve the last offset commited by a consumer group.
Currently my Consumer properties looks like this
Properties cproperties = new Properties();
cproperties.put(ConsumerConfig.GROUP_ID_CONFIG, groupID);
cproperties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, my-broker));
cproperties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
cproperties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
cproperties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, taskDecoder.getClass());
cproperties.put(ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG, "60000");
And without the property Auto-reset-offset I can't consume from the topic, but i can't use this config, I need the consumer group registered on zookeeper.
So, I need to create a consumer group on zookeeper /consumers too.

You need to include the property auto.offset.reset to earliest (or latest, depending on what are you trying to achieve) in order to avoid throwing an exception when an offset is not found (probably because data is deleted).
You also need to make sure that you manually commit offsets since you've disable auto-commit.
cproperties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
To do so, you can either use commitSync()
This commits offsets to Kafka. The offsets committed using this API
will be used on the first fetch after every rebalance and also on
startup. As such, if you need to store offsets in anything other than
Kafka, this API should not be used. The committed offset should be the
next message your application will consume, i.e.
lastProcessedMessageOffset + 1.
or commitAsync()
This commits offsets only to Kafka. The offsets committed using this
API will be used on the first fetch after every rebalance and also on
startup. As such, if you need to store offsets in anything other than
Kafka, this API should not be used.
This is an asynchronous call and will not block. Any errors
encountered are either passed to the callback (if provided) or
discarded.
Note that if you don't commit offsets an exception will be thrown if auto.offset.reset is set to none.
What to do when there is no initial offset in Kafka or if the current
offset does not exist any more on the server (e.g. because that data
has been deleted):
earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
none: throw exception to the consumer if no previous offset is found for the consumer's group
anything else: throw exception to the consumer.

Related

Kaka auto.offset.reset query

My project uses Kafka 0.10.2 version. Iam setting enable.auto.commit=false and auto.offset.reset=latest in the consumer. If consumer is restarted after maintenance, the consumer is reading again from first offset instead of waiting for latest offset messages. Any reasons why is this happening? Have i understood the configurations wrongly?
My requirement is the consumer should not auto commit and should read only the new messages put into the topic when it is active.
Just because you aren't auto committing doesn't guarnatee there are no manual commits.
Regardless, auto.offset.reset=latest will never send the consumer group to the beginning of the topic. Sounds like whatever Kafka tool / library you are using is calling a consumer.seekToBeginning call on its own.
For Understanding purpose , The Consumer property auto.offset.reset determines what to do if there is no valid offset in Kafka for the Consumer’s Consumer Group Based on the below scenarios :
– When a particular Consumer Group starts the first time
– If the Consumer offset is less than the smallest offset
– If the Consumer offset is greater than the last offset
▪ The value can be one of:
– earliest: Automatically reset the offset to the earliest available
– latest: Automatically reset to the latest offset available
– none: Throw an exception if no previous offset can be found for the
ConsumerGroup
▪ The default is latest

Apache Kafka - How to wait for subscription to finish

I'm using Kafka 2.1.0 and want to publish a message once the subscription has taken place. Is there a way for the producer to know if a subscription has happened and then publish a message? Otherwise I'd be losing the 1st message every time.
From https://kafka.apache.org/documentation.html#newconsumerconfigs, auto.offset.reset states:
What to do when there is no initial offset in Kafka or if the current
offset does not exist any more on the server (e.g. because that data
has been deleted):
earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
none: throw exception to the consumer if no previous offset is found
for the consumer's group
anything else: throw exception to the consumer.
Default value of auto.offset.reset is latest. To ensure that your consumer doesn't loses out first record, you need to set auto.offset.reset to earliest.

Kafka manual offset managment issue

While implementing manual offset management, I encountered the following issue: (using 0.9)
In order to manage the offsets manually, for each consumed record, I retrieve the current offset of the record and commit the new offset (currentOffset + 1, since the offset reset strategy is "latest").
When a new consumer group is created, it has no explicit offsets (offset is "unknown"), therefore, if it didn't consume messages from all existing partitions before it is stopped, it will have committed offsets for only part of the partitions (the ones the consumer got messages from), while the offset for the rest of the partitions will still be "unknown".
When the consumer is started again, it gets only some of the messages that were produced while it was down (only the ones from the partitions that had a committed offset), the messages from partitions with "unknown" offset are lost and will never be consumed due to the offset reset strategy.
Since it's unacceptable in my case to miss any messages once a consumer group is created, I'd like to explicitly commit an offset for each partition before starting consumption.
To do that I found two options:
Use low level consumer to send an offset request.
Use high level consumer, call consumer.poll(0) (to trigger the assignment), then call consumer.assignment(), and for each TopicPartition call consumer.committed(topicPartition); consumer.seekToEnd(topicPartition); consumer.position(topicPartition) and eventually commit all offsets.
Both are more complex and noisy than I'd expect (I'd expect a simpler API I could use to get the log end position for all partitions assigned to a consumer).
Any thoughts or ideas for a better implementation would be appreciated.
10x.
Using consumer API totally depends upon where are you committing offsets.
If your offsets are getting stored in Kafka broker then definitely
you should use high-level consumer API it will provide you with more control
over offsets.
If you are keeping offsets in zookeeper than you can use any old consumer API like
List< KafkaStream < byte[], byte[] > > streams
=consumer.createMessageStreamsByFilter(new Whitelist(topicRegex),1)

kafka subscribe commit offset manually

I am using Kafka 9 and confused with the behavior of subscribe.
Why does it expects group.id with subscribe.
Do we need to commit the offset manually using commitSync. Even if don't do that I see that it always starts from the latest.
Is there a way a replay the messages from beginning.
Why does it expects group.id with subscribe?
The concept of consumer groups is used by Kafka to enable parallel consumption of topics - every message will be delivered once per consumer group, no matter how many consumers actually are in that group. This is why the group parameter is mandatory, without a group Kafka would not know how this consumer should be treated in relation to other consumers that might subscribe to the same topic.
Whenever you start a consumer it will join a consumer group, based on how many other consumers are in this consumer group it will then be assigned partitions to read from. For these partitions it then checks whether a list read offset is known, if one is found it will start reading messages from this point.
If no offset is found, the parameter auto.offset.reset controls whether reading starts at the earliest or latest message in the partition.
Do we need to commit the offset manually using commitSync? Even if
don't do that I see that it always starts from the latest.
Whether or not you need to commit the offset depends on the value you choose for the parameter enable.auto.commit. By default this is set to true, which means the consumer will automatically commit its offset regularly (how often is defined by auto.commit.interval.ms). If you set this to false, then you will need to commit the offsets yourself.
This default behavior is probably also what is causing your "problem" where your consumer always starts with the latest message. Since the offset was auto-committed it will use that offset.
Is there a way a replay the messages from beginning?
If you want to start reading from the beginning every time, you can call seekToBeginning, which will reset to the first message in all subscribed partitions if called without parameters, or just those partitions that you pass in.

Kafka Consumer - Poll behaviour

I'm facing some serious problems trying to implement a solution for my needs, regarding KafkaConsumer (>=0.9).
Let's imagine I have a function that has to read just n messages from a kafka topic.
For example: getMsgs(5) --> gets next 5 kafka messages in topic.
So, I have a loop that looks like this. Edited with actual correct parameters. In this case, the consumer's max.poll.records param was set to 1, so the actual loop only iterated once. Different consumers(some of them iterated through many messages) shared an abstract father (this one), that's why it's coded that way. The numMss part was ad-hoc for this consumer.
for (boolean exit= false;!exit;)
{
Records = consumer.poll(config.pollTime);
for (Record r:records)
{
processRecord(r); //do my things
numMss++;
if (numMss==maximum) //maximum=5
{
exit=true;
break;
}
}
}
Taking this into account, the problem is that the poll() method could get more than 5 messages. For example, if it gets 10 messages, my code will forget forever those other 5 messages, since Kafka will think they're already consumed.
I tried commiting the offset but doesn't seem to work:
consumer.commitSync(Collections.singletonMap(partition,
new OffsetAndMetadata(record.offset() + 1)));
Even with the offset configuration, whenever I launch again the consumer, it won't start from the 6th message (remember, I just wanted 5 messages), but from the 11th (since the first poll consumed 10 messages).
Is there any solution for this, or maybe (most surely) am I missing something?
Thanks in advance!!
You can set max.poll.records to whatever number you like such that at most you will get that many records on each poll.
For your use case that you stated in this problem you don't have to commit offsets explicitly by yourself. you can just set enable.auto.commit to trueand set auto.offset.reset to earliest such that it will kick in when there is no consumer group.id (other words when you are about start reading from a partition for the very first time). Once you have a group.id and some consumer offsets stored in Kafka and in case your Kafka consumer process dies it will continue from the last committed offset since it is the default behavior because when a consumer starts it will first look for if there are any committed offsets and if so, will continue from the last committed offset and auto.offset.reset won't kick in.
Had you disabled auto commit by setting enable.auto.commit to false. You need to disable that if you want to manually commit the offset. Without that next call to poll() will automatically commit the latest offset of the messages you received from previous poll().
From Kafka 0.9 the auto.offset.reset parameter names have changed;
What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted):
earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
none: throw exception to the consumer if no previous offset is found for the consumer's group
anything else: throw exception to the consumer.
set auto.offset.reset property as "earliest". Then try consume, you will get the consumed records from the committed offset.
Or you use consumer.seek(TopicPartition, offset) api before poll.