Apache Kafka - How to wait for subscription to finish - scala

I'm using Kafka 2.1.0 and want to publish a message once the subscription has taken place. Is there a way for the producer to know if a subscription has happened and then publish a message? Otherwise I'd be losing the 1st message every time.

From https://kafka.apache.org/documentation.html#newconsumerconfigs, auto.offset.reset states:
What to do when there is no initial offset in Kafka or if the current
offset does not exist any more on the server (e.g. because that data
has been deleted):
earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
none: throw exception to the consumer if no previous offset is found
for the consumer's group
anything else: throw exception to the consumer.
Default value of auto.offset.reset is latest. To ensure that your consumer doesn't loses out first record, you need to set auto.offset.reset to earliest.

Related

Kafka change Offset from Latest to earliest

I have a consumer with 'latest' offset. If I change to earliest, Does my consumer read the messages from starting offset (OR) does it continue from where latest offset left ? no change in consumer group name.
If you've started the consumer already, and it has created a group (committed offsets), then auto.offset.reset value doesn't matter; it'll continue reading from the last committed offset.
If you want to reset the offsets for an existing group, you need to run kafka-consumer-groups --reset-offsets tool, or manually call seek methods on the consumer before it starts polling.

Kaka auto.offset.reset query

My project uses Kafka 0.10.2 version. Iam setting enable.auto.commit=false and auto.offset.reset=latest in the consumer. If consumer is restarted after maintenance, the consumer is reading again from first offset instead of waiting for latest offset messages. Any reasons why is this happening? Have i understood the configurations wrongly?
My requirement is the consumer should not auto commit and should read only the new messages put into the topic when it is active.
Just because you aren't auto committing doesn't guarnatee there are no manual commits.
Regardless, auto.offset.reset=latest will never send the consumer group to the beginning of the topic. Sounds like whatever Kafka tool / library you are using is calling a consumer.seekToBeginning call on its own.
For Understanding purpose , The Consumer property auto.offset.reset determines what to do if there is no valid offset in Kafka for the Consumer’s Consumer Group Based on the below scenarios :
– When a particular Consumer Group starts the first time
– If the Consumer offset is less than the smallest offset
– If the Consumer offset is greater than the last offset
▪ The value can be one of:
– earliest: Automatically reset the offset to the earliest available
– latest: Automatically reset to the latest offset available
– none: Throw an exception if no previous offset can be found for the
ConsumerGroup
▪ The default is latest

Kafka consumer-client is not registering offset of consumer group on zookeeper

I'm trying to create multiple consumers with different consumer groups to a kafka topic using kafka-clients v.0.10.2.1. Although I'm not able to retrieve the last offset commited by a consumer group.
Currently my Consumer properties looks like this
Properties cproperties = new Properties();
cproperties.put(ConsumerConfig.GROUP_ID_CONFIG, groupID);
cproperties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, my-broker));
cproperties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
cproperties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
cproperties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, taskDecoder.getClass());
cproperties.put(ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG, "60000");
And without the property Auto-reset-offset I can't consume from the topic, but i can't use this config, I need the consumer group registered on zookeeper.
So, I need to create a consumer group on zookeeper /consumers too.
You need to include the property auto.offset.reset to earliest (or latest, depending on what are you trying to achieve) in order to avoid throwing an exception when an offset is not found (probably because data is deleted).
You also need to make sure that you manually commit offsets since you've disable auto-commit.
cproperties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
To do so, you can either use commitSync()
This commits offsets to Kafka. The offsets committed using this API
will be used on the first fetch after every rebalance and also on
startup. As such, if you need to store offsets in anything other than
Kafka, this API should not be used. The committed offset should be the
next message your application will consume, i.e.
lastProcessedMessageOffset + 1.
or commitAsync()
This commits offsets only to Kafka. The offsets committed using this
API will be used on the first fetch after every rebalance and also on
startup. As such, if you need to store offsets in anything other than
Kafka, this API should not be used.
This is an asynchronous call and will not block. Any errors
encountered are either passed to the callback (if provided) or
discarded.
Note that if you don't commit offsets an exception will be thrown if auto.offset.reset is set to none.
What to do when there is no initial offset in Kafka or if the current
offset does not exist any more on the server (e.g. because that data
has been deleted):
earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
none: throw exception to the consumer if no previous offset is found for the consumer's group
anything else: throw exception to the consumer.

Kafka Consumer - Poll behaviour

I'm facing some serious problems trying to implement a solution for my needs, regarding KafkaConsumer (>=0.9).
Let's imagine I have a function that has to read just n messages from a kafka topic.
For example: getMsgs(5) --> gets next 5 kafka messages in topic.
So, I have a loop that looks like this. Edited with actual correct parameters. In this case, the consumer's max.poll.records param was set to 1, so the actual loop only iterated once. Different consumers(some of them iterated through many messages) shared an abstract father (this one), that's why it's coded that way. The numMss part was ad-hoc for this consumer.
for (boolean exit= false;!exit;)
{
Records = consumer.poll(config.pollTime);
for (Record r:records)
{
processRecord(r); //do my things
numMss++;
if (numMss==maximum) //maximum=5
{
exit=true;
break;
}
}
}
Taking this into account, the problem is that the poll() method could get more than 5 messages. For example, if it gets 10 messages, my code will forget forever those other 5 messages, since Kafka will think they're already consumed.
I tried commiting the offset but doesn't seem to work:
consumer.commitSync(Collections.singletonMap(partition,
new OffsetAndMetadata(record.offset() + 1)));
Even with the offset configuration, whenever I launch again the consumer, it won't start from the 6th message (remember, I just wanted 5 messages), but from the 11th (since the first poll consumed 10 messages).
Is there any solution for this, or maybe (most surely) am I missing something?
Thanks in advance!!
You can set max.poll.records to whatever number you like such that at most you will get that many records on each poll.
For your use case that you stated in this problem you don't have to commit offsets explicitly by yourself. you can just set enable.auto.commit to trueand set auto.offset.reset to earliest such that it will kick in when there is no consumer group.id (other words when you are about start reading from a partition for the very first time). Once you have a group.id and some consumer offsets stored in Kafka and in case your Kafka consumer process dies it will continue from the last committed offset since it is the default behavior because when a consumer starts it will first look for if there are any committed offsets and if so, will continue from the last committed offset and auto.offset.reset won't kick in.
Had you disabled auto commit by setting enable.auto.commit to false. You need to disable that if you want to manually commit the offset. Without that next call to poll() will automatically commit the latest offset of the messages you received from previous poll().
From Kafka 0.9 the auto.offset.reset parameter names have changed;
What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted):
earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
none: throw exception to the consumer if no previous offset is found for the consumer's group
anything else: throw exception to the consumer.
set auto.offset.reset property as "earliest". Then try consume, you will get the consumed records from the committed offset.
Or you use consumer.seek(TopicPartition, offset) api before poll.

What determines Kafka consumer offset?

I am relatively new to Kafka. I have done a bit of experimenting with it, but a few things are unclear to me regarding consumer offset. From what I have understood so far, when a consumer starts, the offset it will start reading from is determined by the configuration setting auto.offset.reset (correct me if I am wrong).
Now say for example that there are 10 messages (offsets 0 to 9) in the topic, and a consumer happened to consume 5 of them before it went down (or before I killed the consumer). Then say I restart that consumer process. My questions are:
If the auto.offset.reset is set to earliest, is it always going to start consuming from offset 0?
If the auto.offset.reset is set to latest, is it going to start consuming from offset 5?
Is the behavior regarding this kind of scenario always deterministic?
Please don't hesitate to comment if anything in my question is unclear.
It is a bit more complex than you described.
The auto.offset.reset config kicks in ONLY if your consumer group does not have a valid offset committed somewhere (2 supported offset storages now are Kafka and Zookeeper), and it also depends on what sort of consumer you use.
If you use a high-level java consumer then imagine following scenarios:
You have a consumer in a consumer group group1 that has consumed 5 messages and died. Next time you start this consumer it won't even use that auto.offset.reset config and will continue from the place it died because it will just fetch the stored offset from the offset storage (Kafka or ZK as I mentioned).
You have messages in a topic (like you described) and you start a consumer in a new consumer group group2. There is no offset stored anywhere and this time the auto.offset.reset config will decide whether to start from the beginning of the topic (earliest) or from the end of the topic (latest)
One more thing that affects what offset value will correspond to earliest and latest configs is log retention policy. Imagine you have a topic with retention configured to 1 hour. You produce 5 messages, and then an hour later you post 5 more messages. The latest offset will still remain the same as in previous example but the earliest one won't be able to be 0 because Kafka will already remove these messages and thus the earliest available offset will be 5.
Everything mentioned above is not related to SimpleConsumer and every time you run it, it will decide where to start from using the auto.offset.reset config.
If you use Kafka version older than 0.9, you have to replace earliest, latest with smallest,largest.
Just an update: From Kafka 0.9 and forth, Kafka is using a new Java version of the consumer and the auto.offset.reset parameter names have changed; From the manual:
What to do when there is no initial offset in Kafka or if the current
offset does not exist any more on the server (e.g. because that data
has been deleted):
earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
none: throw exception to the consumer if no previous offset is found
for the consumer's group
anything else: throw exception to the consumer.
I spent some time to find this after checking the accepted answer, so I thought it might be useful for the community to post it.
Further more there's offsets.retention.minutes. If time since last commit is > offsets.retention.minutes, then auto.offset.reset also kicks in