Current offset behavior when set by kafka-consumer-groups to earliest? - apache-kafka

I have a kafka topic with 25 partitions and the cluster has been running for 5 months.
As per my understanding for each partition for a given topic, the offset starts from 0,1,2... (un-bounded)
I see log-end-offset at a very high value (right now -> 1230628032)
I created a new consumer group with offset being set to earliest; so i expected the offset from which a client for that consumer group will start from offset 0.
The command which I used to create a new consumer group with offset to earliest:
kafka-consumer-groups --bootstrap-server <IP_address>:9092 --reset-offsets --to-earliest --topic some-topic --group to-earliest-cons --execute
I see the consumer group being created. I expected the current-offset being to 0; however when I described the consumer group the current offset was very high , at the moment --> 1143755193.
The record retention period set is for 7 days (standard value).
My question is why didn't we see the first offset from which a consumer from this consumer group will read 0? Has it to do something with data-retention?
Can anyone help understand this?

It is exactly data retention. It is highly probable that Kafka already removed old messages with offset 0 from your partitions, so it doesn't make sense to start from 0. Instead, Kafka will set offset to the earliest available message on your partition. You can check those offsets using:
./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list <IP_address>:9092 --topic some-topic --time -2
You will probably see values really close to what you're seeing as new consumer offset.
You can also try and set offset explicitly to 0:
./kafka-consumer-groups.sh --bootstrap-server <IP_address>:9092 --reset-offsets --to-offset 0 --topic some-topic --group to-earliest-cons --execute
However, you will see warning that offset 0 does not exist and it will use higher value (aforementioned earliest message available)
New offset (0) is lower than earliest offset for topic partition some-topic. Value will be set to 1143755193

Related

How do we know we reached the last message in the kafka topic using KafkaMessageListenerContainer

I am running consumer using KafkaMessageListenerContainer.I need to stop the consumer on the topic's last message.How can i identify particular message is the last message in the topic.
You can get it with the following shell command (remove --partition parameter to get offsets for all topic's partitions):
./bin/kafka-run-class kafka.tools.GetOffsetShell --broker-list <host>:<port> --topic <topic-name> --partition <partition-number> --time -1
As you can see, this is using the GetOffsetShell [0] object that you may use to get the last message offset and compare it with record's offset.
[0] https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/GetOffsetShell.scala

How to get log end offset of all partitions for a given kafka topic using kafka command line?

When I describe a kafka topic it doesn't show the log end offset of any partition but show all the other metadata such as ISR,Replicas,Leader.
How do I see a log end offset of the partition for a given topic?
Ran this: ./kafka-topics.sh --zookeeper zk-service:2181 --describe --topic "__consumer_offsets"
Output Doesn't have a offset column.
Note: Need Only the log end offset.
Since you're only looking for the log end offset for a topic, you can use kafka-run-class with the kafka.tools.GetOffsetShell class.
Assuming your topic is __consumer_offsets, you would get the end offset by running:
./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --time -1 --topic __consumer_offsets
Change the --broker-list localhost:9092 to your desired Kafka address. This will list all of the log end offsets for each partition in the topic.
install kafkacat, its an easy to use kafka tool:
sudo apt-get update
sudo apt-get install kafkacat
kafkacat -C -b <kafka-broker-ip-and-port> -t <topic> -o -1
This will not consume anything because the offset is incremented after a message is added. But it will give you the offsets for all the partitions. Note however that this isn't the current offset that you are consuming at... The above answers will help you more in terms of looking into partition lag.
Following is the command you would need to get the offset of all partitions for a given kafka topic for a given consumer group:
kafka-consumer-groups --bootstrap-server <kafka-broker-list-with-ports> --describe --group <consumer-group-name>
Please note that the <consumer-group-name> at the end is important as the offsets are committed by consumers that are typically a part of a consumer group.
The output of this command may look something like:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
<topic-name> 0 62 62 0 <consumer-id> <host> <client>
In your post however, you're trying to get this information for the internal topic __consumer_offsets so you would need a consumer group which would have consumers consuming from this internal topic. You could perhaps do the following:
kafka-console-consumer --bootstrap-server <kafka-broker-list-with-ports> --topic __consumer_offsets --formatter "kafka.coordinator.group.GroupMetadataManager\$OffsetsMessageFormatter" --max-messages 5
Output of the above command:
[<consumer-group-name>,<topic-name>,0]::[OffsetMetadata[481690879,NO_METADATA],CommitTime 1479708539051,ExpirationTime 1480313339051]
Just use the <consumer-group-name> from the output and put it in the kafka-consumer-groups command mentioned in the beginning and you'll get the offset details for all the 50 partitions for the given consumer group only.
I hope this helps.

WARN New offset is higher than latest offset. Why I cannot change offset in Kafka?

I use the last version Kafka 1.1.0
Use command:
./kafka-consumer-groups.sh --bootstrap-server localhost:9093 --describe --group notifications-service-group
Then I try change offset for topic "notification", partition № 4:
./kafka-consumer-groups.sh --bootstrap-server localhost:9093 --group notifications-service-group --reset-offsets --topic notification:4 --to-offset 229400
and I see error:
WARN New offset (229400) is higher than latest offset for topic partition notification-4. Value will be set to 0 (kafka.admin.ConsumerGroupCommand$)
I try use any numbers from 1 to 229400 and get same error.
What means that error and how solve it?
In your command to update the offset you have the topic set as notification but the topic name is notifications with an 's'. So notification:4 -> notifications:4 should work.

Reset kafka LAG (change offset) within consumer group in Kafka-python

I found this where I reset my LAG with the kafka-consumer-groups.sh tool How to change start offset for topic? but I am needing to reset it within the application. I found this example, but it doesn't seem to reset it. kafka-python read from last produced message after a consumer restart example
consumer = KafkaConsumer("MyTopic", bootstrap_servers=self.kafka_server + ":" + str(self.kafka_port),
enable_auto_commit=False,
group_id="MyTopic.group")
consumer.poll()
consumer.seek_to_end()
consumer.commit()
... continue on with other code...
Running bin\windows\kafka-consumer-groups.bat --bootstrap-server localhost:9092 --group MyTopic.group --describe still shows that both partitions have a LAG. How can I get the current-offset to "fast-foward" to the end?
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
MyTopic 0 52110 66195 14085 kafka-python-1.4.2-6afb6901-c651-4534-a482-15358db42c22 /Host1 kafka-python-1.4.2
MyTopic 1 52297 66565 14268 kafka-python-1.4.2-c70e0a71-7d61-46a1-97bc-aa2726a8109b /Host2 kafka-python-1.4.2
You may want this:
def consumer_from_offset(topic, group_id, offset):
"""return the consumer from a certain offset"""
consumer = KafkaConsumer(bootstrap_servers=broker_list, group_id=group_id)
tp = TopicPartition(topic=topic, partition=0)
consumer.assign([tp])
consumer.seek(tp, offset)
return consumer
consumer = consumer_from_offset('topic', 'group', 0)
for msg in consumer:
# it will consume the msg beginning from offset 0
print(msg)
In order to "fast forward" the offset of consumer group, means to clear the LAG, you need to create new consumer that will join the same group.
the console command for that is:
kafka-console-consumer.sh --bootstrap-server <brokerIP>:9092 --topic <topicName> --consumer-property group.id=<groupName>
In parallel you can run the command to see the lags like you described, and you will see the lag wiped.

Why kafka consumer (0.10.0.0) for new consumer group does see old/previously published messages?

I have a producer that publishes messages on a topic called 'mytopic' just fine. I have 2 consumers in 2 different consumer groups listening for these messages. I started these 2 consumers and producer in following sequence.
1) Start consumer 1 in group 'group1'
2) start producer to publish several hundreds messages
After sometime I check the offset of consumer 1, which is as I expect:
/opt/kafka_2.11-0.10.0.0/bin/kafka-consumer-offset-checker.sh --zookeeper localhost:2181 --topic mytopic --group group1
Output:
Group Topic Pid Offset logSize Lag Owner
group1 mytopic 0 30230 36942 6712 none
3) Now I start consumer 2 in group 'group2' to listen to the same messages but it comes back with 0 messages on every poll() call.
The offset check for this consumer shows me that its offset is same as the logSize.
/opt/kafka_2.11-0.10.0.0/bin/kafka-consumer-offset-checker.sh --zookeeper localhost:2181 --topic mytopic --group group2
Output:
Group Topic Pid Offset logSize Lag Owner
group2 mytopic 0 36942 36942 0 none
Same problem for any other consumer with a new consumer group. Why is the consumer joining a new consumer group after messages have been published not seeing the old messages even though messages exists on the topic (ie., haven't been deleted)?
You need to change parameter setting auto.offset.reset to value "earliest" in you consumer configuration -- default value is "latest" telling a new consumer to start consuming at the current end of the log.