Kafka Consumer Attached to partition but not consuming messages - apache-kafka

I am new to Kafka. I have a single node Kafka broker(v 0.10.2) and a zookeeper (3.4.9). I am using new Kafka Consumer APIs. One strange thing I observed is when I am starting multiple Kafka consumers for multiple topics placed in a single group and on hitting ./kafka-consumer-groups.sh this script for the group. Few of the consumers are attached to the group but they do not consume any message.
Below are the stats of group command.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST
topic1 0 288 288 0 consumer-8-c9487cd3-573b-4c97-87c1-ddf2063ab5ae /<serverip> consumer-8
topic1 1 283 283 0 consumer-8-c9487cd3-573b-4c97-87c1-ddf2063ab5ae /<serverip> consumer-8
topic1 2 279 279 0 consumer-8-c9487cd3-573b-4c97-87c1-ddf2063ab5ae /<serverip> consumer-8
topic2 0 - 9 - consumer-1-b0476dc8-099c-4a62-a68c-e9dc9c0a5bed /<serverip> consumer-1
topic2 1 - 2 - consumer-1-b0476dc8-099c-4a62-a68c-e9dc9c0a5bed /<serverip> consumer-1
topic3 0 450 450 0 consumer-3-63c07703-17d0-471b-8c5f-17347699f108 /<serverip> consumer-3
topic4 1 - 54
- consumer-2-94dcc209-8377-45ce-8473-9ab0d85951c4 /<serverip>
topic2 2 441 441 0 consumer-5-bcfffc99-5915-41f4-b3e4-970baa204c14 /<serverip>
So can someone help me that why for topic topic2 partition 0 current-offset is showing - and lag is showing - but messages are still there in the server as LOG-END-OFFSET is showing 9.
This is happening very frequently and restarting the consumers solves the issue temporarily.
Any help will be appreciated.

Related

how to drain records in a kafka topic

During application planned maintenance activities , there is a need to drain all the messages a in kafka topic.
In MQ , we can monitor the queue depth and start the maintenance activities once all the messages are consumed. In kafka , do we have a similar mechanism to find out if all messages in the topic has been consumed and its safe to shutdown the producer and consumer ?
Using the following command you can monitor the LAG of your consumer group, once the lag is 0 means no more messages in topic to consume
./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group count_errors --describe
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER
count_errors logs 2 2908278 2908278 0 consumer-1_/10.8.0.55
count_errors logs 3 2907501 2907501 0 consumer-1_/10.8.0.43
count_errors logs 4 2907541 2907541 0 consumer-1_/10.8.0.177
count_errors logs 1 2907499 2907499 0 consumer-1_/10.8.0.115
count_errors logs 0 2907469 2907469 0 consumer-1_/10.8.0.126

Kafka console consumer commits wrong offset when using --max-messages

I have a kafka console consumer in version 1.1.0 that i use to get messages from Kafka.
When I use kafka-console-consumer.sh script with option --max-messages it seems like it is commiting wrong offsets.
I've created a topic and a consumer group and read some messages:
/kafka_2.11-1.1.0/bin/kafka-consumer-groups.sh --bootstrap-server 192.168.1.23:9092 --describe --group my-consumer-group
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
test.offset 1 374 374 0 - - -
test.offset 0 0 375 375 - - -
Than I read 10 messages like this:
/kafka_2.11-1.1.0/bin/kafka-console-consumer.sh --bootstrap-server 192.168.1.23:9092 --topic test.offset --timeout-ms 1000 --max-messages 10 --consumer.config /kafka_2.11-1.1.0/config/consumer.properties
1 var_1
3 var_3
5 var_5
7 var_7
9 var_9
11 var_11
13 var_13
15 var_15
17 var_17
19 var_19
Processed a total of 10 messages
But now offsets show that it read all the messages in a topic
/kafka_2.11-1.1.0/bin/kafka-consumer-groups.sh --bootstrap-server 192.168.1.23:9092 --describe --group my-consumer-group
Note: This will not show information about old Zookeeper-based consumers.
Consumer group 'my-consumer-group' has no active members.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
test.offset 1 374 374 0 - - -
test.offset 0 375 375 0 - - -
And now when I want to read more messages I get an error that there are no more messages in a topic:
/kafka_2.11-1.1.0/bin/kafka-console-consumer.sh --bootstrap-server 192.168.1.23:9092 --topic test.offset --timeout-ms 1000 --max-messages 10 --consumer.config /kafka_2.11-1.1.0/config/consumer.properties
[2020-02-28 08:27:54,782] ERROR Error processing message, terminating consumer process: (kafka.tools.ConsoleConsumer$)
kafka.consumer.ConsumerTimeoutException
at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:98)
at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:129)
at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:84)
at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:54)
at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
Processed a total of 0 messages
What do I do wrong? Why the offset moved to last message in topic and not just by 10 messages?
This is about auto commit feature of Kafka consumer. As mentioned in this link:
The easiest way to commit offsets is to allow the consumer to do it
for you. If you configure enable.auto.commit=true, then every five
seconds the consumer will commit the largest offset your client
received from poll(). The five-second interval is the default and is
controlled by setting auto.commit.interval.ms. Just like everything
else in the consumer, the automatic commits are driven by the poll
loop. Whenever you poll, the consumer checks if it is time to commit,
and if it is, it will commit the offsets it returned in the last poll.
So in your case when your consumer poll, it receives messages up to 500 (default value of max.poll.records) and after 5 seconds it commits largest offset that return from last poll (375 in your case) even you specify max-messages as 10.
--max-messages: The maximum number of messages to
consume before exiting. If not set,
consumption is continual.

Offset for consumer group resetted for one partition

During the last maintenance of Kafka, which required a rolling restart of kafka brokers, we witnessed a reset for consumer group offsets for certain partitions.
At 11:14 am, everything is fine for the consumer group and we don't see a consumer lag:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 0 105130857 105130857 0 st-...
...
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 6 78591770 78591770 0 st-...
However 5 minutes later, during the rolling restart of brokers, we have a reset for one partition and a consumer lag of millions of events.
$ bin/kafka-consumer-groups --bootstrap-server XXX:9093,XXX... --command-config secrets.config --group st-xx --describe
Note: This will not show information about old Zookeeper-based consumers.
[2019-08-26 12:44:13,539] WARN Connection to node -5 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2019-08-26 12:44:13,707] WARN [Consumer clientId=consumer-1, groupId=st-xx] Connection to node -5 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
Consumer group 'st-xx' has no active members.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 0 105132096 105132275 179
...
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 6 15239401 78593165 63353764 ...
In the last two hours, the offset for the partition hasn't recovered and we need to patch it now manually. We had similar issues during the last rolling restart of brokers.
Has anyone seen something like this before? The only clue we could find is this ticket, however we run Kafka version: 1.0.1-kafka3.1.0,

Kafka: Describe Consumer Group Offset

While the Kafka consumer application is up and running, we are able to use the kafka-consumer-groups.sh to describe and retrieve the offset status.
However, if the application goes down, then the command just displays the application is in REBALANCING.
Is there a way to just see the lag of a particular consumer group, even if the application is not up and running?
For example, I would like this output
GROUP|TOPIC|PARTITION|CURRENT-OFFSET|LOG-END-OFFSET|LAG
hrly_ingest_grp|src_hrly|4|63832846|63832846|0
hrly_ingest_grp|src_hrly|2|38372346|38372346|0
hrly_ingest_grp|src_hrly|0|58642250|58642250|0
hrly_ingest_grp|src_hrly|5|96295762|96295762|0
hrly_ingest_grp|src_hrly|3|50602337|50602337|0
hrly_ingest_grp|src_hrly|1|29288993|29288993|0
You can use kt (Kafka tool) - https://github.com/fgeller/kt
Command to query offset and lag will be as follow:
kt group -group groupName -topic topicName -partitions all
Even the consumer application is down, this command will show the offset of each consumer of that group
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group my-group
Output:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
topic3 0 241019 395308 154289 consumer2-e76ea8c3-5d30-4299-9005-47eb41f3d3c4 /127.0.0.1 consumer2
topic2 1 520678 803288 282610 consumer2-e76ea8c3-5d30-4299-9005-47eb41f3d3c4 /127.0.0.1 consumer2
topic3 1 241018 398817 157799 consumer2-e76ea8c3-5d30-4299-9005-47eb41f3d3c4 /127.0.0.1 consumer2
topic1 0 854144 855809 1665 consumer1-3fc8d6f1-581a-4472-bdf3-3515b4aee8c1 /127.0.0.1 consumer1
topic2 0 460537 803290 342753 consumer1-3fc8d6f1-581a-4472-bdf3-3515b4aee8c1 /127.0.0.1 consumer1
topic3 2 243655 398812 155157 consumer4-117fe4d3-c6c1-4178-8ee9-eb4a3954bee0 /127.0.0.1 consumer4

Explain replication-offset-checkpoint AND recovery-point-offset in Kafka

Can Some explain what these files means, present inside kafka broker logs.
root#a2md23297l:/tmp/kafka-logs-1# cat recovery-point-offset-checkpoint
0
5
my-topic 0 0
kafkatopic_R2P1_1 0 0
my-topic 1 0
kafkatopic_R2P1 0 0
test 0 0
root#a2md23297l:/tmp/kafka-logs-1# cat replication-offset-checkpoint
0
5
my-topic 0 0
kafkatopic_R2P1_1 0 2
my-topic 1 0
kafkatopic_R2P1 0 2
test 0 57
Fyi, my-topic,kafkatopic_R2P1_1,my-topic,kafkatopic_R2P1,test are the topics created.
Thanks in advance.
AFAIK: recovery-point-offset-checkpoint is the internal broker log where Kafka tracks which messages (from-to offset) were successfully checkpointed to disk.
replication-offset-checkpoint is the internal broker log where Kafka tracks which messages (from-to offset) were successfully replicated to other brokers.
For more details you can take a deeper look at: kafka/core/src/main/scala/kafka/server/LogOffsetMetadata.scala and ReplicaManager.scala. The code is commented pretty well.
Marko is spot on.
the starting two numbers (0- Not sure what this is) (5-Number of partitions that are present on that particular disk)
Numbers next to the topic name(0- Partition number of the topic)
next number is the offset which was flushed to the disk(recovery-point-offset-checpoint) and in replication-offset-checkpoint last offset which the replicas were successfully replicated the data