What does consumer LAG mean in Consumer Group - apache-kafka

I'm observing that Kafka Consumer is inconsistently not able to receive the messages when Producer trying to send it. When i checked kafka consumer , there are LAG values seen :
docker run --net=host --rm <docker image> kafka-consumer-groups --zookeeper localhost:2181 --describe --group mgmt_testing
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER
mgmt_testing mgmt_testing 0 44 44 0 mgmt_testing_aws-us-east-1-mr3-10-10-8-218-1561090200381-21858516-0
mgmt_testing mgmt_testing 1 35 35 0 mgmt_testing_aws-us-east-1-mr3-10-10-8-218-1561090200381-21858516-0
mgmt_testing mgmt_testing 2 39 39 0 mgmt_testing_aws-us-east-1-mr3-10-10-8-218-1561090200381-21858516-0
mgmt_testing mgmt_testing 3 37 37 0 mgmt_testing_aws-us-east-1-mr3-10-10-8-218-1561090200381-21858516-0
mgmt_testing mgmt_testing 4 25 38 13 mgmt_testing_aws-us-east-1-mr3-10-10-8-218-1561090200381-21858516-0
mgmt_testing mgmt_testing 5 458 666 208 mgmt_testing_aws-us-east-1-mr3-10-10-8-218-1561090200381-21858516-0
mgmt_testing mgmt_testing 6 808167 808181 14 mgmt_testing_aws-us-east-1-mr3-10-10-8-218-1561090200381-21858516-0
mgmt_testing mgmt_testing 7 434028 434041 13 mgmt_testing_aws-us-east-1-mr3-10-10-8-218-1561090200381-21858516-0
What does LAG mean here ? And will this be the reason that consumer is not able to receive the messages?

Essentially, lag is the fact that there will always be some delay between publish a message to a Kafka broker and consuming it. There's a good description on sematext's website: https://sematext.com/blog/kafka-consumer-lag-offsets-monitoring/

Related

kafka + how to add additional partitions to the existing XX partitions?

this is example how to create new 10 topic partitions with name - test_test
kafka-topics.sh --create --zookeeper zookeeper01:2181 --replication-factor 3 --partitions 10 --topic test_test
Created topic "test_test".
[root#kafka01 kafka-data]# \ls -ltr | grep test_test
drwxr-xr-x 2 kafka hadoop 4096 Mar 22 16:53 test_test-8
drwxr-xr-x 2 kafka hadoop 4096 Mar 22 16:53 test_test-5
drwxr-xr-x 2 kafka hadoop 4096 Mar 22 16:53 test_test-2
drwxr-xr-x 2 kafka hadoop 4096 Mar 22 16:53 test_test-0
drwxr-xr-x 2 kafka hadoop 4096 Mar 22 16:53 test_test-7
drwxr-xr-x 2 kafka hadoop 4096 Mar 22 16:53 test_test-4
drwxr-xr-x 2 kafka hadoop 4096 Mar 22 16:53 test_test-1
drwxr-xr-x 2 kafka hadoop 4096 Mar 22 16:53 test_test-9
drwxr-xr-x 2 kafka hadoop 4096 Mar 22 16:53 test_test-6
drwxr-xr-x 2 kafka hadoop 4096 Mar 22 16:53 test_test-3
now we want to add additional 10 partitions to the topic name - test_test
how to add additional partitions to the existing 10 partitions ?
You can run this command:
./bin/kafka-topics.sh --alter --bootstrap-server localhost:9092 --topic test_test --partitions 20
By the way there are two things to consider about changing partitions:
Decreasing the number of partitions is not allowed
If you add more partitions to a topic, key based ordering of the messages cannot be guaranteed
Note: If your Kafka version is older than 2.2 you must use --zookeeper parameter instead of --bootstrap-server
Moreover, you should take into consideration that adding partitions triggers a rebalance which makes all of your this topic's consumers unavailable for a period of time.
rebalance is the process of re-assigning partitions to consumers, it happens when new partitions are added, new consumer is added or a consumer is leaving (may happen due to exception, network problems or initiated exit).
In order to preserve reading consistency, during a rebalance the consumer group entirely stops receiving messages until the new partition assignment is taking place.
This relatively short answer explains rebalance very well.

Spark SQL group data by range and trigger alerts

I am processing the data stream from Kafka using structured streaming with pyspark. I want to publish alerts to Kafka if the readings are abnormal in avro format
source temperature timestamp
1001 21 4/28/2019 10:25
1001 22 4/28/2019 10:26
1001 23 4/28/2019 10:27
1001 24 4/28/2019 10:28
1001 25 4/28/2019 10:29
1001 34 4/28/2019 10:30
1001 37 4/28/2019 10:31
1001 36 4/28/2019 10:32
1001 38 4/28/2019 10:33
1001 40 4/28/2019 10:34
1001 41 4/28/2019 10:35
1001 42 4/28/2019 10:36
1001 45 4/28/2019 10:37
1001 47 4/28/2019 10:38
1001 50 4/28/2019 10:39
1001 41 4/28/2019 10:40
1001 42 4/28/2019 10:41
1001 45 4/28/2019 10:42
1001 47 4/28/2019 10:43
1001 50 4/28/2019 10:44
Transform
source range count alert
1001 21-25 5 HIGH
1001 26-30 5 MEDIUM
1001 40-45 5 MEDIUM
1001 45-50 5 HIGH
I have defined a window function with 20 sec and 1 sec sliding. I am able to publish alerts with simple where condition but I am not able to tranform the data frame like above and trigger alerts if the count is 20 for any alert priority (all records in a window are matches with any priority HIGH->count(20) etc). Can any one suggest how to do this?
Also I am able to publish data using json format but not able to generate using AVRO. Scala and Java has to_avro() function but pyspark doesn't have any.
Appreciate your response
I am able to solve this problem using Bucketizer feature transfrom from ml library in spark.
How to bin in PySpark?

apache kafka NoReplicaOnlineException

Using Apache Kafka with a single node (1 Zookeeper, 1 Broker) I get this exception (repeated multiple times):
kafka.common.NoReplicaOnlineException: No replica in ISR for partition __consumer_offsets-2 is alive. Live brokers are: [Set()], ISR brokers are: [0]
What does it mean? Note, I am starting the KafkaServer programmatically, and I am able to send and consume from a topic using the CLI tools.
It seems I should tell this node that it is operation in standalone mode - how should I do this?
This seems to happen during startup.
Full exception:
17-11-07 19:43:44 NP-3255AJ193091.home ERROR [state.change.logger:107]
- [Controller id=0 epoch=54] Initiated state change for partition __consumer_offsets-16 from OfflinePartition to OnlinePartition failed
kafka.utils.ShutdownableThread.run ShutdownableThread.scala:
64
kafka.controller.ControllerEventManager$ControllerEventThread.doWork
ControllerEventManager.scala: 52
kafka.metrics.KafkaTimer.time KafkaTimer.scala: 31
kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply
ControllerEventManager.scala: 53 (repeats 2 times)
kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp
ControllerEventManager.scala: 53
kafka.controller.KafkaController$Startup$.process
KafkaController.scala: 1581
kafka.controller.KafkaController.elect KafkaController.scala:
1681
kafka.controller.KafkaController.onControllerFailover
KafkaController.scala: 298
kafka.controller.PartitionStateMachine.startup
PartitionStateMachine.scala: 58
kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange
PartitionStateMachine.scala: 81
scala.collection.TraversableLike$WithFilter.foreach
TraversableLike.scala: 732
scala.collection.mutable.HashMap.foreach
HashMap.scala: 130
scala.collection.mutable.HashMap.foreachEntry
HashMap.scala: 40
scala.collection.mutable.HashTable$class.foreachEntry
HashTable.scala: 236
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply
HashMap.scala: 130 (repeats 2 times)
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply
TraversableLike.scala: 733
kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply
PartitionStateMachine.scala: 81
kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply
PartitionStateMachine.scala: 84
kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange
PartitionStateMachine.scala: 163
kafka.controller.PartitionStateMachine.electLeaderForPartition
PartitionStateMachine.scala: 303
kafka.controller.OfflinePartitionLeaderSelector.selectLeader
PartitionLeaderSelector.scala: 65
kafka.common.NoReplicaOnlineException: No replica in ISR for partition
__consumer_offsets-16 is alive. Live brokers are: [Set()], ISR brokers are: [0]

Upgrading consumer from Kafka 8 to 10 with no code changes fails in ZookeeperConsumerConnector.RebalanceListener

I changed my Maven pom.xml to use the 0.10.1.0 client jar, and without changing any of the client code I ran both a producer and consumer.
The producer added messages to the Kafka 10 cluster fine (verified by kafka-consumer-offset-checker.sh), but the consumers that should have covered the 10 partitions in the topic did not seem to register at all. All partitions are unowned.
The consumer offset and owner output:
kafka-consumer-offset-checker.sh --zookeeper localhost:2181 --topic eddude-default-topic --group optimizer-group
[2017-06-28 12:56:06,493] WARN WARNING: ConsumerOffsetChecker is deprecated and will be dropped in releases following 0.9.0. Use ConsumerGroupCommand instead. (kafka.tools.ConsumerOffsetChecker$)
Group Topic Pid Offset logSize Lag Owner
optimizer-group eddude-default-topic 0 28 28 0 none
optimizer-group eddude-default-topic 1 2 2 0 none
optimizer-group eddude-default-topic 2 87 87 0 none
optimizer-group eddude-default-topic 3 0 0 0 none
optimizer-group eddude-default-topic 4 0 0 0 none
optimizer-group eddude-default-topic 5 2 5 3 none
optimizer-group eddude-default-topic 6 80 80 0 none
optimizer-group eddude-default-topic 7 29 29 0 none
optimizer-group eddude-default-topic 8 15 15 0 none
optimizer-group eddude-default-topic 9 0 0 0 none
And here is the relevant consumer client error from my app log:
2017-06-28 12:55:24,702 ERROR [ConnectorManagerEventPool 1] An error occurred starting KafkaTopicSet 4:eddude-default-topic
kafka.common.ConsumerRebalanceFailedException: optimizer-group_L-SEA-10002721-1498679709599-7154a218 can't rebalance after 4 retries
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:670) ~[kafka_2.10-0.10.1.0.jar:na]
at kafka.consumer.ZookeeperConsumerConnector.kafka$consumer$ZookeeperConsumerConnector$$reinitializeConsumer(ZookeeperConsumerConnector.scala:977) ~[kafka_2.10-0.10.1.0.jar:na]
at kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:264) ~[kafka_2.10-0.10.1.0.jar:na]
at kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:85) ~[kafka_2.10-0.10.1.0.jar:na]
at kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:97) ~[kafka_2.10-0.10.1.0.jar:na]
at com.ebay.traffic.messaging.optimizer.impl.kafka.KafkaTopicSet.start(KafkaTopicSet.java:160) ~[classes/:na]
I am just using the same Kafka 8 client code I already had and ignoring the deprecation warnings for now. Shouldn't it work as-is?
I could also post details like the configuration properties and code establishing the actual producer and consumer, but I thought I'd first simply ask in case it is an obvious answer.

Kafka Consumer Attached to partition but not consuming messages

I am new to Kafka. I have a single node Kafka broker(v 0.10.2) and a zookeeper (3.4.9). I am using new Kafka Consumer APIs. One strange thing I observed is when I am starting multiple Kafka consumers for multiple topics placed in a single group and on hitting ./kafka-consumer-groups.sh this script for the group. Few of the consumers are attached to the group but they do not consume any message.
Below are the stats of group command.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST
topic1 0 288 288 0 consumer-8-c9487cd3-573b-4c97-87c1-ddf2063ab5ae /<serverip> consumer-8
topic1 1 283 283 0 consumer-8-c9487cd3-573b-4c97-87c1-ddf2063ab5ae /<serverip> consumer-8
topic1 2 279 279 0 consumer-8-c9487cd3-573b-4c97-87c1-ddf2063ab5ae /<serverip> consumer-8
topic2 0 - 9 - consumer-1-b0476dc8-099c-4a62-a68c-e9dc9c0a5bed /<serverip> consumer-1
topic2 1 - 2 - consumer-1-b0476dc8-099c-4a62-a68c-e9dc9c0a5bed /<serverip> consumer-1
topic3 0 450 450 0 consumer-3-63c07703-17d0-471b-8c5f-17347699f108 /<serverip> consumer-3
topic4 1 - 54
- consumer-2-94dcc209-8377-45ce-8473-9ab0d85951c4 /<serverip>
topic2 2 441 441 0 consumer-5-bcfffc99-5915-41f4-b3e4-970baa204c14 /<serverip>
So can someone help me that why for topic topic2 partition 0 current-offset is showing - and lag is showing - but messages are still there in the server as LOG-END-OFFSET is showing 9.
This is happening very frequently and restarting the consumers solves the issue temporarily.
Any help will be appreciated.