Same offset and partition record is getting consumed twice causing duplicates - apache-kafka

I am trying to consume records using application written in spring-kafka. I am facing very unique condition and not able to understand why this is happening ?
My consumer application is running with 2 concurrency meaning 2 consumer thread subscribed to topic having two partitions.I am consuming records and placing it into table using upsert with offset, partitions and insert timestamp.
I am seeing duplicate values with same offset and partition in the table which should not occur. There is no difference in the timestamp value, means insert occurred at the same time. I am not sure how is it possible? I don't see any issue in the log as well. I am not sure what is happening at the Producer end but we can't have 2 values at the same offset anyway, so not sure whether this is an issue at consumer end of producer end.Any suggestion or thought which would help me to triage this issue?
Kafka log :
I don't see anything unusual in the log as well.
14:29:56.318 [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version: 2.4.0
14:29:56.318 [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId: 77a89fcf8d7fa018
14:29:56.318 [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1604154596318
14:29:56.319 [main] INFO org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] Subscribed to topic(s): kaas.pe.enrollment.csp.ts2
14:29:57.914 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.Metadata - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] Cluster ID: 6cbv7QOaSW6j1vXrOCE4jA
14:29:57.914 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.Metadata - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] Cluster ID: 6cbv7QOaSW6j1vXrOCE4jA
14:29:57.915 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] Discovered group coordinator apslp1563.uhc.com:9093 (id: 2147483574 rack: null)
14:29:57.915 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] Discovered group coordinator apslp1563.uhc.com:9093 (id: 2147483574 rack: null)
14:29:57.923 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] (Re-)joining group
14:29:57.924 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] (Re-)joining group
14:29:58.121 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] (Re-)joining group
14:29:58.125 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] (Re-)joining group
14:30:13.127 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] Finished assignment for group at generation 23: {consumer-csp-prov-emerald-test-1-19d92ba5-5dc3-433d-b967-3cf1ce1b4174=org.apache.kafka.clients.consumer.ConsumerPartitionAssignor$Assignment#d7e2a1f, consumer-csp-prov-emerald-test-2-5833c212-7031-4ab1-944b-7e26f7d7a293=org.apache.kafka.clients.consumer.ConsumerPartitionAssignor$Assignment#53c3aad3}
14:30:13.131 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] Successfully joined group with generation 23
14:30:13.131 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] Successfully joined group with generation 23
14:30:13.134 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] Adding newly assigned partitions: kaas.pe.enrollment.csp.ts2-1
14:30:13.134 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] Adding newly assigned partitions: kaas.pe.enrollment.csp.ts2-0
14:30:13.143 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] Setting offset for partition kaas.pe.enrollment.csp.ts2-0 to the committed offset FetchPosition{offset=500387, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=apslp1559.uhc.com:9093 (id: 69 rack: null), epoch=37}}
14:30:13.143 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] Setting offset for partition kaas.pe.enrollment.csp.ts2-1 to the committed offset FetchPosition{offset=499503, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=apslp1562.uhc.com:9093 (id: 72 rack: null), epoch=36}}
Code :
#KafkaListener(topics = "${app.topic}", groupId = "${app.group_id_config}")
public void run(ConsumerRecord<String, GenericRecord> record, Acknowledgment acknowledgement) throws Exception {
try {
prov_tin_number = record.value().get("providerTinNumber").toString();
prov_tin_type = record.value().get("providerTINType").toString();
enroll_type = record.value().get("enrollmentType").toString();
vcp_prov_choice_ind = record.value().get("vcpProvChoiceInd").toString();
error_flag = "";
dataexecutor.peStremUpsertTbl(prov_tin_number, prov_tin_type, enroll_type, vcp_prov_choice_ind, error_flag,
record.partition(), record.offset());
acknowledgement.acknowledge();
}catch (Exception ex) {
System.out.println(record);
System.out.println(ex.getMessage());
}
}

Related

Confluent Control Center failure: Unable to fetch consumer offsets for cluster id

I am running confluent platform (version 6.1.1). I deploy the following components: 3 Brokers, 3 ZK, Schema Registry, 3 Kafka Connect, KSQL and Confluent Control Center (CCC).
The CCC has entered into a failed state and I have difficulties to bring it back.
To make things cleaner, I have created another EC2 instance (m4.2xlarge) where I configured new CCC with the aim to connect it to the current cluster. New CCC has exactly the same configuration as the failed one, but with a different confluent.controlcenter.id.
I start the CCC and it is running. I can access the CCC UI but it is not working properly: the pages are loading too long, it keeps showing the changing state of the connect cluster (sometimes healthy, sometimes not), it keeps showing the changing state of the brokers (sometimes healthy, sometimes not)
For example it looks like this (see screenshots):
After running certain amount of time, it is automatically restarted and keeps restarting every 5-7 minutes.
When it is started, I see a bunch of new topics created in the Kafka cluster.
After that in the control-center.log I see :
INFO [main] Setting offsets for topic=_confluent-monitoring (io.confluent.controlcenter.KafkaHelper)
INFO [main] found 12 topicPartitions for topic=_confluent-monitoring (io.confluent.controlcenter.KafkaHelper)
INFO [main] Setting offsets for topic=_confluent-metrics (io.confluent.controlcenter.KafkaHelper)
INFO [main] found 12 topicPartitions for topic=_confluent-metrics (io.confluent.controlcenter.KafkaHelper)
INFO [main] action=starting topology=command (io.confluent.controlcenter.ControlCenter)
INFO [main] waiting for streams to be in running state REBALANCING (io.confluent.command.CommandStore)
INFO [main] Streams state RUNNING (io.confluent.command.CommandStore)
INFO [main] action=started topology=command (io.confluent.controlcenter.ControlCenter)
INFO [main] action=starting operation=command-migration (io.confluent.controlcenter.ControlCenter)
INFO [main] action=completed operation=command-migration (io.confluent.controlcenter.ControlCenter)
INFO [main] action=starting topology=monitoring (io.confluent.controlcenter.ControlCenter)
INFO [main] action=started topology=monitoring (io.confluent.controlcenter.ControlCenter)
INFO [main] Starting Health Check (io.confluent.controlcenter.ControlCenter)
INFO [main] Starting Alert Manager (io.confluent.controlcenter.ControlCenter)
INFO [main] Starting Consumer Offsets Fetch (io.confluent.controlcenter.ControlCenter)
INFO [control-center-heartbeat-0] current clusterId=lCRehAk0RqmLR04nhXKHtA (io.confluent.controlcenter.healthcheck.HealthCheck)
INFO [control-center-heartbeat-0] broker id set has changed new={1001=[10.251.xx.xx:9093 (id: 1001 rack: null)], 1002=[10.251.xx.xx:9093 (id: 1002 rack: null)], 1003=[10.251.xx.xx:9093 (id: 1003 rack: null)]} removed={} (io.confluent.controlcenter.healthcheck.HealthCheck)
INFO [control-center-heartbeat-0] new controller=10.251.xx.xx:9093 (id: 1002 rack: null) (io.confluent.controlcenter.healthcheck.HealthCheck)
INFO [main] Initial capacity 128, increased by 64, maximum capacity 2147483647. (io.confluent.rest.ApplicationServer)
INFO [main] Adding listener: http://0.0.0.0:9021 (io.confluent.rest.ApplicationServer)
INFO [main] x509=X509#3a8ead9(ip-44-135-xx-xx.eu-central-1.compute.internal,h=[ip-44-135-xx-xx.eu-central-1.compute.internal],w=[]) for Server#7c8b37a8[provider=null,keyStore=file:///var/kafka-ssl/server.keystore.jks,trustStore=file:///var/kafka-ssl/client.truststore.jks] (org.eclipse.jetty.util.ssl.SslContextFactory)
INFO [main] x509=X509#3831f4c2(caroot,h=[eu-central-1.compute.internal],w=[]) for Server#7c8b37a8[provider=null,keyStore=file:///var/kafka-ssl/server.keystore.jks,trustStore=file:///var/kafka-ssl/client.truststore.jks] (org.eclipse.jetty.util.ssl.SslContextFactory)
INFO [main] jetty-9.4.38.v20210224; built: 2021-02-24T20:25:07.675Z; git: 288f3cc74549e8a913bf363250b0744f2695b8e6; jvm 11.0.13+8-LTS (org.eclipse.jetty.server.Server)
INFO [main] DefaultSessionIdManager workerName=node0 (org.eclipse.jetty.server.session)
INFO [main] No SessionScavenger set, using defaults (org.eclipse.jetty.server.session)
INFO [main] node0 Scavenging every 660000ms (org.eclipse.jetty.server.session)
INFO [main] Started o.e.j.s.ServletContextHandler#1ef5cde4{/,[jar:file:/usr/share/java/acl/acl-6.1.1.jar!/io/confluent/controlcenter/rest/static],AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler)
INFO [main] Started o.e.j.s.ServletContextHandler#5401c6a8{/ws,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler)
INFO [main] Started NetworkTrafficServerConnector#5d6b5d3d{HTTP/1.1, (http/1.1)}{0.0.0.0:9021} (org.eclipse.jetty.server.AbstractConnector)
INFO [main] Started #36578ms (org.eclipse.jetty.server.Server)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=monitoring-input-topic-progress-.count type=monitoring cluster= value=0.0 (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=monitoring-input-topic-progress-.rate type=monitoring cluster= value=0.0 (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=monitoring-input-topic-progress-.timestamp type=monitoring cluster= value=NaN (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=monitoring-input-topic-progress-.min type=monitoring cluster= value=1.7976931348623157E308 (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=metrics-input-topic-progress-lCRehAk0RqmLR04nhXKHtA.count type=metrics cluster=lCRehAk0RqmLR04nhXKHtA value=0.0 (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=metrics-input-topic-progress-lCRehAk0RqmLR04nhXKHtA.rate type=metrics cluster=lCRehAk0RqmLR04nhXKHtA value=0.0 (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=metrics-input-topic-progress-lCRehAk0RqmLR04nhXKHtA.timestamp type=metrics cluster=lCRehAk0RqmLR04nhXKHtA value=NaN (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=metrics-input-topic-progress-lCRehAk0RqmLR04nhXKHtA.min type=metrics cluster=lCRehAk0RqmLR04nhXKHtA value=1.7976931348623157E308 (io.confluent.controlcenter.util.StreamProgressReporter)
WARN [control-center-heartbeat-0] misconfigured topic=_confluent-command config=segment.bytes value=1073741824 expected=134217728 (io.confluent.controlcenter.healthcheck.HealthCheck)
WARN [control-center-heartbeat-0] misconfigured topic=_confluent-command config=delete.retention.ms value=86400000 expected=259200000 (io.confluent.controlcenter.healthcheck.HealthCheck)
INFO [control-center-heartbeat-0] misconfigured topic=_confluent-metrics config=min.insync.replicas value=1 expected=2 (io.confluent.controlcenter.healthcheck.HealthCheck)
WARN [control-center-heartbeat-1] Unable to fetch consumer offsets for cluster id lCRehAk0RqmLR04nhXKHtA (io.confluent.controlcenter.data.ConsumerOffsetsFetcher)
java.util.concurrent.TimeoutException
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:108)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
at io.confluent.controlcenter.data.ConsumerOffsetsDao.getAllConsumerGroupDescriptions(ConsumerOffsetsDao.java:220)
at io.confluent.controlcenter.data.ConsumerOffsetsDao.getAllConsumerGroupOffsets(ConsumerOffsetsDao.java:58)
at io.confluent.controlcenter.data.ConsumerOffsetsFetcher.run(ConsumerOffsetsFetcher.java:73)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
WARN [kafka-admin-client-thread | adminclient-3] failed fetching description for consumerGroup=_confluent-ksql-eim_ksql_non_prodquery_CSAS_SDL_STMTS_GG_347 (io.confluent.controlcenter.data.ConsumerOffsetsDao)
org.apache.kafka.common.errors.TimeoutException: Call(callName=describeConsumerGroups, deadlineMs=1654853629184, tries=1, nextAllowedTryMs=1654853629324) timed out at 1654853629224 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.DisconnectException: Cancelled describeConsumerGroups request with correlation id 168 due to node 1001 being disconnected
WARN [kafka-admin-client-thread | adminclient-3] failed fetching description for consumerGroup=connect-mongo-dci-grid-partner-test11 (io.confluent.controlcenter.data.ConsumerOffsetsDao)
org.apache.kafka.common.errors.TimeoutException: Call(callName=describeConsumerGroups, deadlineMs=1654853629184, tries=1, nextAllowedTryMs=1654853629324) timed out at 1654853629224 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: describeConsumerGroups
WARN [kafka-admin-client-thread | adminclient-3] failed fetching description for consumerGroup=_confluent-ksql-eim_ksql_non_prodquery_CSAS_SDL_STMTS_UPWARD_GG_355 (io.confluent.controlcenter.data.ConsumerOffsetsDao)
org.apache.kafka.common.errors.TimeoutException: Call(callName=describeConsumerGroups, deadlineMs=1654853629184, tries=1, nextAllowedTryMs=1654853629324) timed out at 1654853629224 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: describeConsumerGroups
WARN [kafka-admin-client-thread | adminclient-3] failed fetching description for consumerGroup=_eim_c3_non_prod-4 (io.confluent.controlcenter.data.ConsumerOffsetsDao)
org.apache.kafka.common.errors.TimeoutException: Call(callName=describeConsumerGroups, deadlineMs=1654853629184, tries=1, nextAllowedTryMs=1654853629324) timed out at 1654853629224 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: describeConsumerGroups
...
and so on...
WARN [control-center-heartbeat-1] Unable to fetch consumer offsets for cluster id lCRehAk0RqmLR04nhXKHtA (io.confluent.controlcenter.data.ConsumerOffsetsFetcher)
java.util.concurrent.TimeoutException
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:108)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
at io.confluent.controlcenter.data.ConsumerOffsetsDao.getAllConsumerGroupDescriptions(ConsumerOffsetsDao.java:220)
at io.confluent.controlcenter.data.ConsumerOffsetsDao.getAllConsumerGroupOffsets(ConsumerOffsetsDao.java:58)
at io.confluent.controlcenter.data.ConsumerOffsetsFetcher.run(ConsumerOffsetsFetcher.java:73)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
and so on...
In the control-center-kafka.log I see:
INFO [control-center-heartbeat-1] Kafka version: 6.1.1-ce (org.apache.kafka.common.utils.AppInfoParser)
INFO [control-center-heartbeat-1] Kafka commitId: 73deb3aeb1f8647c (org.apache.kafka.common.utils.AppInfoParser)
INFO [control-center-heartbeat-1] Kafka startTimeMs: 1654853610852 (org.apache.kafka.common.utils.AppInfoParser)
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-5-consumer, groupId=_eim_c3_non_prod-4] Resetting offset for partition _eim_c3_non_prod-4-monitoring-message-rekey-store-7 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[10.251.6.2:9093 (id: 1002 rack: null)], epoch=0}}. (org.apache.kafka.clients.consumer.internals.SubscriptionState)
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-5-consumer, groupId=_eim_c3_non_prod-4] Resetting offset for partition _eim_c3_non_prod-4-monitoring-trigger-event-rekey-7 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[10.251.6.2:9093 (id: 1002 rack: null)], epoch=0}}. (org.apache.kafka.clients.consumer.internals.SubscriptionState)
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-5-consumer, groupId=_eim_c3_non_prod-4] Resetting offset for partition _eim_c3_non_prod-4-MonitoringStream-ONE_MINUTE-repartition-7 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[10.251.6.2:9093 (id: 1002 rack: null)], epoch=0}}. (org.apache.kafka.clients.consumer.internals.SubscriptionState)
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-5-consumer, groupId=_eim_c3_non_prod-4] Resetting offset for partition _eim_c3_non_prod-4-aggregatedTopicPartitionTableWindows-ONE_MINUTE-repartition-7 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[10.251.6.1:9093 (id: 1001 rack: null)], epoch=0}}. (org.apache.kafka.clients.consumer.internals.SubscriptionState)
and so on ...
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-10-consumer, groupId=_eim_c3_non_prod-4] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 1003: (org.apache.kafka.clients.FetchSessionHandler)
org.apache.kafka.common.errors.DisconnectException
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-3] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-3-consumer, groupId=_eim_c3_non_prod-4] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 1002: (org.apache.kafka.clients.FetchSessionHandler)
org.apache.kafka.common.errors.DisconnectException
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-3-consumer, groupId=_eim_c3_non_prod-4] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 1001: (org.apache.kafka.clients.FetchSessionHandler)
org.apache.kafka.common.errors.DisconnectException
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-10] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-10-consumer, groupId=_eim_c3_non_prod-4] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 1002: (org.apache.kafka.clients.FetchSessionHandler)
org.apache.kafka.common.errors.DisconnectException
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-5-consumer, groupId=_eim_c3_non_prod-4] Error sending fetch request (sessionId=1478925475, epoch=1) to node 1003: (org.apache.kafka.clients.FetchSessionHandler)
org.apache.kafka.common.errors.DisconnectException
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-6-consumer, groupId=_eim_c3_non_prod-4] Error sending fetch request (sessionId=1947312909, epoch=1) to node 1002: (org.apache.kafka.clients.FetchSessionHandler)
org.apache.kafka.common.errors.DisconnectException
and so on ...
Any ideas what can be wrong here?

When collecting data with Modbus protocol through kafka Producer, collection stops after a certain period of time

I have deployed a Kafka cluster on a GCP instance.
I used the connector through config/connect-distributed.properties.
Start collecting data through restapi using the following command:
curl -X POST -H "Content-Type:application/json" \
--data '{
"name": "operation1",
"config": {
"connector.class": "org.apache.plc4x.kafka.Plc4xSourceConnector",
"default.topic": "operation1",
"tasks.max": "1",
"sources": "Modbus",
"sources.Modbus.connectionString": "modbus:tcp://<IP address:port>",
"sources.Modbus.pollReturnInterval": "10000",
"sources.Modbus.bufferSize": "5000",
"sources.Modbus.jobReferences": "operation1",
"jobs": "operation1",
"jobs.operation1.fields": "BMS1-1, BMS1-2, BMS2-1, BMS2-2, BMS2-3, PCS, ETC",
"jobs.operation1.interval": "1000",
"jobs.operation1.fields.BMS1-1": "input-register:1[125]",
"jobs.operation1.fields.BMS1-2": "input-register:126[12]",
"jobs.operation1.fields.BMS2-1": "input-register:201[125]",
"jobs.operation1.fields.BMS2-2": "input-register:326[125]",
"jobs.operation1.fields.BMS2-3": "input-register:451[16]",
"jobs.operation1.fields.PCS": "input-register:501[89]",
"jobs.operation1.fields.ETC": "input-register:601[5]"
}
}' http://localhost:8083/connectors
In the log of config/connect-distributed.properties , the following log appears and collection is successful. However, collection stops after a certain amount of time (minutes or hours).
[2022-05-10 05:36:44,522] INFO [operation1|task-0|offsets] WorkerSourceTask{id=operation1-0} Either no records were produced by the task since the last offset commit, or every record has been filtered out by a transformation or dropped due to transformation or conversion errors. (org.apache.kafka.connect.runtime.WorkerSourceTask:484)
[2022-05-10 05:36:54,526] INFO [operation1|task-0|offsets] WorkerSourceTask{id=operation1-0} Either no records were produced by the task since the last offset commit, or every record has been filtered out by a transformation or dropped due to transformation or conversion errors. (org.apache.kafka.connect.runtime.WorkerSourceTask:484)
[2022-05-10 05:37:04,530] INFO [operation1|task-0|offsets] WorkerSourceTask{id=operation1-0} Either no records were produced by the task since the last offset commit, or every record has been filtered out by a transformation or dropped due to transformation or conversion errors. (org.apache.kafka.connect.runtime.WorkerSourceTask:484)
[2022-05-10 05:37:14,534] INFO [operation1|task-0|offsets] WorkerSourceTask{id=operation1-0} Either no records were produced by the task since the last offset commit, or every record has been filtered out by a transformation or dropped due to transformation or conversion errors. (org.apache.kafka.connect.runtime.WorkerSourceTask:484)
[2022-05-10 05:37:24,550] INFO [operation1|task-0|offsets] WorkerSourceTask{id=operation1-0} Either no records were produced by the task since the last offset commit, or every record has been filtered out by a transformation or dropped due to transformation or conversion errors. (org.apache.kafka.connect.runtime.WorkerSourceTask:484)
[2022-05-10 05:37:34,554] INFO [operation1|task-0|offsets] WorkerSourceTask{id=operation1-0} Either no records were produced by the task since the last offset commit, or every record has been filtered out by a transformation or dropped due to transformation or conversion errors. (org.apache.kafka.connect.runtime.WorkerSourceTask:484)
After a certain amount of time, the log message changes to the following:
[2022-05-10 05:42:36,597] WARN [operation1|task-0] Exception during scraping of Job operation1, Connection-Alias Modbus: Error-message: null - for stack-trace change logging to DEBUG (org.apache.plc4x.java.scraper.triggeredscraper.TriggeredScraperTask:148)
[2022-05-10 05:42:38,598] WARN [operation1|task-0] Exception during scraping of Job operation1, Connection-Alias Modbus: Error-message: null - for stack-trace change logging to DEBUG (org.apache.plc4x.java.scraper.triggeredscraper.TriggeredScraperTask:148)
[2022-05-10 05:42:40,598] WARN [operation1|task-0] Exception during scraping of Job operation1, Connection-Alias Modbus: Error-message: null - for stack-trace change logging to DEBUG (org.apache.plc4x.java.scraper.triggeredscraper.TriggeredScraperTask:148)
[2022-05-10 05:42:42,599] WARN [operation1|task-0] Exception during scraping of Job operation1, Connection-Alias Modbus: Error-message: null - for stack-trace change logging to DEBUG (org.apache.plc4x.java.scraper.triggeredscraper.TriggeredScraperTask:148)
[2022-05-10 05:42:44,600] WARN [operation1|task-0] Exception during scraping of Job operation1, Connection-Alias Modbus: Error-message: null - for stack-trace change logging to DEBUG (org.apache.plc4x.java.scraper.triggeredscraper.TriggeredScraperTask:148)
At this time, if you check the status of the Connector with curl , it is still Running.
curl -X GET localhost:8083/connectors/operation1/status
{"name":"operation1","connector":{"state":"RUNNING","worker_id":"<IP>:8083"},"tasks":[{"id":0,"state":"RUNNING","worker_id":"<IP>:8083"}],"type":"source"}
I really don't know why. Help
Logs modified to DEBUG level.
[2022-05-10 08:14:18,708] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-12 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,708] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-0 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,708] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-6 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,709] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-18 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,709] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-9 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,709] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-3 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,709] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-15 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,709] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-21 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,709] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-24 at position FetchPosition{offset=18793, offsetEpoch=Optional[23], currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,709] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Built incremental fetch (sessionId=714175396, epoch=1010) for node 1. Added 0 partition(s), altered 0 partition(s), removed 0 partition(s), replaced 0 partition(s) out of 9 partition(s) (org.apache.kafka.clients.FetchSessionHandler:351)
[2022-05-10 08:14:18,709] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(), toReplace=(), implied=(connect-offsets-12, connect-offsets-0, connect-offsets-6, connect-offsets-18, connect-offsets-9, connect-offsets-3, connect-offsets-15, connect-offsets-21, connect-offsets-24), canUseTopicIds=True) to broker <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:274)
[2022-05-10 08:14:18,709] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Sending FETCH request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-1, correlationId=3034) and timeout 30000 to node 1: FetchRequestData(clusterId=null, replicaId=-1, maxWaitMs=500, minBytes=1, maxBytes=52428800, isolationLevel=0, sessionId=714175396, sessionEpoch=1010, topics=[], forgottenTopicsData=[], rackId='') (org.apache.kafka.clients.NetworkClient:521)
[2022-05-10 08:14:18,757] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Received FETCH response from node 1 for request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-2, correlationId=3030): FetchResponseData(throttleTimeMs=0, errorCode=0, sessionId=1712137779, responses=[]) (org.apache.kafka.clients.NetworkClient:879)
[2022-05-10 08:14:18,757] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Node 1 sent an incremental fetch response with throttleTimeMs = 0 for session 1712137779 with 0 response partition(s), 1 implied partition(s) (org.apache.kafka.clients.FetchSessionHandler:584)
[2022-05-10 08:14:18,758] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-status-2 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>76.32:9092 (id: 1 rack: null)], epoch=23}} to node <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,758] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Built incremental fetch (sessionId=1712137779, epoch=1006) for node 1. Added 0 partition(s), altered 0 partition(s), removed 0 partition(s), replaced 0 partition(s) out of 1 partition(s) (org.apache.kafka.clients.FetchSessionHandler:351)
[2022-05-10 08:14:18,758] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(), toReplace=(), implied=(connect-status-2), canUseTopicIds=True) to broker <IP>76.32:9092 (id: 1 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:274)
[2022-05-10 08:14:18,758] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Sending FETCH request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-2, correlationId=3033) and timeout 30000 to node 1: FetchRequestData(clusterId=null, replicaId=-1, maxWaitMs=500, minBytes=1, maxBytes=52428800, isolationLevel=0, sessionId=1712137779, sessionEpoch=1006, topics=[], forgottenTopicsData=[], rackId='') (org.apache.kafka.clients.NetworkClient:521)
[2022-05-10 08:14:18,759] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Received FETCH response from node 0 for request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-2, correlationId=3031): FetchResponseData(throttleTimeMs=0, errorCode=0, sessionId=619420322, responses=[]) (org.apache.kafka.clients.NetworkClient:879)
[2022-05-10 08:14:18,759] DEBUG [Consumer clientId=consumer-connect-cluster-3, groupId=connect-cluster] Received FETCH response from node 0 for request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-3, correlationId=1014): FetchResponseData(throttleTimeMs=0, errorCode=0, sessionId=208110829, responses=[]) (org.apache.kafka.clients.NetworkClient:879)
[2022-05-10 08:14:18,759] DEBUG [Consumer clientId=consumer-connect-cluster-3, groupId=connect-cluster] Node 0 sent an incremental fetch response with throttleTimeMs = 0 for session 208110829 with 0 response partition(s), 1 implied partition(s) (org.apache.kafka.clients.FetchSessionHandler:584)
[2022-05-10 08:14:18,759] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Node 0 sent an incremental fetch response with throttleTimeMs = 0 for session 619420322 with 0 response partition(s), 2 implied partition(s) (org.apache.kafka.clients.FetchSessionHandler:584)
[2022-05-10 08:14:18,760] DEBUG [Consumer clientId=consumer-connect-cluster-3, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-configs-0 at position FetchPosition{offset=698, offsetEpoch=Optional[54], currentLeader=LeaderAndEpoch{leader=Optional[<IP>92.153:9092 (id: 0 rack: null)], epoch=54}} to node <IP>92.153:9092 (id: 0 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,760] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-status-3 at position FetchPosition{offset=129, offsetEpoch=Optional[50], currentLeader=LeaderAndEpoch{leader=Optional[<IP>92.153:9092 (id: 0 rack: null)], epoch=54}} to node <IP>92.153:9092 (id: 0 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,760] DEBUG [Consumer clientId=consumer-connect-cluster-3, groupId=connect-cluster] Built incremental fetch (sessionId=208110829, epoch=1008) for node 0. Added 0 partition(s), altered 0 partition(s), removed 0 partition(s), replaced 0 partition(s) out of 1 partition(s) (org.apache.kafka.clients.FetchSessionHandler:351)
[2022-05-10 08:14:18,760] DEBUG [Consumer clientId=consumer-connect-cluster-3, groupId=connect-cluster] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(), toReplace=(), implied=(connect-configs-0), canUseTopicIds=True) to broker <IP>92.153:9092 (id: 0 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:274)
[2022-05-10 08:14:18,760] DEBUG [Consumer clientId=consumer-connect-cluster-3, groupId=connect-cluster] Sending FETCH request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-3, correlationId=1015) and timeout 30000 to node 0: FetchRequestData(clusterId=null, replicaId=-1, maxWaitMs=500, minBytes=1, maxBytes=52428800, isolationLevel=0, sessionId=208110829, sessionEpoch=1008, topics=[], forgottenTopicsData=[], rackId='') (org.apache.kafka.clients.NetworkClient:521)
[2022-05-10 08:14:18,760] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-status-0 at position FetchPosition{offset=116, offsetEpoch=Optional[54], currentLeader=LeaderAndEpoch{leader=Optional[<IP>92.153:9092 (id: 0 rack: null)], epoch=54}} to node <IP>92.153:9092 (id: 0 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,760] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Built incremental fetch (sessionId=619420322, epoch=1008) for node 0. Added 0 partition(s), altered 0 partition(s), removed 0 partition(s), replaced 0 partition(s) out of 2 partition(s) (org.apache.kafka.clients.FetchSessionHandler:351)
[2022-05-10 08:14:18,760] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(), toReplace=(), implied=(connect-status-0, connect-status-3), canUseTopicIds=True) to broker <IP>92.153:9092 (id: 0 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:274)
[2022-05-10 08:14:18,760] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Sending FETCH request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-2, correlationId=3034) and timeout 30000 to node 0: FetchRequestData(clusterId=null, replicaId=-1, maxWaitMs=500, minBytes=1, maxBytes=52428800, isolationLevel=0, sessionId=619420322, sessionEpoch=1008, topics=[], forgottenTopicsData=[], rackId='') (org.apache.kafka.clients.NetworkClient:521)
[2022-05-10 08:14:18,812] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Received FETCH response from node 2 for request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-1, correlationId=3032): FetchResponseData(throttleTimeMs=0, errorCode=0, sessionId=581764107, responses=[]) (org.apache.kafka.clients.NetworkClient:879)
[2022-05-10 08:14:18,813] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Node 2 sent an incremental fetch response with throttleTimeMs = 0 for session 581764107 with 0 response partition(s), 8 implied partition(s) (org.apache.kafka.clients.FetchSessionHandler:584)
[2022-05-10 08:14:18,813] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-8 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,813] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-14 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,813] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-2 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,813] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-20 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,813] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-11 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,813] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-5 at position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,813] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-23 at position FetchPosition{offset=599, offsetEpoch=Optional[23], currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,813] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-offsets-17 at position FetchPosition{offset=70, offsetEpoch=Optional[23], currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:18,814] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Built incremental fetch (sessionId=581764107, epoch=1006) for node 2. Added 0 partition(s), altered 0 partition(s), removed 0 partition(s), replaced 0 partition(s) out of 8 partition(s) (org.apache.kafka.clients.FetchSessionHandler:351)
[2022-05-10 08:14:18,814] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(), toReplace=(), implied=(connect-offsets-8, connect-offsets-14, connect-offsets-2, connect-offsets-20, connect-offsets-11, connect-offsets-5, connect-offsets-23, connect-offsets-17), canUseTopicIds=True) to broker <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:274)
[2022-05-10 08:14:18,814] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Sending FETCH request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-1, correlationId=3035) and timeout 30000 to node 2: FetchRequestData(clusterId=null, replicaId=-1, maxWaitMs=500, minBytes=1, maxBytes=52428800, isolationLevel=0, sessionId=581764107, sessionEpoch=1006, topics=[], forgottenTopicsData=[], rackId='') (org.apache.kafka.clients.NetworkClient:521)
[2022-05-10 08:14:18,977] DEBUG [operation1|task-0] Job statistics (operation1, Modbus) number of requests: 354 (201 success, 43.2 % failed, 0.0 % too slow), min latency: 82.47 ms, mean latency: 93.20 ms, median: 89.56 ms (org.apache.plc4x.java.scraper.triggeredscraper.TriggeredScraperImpl:250)
[2022-05-10 08:14:19,073] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Received FETCH response from node 2 for request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-2, correlationId=3032): FetchResponseData(throttleTimeMs=0, errorCode=0, sessionId=1118973913, responses=[]) (org.apache.kafka.clients.NetworkClient:879)
[2022-05-10 08:14:19,073] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Node 2 sent an incremental fetch response with throttleTimeMs = 0 for session 1118973913 with 0 response partition(s), 2 implied partition(s) (org.apache.kafka.clients.FetchSessionHandler:584)
[2022-05-10 08:14:19,073] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-status-4 at position FetchPosition{offset=85, offsetEpoch=Optional[23], currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:19,073] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Added READ_UNCOMMITTED fetch request for partition connect-status-1 at position FetchPosition{offset=115, offsetEpoch=Optional[23], currentLeader=LeaderAndEpoch{leader=Optional[<IP>156.202:9092 (id: 2 rack: null)], epoch=23}} to node <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:1245)
[2022-05-10 08:14:19,074] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Built incremental fetch (sessionId=1118973913, epoch=1008) for node 2. Added 0 partition(s), altered 0 partition(s), removed 0 partition(s), replaced 0 partition(s) out of 2 partition(s) (org.apache.kafka.clients.FetchSessionHandler:351)
[2022-05-10 08:14:19,074] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(), toReplace=(), implied=(connect-status-1, connect-status-4), canUseTopicIds=True) to broker <IP>156.202:9092 (id: 2 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:274)
[2022-05-10 08:14:19,074] DEBUG [Consumer clientId=consumer-connect-cluster-2, groupId=connect-cluster] Sending FETCH request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-2, correlationId=3035) and timeout 30000 to node 2: FetchRequestData(clusterId=null, replicaId=-1, maxWaitMs=500, minBytes=1, maxBytes=52428800, isolationLevel=0, sessionId=1118973913, sessionEpoch=1008, topics=[], forgottenTopicsData=[], rackId='') (org.apache.kafka.clients.NetworkClient:521)
[2022-05-10 08:14:19,126] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Received FETCH response from node 0 for request with header RequestHeader(apiKey=FETCH, apiVersion=13, clientId=consumer-connect-cluster-1, correlationId=3033): FetchResponseData(throttleTimeMs=0, errorCode=0, sessionId=407599491, responses=[]) (org.apache.kafka.clients.NetworkClient:879)
[2022-05-10 08:14:19,126] DEBUG [Consumer clientId=consumer-connect-cluster-1, groupId=connect-cluster] Node 0 sent an incremental fetch response with throttleTimeMs = 0 for session 407599491 with 0 response partition(s), 8 implied partition(s) (org.apache.kafka.clients.FetchSessionHandler:584)
This seems to have been an issue with how the PLC4X connector handles errors. It was causing the connector to stop requesting new messages from the Modbus server after a timeout occurred. However what was interesting was that if the TCP connection to the Modbus server was interrupted, the PLC4X connector would reconnect and start polling again.
Can you please try building the latest PLC4X connector from the PLC4X Github repo a fix has been pushed to it?
PLC4X Kafka Connector Repository
The PLC4X Kafka Connector doesn't fail the Kafka connector on a failed Kafka Connector->PLC connection. Instead it waits for the connection to be restored and begins polling again.
From your comments it would also seem that you have a Modbus server available on a public IP address. This isn't the best design as Modbus provides no security.

How to fix kafka streams problem related to group coordinator is unavailable or invalid, will attempt rediscovery

I have a problem when I try to run Kafka Streams application with PROCESSING_GUARANTEE_CONFIG that equals to "exactly once semantic" for other cases as for example at least once semantic it works very well.
I noticed in the logs that something is going wrong and I found some of the recommendation here in order to fix this problem but unfortunately it didn't helped me :(
03:35:28.627 INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Discovered group coordinator kafka:9093 (id: 2147483646 rack: null)
03:35:28.627 INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Group coordinator kafka:9093 (id: 2147483646 rack: null) is unavailable or invalid, will attempt rediscovery
03:35:28.628 INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=to-transform] Discovered group coordinator kafka:9093 (id: 2147483646 rack: null)
03:35:28.628 INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Group coordinator kafka:9093 (id: 2147483646 rack: null) is unavailable or invalid, will attempt rediscovery
03:35:48.628 INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Discovered group coordinator kafka:9093 (id: 2147483646 rack: null)
03:35:48.630 INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Found no committed offset for partition topic-0
03:35:48.631 INFO o.a.k.c.c.KafkaConsumer - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-restore-consumer, groupId=null] Unsubscribed all topics or patterns and assigned partitions
03:35:48.631 INFO o.a.k.s.p.i.StreamThread - stream-thread [transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1] State transition from PARTITIONS_ASSIGNED to RUNNING
03:35:48.631 INFO o.a.k.s.KafkaStreams - stream-client [transform-f8268b2b-4673-49ac-9396-6a2b86d45697] State transition from REBALANCING to RUNNING
03:35:48.632 INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Attempt to heartbeat failed for since member id transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer-6aacbde6-4553-43ee-bc2f-2b5718e55acf is not valid.
03:35:48.632 INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Found no committed offset for partition topic-0
03:35:48.633 INFO o.a.k.c.c.i.SubscriptionState - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Resetting offset for partition topic-0 to offset 0.
03:35:48.634 INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Giving away all assigned partitions as lost since generation has been reset,indicating that consumer is no longer part of the group
03:35:48.634 INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1-consumer, groupId=transform] Lost previously assigned partitions topic-0
03:35:48.634 INFO o.a.k.s.p.i.StreamThread - stream-thread [transform-f8268b2b-4673-49ac-9396-6a2b86d45697-StreamThread-1] at state RUNNING: partitions [topic-0] lost due to missed rebalance.
As for example first recommendation if I run just single kafka broker node then I have to set up partitions and replications configs to 1 as well second recommendation was to restart kafka broker that also gave no results
kafka:
image: wurstmeister/kafka:2.12-2.4.1
ports:
- "9092:9092"
- "9093:9093"
depends_on:
- zookeeper
links:
- zookeeper:zk
environment:
KAFKA_BROKER_ID: 1
KAFKA_LISTENERS: OUTSIDE://kafka:9092,INSIDE://kafka:9093
KAFKA_ADVERTISED_LISTENERS: OUTSIDE://localhost:9092,INSIDE://kafka:9093
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT,PLAINTEXT:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE
KAFKA_LOG_RETENTION_HOURS: 1
KAFKA_MESSAGE_MAX_BYTES: 1048576
KAFKA_REPLICA_FETCH_MAX_BYTES: 1048576
KAFKA_GROUP_MAX_SESSION_TIMEOUT_MS: 30000
KAFKA_NUM_PARTITIONS: 1
KAFKA_DEFAULT_REPLICATION_FACTOR: 1
KAFKA_OFFSETS_TOPIC_NUM_PARTITIONS: 1
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_NUM_PARTITIONS: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_DELETE_RETENTION_MS: 86400000
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_CREATE_TOPICS: topic:1:1, transform:1:1
Thanks for any help
kind regards, Victor
There can be many reasons for the observed issue. In general, exaclty-once is more expensive and puts a higher load on the brokers and the KafkaStreams application.
Also note, that if you really want to get exactly-once processing, you should run with at least 3 brokers (and topics should be configured with a replication factor of 3, and min-isr of 2). Otherwise, EOS cannot really be guaranteed.
Increasing the commit.interval.ms might help to mitigate the issue. Note, that for EOS, it might lead to higher processing latency (that is the reason why the default commit interval is reduced to 100ms if EOS is enable). If you can accept a higher latency, you might want to increase it to for example 1 seconds.
Also, there is a heavy investment into EOS and newer versions contain many improvements. If you can, you might want to upgrade to upcoming 2.6 release and test the new "eos_beta" processing mode (requires brokers 2.5 or newer).

Unable to consume with specific consumer group on a Kafka cluster

When I try to consume a topic with a specific consumer group it fails.
I can consume the same topic with a new consumer group.
When describe command is used on the topic there are no consumers attached to any partitions:
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
dracg clog 1 105288 105588 300 - - -
dracg clog 2 104232 104532 300 - - -
dracg clog 3 104525 104820 295 - - -
dracg clog 0 104941 105243 302 - - -
Even console consumer code cannot consume with this consumer group
I'm giving relevant -group join- section of console consumer log below, group leader
DEBUG [Consumer clientId=consumer-1, groupId=dracg] Joining group with current subscription: [clog] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
DEBUG [Consumer clientId=consumer-1, groupId=dracg] Sending JoinGroup (JoinGroupRequestData(groupId='dracg', sessionTimeoutMs=10000, rebalanceTimeoutMs=300000, memberId='consumer-1-0d44d911-c975-4dfe-83d9-4b96b5fc9638', groupInstanceId='null', protocolType='consumer', protocols=[JoinGroupRequestProtocol(name='range', metadata=[0, 0, 0, 0, 0, 1, 0, 23, 112, 114, 101, 112, 114, 111, 100, 45, 70, 66, 77, 66, 45, 99, 104, 97, 110, 110, 101, 108, 108, 111, 103, 0, 0, 0, 0])])) to coordinator kafkanode3.example.com:9092 (id: 2147483644 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
DEBUG [Consumer clientId=consumer-1, groupId=dracg] Received successful JoinGroup response: JoinGroupResponseData(throttleTimeMs=0, errorCode=0, generationId=5117, protocolName='range', leader='rdkafka-b5a0e7ac-8311-410f-bf04-b2b2712bad7a', memberId='consumer-1-0d44d911-c975-4dfe-83d9-4b96b5fc9638', members=[]) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
DEBUG [Consumer clientId=consumer-1, groupId=dracg] Sending follower SyncGroup to coordinator kafkanode3.example.com:9092 (id: 2147483644 rack: null): SyncGroupRequestData(groupId='dracg', generationId=5117, memberId='consumer-1-0d44d911-c975-4dfe-83d9-4b96b5fc9638', groupInstanceId='null', assignments=[]) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
INFO [Consumer clientId=consumer-1, groupId=dracg] Successfully joined group with generation 5117 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
DEBUG [Consumer clientId=consumer-1, groupId=dracg] Enabling heartbeat thread (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
INFO [Consumer clientId=consumer-1, groupId=dracg] Setting newly assigned partitions: (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
DEBUG [Consumer clientId=consumer-1, groupId=dracg] Sending Heartbeat request to coordinator kafkanode3.example.com:9092 (id: 2147483644 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
DEBUG [Consumer clientId=consumer-1, groupId=dracg] Received successful Heartbeat response (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
DEBUG [Consumer clientId=consumer-1, groupId=dracg] Sending asynchronous auto-commit of offsets {} (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
DEBUG [Consumer clientId=consumer-1, groupId=dracg] Completed asynchronous auto-commit of offsets {} (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
DEBUG [Consumer clientId=consumer-1, groupId=dracg] Sending Heartbeat request to coordinator kafkanode3.example.com:9092 (id: 2147483644 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
DEBUG [Consumer clientId=consumer-1, groupId=dracg] Received successful Heartbeat response (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
DEBUG [Consumer clientId=consumer-1, groupId=dracg] Sending Heartbeat request to coordinator kafkanode3.example.com:9092 (id: 2147483644 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
DEBUG [Consumer clientId=consumer-1, groupId=dracg] Received successful Heartbeat response (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
Kafka version 2.3.0
If you have any clue on this please share.

Kafka Consumer Group Rebalance and Group Coordinator dead

I have been playing around with Kafka (1.0.0) for a couple of months and trying to understand how consumer group works. I have a single broker Kafka and I am using Kafka-Connect-Cassandra to consume messages from topics to database tables. I have 10 topics, all have just one partition and I have a Single Consumer Group with 10 Consumer instances (one for each topic).
While running this setup I sometimes see the following logs in kafka-connect console:
1:
[Worker clientId=connect-1, groupId=connect-cluster] Marking the coordinator qa-server:9092 (id: 2147483647 rack: null) dead (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Worker clientId=connect-1, groupId=connect-cluster] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Worker clientId=connect-1, groupId=connect-cluster] Marking the coordinator qa-server:9092 (id: 2147483647 rack: null) dead (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] Marking the coordinator qa-server:9092 (id: 2147483647 rack: null) dead (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Worker clientId=connect-1, groupId=connect-cluster] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordi
nator:341)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] Revoking previously assigned partitions [topic1-0, topic2-0, ....] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:341)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:336)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] Successfully joined group with generation 349 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] Setting newly assigned partitions [topic1-0, topic2-0, ....] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:341)
After this it starts consuming messages and writes to Cassandra Tables.
This happens frequently on irregular intervals.
However, sometimes the connector stops and shuts down. Then it starts and consumes messages again. This is the log:
INFO [Worker clientId=connect-1, groupId=connect-cluster] Marking the coordinator qa-server:9092 (id: 2147483647 rack: null) dead (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
INFO [Worker clientId=connect-1, groupId=connect-cluster] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
INFO [Worker clientId=connect-1, groupId=connect-cluster] Marking the coordinator qa-server:9092 (id: 2147483647 rack: null) dead (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
INFO [Worker clientId=connect-1, groupId=connect-cluster] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
INFO WorkerSinkTask{id=cassandra-sink-casb-0} Committing offsets asynchronously using sequence number 42: {topic1-0=OffsetAndMetadata{offset=1074, metadata=''}, topic2-0=OffsetAndMetadata{offset=112, metadata=''}, ...}} (org.apache.kafka.connect.runtime.WorkerSinkTask:311)
INFO Rebalance started (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1214)
INFO Stopping connector cassandra-sink-casb (org.apache.kafka.connect.runtime.Worker:304)
INFO Stopping task cassandra-sink-casb-0 (org.apache.kafka.connect.runtime.Worker:464)
INFO Stopping Cassandra sink. (com.datamountaineer.streamreactor.connect.cassandra.sink.CassandraSinkTask:79)
INFO Shutting down Cassandra driver session and cluster. (com.datamountaineer.streamreactor.connect.cassandra.sink.CassandraJsonWriter:253)
INFO Stopped connector cassandra-sink-casb (org.apache.kafka.connect.runtime.Worker:320)
INFO Finished stopping tasks in preparation for rebalance (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1244)
INFO [Worker clientId=connect-1, groupId=connect-cluster] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:336)
INFO [Worker clientId=connect-1, groupId=connect-cluster] Successfully joined group with generation 7 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
INFO Joined group and got assignment: Assignment{error=0, leader='connect-1-1dc56cda-ed54-4181-a5f9-d11022d8e8c3', leaderUrl='http://127.0.1.1:8083/', offset=8, connectorIds=[cassandra-sink-casb], taskIds
=[cassandra-sink-casb-0]} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1192)
INFO Starting connectors and tasks using config offset 8 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:837)
INFO Starting connector cassandra-sink-casb (org.apache.kafka.connect.runtime.distributed.DistributedHerder:890)
2:
org.apache.kafka.clients.consumer.CommitFailedException:
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member.
This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms,
which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum
size of batches returned in poll() with max.poll.records.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:722)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:600)
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1250)
at org.apache.kafka.connect.runtime.WorkerSinkTask.doCommitSync(WorkerSinkTask.java:299)
at org.apache.kafka.connect.runtime.WorkerSinkTask.doCommit(WorkerSinkTask.java:327)
at org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:398)
at org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions(WorkerSinkTask.java:547)
at org.apache.kafka.connect.runtime.WorkerSinkTask.access$1300(WorkerSinkTask.java:62)
at org.apache.kafka.connect.runtime.WorkerSinkTask$HandleRebalance.onPartitionsRevoked(WorkerSinkTask.java:618)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare(ConsumerCoordinator.java:419)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:359)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:316)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:295)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1146)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1111)
at org.apache.kafka.connect.runtime.WorkerSinkTask.pollConsumer(WorkerSinkTask.java:410)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:283)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:198)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:166)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
INFO [Consumer clientId=consumer-5, groupId=connect-cassandra-sink-casb] Marking the coordinator qa-server:9092 (id: 2147483647 rack: null) dead (org.apache.kafka.clients.consumer.internals.AbstractCoordi
nator:341)
INFO [Consumer clientId=consumer-5, groupId=connect-cassandra-sink-casb] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordi
nator:341)
INFO [Consumer clientId=consumer-5, groupId=connect-cassandra-sink-casb] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:336)
INFO [Consumer clientId=consumer-5, groupId=connect-cassandra-sink-casb] Successfully joined group with generation 343 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
INFO [Consumer clientId=consumer-5, groupId=connect-cassandra-sink-casb] Setting newly assigned partitions [topic1-0, topic2-0,...] (org.apache.kafka.cl
ients.consumer.internals.ConsumerCoordinator:341)
INFO WorkerSinkTask{id=cassandra-sink-casb-0} Committing offsets asynchronously using sequence number 155: {topic1-0=OffsetAndMetadata{offset=836, metadata=''}, topic2-0=OffsetAndMetadata{offset=86, metadata=''}, ...}} (org.apache.kafka.connect.runtime.WorkerSinkTask:311)
Again sometimes Kafka-Connect starts consuming messages after the rebalance and sometimes it shuts down.
I have the following questions:
1) Why does Group Coordinator (Kafka Broker) dies?
I am looking into multiple Kafka-Configs to resolve these issues, like connections.max.idle.ms, max.poll.records, session.timeout.ms , group.min.session.timeout.ms and group.max.session.timeout.
I am not sure what the best configs would be for things to run smoothly.
2) Why does rebalance occurs?
I know group rebalance can occur on adding a new task, changing the task, etc. But I haven't changed anything. Sometimes Kafka Connect framework seem to handle the error a bit too aggressive and kills the connect tasks instead of carrying on working.