Running Kafka Connect in distributed mode, no obvious errors, but data does not end up in sink connector - apache-kafka

Running Kafka Connect via this repo https://github.com/entechlog/kafka-examples/tree/master/kafka-connect-standalone except I have added extra configs for AWS MSK IAM authentication. I've also updated the .env file to use different variables, like the AWS MSK IAM jar file, AWS key/secret key credentials, and a few other minor things. Note that this repo runs in standalone mode, but I have updated the launch shell script to run in distributed mode: exec connect-distributed /etc/"${COMPONENT}"/"${COMPONENT}".properties. But I have NOT created a file called kafka-connect.properties.template because when I do, I get a whole host of errors like Missing required configuration "group.id" which has no default value. which makes no sense to me as I can see it in the docker-compose.yml file.
My goal is to get data from a third party Kafka cluster into BigQuery.
When I run my docker-compose file, I see no errors, a few warnings, but nothing that stands out to me. I get a lot of warnings like this: [2021-12-14 01:57:37,917] WARN The configuration 'camelcase.default.dataset' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:380), and this dataset is required to send data to BigQuery. Not making sense why things like dataset are not being used.
Here are the latest logs:
[2021-12-14 01:57:37,921] INFO Kafka version: 6.2.1-ce (org.apache.kafka.common.utils.AppInfoParser:119)
[2021-12-14 01:57:37,922] INFO Kafka commitId: 14770bfc4e973178 (org.apache.kafka.common.utils.AppInfoParser:120)
[2021-12-14 01:57:37,922] INFO Kafka startTimeMs: 1639447057921 (org.apache.kafka.common.utils.AppInfoParser:121)
[2021-12-14 01:57:39,332] INFO [Producer clientId=producer-3] Cluster ID: k2eIXxm_RkmWu2-R2d0N1Q (org.apache.kafka.clients.Metadata:279)
[2021-12-14 01:57:39,413] INFO [Consumer clientId=consumer-connect-kafka-connect-group-3, groupId=connect-kafka-connect-group] Cluster ID: k2eIXxm_RkmWu2-R2d0N1Q (org.apache.kafka.clients.Metadata:279)
[2021-12-14 01:57:39,428] INFO [Consumer clientId=consumer-connect-kafka-connect-group-3, groupId=connect-kafka-connect-group] Subscribed to partition(s): connect-configs-0 (org.apache.kafka.clients.consumer.KafkaConsumer:1123)
[2021-12-14 01:57:39,428] INFO [Consumer clientId=consumer-connect-kafka-connect-group-3, groupId=connect-kafka-connect-group] Seeking to EARLIEST offset of partition connect-configs-0 (org.apache.kafka.clients.consumer.internals.SubscriptionState:619)
[2021-12-14 01:57:40,727] INFO Finished reading KafkaBasedLog for topic connect-configs (org.apache.kafka.connect.util.KafkaBasedLog:228)
[2021-12-14 01:57:40,728] INFO Started KafkaBasedLog for topic connect-configs (org.apache.kafka.connect.util.KafkaBasedLog:230)
[2021-12-14 01:57:40,729] INFO Started KafkaConfigBackingStore (org.apache.kafka.connect.storage.KafkaConfigBackingStore:290)
[2021-12-14 01:57:40,729] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Herder started (org.apache.kafka.connect.runtime.distributed.DistributedHerder:312)
[2021-12-14 01:57:45,059] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Cluster ID: k2eIXxm_RkmWu2-R2d0N1Q (org.apache.kafka.clients.Metadata:279)
[2021-12-14 01:57:45,089] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Discovered group coordinator <bootstrap server and port here> (id: 2147483643 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:848)
[2021-12-14 01:57:45,095] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Rebalance started (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator:221)
[2021-12-14 01:57:45,095] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:538)
[2021-12-14 01:57:45,957] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:538)
[2021-12-14 01:57:49,038] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Successfully joined group with generation Generation{generationId=1, memberId='connect-1-a4cc5355-60da-46c3-8228-bff2de664f2c', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:594)
[2021-12-14 01:57:49,160] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Successfully synced group in generation Generation{generationId=1, memberId='connect-1-a4cc5355-60da-46c3-8228-bff2de664f2c', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:758)
[2021-12-14 01:57:49,161] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Joined group at generation 1 with protocol version 2 and got assignment: Assignment{error=0, leader='connect-1-a4cc5355-60da-46c3-8228-bff2de664f2c', leaderUrl='http://kafka-connect:8083/', offset=10, connectorIds=[], taskIds=[], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with rebalance delay: 0 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1694)
[2021-12-14 01:57:49,162] WARN [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Catching up to assignment's config offset. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1119)
[2021-12-14 01:57:49,162] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Current config state offset -1 is behind group assignment 10, reading to end of config log (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1183)
[2021-12-14 01:57:49,359] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Finished reading to end of log and updated config snapshot, new config log offset: 10 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1190)
[2021-12-14 01:57:49,359] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Starting connectors and tasks using config offset 10 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1244)
[2021-12-14 01:57:49,359] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1272)
[2021-12-14 01:57:50,791] INFO [Worker clientId=connect-1, groupId=connect-kafka-connect-group] Session key updated (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1582)
And also this:
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource will be ignored.
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource will be ignored.
WARNING: The (sub)resource method createConnector in org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource contains empty path annotation.
WARNING: The (sub)resource method listConnectors in org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource contains empty path annotation.
WARNING: The (sub)resource method listConnectorPlugins in org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource contains empty path annotation.
And when I navigate to http://localhost:8083/connectors in my browser, I get an empty list [].

Related

Confluent Control Center failure: Unable to fetch consumer offsets for cluster id

I am running confluent platform (version 6.1.1). I deploy the following components: 3 Brokers, 3 ZK, Schema Registry, 3 Kafka Connect, KSQL and Confluent Control Center (CCC).
The CCC has entered into a failed state and I have difficulties to bring it back.
To make things cleaner, I have created another EC2 instance (m4.2xlarge) where I configured new CCC with the aim to connect it to the current cluster. New CCC has exactly the same configuration as the failed one, but with a different confluent.controlcenter.id.
I start the CCC and it is running. I can access the CCC UI but it is not working properly: the pages are loading too long, it keeps showing the changing state of the connect cluster (sometimes healthy, sometimes not), it keeps showing the changing state of the brokers (sometimes healthy, sometimes not)
For example it looks like this (see screenshots):
After running certain amount of time, it is automatically restarted and keeps restarting every 5-7 minutes.
When it is started, I see a bunch of new topics created in the Kafka cluster.
After that in the control-center.log I see :
INFO [main] Setting offsets for topic=_confluent-monitoring (io.confluent.controlcenter.KafkaHelper)
INFO [main] found 12 topicPartitions for topic=_confluent-monitoring (io.confluent.controlcenter.KafkaHelper)
INFO [main] Setting offsets for topic=_confluent-metrics (io.confluent.controlcenter.KafkaHelper)
INFO [main] found 12 topicPartitions for topic=_confluent-metrics (io.confluent.controlcenter.KafkaHelper)
INFO [main] action=starting topology=command (io.confluent.controlcenter.ControlCenter)
INFO [main] waiting for streams to be in running state REBALANCING (io.confluent.command.CommandStore)
INFO [main] Streams state RUNNING (io.confluent.command.CommandStore)
INFO [main] action=started topology=command (io.confluent.controlcenter.ControlCenter)
INFO [main] action=starting operation=command-migration (io.confluent.controlcenter.ControlCenter)
INFO [main] action=completed operation=command-migration (io.confluent.controlcenter.ControlCenter)
INFO [main] action=starting topology=monitoring (io.confluent.controlcenter.ControlCenter)
INFO [main] action=started topology=monitoring (io.confluent.controlcenter.ControlCenter)
INFO [main] Starting Health Check (io.confluent.controlcenter.ControlCenter)
INFO [main] Starting Alert Manager (io.confluent.controlcenter.ControlCenter)
INFO [main] Starting Consumer Offsets Fetch (io.confluent.controlcenter.ControlCenter)
INFO [control-center-heartbeat-0] current clusterId=lCRehAk0RqmLR04nhXKHtA (io.confluent.controlcenter.healthcheck.HealthCheck)
INFO [control-center-heartbeat-0] broker id set has changed new={1001=[10.251.xx.xx:9093 (id: 1001 rack: null)], 1002=[10.251.xx.xx:9093 (id: 1002 rack: null)], 1003=[10.251.xx.xx:9093 (id: 1003 rack: null)]} removed={} (io.confluent.controlcenter.healthcheck.HealthCheck)
INFO [control-center-heartbeat-0] new controller=10.251.xx.xx:9093 (id: 1002 rack: null) (io.confluent.controlcenter.healthcheck.HealthCheck)
INFO [main] Initial capacity 128, increased by 64, maximum capacity 2147483647. (io.confluent.rest.ApplicationServer)
INFO [main] Adding listener: http://0.0.0.0:9021 (io.confluent.rest.ApplicationServer)
INFO [main] x509=X509#3a8ead9(ip-44-135-xx-xx.eu-central-1.compute.internal,h=[ip-44-135-xx-xx.eu-central-1.compute.internal],w=[]) for Server#7c8b37a8[provider=null,keyStore=file:///var/kafka-ssl/server.keystore.jks,trustStore=file:///var/kafka-ssl/client.truststore.jks] (org.eclipse.jetty.util.ssl.SslContextFactory)
INFO [main] x509=X509#3831f4c2(caroot,h=[eu-central-1.compute.internal],w=[]) for Server#7c8b37a8[provider=null,keyStore=file:///var/kafka-ssl/server.keystore.jks,trustStore=file:///var/kafka-ssl/client.truststore.jks] (org.eclipse.jetty.util.ssl.SslContextFactory)
INFO [main] jetty-9.4.38.v20210224; built: 2021-02-24T20:25:07.675Z; git: 288f3cc74549e8a913bf363250b0744f2695b8e6; jvm 11.0.13+8-LTS (org.eclipse.jetty.server.Server)
INFO [main] DefaultSessionIdManager workerName=node0 (org.eclipse.jetty.server.session)
INFO [main] No SessionScavenger set, using defaults (org.eclipse.jetty.server.session)
INFO [main] node0 Scavenging every 660000ms (org.eclipse.jetty.server.session)
INFO [main] Started o.e.j.s.ServletContextHandler#1ef5cde4{/,[jar:file:/usr/share/java/acl/acl-6.1.1.jar!/io/confluent/controlcenter/rest/static],AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler)
INFO [main] Started o.e.j.s.ServletContextHandler#5401c6a8{/ws,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler)
INFO [main] Started NetworkTrafficServerConnector#5d6b5d3d{HTTP/1.1, (http/1.1)}{0.0.0.0:9021} (org.eclipse.jetty.server.AbstractConnector)
INFO [main] Started #36578ms (org.eclipse.jetty.server.Server)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=monitoring-input-topic-progress-.count type=monitoring cluster= value=0.0 (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=monitoring-input-topic-progress-.rate type=monitoring cluster= value=0.0 (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=monitoring-input-topic-progress-.timestamp type=monitoring cluster= value=NaN (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=monitoring-input-topic-progress-.min type=monitoring cluster= value=1.7976931348623157E308 (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=metrics-input-topic-progress-lCRehAk0RqmLR04nhXKHtA.count type=metrics cluster=lCRehAk0RqmLR04nhXKHtA value=0.0 (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=metrics-input-topic-progress-lCRehAk0RqmLR04nhXKHtA.rate type=metrics cluster=lCRehAk0RqmLR04nhXKHtA value=0.0 (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=metrics-input-topic-progress-lCRehAk0RqmLR04nhXKHtA.timestamp type=metrics cluster=lCRehAk0RqmLR04nhXKHtA value=NaN (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=metrics-input-topic-progress-lCRehAk0RqmLR04nhXKHtA.min type=metrics cluster=lCRehAk0RqmLR04nhXKHtA value=1.7976931348623157E308 (io.confluent.controlcenter.util.StreamProgressReporter)
WARN [control-center-heartbeat-0] misconfigured topic=_confluent-command config=segment.bytes value=1073741824 expected=134217728 (io.confluent.controlcenter.healthcheck.HealthCheck)
WARN [control-center-heartbeat-0] misconfigured topic=_confluent-command config=delete.retention.ms value=86400000 expected=259200000 (io.confluent.controlcenter.healthcheck.HealthCheck)
INFO [control-center-heartbeat-0] misconfigured topic=_confluent-metrics config=min.insync.replicas value=1 expected=2 (io.confluent.controlcenter.healthcheck.HealthCheck)
WARN [control-center-heartbeat-1] Unable to fetch consumer offsets for cluster id lCRehAk0RqmLR04nhXKHtA (io.confluent.controlcenter.data.ConsumerOffsetsFetcher)
java.util.concurrent.TimeoutException
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:108)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
at io.confluent.controlcenter.data.ConsumerOffsetsDao.getAllConsumerGroupDescriptions(ConsumerOffsetsDao.java:220)
at io.confluent.controlcenter.data.ConsumerOffsetsDao.getAllConsumerGroupOffsets(ConsumerOffsetsDao.java:58)
at io.confluent.controlcenter.data.ConsumerOffsetsFetcher.run(ConsumerOffsetsFetcher.java:73)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
WARN [kafka-admin-client-thread | adminclient-3] failed fetching description for consumerGroup=_confluent-ksql-eim_ksql_non_prodquery_CSAS_SDL_STMTS_GG_347 (io.confluent.controlcenter.data.ConsumerOffsetsDao)
org.apache.kafka.common.errors.TimeoutException: Call(callName=describeConsumerGroups, deadlineMs=1654853629184, tries=1, nextAllowedTryMs=1654853629324) timed out at 1654853629224 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.DisconnectException: Cancelled describeConsumerGroups request with correlation id 168 due to node 1001 being disconnected
WARN [kafka-admin-client-thread | adminclient-3] failed fetching description for consumerGroup=connect-mongo-dci-grid-partner-test11 (io.confluent.controlcenter.data.ConsumerOffsetsDao)
org.apache.kafka.common.errors.TimeoutException: Call(callName=describeConsumerGroups, deadlineMs=1654853629184, tries=1, nextAllowedTryMs=1654853629324) timed out at 1654853629224 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: describeConsumerGroups
WARN [kafka-admin-client-thread | adminclient-3] failed fetching description for consumerGroup=_confluent-ksql-eim_ksql_non_prodquery_CSAS_SDL_STMTS_UPWARD_GG_355 (io.confluent.controlcenter.data.ConsumerOffsetsDao)
org.apache.kafka.common.errors.TimeoutException: Call(callName=describeConsumerGroups, deadlineMs=1654853629184, tries=1, nextAllowedTryMs=1654853629324) timed out at 1654853629224 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: describeConsumerGroups
WARN [kafka-admin-client-thread | adminclient-3] failed fetching description for consumerGroup=_eim_c3_non_prod-4 (io.confluent.controlcenter.data.ConsumerOffsetsDao)
org.apache.kafka.common.errors.TimeoutException: Call(callName=describeConsumerGroups, deadlineMs=1654853629184, tries=1, nextAllowedTryMs=1654853629324) timed out at 1654853629224 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: describeConsumerGroups
...
and so on...
WARN [control-center-heartbeat-1] Unable to fetch consumer offsets for cluster id lCRehAk0RqmLR04nhXKHtA (io.confluent.controlcenter.data.ConsumerOffsetsFetcher)
java.util.concurrent.TimeoutException
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:108)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
at io.confluent.controlcenter.data.ConsumerOffsetsDao.getAllConsumerGroupDescriptions(ConsumerOffsetsDao.java:220)
at io.confluent.controlcenter.data.ConsumerOffsetsDao.getAllConsumerGroupOffsets(ConsumerOffsetsDao.java:58)
at io.confluent.controlcenter.data.ConsumerOffsetsFetcher.run(ConsumerOffsetsFetcher.java:73)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
and so on...
In the control-center-kafka.log I see:
INFO [control-center-heartbeat-1] Kafka version: 6.1.1-ce (org.apache.kafka.common.utils.AppInfoParser)
INFO [control-center-heartbeat-1] Kafka commitId: 73deb3aeb1f8647c (org.apache.kafka.common.utils.AppInfoParser)
INFO [control-center-heartbeat-1] Kafka startTimeMs: 1654853610852 (org.apache.kafka.common.utils.AppInfoParser)
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-5-consumer, groupId=_eim_c3_non_prod-4] Resetting offset for partition _eim_c3_non_prod-4-monitoring-message-rekey-store-7 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[10.251.6.2:9093 (id: 1002 rack: null)], epoch=0}}. (org.apache.kafka.clients.consumer.internals.SubscriptionState)
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-5-consumer, groupId=_eim_c3_non_prod-4] Resetting offset for partition _eim_c3_non_prod-4-monitoring-trigger-event-rekey-7 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[10.251.6.2:9093 (id: 1002 rack: null)], epoch=0}}. (org.apache.kafka.clients.consumer.internals.SubscriptionState)
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-5-consumer, groupId=_eim_c3_non_prod-4] Resetting offset for partition _eim_c3_non_prod-4-MonitoringStream-ONE_MINUTE-repartition-7 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[10.251.6.2:9093 (id: 1002 rack: null)], epoch=0}}. (org.apache.kafka.clients.consumer.internals.SubscriptionState)
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-5-consumer, groupId=_eim_c3_non_prod-4] Resetting offset for partition _eim_c3_non_prod-4-aggregatedTopicPartitionTableWindows-ONE_MINUTE-repartition-7 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[10.251.6.1:9093 (id: 1001 rack: null)], epoch=0}}. (org.apache.kafka.clients.consumer.internals.SubscriptionState)
and so on ...
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-10-consumer, groupId=_eim_c3_non_prod-4] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 1003: (org.apache.kafka.clients.FetchSessionHandler)
org.apache.kafka.common.errors.DisconnectException
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-3] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-3-consumer, groupId=_eim_c3_non_prod-4] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 1002: (org.apache.kafka.clients.FetchSessionHandler)
org.apache.kafka.common.errors.DisconnectException
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-3-consumer, groupId=_eim_c3_non_prod-4] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 1001: (org.apache.kafka.clients.FetchSessionHandler)
org.apache.kafka.common.errors.DisconnectException
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-10] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-10-consumer, groupId=_eim_c3_non_prod-4] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 1002: (org.apache.kafka.clients.FetchSessionHandler)
org.apache.kafka.common.errors.DisconnectException
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-5-consumer, groupId=_eim_c3_non_prod-4] Error sending fetch request (sessionId=1478925475, epoch=1) to node 1003: (org.apache.kafka.clients.FetchSessionHandler)
org.apache.kafka.common.errors.DisconnectException
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-6-consumer, groupId=_eim_c3_non_prod-4] Error sending fetch request (sessionId=1947312909, epoch=1) to node 1002: (org.apache.kafka.clients.FetchSessionHandler)
org.apache.kafka.common.errors.DisconnectException
and so on ...
Any ideas what can be wrong here?

kafka stream client shutdown

I have an apache kafka streams application . I notice that it sometimes shutsdown when a rebalancing occurs with no real reason for the shutdown . It doesn't even throw an exception.
Here are some logs on the same
[2022-03-08 17:13:37,024] INFO [Consumer clientId=svc-stream-collector-StreamThread-1-consumer, groupId=svc-stream-collector] Adding newly assigned partitions: (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2022-03-08 17:13:37,024] ERROR stream-thread [svc-stream-collector-StreamThread-1] A Kafka Streams client in this Kafka Streams application is requesting to shutdown the application (org.apache.kafka.streams.processor.internals.StreamThread)
[2022-03-08 17:13:37,030] INFO stream-client [svc-stream-collector] State transition from REBALANCING to PENDING_ERROR (org.apache.kafka.streams.KafkaStreams)
old state:REBALANCING new state:PENDING_ERROR
[2022-03-08 17:13:37,031] INFO [Consumer clientId=svc-stream-collector-StreamThread-1-consumer, groupId=svc-stream-collector] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2022-03-08 17:13:37,032] INFO stream-thread [svc-stream-collector-StreamThread-1] Informed to shut down (org.apache.kafka.streams.processor.internals.StreamThread)
[2022-03-08 17:13:37,032] INFO stream-thread [svc-stream-collector-StreamThread-1] State transition from PARTITIONS_REVOKED to PENDING_SHUTDOWN (org.apache.kafka.streams.processor.internals.StreamThread)
[2022-03-08 17:13:37,067] INFO stream-thread [svc-stream-collector-StreamThread-1] Thread state is already PENDING_SHUTDOWN, skipping the run once call after poll request (org.apache.kafka.streams.processor.internals.StreamThread)
[2022-03-08 17:13:37,067] WARN stream-thread [svc-stream-collector-StreamThread-1] Detected that shutdown was requested. All clients in this app will now begin to shutdown (org.apache.kafka.streams.processor.internals.StreamThread)
I'm suspecting its because there are no newly assigned partitions in the log below
[2022-03-08 17:13:37,024] INFO [Consumer clientId=svc-stream-collector-StreamThread-1-consumer, groupId=svc-stream-collector] Adding newly assigned partitions: (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
However I'm not exactly sure why this error occurs . Any help would be appreciated.

kafka connect - Restating the worker causing rebalance issue

Im using a 2 node Kafka Connect in distributed mode. They are running fine, but the moment when I restart the Worker service, then the connector which was running on that node went to UNASSIGNED then exactly after 5mins it changed to ASSIGNED. I don't know why this is happening, because generally, it has to move that connector's tasks to the other running node right?
Here are the logs:(after 5mins from the worker restart)
Rebalance started [org.apache.kafka.connect.runtime.distributed.WorkerCoordinator:221]
[2021-08-17 07:23:46,120] [INFO] [Worker clientId=connect-1, groupId=debezium-cluster1] (Re-)joining group [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:538]
[2021-08-17 07:23:46,124] [INFO] [Worker clientId=connect-1, groupId=debezium-cluster1] Successfully joined group with generation Generation{generationId=27, memberId='connect-1-56d39766-4974-4203-945b-6eee4fe811e7', protocol='sessioned'} [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:594]
[2021-08-17 07:23:46,128] [INFO] [Worker clientId=connect-1, groupId=debezium-cluster1] Successfully synced group in generation Generation{generationId=27, memberId='connect-1-56d39766-4974-4203-945b-6eee4fe811e7', protocol='sessioned'} [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:758]
[2021-08-17 07:23:46,129] [INFO] [Worker clientId=connect-1, groupId=debezium-cluster1] Joined group at generation 27 with protocol version 2 and got assignment: Assignment{error=0, leader='connect-1-ccdf6d6a-eeab-423c-9611-56795d0deca9', leaderUrl='http://172.30.32.13:8083/', offset=20, connectorIds=[mysql-connector-01], taskIds=[mysql-connector-01-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with rebalance delay: 0 [org.apache.kafka.connect.runtime.distributed.DistributedHerder:1694]
[2021-08-17 07:23:46,129] [INFO] [Worker clientId=connect-1, groupId=debezium-cluster1] Starting connectors and tasks using config offset 20 [org.apache.kafka.connect.runtime.distributed.DistributedHerder:1244]
[2021-08-17 07:23:46,130] [INFO] [Worker clientId=connect-1, groupId=debezium-cluster1] Starting task mysql-connector-01-0 [org.apache.kafka.connect.runtime.distributed.DistributedHerder:1286]
[2021-08-17 07:23:46,131] [INFO] [Worker clientId=connect-1, groupId=debezium-cluster1] Starting connector mysql-connector-01 [org.apache.kafka.connect.runtime.distributed.DistributedHerder:1321]
I tried to restart the connector, but its not working.
curl -X POST 172.30.34.99:8083/connectors/mysql-connector-01/restart
{"error_code":409,"message":"Cannot complete request momentarily due to no known leader URL, likely because a rebalance was underway."}
I found the cause for this, Its due to Kafka's scheduled rebalance delay. An awesome blog to know more about it - https://www.confluent.io/blog/incremental-cooperative-rebalancing-in-kafka/

Confluent RabbitMQ Source Connector - configuration, license related error?

our Kafka setup consists of brokers on AWS MSK, Confluent Kafka Connect (confluentinc/cp-kafka-connect:5.5.1) on AWS EKS pod.
We are trying to use Confluent RabbitMQ Source Connector (trial version of commercial connector) https://docs.confluent.io/5.5.1/connect/kafka-connect-rabbitmq/index.html and getting below error .
Connector Config -
{
"connector.class": "io.confluent.connect.rabbitmq.RabbitMQSourceConnector",
"confluent.topic.bootstrap.servers": "b-1.###.amazonaws.com:9092, b-2.###.amazonaws.com:9092,b-3.###.amazonaws.com:9092,b-4.###.amazonaws.com:9092",
"tasks.max": "1",
"rabbitmq.password": "user",
"rabbitmq.queue": "my_queue",
"rabbitmq.username": "pass",
"rabbitmq.virtual.host": "/",
"rabbitmq.port": "port",
"confluent.topic.replication.factor": "1",
"rabbitmq.host": "rabbit_host_ip",
"name": "Rabbit_Source_RT4",
"kafka.topic": "my_topic",
"value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter"
}
GET Connector Status -
{
"name": "Rabbit_Source_RT4,
"connector": {
"state": "FAILED",
"worker_id": "kfk-connect:8083",
"trace": "java.lang.NullPointerException\n\tat io.confluent.license.License.readFully(License.java:195)\n\tat io.confluent.license.License.loadPublicKey(License.java:187)\n\tat io.confluent.license.License.loadPublicKey(License.java:181)\n\tat io.confluent.license.LicenseManager.loadPublicKey(LicenseManager.java:553)\n\tat io.confluent.license.LicenseManager.registerOrValidateLicense(LicenseManager.java:331)\n\tat io.confluent.connect.utils.licensing.ConnectLicenseManager.registerOrValidateLicense(ConnectLicenseManager.java:257)\n\tat io.confluent.connect.rabbitmq.RabbitMQSourceConnector.doStart(RabbitMQSourceConnector.java:62)\n\tat io.confluent.connect.rabbitmq.RabbitMQSourceConnector.start(RabbitMQSourceConnector.java:56)\n\tat org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:110)\n\tat org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:135)\n\tat org.apache.kafka.connect.runtime.WorkerConnector.transitionTo(WorkerConnector.java:195)\n\tat org.apache.kafka.connect.runtime.Worker.startConnector(Worker.java:259)\n\tat org.apache.kafka.connect.runtime.distributed.DistributedHerder.startConnector(DistributedHerder.java:1229)\n\tat org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1300(DistributedHerder.java:127)\n\tat org.apache.kafka.connect.runtime.distributed.DistributedHerder$14.call(DistributedHerder.java:1245)\n\tat org.apache.kafka.connect.runtime.distributed.DistributedHerder$14.call(DistributedHerder.java:1241)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n"
},
"tasks": [],
"type": "source"
}
Connector state failed and no task created. Also tried to update this configuration, but same error everytime.
Logs -
[2021-01-07 15:21:17,884] INFO Kafka version: 5.5.1-ccs (org.apache.kafka.common.utils.AppInfoParser)
[2021-01-07 15:21:17,884] INFO Kafka commitId: a0a0000zzz0a0000 (org.apache.kafka.common.utils.AppInfoParser)
[2021-01-07 15:21:17,884] INFO Kafka startTimeMs: 1610032877884 (org.apache.kafka.common.utils.AppInfoParser)
[2021-01-07 15:21:17,884] INFO [Producer clientId=Rabbit_Source_RT4-license-manager] Cluster ID: -aAaAzxcvA1a0weaaa11A (org.apache.kafka.clients.Metadata)
[2021-01-07 15:21:17,887] INFO [Consumer clientId=Rabbit_Source_RT4-license-manager, groupId=null] Cluster ID: -aAaAzxcvA1a0weaaa11A (org.apache.kafka.clients.Metadata)
[2021-01-07 15:21:17,890] INFO [Consumer clientId=Rabbit_Source_RT4-license-manager, groupId=null] Subscribed to partition(s): _confluent-command-0 (org.apache.kafka.clients.consumer.KafkaConsumer)
[2021-01-07 15:21:17,890] INFO [Consumer clientId=Rabbit_Source_RT4-license-manager, groupId=null] Seeking to EARLIEST offset of partition _confluent-command-0 (org.apache.kafka.clients.consumer.internals.SubscriptionState)
[2021-01-07 15:21:17,899] INFO [Consumer clientId=Rabbit_Source_RT4-license-manager, groupId=null] Resetting offset for partition _confluent-command-0 to offset 0. (org.apache.kafka.clients.consumer.internals.SubscriptionState)
[2021-01-07 15:21:17,900] INFO Finished reading KafkaBasedLog for topic _confluent-command (org.apache.kafka.connect.util.KafkaBasedLog)
[2021-01-07 15:21:17,900] INFO Started KafkaBasedLog for topic _confluent-command (org.apache.kafka.connect.util.KafkaBasedLog)
[2021-01-07 15:21:17,900] INFO Started License Store (io.confluent.license.LicenseStore)
[2021-01-07 15:21:17,901] INFO Validating Confluent License (io.confluent.connect.utils.licensing.ConnectLicenseManager)
[2021-01-07 15:21:17,906] INFO Closing License Store (io.confluent.license.LicenseStore)
[2021-01-07 15:21:17,906] INFO Stopping KafkaBasedLog for topic _confluent-command (org.apache.kafka.connect.util.KafkaBasedLog)
[2021-01-07 15:21:17,908] INFO [Producer clientId=Rabbit_Source_RT4-license-manager] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer)
[2021-01-07 15:21:17,910] INFO Stopped KafkaBasedLog for topic _confluent-command (org.apache.kafka.connect.util.KafkaBasedLog)
[2021-01-07 15:21:17,910] INFO Closed License Store (io.confluent.license.LicenseStore)
[2021-01-07 15:21:17,910] ERROR WorkerConnector{id=Rabbit_Source_RT4} Error while starting connector (org.apache.kafka.connect.runtime.WorkerConnector)
java.lang.NullPointerException
at io.confluent.license.License.readFully(License.java:195)
at io.confluent.license.License.loadPublicKey(License.java:187)
at io.confluent.license.License.loadPublicKey(License.java:181)
at io.confluent.license.LicenseManager.loadPublicKey(LicenseManager.java:553)
at io.confluent.license.LicenseManager.registerOrValidateLicense(LicenseManager.java:331)
at io.confluent.connect.utils.licensing.ConnectLicenseManager.registerOrValidateLicense(ConnectLicenseManager.java:257)
at io.confluent.connect.rabbitmq.RabbitMQSourceConnector.doStart(RabbitMQSourceConnector.java:62)
at io.confluent.connect.rabbitmq.RabbitMQSourceConnector.start(RabbitMQSourceConnector.java:56)
at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:110)
at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:135)
at org.apache.kafka.connect.runtime.WorkerConnector.transitionTo(WorkerConnector.java:195)
at org.apache.kafka.connect.runtime.Worker.startConnector(Worker.java:259)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startConnector(DistributedHerder.java:1229)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1300(DistributedHerder.java:127)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$14.call(DistributedHerder.java:1245)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$14.call(DistributedHerder.java:1241)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2021-01-07 15:21:17,913] INFO Finished creating connector Rabbit_Source_RT4 (org.apache.kafka.connect.runtime.Worker)
[2021-01-07 15:21:17,913] INFO [Worker clientId=connect-1, groupId=compose-kfk-connect-group] Skipping reconfiguration of connector Rabbit_Source_RT4 since it is not running (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
[2021-01-07 15:21:17,913] INFO [Worker clientId=connect-1, groupId=compose-kfk-connect-group] Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
Output of GET /connector-plugins request contains -
{"class":"io.confluent.connect.rabbitmq.RabbitMQSourceConnector","type":"source","version":"0.0.0.0"},
Also checked and found that '_confluent-command' topic does not contain any messages.
Is it because of trial version is over and an Enterprise license will be needed OR due to some error in configuration ?
How to verify duration remaining on trial version (since we are not using Control Center) ?
Thanks in advance.

worker not recovered - Current config state offset 5 is behind group assignment 20, reading to end of config log

I am using a 3 nodes kafka connect cluster to write data from a source to to kafka topic and from topic to destination. Everything works fine in distributed mode but when one of the worker is stopped and then restarted then I am getting the below message.
[2017-09-13 23:48:44,519] WARN Catching up to assignment's config offset. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:741)
[2017-09-13 23:48:44,519] INFO Current config state offset 5 is behind group assignment 20, reading to end of config log (org.apache.kafka.connect.runtime.distributed.DistributedHerder:785)
[2017-09-13 23:48:45,018] INFO Finished reading to end of log and updated config snapshot, new config log offset: 5 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:789)
[2017-09-13 23:48:45,018] INFO Current config state offset 5 does not match group assignment 20. Forcing rebalance. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:765)
[2017-09-13 23:48:45,018] INFO Rebalance started (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1187)
[2017-09-13 23:48:45,018] INFO Wasn't unable to resume work after last rebalance, can skip stopping connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1219)
[2017-09-13 23:48:45,018] INFO (Re-)joining group connect-cluster (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:432)
[2017-09-13 23:48:45,023] INFO Successfully joined group connect-cluster with generation 38 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:399)
[2017-09-13 23:48:45,023] INFO Joined group and got assignment: Assignment{error=0, leader='connect-1-e51c1e8b-c95a-406b-8c56-2a0d4fc432f6', leaderUrl='http://10.10.10.10:8083/', offset=20, connectorIds=[], taskIds=[oracle_jdbc_sink_test-0]} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1166)
[2017-09-13 23:48:45,023] WARN Catching up to assignment's config offset. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:741)
[2017-09-13 23:48:45,023] INFO Current config state offset 5 is behind group assignment 20, reading to end of config log (org.apache.kafka.connect.runtime.distributed.DistributedHerder:785)
[2017-09-13 23:48:45,535] INFO Finished reading to end of log and updated config snapshot, new config log offset: 5 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:789)
[2017-09-13 23:48:45,535] INFO Current config state offset 5 does not match group assignment 20. Forcing rebalance. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:765)
[2017-09-13 23:48:45,535] INFO Rebalance started (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1187)
[2017-09-13 23:48:45,535] INFO Wasn't unable to resume work after last rebalance, can skip stopping connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1219)
[2017-09-13 23:48:45,535] INFO (Re-)joining group connect-cluster (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:432)
[2017-09-13 23:48:45,540] INFO Successfully joined group connect-cluster with generation 38 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:399)
[2017-09-13 23:48:45,540] INFO Joined group and got assignment: Assignment{error=0, leader='connect-1-e51c1e8b-c95a-406b-8c56-2a0d4fc432f6', leaderUrl='http://10.10.10.10:8083/', offset=20, connectorIds=[], taskIds=[oracle_jdbc_sink_test-0]} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1166)
[2017-09-13 23:48:45,540] WARN Catching up to assignment's config offset. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:741)
[2017-09-13 23:48:45,540] INFO Current config state offset 5 is behind group assignment 20, reading to end of config log (org.apache.kafka.connect.runtime.distributed.DistributedHerder:785)
[2017-09-13 23:48:46,042] INFO Finished reading to end of log and updated config snapshot, new config log offset: 5 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:789)
[2017-09-13 23:48:46,042] INFO Current config state offset 5 does not match group assignment 20. Forcing rebalance. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:765)
[2017-09-13 23:48:46,042] INFO Rebalance started (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1187)
[2017-09-13 23:48:46,042] INFO Wasn't unable to resume work after last rebalance, can skip stopping connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1219)
You can try resolving this issue by deleting and recreating config topics or changing the group Id.