Logstash & Kafka - java.io.EOFException: null. Node -1 disconnected - apache-kafka

I am trying to set up ELK stack with one of my Logstash filter on my local machine.
I have an input file that comes into a kafka queue that is parse by my filter. Which outputs to elasticsearch. When I run my test.sh that runs the logstash filter on the input file, this is the resulting errors in logstash --debug
I am not sure what this error could be, all of my settings are localhost and default ports. Any guidance as this error does not tell me much.
[2019-01-23T15:09:27,263][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Initialize connection to node localhost:2181 (id: -1 rack: null) for sending metadata request
[2019-01-23T15:09:27,264][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Initiating connection to node localhost:2181 (id: -1 rack: null)
[2019-01-23T15:09:27,265][DEBUG][org.apache.kafka.common.network.Selector] [Consumer clientId=logstash-0, groupId=test-retry] Created socket with SO_RCVBUF = 342972, SO_SNDBUF = 146988, SO_TIMEOUT = 0 to node -1
[2019-01-23T15:09:27,265][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Completed connection to node -1. Fetching API versions.
[2019-01-23T15:09:27,265][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Initiating API versions fetch from node -1.
[2019-01-23T15:09:27,266][DEBUG][org.apache.kafka.common.network.Selector] [Consumer clientId=logstash-0, groupId=test-retry] Connection with localhost/127.0.0.1 disconnected
java.io.EOFException: null
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:96) ~[kafka-clients-2.0.1.jar:?]
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:335) ~[kafka-clients-2.0.1.jar:?]
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:296) ~[kafka-clients-2.0.1.jar:?]
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:562) ~[kafka-clients-2.0.1.jar:?]
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:498) [kafka-clients-2.0.1.jar:?]
at org.apache.kafka.common.network.Selector.poll(Selector.java:427) [kafka-clients-2.0.1.jar:?]
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:510) [kafka-clients-2.0.1.jar:?]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:271) [kafka-clients-2.0.1.jar:?]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242) [kafka-clients-2.0.1.jar:?]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) [kafka-clients-2.0.1.jar:?]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:161) [kafka-clients-2.0.1.jar:?]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:243) [kafka-clients-2.0.1.jar:?]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:314) [kafka-clients-2.0.1.jar:?]
at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1218) [kafka-clients-2.0.1.jar:?]
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1181) [kafka-clients-2.0.1.jar:?]
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115) [kafka-clients-2.0.1.jar:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_172]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_172]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_172]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_172]
at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:423) [jruby-complete-9.1.13.0.jar:?]
at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:290) [jruby-complete-9.1.13.0.jar:?]
at org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:28) [jruby-complete-9.1.13.0.jar:?]
at org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:90) [jruby-complete-9.1.13.0.jar:?]
at org.jruby.ir.targets.InvokeSite.invoke(InvokeSite.java:145) [jruby-complete-9.1.13.0.jar:?]
at usr.local.Cellar.logstash.$6_dot_5_dot_4.libexec.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_input_minus_kafka_minus_8_dot_2_dot_1.lib.logstash.inputs.kafka.RUBY$block$thread_runner$1(/usr/local/Cellar/logstash/6.5.4/libexec/vendor/bundle/jruby/2.3.0/gems/logstash-input-kafka-8.2.1/lib/logstash/inputs/kafka.rb:253) [jruby-complete-9.1.13.0.jar:?]
at org.jruby.runtime.CompiledIRBlockBody.callDirect(CompiledIRBlockBody.java:145) [jruby-complete-9.1.13.0.jar:?]
at org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:71) [jruby-complete-9.1.13.0.jar:?]
at org.jruby.runtime.Block.call(Block.java:124) [jruby-complete-9.1.13.0.jar:?]
at org.jruby.RubyProc.call(RubyProc.java:289) [jruby-complete-9.1.13.0.jar:?]
at org.jruby.RubyProc.call(RubyProc.java:246) [jruby-complete-9.1.13.0.jar:?]
at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:104) [jruby-complete-9.1.13.0.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
[2019-01-23T15:09:27,267][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Node -1 disconnected.
[2019-01-23T15:09:27,267][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:27,318][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:27,373][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:27,424][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:27,474][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:27,529][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:27,579][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:27,633][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:27,700][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:27,753][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:27,806][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:27,858][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:27,913][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:27,966][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:28,020][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:28,073][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:28,128][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:28,183][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:28,234][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:28,288][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Give up sending metadata request since no node is available
[2019-01-23T15:09:28,340][DEBUG][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-0, groupId=test-retry] Initialize connection to node localhost:2181 (id: -1 rack: null) for sending metadata request

You set logstash to connect to Zookeeper, not Kafka.
Initialize connection to node localhost:2181
Make sure the bootstrap server is localhost:9092, in your case

Related

KAFKA connect EOFException when running connect-standalone with confluent broker and source connector

i am trying to run a connect-standalone application with a sample soure connector. (FileSource or JDBC connector).
i am getting constantly repeating error messages like
[2022-11-14 18:33:09,641] DEBUG [local-file-source|task-0] [Producer clientId=connector-producer-local-file-source-0] Connection with xxxx.westeurope.azure.confluent.cloud/ (channelId=-1) disconnected (org.apache.kafka.common.network.Selector:606)
java.io.EOFException
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:97)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:452)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:402)
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:674)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:576)
at org.apache.kafka.common.network.Selector.poll(Selector.java:481)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:560)
at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:328)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243)
at java.base/java.lang.Thread.run(Thread.java:1589)
[2022-11-14 18:33:09,643] INFO [local-file-source|task-0] [Producer clientId=connector-producer-local-file-source-0] Node -1 disconnected. (org.apache.kafka.clients.NetworkClient:935)
[2022-11-14 18:33:09,645] DEBUG [local-file-source|task-0] [Producer clientId=connector-producer-local-file-source-0] Cancelled in-flight API_VERSIONS request with correlation id 0 due to node -1 being disconnected (elapsed time since creation: 34ms, elapsed time since send: 34ms, request timeout: 30000ms): ApiVersionsRequestData(clientSoftwareName='apache-kafka-java', clientSoftwareVersion='3.2.1') (org.apache.kafka.clients.NetworkClient:335)
[2022-11-14 18:33:09,647] WARN [local-file-source|task-0] [Producer clientId=connector-producer-local-file-source-0] Bootstrap broker xxxx.westeurope.azure.confluent.cloud:9092 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient:1063)
[2022-11-14 18:33:09,757] DEBUG [local-file-source|task-0] [Producer clientId=connector-producer-local-file-source-0] Initialize connection to node xxxx.westeurope.azure.confluent.cloud:9092 (id: -1 rack: null) for sending metadata request (org.apache.kafka.clients.NetworkClient:1160)
[2022-11-14 18:33:09,758] DEBUG [local-file-source|task-0] Resolved host xxxx.westeurope.azure.confluent.cloud as (org.apache.kafka.clients.ClientUtils:113)
[2022-11-14 18:33:09,758] DEBUG [local-file-source|task-0] [Producer clientId=connector-producer-local-file-source-0] Initiating connection to node xxxx.westeurope.azure.confluent.cloud:9092 (id: -1 rack: null) using address xxxx.westeurope.azure.confluent.cloud/ (org.apache.kafka.clients.NetworkClient:989)
[2022-11-14 18:33:09,787] DEBUG [local-file-source|task-0] [Producer clientId=connector-producer-local-file-source-0] Created socket with SO_RCVBUF = 32768, SO_SNDBUF = 131072, SO_TIMEOUT = 0 to node -1 (org.apache.kafka.common.network.Selector:531)
[2022-11-14 18:33:09,788] DEBUG [local-file-source|task-0] [Producer clientId=connector-producer-local-file-source-0] Completed connection to node -1. Fetching API versions. (org.apache.kafka.clients.NetworkClient:951)
[2022-11-14 18:33:09,789] DEBUG [local-file-source|task-0] [Producer clientId=connector-producer-local-file-source-0] Initiating API versions fetch from node -1. (org.apache.kafka.clients.NetworkClient:965)
[2022-11-14 18:33:09,789] DEBUG [local-file-source|task-0] [Producer clientId=connector-producer-local-file-source-0] Sending API_VERSIONS request with header RequestHeader(apiKey=API_VERSIONS, apiVersion=3, clientId=connector-producer-local-file-source-0, correlationId=1) and timeout 30000 to node -1: ApiVersionsRequestData(clientSoftwareName='apache-kafka-java', clientSoftwareVersion='3.2.1') (org.apache.kafka.clients.NetworkClient:521)
[2022-11-14 18:33:09,817] DEBUG [local-file-source|task-0] [Producer clientId=connector-producer-local-file-source-0] Connection with xxxx.westeurope.azure.confluent.cloud/ (channelId=-1) disconnected
I can create a topic with the kafka-topics.sh command, write messages through the console producer to the topic and read from the topic through console consumer as well as with connect-standalone with sink connectors.
If i am running kafka server and zookeper locally everything seems to work fine.
commandline:
bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties
connect-standalone.properties
bootstrap.servers=pkc-pj9zy.westeurope.azure.confluent.cloud:9092
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
ssl.endpoint.identification.algorithm=https
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="name" password="passphrase";
ssl.protocol=TLSv1.2
ssl.enabled.protocols=TLSv1.2
plugin.path=./plugins,./libs
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
connect-file-source.properties
name=local-file-source
connector.class=FileStreamSource
tasks.max=1
file=test.txt
topic=test_ksql_af_file_source-test
auto.create=true
auto.evolve=true
Got it sorted out.
was missing the properties
producer.security.protocol=SASL_SSL
producer.sasl.mechanism=PLAIN
producer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="name" password="passphrase";
in my settings.
The log did unfortunately not give a hint about this

Confluent Control Center failure: Unable to fetch consumer offsets for cluster id

I am running confluent platform (version 6.1.1). I deploy the following components: 3 Brokers, 3 ZK, Schema Registry, 3 Kafka Connect, KSQL and Confluent Control Center (CCC).
The CCC has entered into a failed state and I have difficulties to bring it back.
To make things cleaner, I have created another EC2 instance (m4.2xlarge) where I configured new CCC with the aim to connect it to the current cluster. New CCC has exactly the same configuration as the failed one, but with a different confluent.controlcenter.id.
I start the CCC and it is running. I can access the CCC UI but it is not working properly: the pages are loading too long, it keeps showing the changing state of the connect cluster (sometimes healthy, sometimes not), it keeps showing the changing state of the brokers (sometimes healthy, sometimes not)
For example it looks like this (see screenshots):
After running certain amount of time, it is automatically restarted and keeps restarting every 5-7 minutes.
When it is started, I see a bunch of new topics created in the Kafka cluster.
After that in the control-center.log I see :
INFO [main] Setting offsets for topic=_confluent-monitoring (io.confluent.controlcenter.KafkaHelper)
INFO [main] found 12 topicPartitions for topic=_confluent-monitoring (io.confluent.controlcenter.KafkaHelper)
INFO [main] Setting offsets for topic=_confluent-metrics (io.confluent.controlcenter.KafkaHelper)
INFO [main] found 12 topicPartitions for topic=_confluent-metrics (io.confluent.controlcenter.KafkaHelper)
INFO [main] action=starting topology=command (io.confluent.controlcenter.ControlCenter)
INFO [main] waiting for streams to be in running state REBALANCING (io.confluent.command.CommandStore)
INFO [main] Streams state RUNNING (io.confluent.command.CommandStore)
INFO [main] action=started topology=command (io.confluent.controlcenter.ControlCenter)
INFO [main] action=starting operation=command-migration (io.confluent.controlcenter.ControlCenter)
INFO [main] action=completed operation=command-migration (io.confluent.controlcenter.ControlCenter)
INFO [main] action=starting topology=monitoring (io.confluent.controlcenter.ControlCenter)
INFO [main] action=started topology=monitoring (io.confluent.controlcenter.ControlCenter)
INFO [main] Starting Health Check (io.confluent.controlcenter.ControlCenter)
INFO [main] Starting Alert Manager (io.confluent.controlcenter.ControlCenter)
INFO [main] Starting Consumer Offsets Fetch (io.confluent.controlcenter.ControlCenter)
INFO [control-center-heartbeat-0] current clusterId=lCRehAk0RqmLR04nhXKHtA (io.confluent.controlcenter.healthcheck.HealthCheck)
INFO [control-center-heartbeat-0] broker id set has changed new={1001=[10.251.xx.xx:9093 (id: 1001 rack: null)], 1002=[10.251.xx.xx:9093 (id: 1002 rack: null)], 1003=[10.251.xx.xx:9093 (id: 1003 rack: null)]} removed={} (io.confluent.controlcenter.healthcheck.HealthCheck)
INFO [control-center-heartbeat-0] new controller=10.251.xx.xx:9093 (id: 1002 rack: null) (io.confluent.controlcenter.healthcheck.HealthCheck)
INFO [main] Initial capacity 128, increased by 64, maximum capacity 2147483647. (io.confluent.rest.ApplicationServer)
INFO [main] Adding listener: http://0.0.0.0:9021 (io.confluent.rest.ApplicationServer)
INFO [main] x509=X509#3a8ead9(ip-44-135-xx-xx.eu-central-1.compute.internal,h=[ip-44-135-xx-xx.eu-central-1.compute.internal],w=[]) for Server#7c8b37a8[provider=null,keyStore=file:///var/kafka-ssl/server.keystore.jks,trustStore=file:///var/kafka-ssl/client.truststore.jks] (org.eclipse.jetty.util.ssl.SslContextFactory)
INFO [main] x509=X509#3831f4c2(caroot,h=[eu-central-1.compute.internal],w=[]) for Server#7c8b37a8[provider=null,keyStore=file:///var/kafka-ssl/server.keystore.jks,trustStore=file:///var/kafka-ssl/client.truststore.jks] (org.eclipse.jetty.util.ssl.SslContextFactory)
INFO [main] jetty-9.4.38.v20210224; built: 2021-02-24T20:25:07.675Z; git: 288f3cc74549e8a913bf363250b0744f2695b8e6; jvm 11.0.13+8-LTS (org.eclipse.jetty.server.Server)
INFO [main] DefaultSessionIdManager workerName=node0 (org.eclipse.jetty.server.session)
INFO [main] No SessionScavenger set, using defaults (org.eclipse.jetty.server.session)
INFO [main] node0 Scavenging every 660000ms (org.eclipse.jetty.server.session)
INFO [main] Started o.e.j.s.ServletContextHandler#1ef5cde4{/,[jar:file:/usr/share/java/acl/acl-6.1.1.jar!/io/confluent/controlcenter/rest/static],AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler)
INFO [main] Started o.e.j.s.ServletContextHandler#5401c6a8{/ws,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler)
INFO [main] Started NetworkTrafficServerConnector#5d6b5d3d{HTTP/1.1, (http/1.1)}{0.0.0.0:9021} (org.eclipse.jetty.server.AbstractConnector)
INFO [main] Started #36578ms (org.eclipse.jetty.server.Server)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=monitoring-input-topic-progress-.count type=monitoring cluster= value=0.0 (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=monitoring-input-topic-progress-.rate type=monitoring cluster= value=0.0 (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=monitoring-input-topic-progress-.timestamp type=monitoring cluster= value=NaN (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=monitoring-input-topic-progress-.min type=monitoring cluster= value=1.7976931348623157E308 (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=metrics-input-topic-progress-lCRehAk0RqmLR04nhXKHtA.count type=metrics cluster=lCRehAk0RqmLR04nhXKHtA value=0.0 (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=metrics-input-topic-progress-lCRehAk0RqmLR04nhXKHtA.rate type=metrics cluster=lCRehAk0RqmLR04nhXKHtA value=0.0 (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=metrics-input-topic-progress-lCRehAk0RqmLR04nhXKHtA.timestamp type=metrics cluster=lCRehAk0RqmLR04nhXKHtA value=NaN (io.confluent.controlcenter.util.StreamProgressReporter)
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-1] name=metrics-input-topic-progress-lCRehAk0RqmLR04nhXKHtA.min type=metrics cluster=lCRehAk0RqmLR04nhXKHtA value=1.7976931348623157E308 (io.confluent.controlcenter.util.StreamProgressReporter)
WARN [control-center-heartbeat-0] misconfigured topic=_confluent-command config=segment.bytes value=1073741824 expected=134217728 (io.confluent.controlcenter.healthcheck.HealthCheck)
WARN [control-center-heartbeat-0] misconfigured topic=_confluent-command config=delete.retention.ms value=86400000 expected=259200000 (io.confluent.controlcenter.healthcheck.HealthCheck)
INFO [control-center-heartbeat-0] misconfigured topic=_confluent-metrics config=min.insync.replicas value=1 expected=2 (io.confluent.controlcenter.healthcheck.HealthCheck)
WARN [control-center-heartbeat-1] Unable to fetch consumer offsets for cluster id lCRehAk0RqmLR04nhXKHtA (io.confluent.controlcenter.data.ConsumerOffsetsFetcher)
java.util.concurrent.TimeoutException
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:108)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
at io.confluent.controlcenter.data.ConsumerOffsetsDao.getAllConsumerGroupDescriptions(ConsumerOffsetsDao.java:220)
at io.confluent.controlcenter.data.ConsumerOffsetsDao.getAllConsumerGroupOffsets(ConsumerOffsetsDao.java:58)
at io.confluent.controlcenter.data.ConsumerOffsetsFetcher.run(ConsumerOffsetsFetcher.java:73)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
WARN [kafka-admin-client-thread | adminclient-3] failed fetching description for consumerGroup=_confluent-ksql-eim_ksql_non_prodquery_CSAS_SDL_STMTS_GG_347 (io.confluent.controlcenter.data.ConsumerOffsetsDao)
org.apache.kafka.common.errors.TimeoutException: Call(callName=describeConsumerGroups, deadlineMs=1654853629184, tries=1, nextAllowedTryMs=1654853629324) timed out at 1654853629224 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.DisconnectException: Cancelled describeConsumerGroups request with correlation id 168 due to node 1001 being disconnected
WARN [kafka-admin-client-thread | adminclient-3] failed fetching description for consumerGroup=connect-mongo-dci-grid-partner-test11 (io.confluent.controlcenter.data.ConsumerOffsetsDao)
org.apache.kafka.common.errors.TimeoutException: Call(callName=describeConsumerGroups, deadlineMs=1654853629184, tries=1, nextAllowedTryMs=1654853629324) timed out at 1654853629224 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: describeConsumerGroups
WARN [kafka-admin-client-thread | adminclient-3] failed fetching description for consumerGroup=_confluent-ksql-eim_ksql_non_prodquery_CSAS_SDL_STMTS_UPWARD_GG_355 (io.confluent.controlcenter.data.ConsumerOffsetsDao)
org.apache.kafka.common.errors.TimeoutException: Call(callName=describeConsumerGroups, deadlineMs=1654853629184, tries=1, nextAllowedTryMs=1654853629324) timed out at 1654853629224 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: describeConsumerGroups
WARN [kafka-admin-client-thread | adminclient-3] failed fetching description for consumerGroup=_eim_c3_non_prod-4 (io.confluent.controlcenter.data.ConsumerOffsetsDao)
org.apache.kafka.common.errors.TimeoutException: Call(callName=describeConsumerGroups, deadlineMs=1654853629184, tries=1, nextAllowedTryMs=1654853629324) timed out at 1654853629224 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: describeConsumerGroups
...
and so on...
WARN [control-center-heartbeat-1] Unable to fetch consumer offsets for cluster id lCRehAk0RqmLR04nhXKHtA (io.confluent.controlcenter.data.ConsumerOffsetsFetcher)
java.util.concurrent.TimeoutException
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:108)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
at io.confluent.controlcenter.data.ConsumerOffsetsDao.getAllConsumerGroupDescriptions(ConsumerOffsetsDao.java:220)
at io.confluent.controlcenter.data.ConsumerOffsetsDao.getAllConsumerGroupOffsets(ConsumerOffsetsDao.java:58)
at io.confluent.controlcenter.data.ConsumerOffsetsFetcher.run(ConsumerOffsetsFetcher.java:73)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
and so on...
In the control-center-kafka.log I see:
INFO [control-center-heartbeat-1] Kafka version: 6.1.1-ce (org.apache.kafka.common.utils.AppInfoParser)
INFO [control-center-heartbeat-1] Kafka commitId: 73deb3aeb1f8647c (org.apache.kafka.common.utils.AppInfoParser)
INFO [control-center-heartbeat-1] Kafka startTimeMs: 1654853610852 (org.apache.kafka.common.utils.AppInfoParser)
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-5-consumer, groupId=_eim_c3_non_prod-4] Resetting offset for partition _eim_c3_non_prod-4-monitoring-message-rekey-store-7 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[10.251.6.2:9093 (id: 1002 rack: null)], epoch=0}}. (org.apache.kafka.clients.consumer.internals.SubscriptionState)
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-5-consumer, groupId=_eim_c3_non_prod-4] Resetting offset for partition _eim_c3_non_prod-4-monitoring-trigger-event-rekey-7 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[10.251.6.2:9093 (id: 1002 rack: null)], epoch=0}}. (org.apache.kafka.clients.consumer.internals.SubscriptionState)
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-5-consumer, groupId=_eim_c3_non_prod-4] Resetting offset for partition _eim_c3_non_prod-4-MonitoringStream-ONE_MINUTE-repartition-7 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[10.251.6.2:9093 (id: 1002 rack: null)], epoch=0}}. (org.apache.kafka.clients.consumer.internals.SubscriptionState)
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-5-consumer, groupId=_eim_c3_non_prod-4] Resetting offset for partition _eim_c3_non_prod-4-aggregatedTopicPartitionTableWindows-ONE_MINUTE-repartition-7 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[10.251.6.1:9093 (id: 1001 rack: null)], epoch=0}}. (org.apache.kafka.clients.consumer.internals.SubscriptionState)
and so on ...
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-10-consumer, groupId=_eim_c3_non_prod-4] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 1003: (org.apache.kafka.clients.FetchSessionHandler)
org.apache.kafka.common.errors.DisconnectException
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-3] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-3-consumer, groupId=_eim_c3_non_prod-4] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 1002: (org.apache.kafka.clients.FetchSessionHandler)
org.apache.kafka.common.errors.DisconnectException
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-3-consumer, groupId=_eim_c3_non_prod-4] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 1001: (org.apache.kafka.clients.FetchSessionHandler)
org.apache.kafka.common.errors.DisconnectException
INFO [_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-10] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-10-consumer, groupId=_eim_c3_non_prod-4] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 1002: (org.apache.kafka.clients.FetchSessionHandler)
org.apache.kafka.common.errors.DisconnectException
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-5-consumer, groupId=_eim_c3_non_prod-4] Error sending fetch request (sessionId=1478925475, epoch=1) to node 1003: (org.apache.kafka.clients.FetchSessionHandler)
org.apache.kafka.common.errors.DisconnectException
INFO [kafka-coordinator-heartbeat-thread | _eim_c3_non_prod-4] [Consumer clientId=_eim_c3_non_prod-4-b6c9d6bd-717d-4559-bcfe-a4c9be647b7f-StreamThread-6-consumer, groupId=_eim_c3_non_prod-4] Error sending fetch request (sessionId=1947312909, epoch=1) to node 1002: (org.apache.kafka.clients.FetchSessionHandler)
org.apache.kafka.common.errors.DisconnectException
and so on ...
Any ideas what can be wrong here?

Same offset and partition record is getting consumed twice causing duplicates

I am trying to consume records using application written in spring-kafka. I am facing very unique condition and not able to understand why this is happening ?
My consumer application is running with 2 concurrency meaning 2 consumer thread subscribed to topic having two partitions.I am consuming records and placing it into table using upsert with offset, partitions and insert timestamp.
I am seeing duplicate values with same offset and partition in the table which should not occur. There is no difference in the timestamp value, means insert occurred at the same time. I am not sure how is it possible? I don't see any issue in the log as well. I am not sure what is happening at the Producer end but we can't have 2 values at the same offset anyway, so not sure whether this is an issue at consumer end of producer end.Any suggestion or thought which would help me to triage this issue?
Kafka log :
I don't see anything unusual in the log as well.
14:29:56.318 [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version: 2.4.0
14:29:56.318 [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId: 77a89fcf8d7fa018
14:29:56.318 [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1604154596318
14:29:56.319 [main] INFO org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] Subscribed to topic(s): kaas.pe.enrollment.csp.ts2
14:29:57.914 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.Metadata - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] Cluster ID: 6cbv7QOaSW6j1vXrOCE4jA
14:29:57.914 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.Metadata - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] Cluster ID: 6cbv7QOaSW6j1vXrOCE4jA
14:29:57.915 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] Discovered group coordinator apslp1563.uhc.com:9093 (id: 2147483574 rack: null)
14:29:57.915 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] Discovered group coordinator apslp1563.uhc.com:9093 (id: 2147483574 rack: null)
14:29:57.923 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] (Re-)joining group
14:29:57.924 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] (Re-)joining group
14:29:58.121 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] (Re-)joining group
14:29:58.125 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] (Re-)joining group
14:30:13.127 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] Finished assignment for group at generation 23: {consumer-csp-prov-emerald-test-1-19d92ba5-5dc3-433d-b967-3cf1ce1b4174=org.apache.kafka.clients.consumer.ConsumerPartitionAssignor$Assignment#d7e2a1f, consumer-csp-prov-emerald-test-2-5833c212-7031-4ab1-944b-7e26f7d7a293=org.apache.kafka.clients.consumer.ConsumerPartitionAssignor$Assignment#53c3aad3}
14:30:13.131 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] Successfully joined group with generation 23
14:30:13.131 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] Successfully joined group with generation 23
14:30:13.134 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] Adding newly assigned partitions: kaas.pe.enrollment.csp.ts2-1
14:30:13.134 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] Adding newly assigned partitions: kaas.pe.enrollment.csp.ts2-0
14:30:13.143 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-1, groupId=csp-prov-emerald-test] Setting offset for partition kaas.pe.enrollment.csp.ts2-0 to the committed offset FetchPosition{offset=500387, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=apslp1559.uhc.com:9093 (id: 69 rack: null), epoch=37}}
14:30:13.143 [org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-csp-prov-emerald-test-2, groupId=csp-prov-emerald-test] Setting offset for partition kaas.pe.enrollment.csp.ts2-1 to the committed offset FetchPosition{offset=499503, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=apslp1562.uhc.com:9093 (id: 72 rack: null), epoch=36}}
Code :
#KafkaListener(topics = "${app.topic}", groupId = "${app.group_id_config}")
public void run(ConsumerRecord<String, GenericRecord> record, Acknowledgment acknowledgement) throws Exception {
try {
prov_tin_number = record.value().get("providerTinNumber").toString();
prov_tin_type = record.value().get("providerTINType").toString();
enroll_type = record.value().get("enrollmentType").toString();
vcp_prov_choice_ind = record.value().get("vcpProvChoiceInd").toString();
error_flag = "";
dataexecutor.peStremUpsertTbl(prov_tin_number, prov_tin_type, enroll_type, vcp_prov_choice_ind, error_flag,
record.partition(), record.offset());
acknowledgement.acknowledge();
}catch (Exception ex) {
System.out.println(record);
System.out.println(ex.getMessage());
}
}

Kafka send to azure event hub

I've set up a kafka in my machine and I'm trying to set up Mirror Maker to consume from a local topic and mirror it to an azure event hub, but so far i've been unable to do it and I get the following error:
ERROR Error when sending message to topic dev-eh-kafka-test with key: null, value: 5 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
After some time I realized that this must be the producer part so I tried to simply use the kafka-console-producer tool directly to event hub and got the same error.
Here is my producer settings file:
bootstrap.servers=dev-we-eh-feed.servicebus.windows.net:9093
compression.type=none
max.block.ms=0
# for event hub
sasl.mechanism=PLAIN
security.protocol=SASL_SSL
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="Endpoint=sb://dev-we-eh-feed.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=*****”;
Here is the command to spin the producer:
kafka-console-producer.bat --broker-list dev-we-eh-feed.servicebus.windows.net:9093 --topic dev-eh-kafka-test
My event hub namespace has an event hub named dev-eh-kafka-test.
Has anyone been able to do it? Eventually the idea would be to SSL this with a certificate but first I need to be able to do the connection.
I tried using both Apacha Kafka 1.1.1 or the Confluent Kafka 4.1.3 (because this is the version the client is using).
==== UPDATE 1
Someone showed me how to get more logs and this seems to be the detailed version of the error
[2020-02-28 17:32:08,010] DEBUG [Producer clientId=console-producer] Initialize connection to node dev-we-eh-feed.servicebus.windows.net:9093 (id: -1 rack: null) for sending metadata request (org.apache.kafka.clients.NetworkClient)
[2020-02-28 17:32:08,010] DEBUG [Producer clientId=console-producer] Initiating connection to node dev-we-eh-feed.servicebus.windows.net:9093 (id: -1 rack: null) (org.apache.kafka.clients.NetworkClient)
[2020-02-28 17:32:08,010] DEBUG [Producer clientId=console-producer] Created socket with SO_RCVBUF = 32768, SO_SNDBUF = 102400, SO_TIMEOUT = 0 to node -1 (org.apache.kafka.common.network.Selector)
[2020-02-28 17:32:08,010] DEBUG [Producer clientId=console-producer] Completed connection to node -1. Fetching API versions. (org.apache.kafka.clients.NetworkClient)
[2020-02-28 17:32:08,010] DEBUG [Producer clientId=console-producer] Initiating API versions fetch from node -1. (org.apache.kafka.clients.NetworkClient)
[2020-02-28 17:32:08,010] DEBUG [Producer clientId=console-producer] Connection with dev-we-eh-feed.servicebus.windows.net/51.144.238.23 disconnected (org.apache.kafka.common.network.Selector)
java.io.EOFException
at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:124)
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:93)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:235)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:196)
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:559)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:495)
at org.apache.kafka.common.network.Selector.poll(Selector.java:424)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:239)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
at java.base/java.lang.Thread.run(Thread.java:830)
[2020-02-28 17:32:08,010] DEBUG [Producer clientId=console-producer] Node -1 disconnected. (org.apache.kafka.clients.NetworkClient)
So here is the configuration that worked (it seems I was missing client.id).
Also, it seems you can not choose the destination topic, it seems it must have the same name as the source...
bootstrap.servers=dev-we-eh-feed.servicebus.windows.net:9093
client.id=mirror_maker_producer
request.timeout.ms=60000
sasl.mechanism=PLAIN
security.protocol=SASL_SSL
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="Endpoint=sb://dev-we-eh-feed.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=******";

Kafka Consumer Group Rebalance and Group Coordinator dead

I have been playing around with Kafka (1.0.0) for a couple of months and trying to understand how consumer group works. I have a single broker Kafka and I am using Kafka-Connect-Cassandra to consume messages from topics to database tables. I have 10 topics, all have just one partition and I have a Single Consumer Group with 10 Consumer instances (one for each topic).
While running this setup I sometimes see the following logs in kafka-connect console:
1:
[Worker clientId=connect-1, groupId=connect-cluster] Marking the coordinator qa-server:9092 (id: 2147483647 rack: null) dead (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Worker clientId=connect-1, groupId=connect-cluster] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Worker clientId=connect-1, groupId=connect-cluster] Marking the coordinator qa-server:9092 (id: 2147483647 rack: null) dead (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] Marking the coordinator qa-server:9092 (id: 2147483647 rack: null) dead (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Worker clientId=connect-1, groupId=connect-cluster] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordi
nator:341)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] Revoking previously assigned partitions [topic1-0, topic2-0, ....] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:341)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:336)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] Successfully joined group with generation 349 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
[Consumer clientId=consumer-7, groupId=connect-cassandra-sink-casb] Setting newly assigned partitions [topic1-0, topic2-0, ....] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:341)
After this it starts consuming messages and writes to Cassandra Tables.
This happens frequently on irregular intervals.
However, sometimes the connector stops and shuts down. Then it starts and consumes messages again. This is the log:
INFO [Worker clientId=connect-1, groupId=connect-cluster] Marking the coordinator qa-server:9092 (id: 2147483647 rack: null) dead (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
INFO [Worker clientId=connect-1, groupId=connect-cluster] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
INFO [Worker clientId=connect-1, groupId=connect-cluster] Marking the coordinator qa-server:9092 (id: 2147483647 rack: null) dead (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
INFO [Worker clientId=connect-1, groupId=connect-cluster] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
INFO WorkerSinkTask{id=cassandra-sink-casb-0} Committing offsets asynchronously using sequence number 42: {topic1-0=OffsetAndMetadata{offset=1074, metadata=''}, topic2-0=OffsetAndMetadata{offset=112, metadata=''}, ...}} (org.apache.kafka.connect.runtime.WorkerSinkTask:311)
INFO Rebalance started (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1214)
INFO Stopping connector cassandra-sink-casb (org.apache.kafka.connect.runtime.Worker:304)
INFO Stopping task cassandra-sink-casb-0 (org.apache.kafka.connect.runtime.Worker:464)
INFO Stopping Cassandra sink. (com.datamountaineer.streamreactor.connect.cassandra.sink.CassandraSinkTask:79)
INFO Shutting down Cassandra driver session and cluster. (com.datamountaineer.streamreactor.connect.cassandra.sink.CassandraJsonWriter:253)
INFO Stopped connector cassandra-sink-casb (org.apache.kafka.connect.runtime.Worker:320)
INFO Finished stopping tasks in preparation for rebalance (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1244)
INFO [Worker clientId=connect-1, groupId=connect-cluster] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:336)
INFO [Worker clientId=connect-1, groupId=connect-cluster] Successfully joined group with generation 7 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
INFO Joined group and got assignment: Assignment{error=0, leader='connect-1-1dc56cda-ed54-4181-a5f9-d11022d8e8c3', leaderUrl='http://127.0.1.1:8083/', offset=8, connectorIds=[cassandra-sink-casb], taskIds
=[cassandra-sink-casb-0]} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1192)
INFO Starting connectors and tasks using config offset 8 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:837)
INFO Starting connector cassandra-sink-casb (org.apache.kafka.connect.runtime.distributed.DistributedHerder:890)
2:
org.apache.kafka.clients.consumer.CommitFailedException:
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member.
This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms,
which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum
size of batches returned in poll() with max.poll.records.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:722)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:600)
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1250)
at org.apache.kafka.connect.runtime.WorkerSinkTask.doCommitSync(WorkerSinkTask.java:299)
at org.apache.kafka.connect.runtime.WorkerSinkTask.doCommit(WorkerSinkTask.java:327)
at org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:398)
at org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions(WorkerSinkTask.java:547)
at org.apache.kafka.connect.runtime.WorkerSinkTask.access$1300(WorkerSinkTask.java:62)
at org.apache.kafka.connect.runtime.WorkerSinkTask$HandleRebalance.onPartitionsRevoked(WorkerSinkTask.java:618)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare(ConsumerCoordinator.java:419)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:359)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:316)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:295)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1146)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1111)
at org.apache.kafka.connect.runtime.WorkerSinkTask.pollConsumer(WorkerSinkTask.java:410)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:283)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:198)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:166)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
INFO [Consumer clientId=consumer-5, groupId=connect-cassandra-sink-casb] Marking the coordinator qa-server:9092 (id: 2147483647 rack: null) dead (org.apache.kafka.clients.consumer.internals.AbstractCoordi
nator:341)
INFO [Consumer clientId=consumer-5, groupId=connect-cassandra-sink-casb] Discovered group coordinator qa-server:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordi
nator:341)
INFO [Consumer clientId=consumer-5, groupId=connect-cassandra-sink-casb] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:336)
INFO [Consumer clientId=consumer-5, groupId=connect-cassandra-sink-casb] Successfully joined group with generation 343 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
INFO [Consumer clientId=consumer-5, groupId=connect-cassandra-sink-casb] Setting newly assigned partitions [topic1-0, topic2-0,...] (org.apache.kafka.cl
ients.consumer.internals.ConsumerCoordinator:341)
INFO WorkerSinkTask{id=cassandra-sink-casb-0} Committing offsets asynchronously using sequence number 155: {topic1-0=OffsetAndMetadata{offset=836, metadata=''}, topic2-0=OffsetAndMetadata{offset=86, metadata=''}, ...}} (org.apache.kafka.connect.runtime.WorkerSinkTask:311)
Again sometimes Kafka-Connect starts consuming messages after the rebalance and sometimes it shuts down.
I have the following questions:
1) Why does Group Coordinator (Kafka Broker) dies?
I am looking into multiple Kafka-Configs to resolve these issues, like connections.max.idle.ms, max.poll.records, session.timeout.ms , group.min.session.timeout.ms and group.max.session.timeout.
I am not sure what the best configs would be for things to run smoothly.
2) Why does rebalance occurs?
I know group rebalance can occur on adding a new task, changing the task, etc. But I haven't changed anything. Sometimes Kafka Connect framework seem to handle the error a bit too aggressive and kills the connect tasks instead of carrying on working.