I have Spring Boot application running on Kubernetes connected to Mongo database (deployed with Mongo Operator)
Sometimes - I can't see a pattern so far - the connection between Spring application & MongoDB breaks. After some time connection is restored but this may cause an error to the user.
Logs from application:
12:00:32.606 INFO org.mongodb.driver.connection : Closed connection [connectionId{localValue:3, serverValue:17292}] to kompas2mongo-0.kompas2mongo-svc.dev.svc.cluster.local:27017 because there was a socket exception raised by this connection.
12:00:32.607 INFO org.mongodb.driver.cluster : No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=UNKNOWN, connectionMode=SINGLE, serverDescriptions=[ServerDescription{address=kompas2mongo-0.kompas2mongo-svc.dev.svc.cluster.local:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketReadException: Prematurely reached end of stream}}]}. Waiting for 30000 ms before timing out
12:00:32.612 INFO org.mongodb.driver.cluster : Exception in monitor thread while connecting to server kompas2mongo-0.kompas2mongo-svc.dev.svc.cluster.local:27017
com.mongodb.MongoSocketOpenException: Exception opening socket
at com.mongodb.internal.connection.SocketStream.open(SocketStream.java:70)
at com.mongodb.internal.connection.InternalStreamConnection.open(InternalStreamConnection.java:143)
at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.lookupServerDescription(DefaultServerMonitor.java:188)
at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:144)
at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.Net.pollConnect(Native Method)
at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:589)
at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:542)
at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:333)
at java.base/java.net.Socket.connect(Socket.java:648)
at com.mongodb.internal.connection.SocketStreamHelper.initialize(SocketStreamHelper.java:107)
at com.mongodb.internal.connection.SocketStream.initializeSocket(SocketStream.java:79)
at com.mongodb.internal.connection.SocketStream.open(SocketStream.java:65)
... 4 common frames omitted
12:00:42.808 INFO org.mongodb.driver.cluster : Cluster description not yet available. Waiting for 30000 ms before timing out
12:01:02.612 ERROR o.a.c.c.C.[.[.[.[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [/api] threw exception [Request processing failed; nested exception is org.springframework.dao.DataAccessResourceFailureException: Prematurely reached end of stream; nested exception is com.mongodb.MongoSocketReadException: Prematurely reached end of stream] with root cause
com.mongodb.MongoSocketReadException: Prematurely reached end of stream
Logs from MongoDB:
2022-08-21T12:00:33.583Z INFO controllers/replica_set_controller.go:132 Reconciling MongoDB {"ReplicaSet": "dev/kompas2mongo"}
2022-08-21T12:00:33.583Z DEBUG controllers/replica_set_controller.go:134 Validating MongoDB.Spec {"ReplicaSet": "dev/kompas2mongo"}
2022-08-21T12:00:33.583Z DEBUG controllers/replica_set_controller.go:143 Ensuring the service exists {"ReplicaSet": "dev/kompas2mongo"}
2022-08-21T12:00:33.600Z INFO controllers/replica_set_controller.go:390 The service already exists... moving forward: services "kompas2mongo-svc" already exists {"ReplicaSet": "dev/kompas2mongo"}
2022-08-21T12:00:33.600Z INFO controllers/replica_set_controller.go:308 Creating/Updating AutomationConfig {"ReplicaSet": "dev/kompas2mongo"}
2022-08-21T12:00:33.628Z INFO agent/agent_readiness.go:52 All 1 Agents have reached Goal state {"ReplicaSet": "dev/kompas2mongo"}
2022-08-21T12:00:33.628Z INFO controllers/replica_set_controller.go:288 Creating/Updating StatefulSet {"ReplicaSet": "dev/kompas2mongo"}
2022-08-21T12:00:33.639Z DEBUG controllers/replica_set_controller.go:298 Ensuring StatefulSet is ready, with type: RollingUpdate {"ReplicaSet": "dev/kompas2mongo"}
2022-08-21T12:00:33.639Z INFO controllers/mongodb_status_options.go:110 ReplicaSet is not yet ready, retrying in 10 seconds
2022-08-21T12:00:34.612Z INFO controllers/replica_set_controller.go:132 Reconciling MongoDB {"ReplicaSet": "dev/kompas2mongo"}
Mongo is configured as a Replica Set and application.yaml looks like this:
spring.data.mongodb:
host: ${MONGO_HOST}
database: ${MONGO_DATABASE}
username: ${MONGO_USERNAME}
password: ${MONGO_PASSWORD}
In documentation there is different connection string for replica set. Could this be the issue? Or do you have other ideas?
Related
I launch kafkadrop to connect to a kafka cluster running in confluent cloud:
java -jar kafdrop-3.30.0.jar
---kafka.brokerConnect=.......confluent.cloud:9092
I see error:
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: listNodes
, message=Request processing failed; nested exception is kafdrop.service.KafkaAdminClientException: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: listNodes, path=/}
2022-07-31 11:25:45.662 INFO 44652 [ionShutdownHook] i.u.Undertow : stopping server: Undertow - 2.2.16.Final
2022-07-31 11:25:45.672 INFO 44652 [ionShutdownHook] i.u.s.s.ServletContextImpl : Destroying Spring FrameworkServlet 'dispatcherServlet'
What do I miss here?
I have Artemis configuration (shared storage) with following ha-policy (for master and backup):
<ha-policy>
<shared-store>
<master>
<failover-on-shutdown>true</failover-on-shutdown>
</master>
</shared-store>
</ha-policy>
<ha-policy>
<shared-store>
<slave>
<failover-on-shutdown>true</failover-on-shutdown>
</slave>
</shared-store>
</ha-policy>
Client connection string:
(tcp://master:61616,tcp://backup:61616)?ha=true&retryInterval=1000&retryIntervalMultiplier=1.0&reconnectAttempts=10
At ~18:38 the server crashed, then at ~18:48 it recovered.
Some applications were unable to reconnect correctly without restarting with following errors...
APP 1
Master node crash:
2020-08-06 18:38:37,873 [Thread-0 (ActiveMQ-client-global-threads)] WARN org.apache.activemq.artemis.core.client - AMQ212037: Connection failure has been detected: AMQ119015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
Master Node now active (the backup went into passive mode):
2020-08-06 18:47:50,949 [Thread-1 (ActiveMQ-client-global-threads)] WARN org.apache.activemq.artemis.core.client - AMQ212037: Connection failure has been detected: AMQ119015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
Why? Other applications reconnected correctly
2020-08-06 18:47:59,994 [Thread-1 (ActiveMQ-client-global-threads)] WARN org.apache.activemq.artemis.core.client - AMQ212005: Tried 10 times to connect. Now giving up on reconnecting it.
2020-08-06 18:47:59,998 [Camel (camel-1) thread #4 - JmsConsumer[xxx]] WARN org.apache.camel.component.jms.DefaultJmsMessageListenerContainer - Setup of JMS message listener invoker failed for destination 'xxx' - trying to recover. Cause: Session is closed
2020-08-06 18:47:59,999 [Camel (camel-1) thread #4 - JmsConsumer[xxx]] INFO org.apache.camel.component.jms.DefaultJmsMessageListenerContainer - Successfully refreshed JMS Connection
2020-08-06 18:48:00,006 [Camel (camel-1) thread #3 - JmsConsumer[xxx]] WARN org.apache.camel.component.jms.DefaultJmsMessageListenerContainer - Setup of JMS message listener invoker failed for destination 'xxx' - trying to recover. Cause: Session is closed
This error this error was not fixed although the cluster was recovered:
2020-08-06 18:49:25,033 [Camel (camel-1) thread #5 - JmsConsumer[xxx]] INFO org.apache.camel.component.jms.DefaultJmsMessageListenerContainer - Successfully refreshed JMS Connection
2020-08-06 18:49:25,033 [Camel (camel-1) thread #7 - JmsConsumer[xxx]] WARN org.apache.camel.component.jms.DefaultJmsMessageListenerContainer - Setup of JMS message listener invoker failed for destination 'xxx' - trying to recover. Cause: AMQ119010: Connection is destroyed
APP 2
Master node crash:
2020-08-06 18:38:37.883 WARN 1 --- [Thread-1 (ActiveMQ-client-global-threads)] org.apache.activemq.artemis.core.client : AMQ212037: Connection failure to master/master:61616 has been detected: AMQ219015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
2020-08-06 18:38:46.935 WARN 1 --- [Thread-1 (ActiveMQ-client-global-threads)] org.apache.activemq.artemis.core.client : AMQ212005: Tried 10 times to connect. Now giving up on reconnecting it.
2020-08-06 18:38:46.939 WARN 1 --- [DefaultMessageListenerContainer-1] o.s.j.l.DefaultMessageListenerContainer : Setup of JMS message listener invoker failed for destination 'yyyy' - trying to recover. Cause: Session is closed
2020-08-06 18:38:46.945 WARN 1 --- [DefaultMessageListenerContainer-1] o.s.j.l.DefaultMessageListenerContainer : Setup of JMS message listener invoker failed for destination 'yyyy' - trying to recover. Cause: Session is closed
2020-08-06 18:38:46.963 INFO 1 --- [Thread-7] o.s.j.c.SingleConnectionFactory : Encountered a JMSException - resetting the underlying JMS Connection
javax.jms.JMSException: ActiveMQDisconnectedException[errorType=DISCONNECTED message=AMQ219015: The connection was disconnected because of server shutdown]
at org.apache.activemq.artemis.jms.client.ActiveMQConnection$JMSFailureListener.connectionFailed(ActiveMQConnection.java:750) ~[artemis-jms-client-2.10.1.jar!/:2.10.1]
at org.apache.activemq.artemis.jms.client.ActiveMQConnection$JMSFailureListener.connectionFailed(ActiveMQConnection.java:771) ~[artemis-jms-client-2.10.1.jar!/:2.10.1]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.callSessionFailureListeners(ClientSessionFactoryImpl.java:704) ~[artemis-core-client-2.10.1.jar!/:2.10.1]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.failoverOrReconnect(ClientSessionFactoryImpl.java:640) ~[artemis-core-client-2.10.1.jar!/:2.10.1]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:507) ~[artemis-core-client-2.10.1.jar!/:2.10.1]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.access$600(ClientSessionFactoryImpl.java:73) ~[artemis-core-client-2.10.1.jar!/:2.10.1]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$DelegatingFailureListener.connectionFailed(ClientSessionFactoryImpl.java:1229) ~[artemis-core-client-2.10.1.jar!/:2.10.1]
at org.apache.activemq.artemis.spi.core.protocol.AbstractRemotingConnection.callFailureListeners(AbstractRemotingConnection.java:77) ~[artemis-core-client-2.10.1.jar!/:2.10.1]
at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:220) ~[artemis-core-client-2.10.1.jar!/:2.10.1]
at org.apache.activemq.artemis.spi.core.protocol.AbstractRemotingConnection.fail(AbstractRemotingConnection.java:220) ~[artemis-core-client-2.10.1.jar!/:2.10.1]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$CloseRunnable.run(ClientSessionFactoryImpl.java:1018) ~[artemis-core-client-2.10.1.jar!/:2.10.1]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) ~[artemis-commons-2.10.1.jar!/:2.10.1]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) ~[artemis-commons-2.10.1.jar!/:2.10.1]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) ~[artemis-commons-2.10.1.jar!/:2.10.1]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[na:na]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) ~[artemis-commons-2.10.1.jar!/:2.10.1]
Caused by: org.apache.activemq.artemis.api.core.ActiveMQDisconnectedException: AMQ219015: The connection was disconnected because of server shutdown
... 7 common frames omitted
2020-08-06 18:38:51.945 WARN 1 --- [DefaultMessageListenerContainer-2] o.s.j.l.DefaultMessageListenerContainer : Setup of JMS message listener invoker failed for destination 'xxx' - trying to recover. Cause: AMQ219010: Connection is destroyed
2020-08-06 18:39:21.965 ERROR 1 --- [DefaultMessageListenerContainer-2] o.s.j.l.DefaultMessageListenerContainer : Could not refresh JMS Connection for destination 'yyyy' - retrying using FixedBackOff{interval=5000, currentAttempts=0, maxAttempts=unlimited}. Cause: Failed to create session factory; nested exception is ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ219013: Timed out waiting to receive cluster topology. Group:null]
2020-08-06 18:39:52.983 ERROR 1 --- [DefaultMessageListenerContainer-2] o.s.j.l.DefaultMessageListenerContainer : Could not refresh JMS Connection for destination 'yyy' - retrying using FixedBackOff{interval=5000, currentAttempts=0, maxAttempts=unlimited}. Cause: Failed to create session factory; nested exception is ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ219013: Timed out waiting to receive cluster topology. Group:null]
2020-08-06 18:40:23.992 ERROR 1 --- [DefaultMessageListenerContainer-2] o.s.j.l.DefaultMessageListenerContainer : Could not refresh JMS Connection for destination 'yyyy' - retrying using FixedBackOff{interval=5000, currentAttempts=1, maxAttempts=unlimited}. Cause: Failed to create session factory; nested exception is ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ219013: Timed out waiting to receive cluster topology. Group:null]
Master Node now active (the backup went into passive mode):
2020-08-06 18:47:50.949 WARN 1 --- [Thread-5 (ActiveMQ-client-global-threads)] org.apache.activemq.artemis.core.client : AMQ212037: Connection failure to backup/backup:61616 has been detected: AMQ219015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
2020-08-06 18:47:53.145 ERROR 1 --- [DefaultMessageListenerContainer-2] o.s.j.l.DefaultMessageListenerContainer : Could not refresh JMS Connection for destination 'yyyy' - retrying using FixedBackOff{interval=5000, currentAttempts=8, maxAttempts=unlimited}. Cause: Failed to create session factory; nested exception is ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ219013: Timed out waiting to receive cluster topology. Group:null]
2020-08-06 18:47:53.146 WARN 1 --- [Thread-5 (ActiveMQ-client-global-threads)] org.apache.activemq.artemis.core.client : AMQ212004: Failed to connect to server.
This error this error was not fixed although the cluster was recovered:
2020-08-06 18:47:58.147 ERROR 1 --- [DefaultMessageListenerContainer-2] o.s.j.l.DefaultMessageListenerContainer : Could not refresh JMS Connection for destination 'yyyy' - retrying using FixedBackOff{interval=5000, currentAttempts=9, maxAttempts=unlimited}. Cause: Failed to create session factory; nested exception is ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=AMQ219024: Could not select a TransportConfiguration to create SessionFactory]
2020-08-06 18:48:53.160 ERROR 1 --- [DefaultMessageListenerContainer-2] o.s.j.l.DefaultMessageListenerContainer : Could not refresh JMS Connection for destination 'yyyy' - retrying using FixedBackOff{interval=5000, currentAttempts=20, maxAttempts=unlimited}. Cause: Failed to create session factory; nested exception is ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=AMQ219024: Could not select a TransportConfiguration to create SessionFactory]
I tried to reproduce the error but could not. Maybe there are ideas for the correct setting?
When the Spring Cloud Dataflow Http-Source app Pod starts on kubernetes notice following two messages in console.
Connect Timeout Exception on Url - http://localhost:8888.
Could not locate PropertySource: I/O error on GET request for "http://localhost:8888/http-source/default": Connection refused (Connection refused); nested exception is java.net.ConnectException
How to get this resolved?
subscriber to the 'errorChannel' channel
2019-09-15 05:17:26.773 INFO 1 --- [ main] o.s.i.channel.PublishSubscribeChannel : Channel 'application-1.errorChannel' has 1 subscriber(s).
2019-09-15 05:17:26.774 INFO 1 --- [ main] o.s.i.endpoint.EventDrivenConsumer : started _org.springframework.integration.errorLogger
2019-09-15 05:17:27.065 INFO 1 --- [ main] c.c.c.ConfigServicePropertySourceLocator : Fetching config from server at : http://localhost:8888
2019-09-15 05:17:27.137 INFO 1 --- [ main] c.c.c.ConfigServicePropertySourceLocator : Connect Timeout Exception on Url - http://localhost:8888. Will be trying the next url if available
2019-09-15 05:17:27.141 WARN 1 --- [ main] c.c.c.ConfigServicePropertySourceLocator : Could not locate PropertySource: I/O error on GET request for "http://localhost:8888/http-source/default": Connection refused (Connection refused); nested exception is java.net.ConnectException: Connection refused (Connection refused)
If you look carefully, the following message will be logged as a WARN in the logs.
Connect Timeout Exception on Url - http://localhost:8888.
Could not locate PropertySource: I/O error on GET request for "http://localhost:8888/http-source/default": Connection refused (Connection refused); nested exception is java.net.ConnectException
You'd see this WARN message for all the apps that we ship, SCDF, and Skipper servers that runs on K8s. This means that the apps, SCDF or Skipper don't have a config-server configured, so it defaults to the default http://localhost:8888.
Background: we provide the config-server dependency in all the apps that we ship to help you get started with it quickly.
If you don't use the config-server, that's fine; it will not cause any harm - nothing to worry, however.
I am trying to connect MongoDB as source to Kafka server but when I run Debezium Mongo source connector, I get error. I don't understand why timed out?
[2019-08-22 13:28:58,194] INFO Cluster description not yet available. Waiting for 30000 ms before timing out (org.mongodb.driver.cluster:71)
[2019-08-22 13:28:58,648] INFO Exception in monitor thread while connecting to server morgan-shard-00-00-ayfai.mongodb.net:27017 (org.mongodb.driver.cluster:76)
com.mongodb.MongoSocketReadException: Prematurely reached end of stream
at com.mongodb.internal.connection.SocketStream.read(SocketStream.java:112)
at com.mongodb.internal.connection.InternalStreamConnection.receiveResponseBuffers(InternalStreamConnection.java:554)
at com.mongodb.internal.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:425)
at com.mongodb.internal.connection.InternalStreamConnection.receiveCommandMessageResponse(InternalStreamConnection.java:289)
at com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:255)
at com.mongodb.internal.connection.CommandHelper.sendAndReceive(CommandHelper.java:83)
at com.mongodb.internal.connection.CommandHelper.executeCommand(CommandHelper.java:33)
at com.mongodb.internal.connection.InternalStreamConnectionInitializer.initializeConnectionDescription(InternalStreamConnectionInitializer.java:106)
at com.mongodb.internal.connection.InternalStreamConnectionInitializer.initialize(InternalStreamConnectionInitializer.java:63)
at com.mongodb.internal.connection.InternalStreamConnection.open(InternalStreamConnection.java:127)
at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:117)
at java.lang.Thread.run(Thread.java:748)
[2019-08-22 13:29:08,196] INFO Created connector mongodb-source-connector (org.apache.kafka.connect.cli.ConnectStandalone:112)
[2019-08-22 13:29:28,195] ERROR Error while reading the 'shards' collection in the 'config' database: Timed out after 30000 ms while waiting to connect. Client view of cluster state is {type=UNKNOWN, servers=[{address=morgan-shard-00-00-ayfai.mongodb.net:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketReadException: Prematurely reached end of stream}}] (io.debezium.connector.mongodb.ReplicaSetDiscovery:80)
com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting to connect. Client view of cluster state is {type=UNKNOWN, servers=[{address=morgan-shard-00-00-ayfai.mongodb.net:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketReadException: Prematurely reached end of stream}}]
at com.mongodb.internal.connection.BaseCluster.getDescription(BaseCluster.java:179)
at com.mongodb.internal.connection.SingleServerCluster.getDescription(SingleServerCluster.java:41)
at com.mongodb.client.internal.MongoClientDelegate.getConnectedClusterDescription(MongoClientDelegate.java:136)
at com.mongodb.client.internal.MongoClientDelegate.createClientSession(MongoClientDelegate.java:94)
at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.getClientSession(MongoClientDelegate.java:249)
at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:172)
at com.mongodb.client.internal.MongoIterableImpl.execute(MongoIterableImpl.java:132)
at com.mongodb.client.internal.MongoIterableImpl.iterator(MongoIterableImpl.java:86)
at com.mongodb.client.internal.MappingIterable.iterator(MappingIterable.java:39)
at io.debezium.connector.mongodb.MongoUtil.contains(MongoUtil.java:183)
at io.debezium.connector.mongodb.MongoUtil.contains(MongoUtil.java:172)
at io.debezium.connector.mongodb.MongoUtil.onDatabase(MongoUtil.java:116)
at io.debezium.connector.mongodb.MongoUtil.onCollection(MongoUtil.java:131)
at io.debezium.connector.mongodb.MongoUtil.onCollectionDocuments(MongoUtil.java:150)
at io.debezium.connector.mongodb.ReplicaSetDiscovery.getReplicaSets(ReplicaSetDiscovery.java:67)
at io.debezium.connector.mongodb.ReplicaSetMonitorThread.run(ReplicaSetMonitorThread.java:65)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2019-08-22 13:29:28,195] INFO Cluster description not yet available. Waiting for 30000 ms before timing out (org.mongodb.driver.cluster:71)
[2019-08-22 13:29:58,196] ERROR Error while trying to get information about the replica sets (io.debezium.connector.mongodb.ReplicaSetMonitorThread:87)
com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting to connect. Client view of cluster state is {type=UNKNOWN, servers=[{address=morgan-shard-00-00-ayfai.mongodb.net:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketReadException: Prematurely reached end of stream}}]
at com.mongodb.internal.connection.BaseCluster.getDescription(BaseCluster.java:179)
at com.mongodb.internal.connection.SingleServerCluster.getDescription(SingleServerCluster.java:41)
at com.mongodb.Mongo.getClusterDescription(Mongo.java:412)
at com.mongodb.Mongo.getReplicaSetStatus(Mongo.java:455)
at io.debezium.connector.mongodb.ReplicaSetDiscovery.getReplicaSets(ReplicaSetDiscovery.java:85)
at io.debezium.connector.mongodb.ReplicaSetMonitorThread.run(ReplicaSetMonitorThread.java:65)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Without looking at the connector configuration it would be impossible to know the exact issue but from my experience with MongoDB Atlas and Debezium, it can be an issue with SSL.
Try enabling SSL
mongodb.ssl.enabled: true
https://debezium.io/documentation/reference/1.2/connectors/mongodb.html#mongodb-property-mongodb-ssl-enabled
In an attempt to reduce the storage on my AWS instance I decided to launch a new, smaller instance and setup Kafka again from scratch using the Ansible playbook we had from before. I then terminated the old, larger instance and took its IP address that it and the other brokers were using and put it on my new instance.
When tailing my Zookeeper logs however I'm receiving this error -
2018-04-13 14:17:34,884 [myid:2] - WARN [RecvWorker:1:QuorumCnxManager$RecvWorker#810] - Connection broken for id 1, my id = 2, error =
java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:153)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.net.SocketInputStream.read(SocketInputStream.java:211)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:795)
2018-04-13 14:17:34,885 [myid:2] - WARN [RecvWorker:1:QuorumCnxManager$RecvWorker#813] - Interrupting SendWorker
2018-04-13 14:17:34,884 [myid:2] - WARN [SendWorker:1:QuorumCnxManager$SendWorker#727] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:879)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:65)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:715)
I double checked and all 3 Kafka broker IP addresses are correctly listed in these location and I restarted all their services to be safe.
/etc/hosts
/etc/kafka/config/server.properties
/etc/zookeeper/conf/zoo.cfg
/etc/filebeat/filebeat.yml