MongoDB Debezium Fails to connect due to ssl handshake failure - apache-kafka

I'm running a MongoDB Debezium Kafka Connector on AWS MSK, and the connector goes to the failed status with this error on the MongoDB server Error receiving request from client: SSLHandshakeFailed: The server is configured to only allow SSL connections and com.mongodb.MongoSocketReadException: Prematurely reached end of stream in the debezium logs.
Below is my debezium configuration, and I have enabled mongodb.ssl.enabled=true.
Does anybody know if I'm missing something from the configuration?
I also enabled the mongodb.ssl.invalid.hostname.allowed but that didn't fix the issue
connector.class=io.debezium.connector.mongodb.MongoDbConnector
mongodb.ssl.enabled=true
collection.include.list=***
mongodb.password=***
tasks.max=2
mongodb.user=***
mongodb.ssl.invalid.hostname.allowed=true
mongodb.hosts=***
database.include.list=***
Debezium stack trace:
at
com.mongodb.connection.BaseCluster.getDescription(BaseCluster.java:160)
at com.mongodb.Mongo.getClusterDescription(Mongo.java:378) at
com.mongodb.Mongo.getReplicaSetStatus(Mongo.java:414) at
io.debezium.connector.mongodb.ConnectionContext.clientForPrimary(ConnectionContext.java:335)
at
io.debezium.connector.mongodb.ConnectionContext.lambda$primaryClientFor$1(ConnectionContext.java:179)
at
io.debezium.connector.mongodb.ConnectionContext.lambda$primaryClientFor$2(ConnectionContext.java:188)
at
io.debezium.connector.mongodb.ConnectionContext$MongoPrimary.execute(ConnectionContext.java:258)
at
io.debezium.connector.mongodb.ConnectionContext$MongoPrimary.databaseNames(ConnectionContext.java:296)
at
io.debezium.connector.mongodb.MongoDbConnectorConfig$DatabaseRecommender.lambda$validValues$1(MongoDbConnectorConfig.java:239)
at java.base/java.util.HashMap$Values.forEach(HashMap.java:977) at
io.debezium.connector.mongodb.ReplicaSets.onEachReplicaSet(ReplicaSets.java:102)
at
io.debezium.connector.mongodb.MongoDbConnectorConfig$DatabaseRecommender.validValues(MongoDbConnectorConfig.java:236)
at io.debezium.config.Field.validate(Field.java:567) at
io.debezium.config.Field.lambda$validate$7(Field.java:583) at
java.base/java.util.Arrays$ArrayList.forEach(Arrays.java:4390) at
io.debezium.config.Field.validate(Field.java:580) at
io.debezium.config.Configuration.lambda$validate$25(Configuration.java:1653)
at
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at
java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at
java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at
java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at
java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at io.debezium.config.Field$Set.forEachTopLevelField(Field.java:127)
at io.debezium.config.Configuration.validate(Configuration.java:1652)
at
io.debezium.connector.mongodb.MongoDbConnector.validate(MongoDbConnector.java:194)
at
org.apache.kafka.connect.runtime.AbstractHerder.validateConnectorConfig(AbstractHerder.java:375)
at
org.apache.kafka.connect.runtime.AbstractHerder.lambda$validateConnectorConfig$1(AbstractHerder.java:326)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829) [2022-04-14
03:41:56,279] INFO Closing all connections to
(io.debezium.connector.mongodb.ConnectionContext:75) [2022-04-14 03:41:56,280] ERROR Uncaught exception in REST call to /connectors
(org.apache.kafka.connect.runtime.rest.errors.ConnectExceptionMapper:61)
org.apache.kafka.connect.errors.ConnectException: Unable to connect to
primary node of 'atlas-:27017' after 2 failed attempts

Related

Debezium connector fails after "Searching for WAL resume position"

I'm using debezium postgres connector to capture few tables' record in kafka. The debezium connector closed after "Searching for WAL resume position". I've give the error below. Any help would be greatly appreciated.
2022-12-12 07:46:51,834 INFO Postgres|postgres|streaming Searching for WAL resume position [io.debezium.connector.postgresql.PostgresStreamingChangeEventSource]
2022-12-12 07:46:56,548 ERROR || WorkerSourceTask{id=ksqldb-connector-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted [org.apache.kafka.connect.runtime.WorkerTask]
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:223)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:149)
at org.apache.kafka.connect.runtime.AbstractWorkerSourceTask.convertTransformedRecord(AbstractWorkerSourceTask.java:474)
at org.apache.kafka.connect.runtime.AbstractWorkerSourceTask.sendRecords(AbstractWorkerSourceTask.java:387)
at org.apache.kafka.connect.runtime.AbstractWorkerSourceTask.execute(AbstractWorkerSourceTask.java:354)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:189)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:244)
at org.apache.kafka.connect.runtime.AbstractWorkerSourceTask.run(AbstractWorkerSourceTask.java:72)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: io.apicurio.registry.rest.client.exception.RestClientException
at io.apicurio.registry.rest.client.impl.ErrorHandler.parseError(ErrorHandler.java:95)
at io.apicurio.rest.client.JdkHttpClient.sendRequest(JdkHttpClient.java:205)
at io.apicurio.registry.rest.client.impl.RegistryClientImpl.createArtifact(RegistryClientImpl.java:240)
at io.apicurio.registry.rest.client.RegistryClient.createArtifact(RegistryClient.java:143)
at io.apicurio.registry.resolver.DefaultSchemaResolver.lambda$handleAutoCreateArtifact$2(DefaultSchemaResolver.java:236)
at io.apicurio.registry.resolver.ERCache.lambda$getValue$0(ERCache.java:142)
at io.apicurio.registry.resolver.ERCache.retry(ERCache.java:181)
at io.apicurio.registry.resolver.ERCache.getValue(ERCache.java:141)
at io.apicurio.registry.resolver.ERCache.getByContent(ERCache.java:121)
at io.apicurio.registry.resolver.DefaultSchemaResolver.handleAutoCreateArtifact(DefaultSchemaResolver.java:234)
at io.apicurio.registry.resolver.DefaultSchemaResolver.getSchemaFromRegistry(DefaultSchemaResolver.java:115)
at io.apicurio.registry.resolver.DefaultSchemaResolver.resolveSchema(DefaultSchemaResolver.java:88)
at io.apicurio.registry.utils.converter.ExtJsonConverter.fromConnectData(ExtJsonConverter.java:97)
at org.apache.kafka.connect.runtime.AbstractWorkerSourceTask.lambda$convertTransformedRecord$5(AbstractWorkerSourceTask.java:474)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:173)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:207)
... 12 more
2022-12-12 07:46:56,548 INFO || Stopping down connector [io.debezium.connector.common.BaseSourceTask]
2022-12-12 07:46:56,585 INFO Postgres|postgres|streaming WAL resume position 'null' discovered [io.debezium.connector.postgresql.PostgresStreamingChangeEventSource]
2022-12-12 07:46:56,588 INFO Postgres|postgres|streaming Connection gracefully closed [io.debezium.jdbc.JdbcConnection]
2022-12-12 07:46:56,691 INFO Postgres|postgres|streaming Connection gracefully closed [io.debezium.jdbc.JdbcConnection]
Actually it was Apicurio error. The error was all the container was running in one network in which Apicurio was not added. I added Apicurio in the network , then it got resolved.

Kafka-Kinesis-Connector Commit of offsets threw an unexpected exception for sequence number: null

I have a java project which is using Kafka-Kinesis-Connector as a connector used with Kafka Connect to publish messages from Kafka to Amazon Kinesis Stream which in turn triggers Lambda. The service we had was using kafka kinesis client 1.7.3 lib and amazon kinesis producer v 0.14.7. In order to migrate to latest version of AWS SDK Java V1 , these two libs were updated to versions 1.14.8 and
[2022-10-16 21:59:15,481] ERROR WorkerSinkTask{id=fm-sync-loads-analytics-nprod-test-01-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted. Error: The child process has been shutdown and can no longer accept messages. (org.apache.kafka.connect.runtime.WorkerSinkTask)
com.amazonaws.services.kinesis.producer.DaemonException: The child process has been shutdown and can no longer accept messages.
at com.amazonaws.services.kinesis.producer.Daemon.add(Daemon.java:173)
at com.amazonaws.services.kinesis.producer.KinesisProducer.addUserRecord(KinesisProducer.java:625)
at com.amazonaws.services.kinesis.producer.KinesisProducer.addUserRecord(KinesisProducer.java:535)
at com.amazonaws.services.kinesis.producer.KinesisProducer.addUserRecord(KinesisProducer.java:411)
at com.amazon.kinesis.kafka.AmazonKinesisSinkTask.addUserRecord(AmazonKinesisSinkTask.java:235)
at com.amazon.kinesis.kafka.AmazonKinesisSinkTask.put(AmazonKinesisSinkTask.java:143)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:545)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:325)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:228)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:200)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.base/java.lang.Thread.run(Thread.java:844)
[2022-10-16 21:59:15,483] WARN WorkerSinkTask{id=fm-sync-loads-analytics-nprod-test-01-0} Offset commit failed during close (org.apache.kafka.connect.runtime.WorkerSinkTask)
[2022-10-16 21:59:15,483] ERROR WorkerSinkTask{id=fm-sync-loads-analytics-nprod-test-01-0} Commit of offsets threw an unexpected exception for sequence number 1: null (org.apache.kafka.connect.runtime.WorkerSinkTask)
com.amazonaws.services.kinesis.producer.DaemonException: The child process has been shutdown and can no longer accept messages.
at com.amazonaws.services.kinesis.producer.Daemon.add(Daemon.java:173)
at com.amazonaws.services.kinesis.producer.KinesisProducer.flush(KinesisProducer.java:916)
at com.amazonaws.services.kinesis.producer.KinesisProducer.flush(KinesisProducer.java:936)
at com.amazonaws.services.kinesis.producer.KinesisProducer.flushSync(KinesisProducer.java:962)
at com.amazon.kinesis.kafka.AmazonKinesisSinkTask.lambda$flush$0(AmazonKinesisSinkTask.java:108)
at java.base/java.util.HashMap$Values.forEach(HashMap.java:981)
at com.amazon.kinesis.kafka.AmazonKinesisSinkTask.flush(AmazonKinesisSinkTask.java:106)
at org.apache.kafka.connect.sink.SinkTask.preCommit(SinkTask.java:125)
at org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:382)
at org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions(WorkerSinkTask.java:597)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.base/java.lang.Thread.run(Thread.java:844)
[2022-10-16 21:59:16,452] ERROR WorkerSinkTask{id=fm-sync-loads-analytics-nprod-test-01-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:567)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:325)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:228)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:200)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: com.amazonaws.services.kinesis.producer.DaemonException: The child process has been shutdown and can no longer accept messages.

Connection refused to Schema Registry

I have installed the new version of confluent i.e 5.4 and since after that I am unable to connect to the confluent, my schema registry also gets terminated untimely.
Today when I started the confluent and tried to produce the data, I recieved the following error:
2020-03-05 12:25:00,453] ERROR Failed to send HTTP request to endpoint: http://localhost:8081/subjects/avro-key/versions (io.confluent.kafka.schemaregistry.client.rest.RestService:245)
java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.Socket.connect(Socket.java:609)
at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569)
at java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:341)
at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:362)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1248)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)
at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1015)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1362)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1337)
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:241)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:322)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:422)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:414)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:400)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.registerAndGetId(CachedSchemaRegistryClient.java:140)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:196)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:172)
at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:71)
at io.confluent.kafka.formatter.AvroMessageReader.readMessage(AvroMessageReader.java:199)
at kafka.tools.ConsoleProducer$.main(ConsoleProducer.scala:55)
at kafka.tools.ConsoleProducer.main(ConsoleProducer.scala)'
Updated the question with the Schema-registry logs:
INFO Logging initialized #865ms to org.eclipse.jetty.util.log.Slf4jLog (org.eclipse.jetty.util.log:169)
[2020-03-09 12:35:51,851] INFO Adding listener: http://0.0.0.0:8081 (io.confluent.rest.ApplicationServer:316)
[2020-03-09 12:35:52,366] INFO Created schema registry namespace localhost:2181 /schema_registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryConfig:709)
[2020-03-09 12:35:53,329] INFO Initializing KafkaStore with broker endpoints: PLAINTEXT://LAP-LIN-897:9092 (io.confluent.kafka.schemaregistry.storage.KafkaStore:108)
[2020-03-09 12:38:03,215] ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication:77)
io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryInitializationException: Error initializing kafka store while initializing schema registry
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:248)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.initSchemaRegistry(SchemaRegistryRestApplication.java:75)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.configureBaseApplication(SchemaRegistryRestApplication.java:90)
at io.confluent.rest.Application.configureHandler(Application.java:217)
at io.confluent.rest.ApplicationServer.doStart(ApplicationServer.java:185)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:43)
Caused by: io.confluent.kafka.schemaregistry.storage.exceptions.StoreInitializationException: Timed out trying to create or validate schema topic configuration
at io.confluent.kafka.schemaregistry.storage.KafkaStore.createOrVerifySchemaTopic(KafkaStore.java:177)
at io.confluent.kafka.schemaregistry.storage.KafkaStore.init(KafkaStore.java:119)
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:246)
... 6 more
Caused by: java.util.concurrent.TimeoutException
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:108)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
at io.confluent.kafka.schemaregistry.storage.KafkaStore.createOrVerifySchemaTopic(KafkaStore.java:170)
... 8 more
'

Postgres JDBC client getting stuck at reading from socket

I have a PostGIS database and a client built on top of HikariCP to read the data from a database. My client can read the data without any problem on some machines. However, on some other machines client gets stuck and is not able to read any data throwing socket timeout exceptions.
MyClass:120 - Failed to execute HikariProxyPreparedStatement#2091541230 wrapping <my-query>.
org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:332)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:155)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118)
at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
...
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:140)
at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:109)
at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:67)
at org.postgresql.core.PGStream.receiveChar(PGStream.java:293)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1947)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:306)
... 32 more
ProxyConnection:161 - HikariPool-1 - Connection org.postgresql.jdbc.PgConnection#1aafd32f marked as broken because of SQLSTATE(08006), ErrorCode(0)
org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:332)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:155)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118)
at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
...
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:140)
at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:109)
at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:67)
at org.postgresql.core.PGStream.receiveChar(PGStream.java:293)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1947)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:306)
... 31 more
Before client throws SocketTimeoutException on the database side, I monitored pg_stat_activity table. The corresponding row for the query above had wait_event_type=Client and wait_event=ClientWrite. In addition, database server logged messages indicating connection is lost.
LOG: unexpected EOF on client connection with an open transaction
LOG: could not send data to client: Connection timed out
FATAL: connection to client lost
Versions
PostGIS-jdbc: 2.2.1 (postgresql jdbc: 9.4.1208.jre7)
HikariCP: 3.1.0
Postgres server: 10.3
PostGIS server: 2.4.4
If I don't set socketTimeout through jdbc connection string, then connection would get stuck forever. Once the connection reaches it's max life time, it would be dropped and connected again. However, it still cannot read the data. When I set socketTimeout, then exception would be thrown.
UPDATE
If socketTimeout is not set, then pg_stat_activity table would have a row for the connection with the following values: state=idle in transaction, wait_event_type=Client and wait_event=ClientRead.
My guess is that some sort of network setting is blocking the read from the server on the client side. How can I further debug this and find the root cause?
We found out that this was caused by the database server's MTU setting. MTU was set to 9000 by default and resulted in packet loss. Changing it to 1500 resolved the issue.

Schema Registry won't start after upgrading to Confluent 4.1

I have recently upgraded Confluent to 4.1 but schema registry seems to have some issues. On confluent start schema-registry (and consequently ksql-server) cannot start.
Here's the error I get in the logs of schema-registry:
[2018-04-20 11:27:38,426] ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication:65)
io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryInitializationException: Error initializing kafka store while initializing schema registry
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:203)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.setupResources(SchemaRegistryRestApplication.java:63)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.setupResources(SchemaRegistryRestApplication.java:41)
at io.confluent.rest.Application.createServer(Application.java:165)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:43)
Caused by: io.confluent.kafka.schemaregistry.storage.exceptions.StoreInitializationException: io.confluent.kafka.schemaregistry.storage.exceptions.StoreException: Failed to write Noop record to kafka store.
at io.confluent.kafka.schemaregistry.storage.KafkaStore.init(KafkaStore.java:139)
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:201)
... 4 more
Caused by: io.confluent.kafka.schemaregistry.storage.exceptions.StoreException: Failed to write Noop record to kafka store.
at io.confluent.kafka.schemaregistry.storage.KafkaStore.getLatestOffset(KafkaStore.java:423)
at io.confluent.kafka.schemaregistry.storage.KafkaStore.waitUntilKafkaReaderReachesLastOffset(KafkaStore.java:276)
at io.confluent.kafka.schemaregistry.storage.KafkaStore.init(KafkaStore.java:137)
... 5 more
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:94)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:77)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:29)
at io.confluent.kafka.schemaregistry.storage.KafkaStore.getLatestOffset(KafkaStore.java:418)
... 7 more
Caused by: org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.
[2018-04-20 11:27:38,430] INFO Shutting down schema registry (io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry:726)
[2018-04-20 11:27:38,430] INFO [kafka-store-reader-thread-_schemas]: Shutting down (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread:66)
[2018-04-20 11:27:38,431] INFO [kafka-store-reader-thread-_schemas]: Stopped (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread:66)
[2018-04-20 11:27:38,440] INFO [kafka-store-reader-thread-_schemas]: Shutdown completed (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread:66)
[2018-04-20 11:27:38,446] INFO KafkaStoreReaderThread shutdown complete. (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread:227)
I have no clue why this error is reported and the error messages are not that meaningful to me.
After failing, confluent start schema-registry and confluent start ksql-server bring both services up but when starting KSQL I get the following warning:
**************** WARNING ******************
Remote server address may not be valid:
Error issuing GET to KSQL server
Caused by: java.net.ConnectException: Connection refused (Connection refused)
Caused by: Could not connect to the server.
*******************************************
When trying to run a command (e.g. show tables;) the following error is reported:
ksql> show tables;
Error issuing POST to KSQL server
Caused by: java.net.ConnectException: Connection refused (Connection refused)
Caused by: Could not connect to the server.
EDIT: I've fixed this by destroying current run (confluent destroy) but it would be interesting if someone could explain this issue.
From the info you've posted it feels like you may have had some zombie processes or bad data somewhere, though I can't be sure.
The Schema Registry was complaining that it couldn't write a message to Kafka, because the Kafka broker was complaining that it didn't own the topic partition the Schema Registry was writing to. This might of been caused by a previous Kafka broker, (from the old install), still running.
Did you confluent stop before upgrading?
Using confluent destroy, as you did, to flatted/reset the installation is always a good option, as long as you're not precious about your data. Checking for spurious processes, (or using the old 'reboot machine' trick), can also be a good place to start when things aren't behaving as you'd expect.
Glad its all sorted now :D
Andy