Thingsboard AWS server freezes - postgresql

I want to run a self managed Thingsboard on AWS (t2.micro).
I have installed Thingsboard CE on a t2.micro AWS instance running Ubuntu 20.04 sever.
I followed the aws setup and Ubuntu install guides(postgresql + built in queue service).
I also set up haproxy using this guide.
I was able to successfully log in to my Thingsboard. I only changed the passwords and checked the basic functionalities, but didn't create any new dashboards or made any modifications.
After this I left the computer on, running Thingsboard. Next day I could not reach Thingsboard and although the AWS instance was running I could not ssh into it anymore. After stopping and starting(reboot didn't work) the instance everything was ok (could ssh and Thingsboard was reachable).
I can reproduce this failure just by leaving the instance on, it seems that after serveral hours (5-8 hrs) Thingsboard(or something else not sure) fails which freezes the whole computer.
I have checked two things:
I checked CPU utiization on AWS monitoring.
It seems that after some hours there is a big jump in CPU load and then it drops back to almost zero. While Thingsboard is running, it is constant.See printscreen from AWS monitoring
I checked the Thingsboard logs (in /var/log/thingsboard):
There are some errors, but unfortunately most of the things are not enough for me guess what could be a problem with the fresh installation. Here are some lines from the log:
2021-11-12 00:21:59,626 [http-nio-0.0.0.0-8080-exec-13] INFO o.a.coyote.http11.Http11Processor - Error parsing HTTP request header
Note: further occurrences of HTTP request parsing errors will be logged at DEBUG level.
java.lang.IllegalArgumentException: Invalid character found in method name
[0x160x030x010x00{0x010x000x00w0x030x030x170xb80xb80xe50xef0x000xb50x0a&0x930x020x00:0xde0xd70xa00xab0xb
70x8bU0xc00x92r0x9330x10O0x8c<o0xf70xf90x000x000x1a0xc0/0xc0+0xc00x110xc00x070xc00x130xc00x090xc00x140xc00x0a0x000x050x00/0x0050xc00x120x000x0a0x010x000x0040x000x050x000x050x010x000
x000x000x000x000x0a0x000x080x000x060x000x170x000x180x000x190x000x0b0x000x020x010x000x000x0d0x000x100x000x0e0x040x010x040x030x020x010x020x030x040x010x050x010x060x010xff0x010x000x010x00...].
HTTP method names must be tokens
at org.apache.coyote.http11.Http11InputBuffer.parseRequestLine(Http11InputBuffer.java:417)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:261)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:893)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1707)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.base/java.lang.Thread.run(Thread.java:829)
2021-11-12 00:22:01,486 [sql-queue-2-ts-4-thread-1] WARN com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Failed to validate
connection org.postgresql.jdbc.PgConnection#4393afd0 (This connection
has been closed.). Possibly consider using a shorter maxLifetime
value.
2021-11-12 00:22:01,487 [sql-queue-2-ts latest-8-thread-1] WARN com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Failed to validate
connection org.postgresql.jdbc.PgConnection#75b9496b (This connection
has been closed.). Possibly consider using a shorter maxLifetime
value.
2021-11-12 00:22:01,487 [sql-queue-0-ts latest-6-thread-1] WARN com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Failed to validate
connection org.postgresql.jdbc.PgConnection#31849eec (This connection
has been closed.). Possibly consider using a shorter maxLifetime
value.
2021-11-12 00:22:01,487 [sql-queue-0-ts-2-thread-1] WARN com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Failed to validate
connection org.postgresql.jdbc.PgConnection#725fafe3 (This connection
has been closed.). Possibly consider using a shorter maxLifetime
value.
Some more:
2021-11-12 00:23:46,205 [sql-log-1-thread-1] INFO o.t.s.dao.sql.TbSqlBlockingQueue - Queue-2 [TS Latest] queueSize [9] totalAdded [0] totalSaved [0] totalFailed [0]
2021-11-12 00:23:47,741 [sql-queue-0-ts-2-thread-1] WARN o.h.e.jdbc.spi.SqlExceptionHelper - SQL Error: 0, SQLState: 08003
2021-11-12 00:23:47,742 [sql-queue-2-ts-4-thread-1] WARN o.h.e.jdbc.spi.SqlExceptionHelper - SQL Error: 0, SQLState: 08003
2021-11-12 00:23:47,742 [sql-queue-2-ts latest-8-thread-1] WARN o.h.e.jdbc.spi.SqlExceptionHelper - SQL Error: 0, SQLState: 08003
2021-11-12 00:23:47,742 [sql-queue-0-ts latest-6-thread-1] WARN o.h.e.jdbc.spi.SqlExceptionHelper - SQL Error: 0, SQLState: 08003
2021-11-12 00:23:48,022 [sql-queue-0-ts-2-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - HikariPool-1 - Connection is not available, request timed out after 634223ms.
2021-11-12 00:23:48,058 [sql-queue-0-ts-2-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - This connection has been closed.
2021-11-12 00:23:48,022 [sql-queue-0-ts latest-6-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - HikariPool-1 - Connection is not available, request timed out after 634223ms.
2021-11-12 00:23:48,059 [sql-queue-0-ts latest-6-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - This connection has been closed.
2021-11-12 00:23:48,022 [sql-queue-2-ts latest-8-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - HikariPool-1 - Connection is not available, request timed out after 624177ms.
2021-11-12 00:23:48,059 [sql-queue-2-ts latest-8-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - This connection has been closed.
2021-11-12 00:23:48,023 [sql-queue-2-ts-4-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - HikariPool-1 - Connection is not available, request timed out after 627819ms.
2021-11-12 00:23:48,059 [sql-queue-2-ts-4-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - This connection has been closed.
At the last:
2021-11-12 00:33:10,919 [sql-queue-0-ts latest-6-thread-1] ERROR o.t.s.dao.sql.TbSqlBlockingQueue - [TS Latest] Failed to save 1 entities
org.springframework.transaction.CannotCreateTransactionException: Could not open JPA EntityManager for transaction; nested exception is org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:448)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.startTransaction(AbstractPlatformTransactionManager.java:400)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:373)
at org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(TransactionAspectSupport.java:574)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:361)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:118)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:692)
at org.thingsboard.server.dao.sqlts.insert.latest.psql.PsqlLatestInsertTsRepository$$EnhancerBySpringCGLIB$$381b448c.saveOrUpdate(<generated>)
at org.thingsboard.server.dao.sqlts.SqlTimeseriesLatestDao.lambda$init$3(SqlTimeseriesLatestDao.java:133)
at org.thingsboard.server.dao.sql.TbSqlBlockingQueue.lambda$init$2(TbSqlBlockingQueue.java:71)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:48)
at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:42)
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:113)
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:99)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:111)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getPhysicalConnection(LogicalConnectionManagedImpl.java:138)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getConnectionForTransactionManagement(LogicalConnectionManagedImpl.java:276)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.begin(LogicalConnectionManagedImpl.java:284)
at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.begin(JdbcResourceLocalTransactionCoordinatorImpl.java:246)
at org.hibernate.engine.transaction.internal.TransactionImpl.begin(TransactionImpl.java:83)
at org.springframework.orm.jpa.vendor.HibernateJpaDialect.beginTransaction(HibernateJpaDialect.java:184)
at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:402)
... 16 common frames omitted
Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 634223ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:695)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:197)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:162)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:128)
at org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection(DatasourceConnectionProviderImpl.java:122)
at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:38)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:108)
... 23 common frames omitted
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:877)
at org.postgresql.jdbc.PgConnection.setNetworkTimeout(PgConnection.java:1610)
at com.zaxxer.hikari.pool.PoolBase.setNetworkTimeout(PoolBase.java:560)
at com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:173)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186)
... 28 common frames omitted
What is interesting that the timestaps on the CPU load going to max don't precisely correlate with the error messages in the log.
I apologise for the long error messages, but right now I don't know what could be the root cause.
I haven't tried to reinstall the whole computer yet.
My question would be, how should I proceed? Does anyone have ever faced similar issues? What logs/services/etc. should I check to grasp the root cause?
Should I try using a machine with more resources? Should I try other database and queue service?
In the current form this Thingsboard instance is not stable even for tests.
Edit: Sorry I could not format properly the first part of the error code.
Edit2: First link was wrong.

Here are some points:
It looks like the operating system run out of memory and become unresponsive. To fix the issue try to manage Java heap memory
For 4Gb instance, this Java heap limit JAVA_OPTS="$JAVA_OPTS -Xms1024M -Xmx1024M" may be useful, because Java uses some non-heap memory as well, PostgreSQL and others require some memory to run.
t2 instances on AWS may slow down the whole thing by CPU throttling. Instances like c6 or m5 are better options by performance.
In-memory queues may result in out-of-memory issues and data loss in case of a heavy message rate or some processing congestion due to the third party. Consider using Kafka to make your installation much stable and solid.

After I increased the RAM to 4GB (from 1GB) the ThingsBoard server is up without problem. There are no sporadic freezes anyomore. As there were no other provable suggestions for the problem and now my system works without a problem I consider the question answered.

Related

wildfly failed to start up due to exception in infinispan

We are seeing the following error that is only happening few testbeds:
18-Jan-2023 15:26:15,846 CST WARN [TCP] (TQ-Bundler-7,ejb,cdada7bd-7d38-41d0-afa9-9b820c587a29) JGRP000032: cdada7bd-7d38-41d0-afa9-9b820c587a29: no physical address for f4315409-4d0f-148d-8d7d-8fbc74f11179, dropping message
18-Jan-2023 15:26:20,774 CST WARN [ClusterTopologyManagerImpl] (MSC service thread 1-5) ISPN000329: Unable to read rebalancing status from coordinator 8096d1bd-2e66-4fd0-9a98-ac78dfd9d171: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 10 from 8096d1bd-2e66-4fd0-9a98-ac78dfd9d171
at org.inf...#9.4.18.Final//org.infinispan.remoting.transport.impl.SingleTargetRequest.onTimeout(SingleTargetRequest.java:65)
at org.inf...#9.4.18.Final//org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
at org.inf...#9.4.18.Final//org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
we also noticed that sometimes this issue went away after reboot. But other times it stayed the same.
The wildfly version is 19.1.0.
Can someone shed some light on this issue?
Thanks.

Hikari Connection Pool - org.postgresql.util.PSQLException: This ResultSet is closed

I am trying to create Hikari connection pool with Postgres database.
Hikari version is 3.4.1 and postgres driver version is 42.2.5
My connection properties are as follows.
mdb.hcp.driverClassName=org.postgresql.Driver
mdb.hcp.username=user1
mdb.hcp.jdbcUrl=jdbc:postgresql://localhost:2001/db1?ssl=false
mdb.hcp.password=pass
mdb.hcp.maximumPoolSize=1
mdb.contractSchema=schema1
I am getting below exception.
06-12-2019 17:40:08.181 [main] INFO com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Driver does not support get/set network timeout for connections. (Method org.postgresql.jdbc4.Jdbc4Connection.getNetworkTimeout() is not yet implemented.)
06-12-2019 17:40:08.649 [main] WARN com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Default transaction isolation level detection failed (This ResultSet is closed.).
06-12-2019 17:40:08.656 [main] ERROR com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Error thrown while acquiring connection from data source
org.postgresql.util.PSQLException: This ResultSet is closed.
at org.postgresql.jdbc2.AbstractJdbc2ResultSet.checkClosed(AbstractJdbc2ResultSet.java:2852)
at org.postgresql.jdbc2.AbstractJdbc2ResultSet.setFetchSize(AbstractJdbc2ResultSet.java:1875)
at org.postgresql.jdbc4.Jdbc4Statement.createResultSet(Jdbc4Statement.java:37)
at org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler.handleResultRows(AbstractJdbc2Statement.java:221)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1853)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:561)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:405)
at org.postgresql.jdbc2.AbstractJdbc2Connection.execSQLUpdate(AbstractJdbc2Connection.java:382)
at org.postgresql.jdbc2.AbstractJdbc2Connection.getTransactionIsolation(AbstractJdbc2Connection.java:904)
at com.zaxxer.hikari.pool.PoolBase.checkDefaultIsolation(PoolBase.java:471)
at com.zaxxer.hikari.pool.PoolBase.checkDriverSupport(PoolBase.java:434)
at com.zaxxer.hikari.pool.PoolBase.setupConnection(PoolBase.java:402)
at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:355)
at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:201)
at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:473)
at com.zaxxer.hikari.pool.HikariPool.checkFailFast(HikariPool.java:562)
at com.zaxxer.hikari.pool.HikariPool.(HikariPool.java:115)
at com.zaxxer.hikari.HikariDataSource.(HikariDataSource.java:81)
Please help!
Try to switch to a newer postgres driver version https://mvnrepository.com/artifact/org.postgresql/postgresql

Postgres JDBC client getting stuck at reading from socket

I have a PostGIS database and a client built on top of HikariCP to read the data from a database. My client can read the data without any problem on some machines. However, on some other machines client gets stuck and is not able to read any data throwing socket timeout exceptions.
MyClass:120 - Failed to execute HikariProxyPreparedStatement#2091541230 wrapping <my-query>.
org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:332)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:155)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118)
at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
...
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:140)
at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:109)
at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:67)
at org.postgresql.core.PGStream.receiveChar(PGStream.java:293)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1947)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:306)
... 32 more
ProxyConnection:161 - HikariPool-1 - Connection org.postgresql.jdbc.PgConnection#1aafd32f marked as broken because of SQLSTATE(08006), ErrorCode(0)
org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:332)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:155)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118)
at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
...
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:140)
at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:109)
at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:67)
at org.postgresql.core.PGStream.receiveChar(PGStream.java:293)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1947)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:306)
... 31 more
Before client throws SocketTimeoutException on the database side, I monitored pg_stat_activity table. The corresponding row for the query above had wait_event_type=Client and wait_event=ClientWrite. In addition, database server logged messages indicating connection is lost.
LOG: unexpected EOF on client connection with an open transaction
LOG: could not send data to client: Connection timed out
FATAL: connection to client lost
Versions
PostGIS-jdbc: 2.2.1 (postgresql jdbc: 9.4.1208.jre7)
HikariCP: 3.1.0
Postgres server: 10.3
PostGIS server: 2.4.4
If I don't set socketTimeout through jdbc connection string, then connection would get stuck forever. Once the connection reaches it's max life time, it would be dropped and connected again. However, it still cannot read the data. When I set socketTimeout, then exception would be thrown.
UPDATE
If socketTimeout is not set, then pg_stat_activity table would have a row for the connection with the following values: state=idle in transaction, wait_event_type=Client and wait_event=ClientRead.
My guess is that some sort of network setting is blocking the read from the server on the client side. How can I further debug this and find the root cause?
We found out that this was caused by the database server's MTU setting. MTU was set to 9000 by default and resulted in packet loss. Changing it to 1500 resolved the issue.

Zookeeper unable to talk to new Kafka broker

In an attempt to reduce the storage on my AWS instance I decided to launch a new, smaller instance and setup Kafka again from scratch using the Ansible playbook we had from before. I then terminated the old, larger instance and took its IP address that it and the other brokers were using and put it on my new instance.
When tailing my Zookeeper logs however I'm receiving this error -
2018-04-13 14:17:34,884 [myid:2] - WARN [RecvWorker:1:QuorumCnxManager$RecvWorker#810] - Connection broken for id 1, my id = 2, error =
java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:153)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.net.SocketInputStream.read(SocketInputStream.java:211)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:795)
2018-04-13 14:17:34,885 [myid:2] - WARN [RecvWorker:1:QuorumCnxManager$RecvWorker#813] - Interrupting SendWorker
2018-04-13 14:17:34,884 [myid:2] - WARN [SendWorker:1:QuorumCnxManager$SendWorker#727] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:879)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:65)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:715)
I double checked and all 3 Kafka broker IP addresses are correctly listed in these location and I restarted all their services to be safe.
/etc/hosts
/etc/kafka/config/server.properties
/etc/zookeeper/conf/zoo.cfg
/etc/filebeat/filebeat.yml

Confluent schema registry fails on start with NoSuchMethodError

Exception in thread "main" java.lang.NoSuchMethodError: io.confluent.rest.Application.parseListeners(Ljava/util/List;ILjava/util/List;Ljava/lang/String;)Ljava/util/List;
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.getPortForIdentity(KafkaSchemaRegistry.java:204)
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.(KafkaSchemaRegistry.java:133)
etc/schema-registry/schema-registry.properties
listeners=http://0.0.0.0:8081
kafkastore.connection.url=localhost:2181
kafkastore.topic=_schemas
debug=false
kafka and zookeeper are already running.
Why logs from zookeeper keep on coming like
[2017-10-17 09:57:31,352] INFO Accepted socket connection from /13.**.**.***:39572 (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2017-10-17 09:57:31,352] WARN Exception causing close of session 0x0 due to java.io.EOFException (org.apache.zookeeper.server.NIOServerCnxn)
[2017-10-17 09:57:31,352] INFO Closed socket connection for client /13.58.108.150:39572 (no session established for client) (org.apache.zookeeper.server.NIOServerCnxn)
[2017-10-17 09:57:31,438] INFO Accepted socket connection from /13.**.**.***:39574 (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2017-10-17 09:57:31,438] WARN Exception causing close of session 0x0 due to java.io.EOFException (org.apache.zookeeper.server.NIOServerCnxn)
[2017-10-17 09:57:31,438] INFO Closed socket connection for client /13.**.***.**:39574 (no session established for client) (org.apache.zookeeper.server.NIOServerCnxn)
I was wondering maybe this will be the cause of failure for schema-registry.
Any suggestions.
NoSuchMethodError indicates your CLASSPATH is misconfigured.
It's not clear what version you're running or what OS you're using but Windows is not officially supported, and pater versions of Confluent Platform have likely fixed this, or using the Docker images should work as well
in my situation the problem was caused by hostname, check if hostname is equal to "localhost"
Problem "Scheme registry fail on start"
Test Solution "set Hostname to "localhost""
If this solve your problem, you can config permantly yout hostname:
modify file /etc/hostname