wildfly failed to start up due to exception in infinispan - wildfly

We are seeing the following error that is only happening few testbeds:
18-Jan-2023 15:26:15,846 CST WARN [TCP] (TQ-Bundler-7,ejb,cdada7bd-7d38-41d0-afa9-9b820c587a29) JGRP000032: cdada7bd-7d38-41d0-afa9-9b820c587a29: no physical address for f4315409-4d0f-148d-8d7d-8fbc74f11179, dropping message
18-Jan-2023 15:26:20,774 CST WARN [ClusterTopologyManagerImpl] (MSC service thread 1-5) ISPN000329: Unable to read rebalancing status from coordinator 8096d1bd-2e66-4fd0-9a98-ac78dfd9d171: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 10 from 8096d1bd-2e66-4fd0-9a98-ac78dfd9d171
at org.inf...#9.4.18.Final//org.infinispan.remoting.transport.impl.SingleTargetRequest.onTimeout(SingleTargetRequest.java:65)
at org.inf...#9.4.18.Final//org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
at org.inf...#9.4.18.Final//org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
we also noticed that sometimes this issue went away after reboot. But other times it stayed the same.
The wildfly version is 19.1.0.
Can someone shed some light on this issue?
Thanks.

Related

Debezium connector fails after "Searching for WAL resume position"

I'm using debezium postgres connector to capture few tables' record in kafka. The debezium connector closed after "Searching for WAL resume position". I've give the error below. Any help would be greatly appreciated.
2022-12-12 07:46:51,834 INFO Postgres|postgres|streaming Searching for WAL resume position [io.debezium.connector.postgresql.PostgresStreamingChangeEventSource]
2022-12-12 07:46:56,548 ERROR || WorkerSourceTask{id=ksqldb-connector-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted [org.apache.kafka.connect.runtime.WorkerTask]
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:223)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:149)
at org.apache.kafka.connect.runtime.AbstractWorkerSourceTask.convertTransformedRecord(AbstractWorkerSourceTask.java:474)
at org.apache.kafka.connect.runtime.AbstractWorkerSourceTask.sendRecords(AbstractWorkerSourceTask.java:387)
at org.apache.kafka.connect.runtime.AbstractWorkerSourceTask.execute(AbstractWorkerSourceTask.java:354)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:189)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:244)
at org.apache.kafka.connect.runtime.AbstractWorkerSourceTask.run(AbstractWorkerSourceTask.java:72)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: io.apicurio.registry.rest.client.exception.RestClientException
at io.apicurio.registry.rest.client.impl.ErrorHandler.parseError(ErrorHandler.java:95)
at io.apicurio.rest.client.JdkHttpClient.sendRequest(JdkHttpClient.java:205)
at io.apicurio.registry.rest.client.impl.RegistryClientImpl.createArtifact(RegistryClientImpl.java:240)
at io.apicurio.registry.rest.client.RegistryClient.createArtifact(RegistryClient.java:143)
at io.apicurio.registry.resolver.DefaultSchemaResolver.lambda$handleAutoCreateArtifact$2(DefaultSchemaResolver.java:236)
at io.apicurio.registry.resolver.ERCache.lambda$getValue$0(ERCache.java:142)
at io.apicurio.registry.resolver.ERCache.retry(ERCache.java:181)
at io.apicurio.registry.resolver.ERCache.getValue(ERCache.java:141)
at io.apicurio.registry.resolver.ERCache.getByContent(ERCache.java:121)
at io.apicurio.registry.resolver.DefaultSchemaResolver.handleAutoCreateArtifact(DefaultSchemaResolver.java:234)
at io.apicurio.registry.resolver.DefaultSchemaResolver.getSchemaFromRegistry(DefaultSchemaResolver.java:115)
at io.apicurio.registry.resolver.DefaultSchemaResolver.resolveSchema(DefaultSchemaResolver.java:88)
at io.apicurio.registry.utils.converter.ExtJsonConverter.fromConnectData(ExtJsonConverter.java:97)
at org.apache.kafka.connect.runtime.AbstractWorkerSourceTask.lambda$convertTransformedRecord$5(AbstractWorkerSourceTask.java:474)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:173)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:207)
... 12 more
2022-12-12 07:46:56,548 INFO || Stopping down connector [io.debezium.connector.common.BaseSourceTask]
2022-12-12 07:46:56,585 INFO Postgres|postgres|streaming WAL resume position 'null' discovered [io.debezium.connector.postgresql.PostgresStreamingChangeEventSource]
2022-12-12 07:46:56,588 INFO Postgres|postgres|streaming Connection gracefully closed [io.debezium.jdbc.JdbcConnection]
2022-12-12 07:46:56,691 INFO Postgres|postgres|streaming Connection gracefully closed [io.debezium.jdbc.JdbcConnection]
Actually it was Apicurio error. The error was all the container was running in one network in which Apicurio was not added. I added Apicurio in the network , then it got resolved.

Why does the message "The Critical Analyzer detected slow paths on the broker" mean in Artemis broker?

Setup: I have an artemis broker HA cluster with 3 brokers. The replication policy is replication. Each broker is running in its own VM.
Problem: When I leave my brokers running for long time, usually after 5-6 hours, I get the below error.
2022-11-21 21:32:37,902 WARN
[org.apache.activemq.artemis.utils.critical.CriticalMeasure] Component
org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager
is expired on path 0 2022-11-21 21:32:37,902 INFO
[org.apache.activemq.artemis.core.server] AMQ224107: The Critical
Analyzer detected slow paths on the broker. It is recommended that
you enable trace logs on org.apache.activemq.artemis.utils.critical
while you troubleshoot this issue. You should disable the trace logs
when you have finished troubleshooting. 2022-11-21 21:32:37,902 ERROR
[org.apache.activemq.artemis.core.server] AMQ224079: The process for
the virtual machine will be killed, as component
org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager#46d59067
is not responsive 2022-11-21 21:32:37,969 WARN
[org.apache.activemq.artemis.core.server] AMQ222199: Thread dump:
******************************************************************************* Complete Thread dump "Thread-517
(ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$7#437da279)"
Id=602 TIMED_WAITING on
java.util.concurrent.SynchronousQueue$TransferStack#75f49105
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.SynchronousQueue$TransferStack#75f49105
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
What does this really mean? I understand that the critical analyzer sees an error and it halts the broker but what is causing this error?
You may take a look at the documentation. Basically you are experiencing some issue tat the broker detects and it shuts down before it becomes too irresponsive. Setting the policy to LOG you might get more clues on the issue.

Thingsboard AWS server freezes

I want to run a self managed Thingsboard on AWS (t2.micro).
I have installed Thingsboard CE on a t2.micro AWS instance running Ubuntu 20.04 sever.
I followed the aws setup and Ubuntu install guides(postgresql + built in queue service).
I also set up haproxy using this guide.
I was able to successfully log in to my Thingsboard. I only changed the passwords and checked the basic functionalities, but didn't create any new dashboards or made any modifications.
After this I left the computer on, running Thingsboard. Next day I could not reach Thingsboard and although the AWS instance was running I could not ssh into it anymore. After stopping and starting(reboot didn't work) the instance everything was ok (could ssh and Thingsboard was reachable).
I can reproduce this failure just by leaving the instance on, it seems that after serveral hours (5-8 hrs) Thingsboard(or something else not sure) fails which freezes the whole computer.
I have checked two things:
I checked CPU utiization on AWS monitoring.
It seems that after some hours there is a big jump in CPU load and then it drops back to almost zero. While Thingsboard is running, it is constant.See printscreen from AWS monitoring
I checked the Thingsboard logs (in /var/log/thingsboard):
There are some errors, but unfortunately most of the things are not enough for me guess what could be a problem with the fresh installation. Here are some lines from the log:
2021-11-12 00:21:59,626 [http-nio-0.0.0.0-8080-exec-13] INFO o.a.coyote.http11.Http11Processor - Error parsing HTTP request header
Note: further occurrences of HTTP request parsing errors will be logged at DEBUG level.
java.lang.IllegalArgumentException: Invalid character found in method name
[0x160x030x010x00{0x010x000x00w0x030x030x170xb80xb80xe50xef0x000xb50x0a&0x930x020x00:0xde0xd70xa00xab0xb
70x8bU0xc00x92r0x9330x10O0x8c<o0xf70xf90x000x000x1a0xc0/0xc0+0xc00x110xc00x070xc00x130xc00x090xc00x140xc00x0a0x000x050x00/0x0050xc00x120x000x0a0x010x000x0040x000x050x000x050x010x000
x000x000x000x000x0a0x000x080x000x060x000x170x000x180x000x190x000x0b0x000x020x010x000x000x0d0x000x100x000x0e0x040x010x040x030x020x010x020x030x040x010x050x010x060x010xff0x010x000x010x00...].
HTTP method names must be tokens
at org.apache.coyote.http11.Http11InputBuffer.parseRequestLine(Http11InputBuffer.java:417)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:261)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:893)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1707)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.base/java.lang.Thread.run(Thread.java:829)
2021-11-12 00:22:01,486 [sql-queue-2-ts-4-thread-1] WARN com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Failed to validate
connection org.postgresql.jdbc.PgConnection#4393afd0 (This connection
has been closed.). Possibly consider using a shorter maxLifetime
value.
2021-11-12 00:22:01,487 [sql-queue-2-ts latest-8-thread-1] WARN com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Failed to validate
connection org.postgresql.jdbc.PgConnection#75b9496b (This connection
has been closed.). Possibly consider using a shorter maxLifetime
value.
2021-11-12 00:22:01,487 [sql-queue-0-ts latest-6-thread-1] WARN com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Failed to validate
connection org.postgresql.jdbc.PgConnection#31849eec (This connection
has been closed.). Possibly consider using a shorter maxLifetime
value.
2021-11-12 00:22:01,487 [sql-queue-0-ts-2-thread-1] WARN com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Failed to validate
connection org.postgresql.jdbc.PgConnection#725fafe3 (This connection
has been closed.). Possibly consider using a shorter maxLifetime
value.
Some more:
2021-11-12 00:23:46,205 [sql-log-1-thread-1] INFO o.t.s.dao.sql.TbSqlBlockingQueue - Queue-2 [TS Latest] queueSize [9] totalAdded [0] totalSaved [0] totalFailed [0]
2021-11-12 00:23:47,741 [sql-queue-0-ts-2-thread-1] WARN o.h.e.jdbc.spi.SqlExceptionHelper - SQL Error: 0, SQLState: 08003
2021-11-12 00:23:47,742 [sql-queue-2-ts-4-thread-1] WARN o.h.e.jdbc.spi.SqlExceptionHelper - SQL Error: 0, SQLState: 08003
2021-11-12 00:23:47,742 [sql-queue-2-ts latest-8-thread-1] WARN o.h.e.jdbc.spi.SqlExceptionHelper - SQL Error: 0, SQLState: 08003
2021-11-12 00:23:47,742 [sql-queue-0-ts latest-6-thread-1] WARN o.h.e.jdbc.spi.SqlExceptionHelper - SQL Error: 0, SQLState: 08003
2021-11-12 00:23:48,022 [sql-queue-0-ts-2-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - HikariPool-1 - Connection is not available, request timed out after 634223ms.
2021-11-12 00:23:48,058 [sql-queue-0-ts-2-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - This connection has been closed.
2021-11-12 00:23:48,022 [sql-queue-0-ts latest-6-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - HikariPool-1 - Connection is not available, request timed out after 634223ms.
2021-11-12 00:23:48,059 [sql-queue-0-ts latest-6-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - This connection has been closed.
2021-11-12 00:23:48,022 [sql-queue-2-ts latest-8-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - HikariPool-1 - Connection is not available, request timed out after 624177ms.
2021-11-12 00:23:48,059 [sql-queue-2-ts latest-8-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - This connection has been closed.
2021-11-12 00:23:48,023 [sql-queue-2-ts-4-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - HikariPool-1 - Connection is not available, request timed out after 627819ms.
2021-11-12 00:23:48,059 [sql-queue-2-ts-4-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - This connection has been closed.
At the last:
2021-11-12 00:33:10,919 [sql-queue-0-ts latest-6-thread-1] ERROR o.t.s.dao.sql.TbSqlBlockingQueue - [TS Latest] Failed to save 1 entities
org.springframework.transaction.CannotCreateTransactionException: Could not open JPA EntityManager for transaction; nested exception is org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:448)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.startTransaction(AbstractPlatformTransactionManager.java:400)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:373)
at org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(TransactionAspectSupport.java:574)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:361)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:118)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:692)
at org.thingsboard.server.dao.sqlts.insert.latest.psql.PsqlLatestInsertTsRepository$$EnhancerBySpringCGLIB$$381b448c.saveOrUpdate(<generated>)
at org.thingsboard.server.dao.sqlts.SqlTimeseriesLatestDao.lambda$init$3(SqlTimeseriesLatestDao.java:133)
at org.thingsboard.server.dao.sql.TbSqlBlockingQueue.lambda$init$2(TbSqlBlockingQueue.java:71)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:48)
at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:42)
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:113)
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:99)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:111)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getPhysicalConnection(LogicalConnectionManagedImpl.java:138)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getConnectionForTransactionManagement(LogicalConnectionManagedImpl.java:276)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.begin(LogicalConnectionManagedImpl.java:284)
at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.begin(JdbcResourceLocalTransactionCoordinatorImpl.java:246)
at org.hibernate.engine.transaction.internal.TransactionImpl.begin(TransactionImpl.java:83)
at org.springframework.orm.jpa.vendor.HibernateJpaDialect.beginTransaction(HibernateJpaDialect.java:184)
at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:402)
... 16 common frames omitted
Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 634223ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:695)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:197)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:162)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:128)
at org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection(DatasourceConnectionProviderImpl.java:122)
at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:38)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:108)
... 23 common frames omitted
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:877)
at org.postgresql.jdbc.PgConnection.setNetworkTimeout(PgConnection.java:1610)
at com.zaxxer.hikari.pool.PoolBase.setNetworkTimeout(PoolBase.java:560)
at com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:173)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186)
... 28 common frames omitted
What is interesting that the timestaps on the CPU load going to max don't precisely correlate with the error messages in the log.
I apologise for the long error messages, but right now I don't know what could be the root cause.
I haven't tried to reinstall the whole computer yet.
My question would be, how should I proceed? Does anyone have ever faced similar issues? What logs/services/etc. should I check to grasp the root cause?
Should I try using a machine with more resources? Should I try other database and queue service?
In the current form this Thingsboard instance is not stable even for tests.
Edit: Sorry I could not format properly the first part of the error code.
Edit2: First link was wrong.
Here are some points:
It looks like the operating system run out of memory and become unresponsive. To fix the issue try to manage Java heap memory
For 4Gb instance, this Java heap limit JAVA_OPTS="$JAVA_OPTS -Xms1024M -Xmx1024M" may be useful, because Java uses some non-heap memory as well, PostgreSQL and others require some memory to run.
t2 instances on AWS may slow down the whole thing by CPU throttling. Instances like c6 or m5 are better options by performance.
In-memory queues may result in out-of-memory issues and data loss in case of a heavy message rate or some processing congestion due to the third party. Consider using Kafka to make your installation much stable and solid.
After I increased the RAM to 4GB (from 1GB) the ThingsBoard server is up without problem. There are no sporadic freezes anyomore. As there were no other provable suggestions for the problem and now my system works without a problem I consider the question answered.

windows kafka java.nio.file.FileSystemException

Very frequently error in windows server 2012
kafka verson 2.3.1
the error log
[2019-12-05 03:57:51,567] ERROR Uncaught exception in scheduled task 'kafka-log-retention' (kafka.utils.KafkaScheduler)
org.apache.kafka.common.errors.KafkaStorageException: Error while deleting segments for MetadataLog-0 in dir D:\GpsPlatform\kafka\.\tmp\kafka-logs
Caused by: java.nio.file.FileSystemException: D:\GpsPlatform\kafka\.\tmp\kafka-logs\MetadataLog-0\00000000000003368617.index -> D:\GpsPlatform\kafka\.\tmp\kafka-logs\MetadataLog-0\00000000000003368617.index.deleted: 另一个程序正在使用此文件,进程无法访问。
at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:395)
at java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292)
at java.base/java.nio.file.Files.move(Files.java:1425)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:815)
at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:209)
at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:509)
at kafka.log.Log.asyncDeleteSegment(Log.scala:1982)
at kafka.log.Log.deleteSegment(Log.scala:1967)
at kafka.log.Log.$anonfun$deleteSegments$3(Log.scala:1493)
at kafka.log.Log.$anonfun$deleteSegments$3$adapted(Log.scala:1493)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1493)
at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
at kafka.log.Log.maybeHandleIOException(Log.scala:2085)
at kafka.log.Log.deleteSegments(Log.scala:1484)
at kafka.log.Log.deleteOldSegments(Log.scala:1479)
at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1557)
at kafka.log.Log.deleteOldSegments(Log.scala:1547)
at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:914)
at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:911)
at scala.collection.immutable.List.foreach(List.scala:392)
at kafka.log.LogManager.cleanupLogs(LogManager.scala:911)
at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:395)
at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:830)
Suppressed: java.nio.file.FileSystemException: D:\GpsPlatform\kafka\.\tmp\kafka-logs\MetadataLog-0\00000000000003368617.index -> D:\GpsPlatform\kafka\.\tmp\kafka-logs\MetadataLog-0\00000000000003368617.index.deleted: 另一个程序正在使用此文件,进程无法访问。
at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:309)
at java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292)
at java.base/java.nio.file.Files.move(Files.java:1425)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:812)
... 29 more
after running for a period of time, a similar exception will be reported, causing Kafka to crash. How to completely resolve this exception?
If you have to use kafka in windows environment. You have to disable log retention.
In Kafka server.properties
log.retention.hours=-1
log.cleaner.enable=false
# Remove any other rows start from log.retention.*
To run Kafka on Windows it's recommended to do so using WSL2 as detailed here. Otherwise you encounter the kind of problems described above.

Root cause for Connection broken for id 1, my id = 3, error =

I am using Confluent 4 for kafka and zookeeper installation.
On our Kafka Cluster environment (of 3 brokers and 3 zookeeper nodes running on 3 aws instances)
we are seeing a set of below warnings, repeatedly getting recorded in the broker's server.log file.
We have not observed any functionality issues due to this yet, but we are not able to find the root cause and there may be a chance in future it will affect the clients or other broker nodes. We are not sure yet about this. Below is the set of warnings
[2018-04-03 12:00:40,707] WARN Interrupted while waiting for message on queue (org.apache.zookeeper.server.quorum.QuorumCnxManager)
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1097)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:74)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:932)
[2018-04-03 12:00:40,707] WARN Connection broken for id 1, my id = 3, error = (org.apache.zookeeper.server.quorum.QuorumCnxManager)
java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1013)
[2018-04-03 12:00:40,708] WARN Interrupting SendWorker (org.apache.zookeeper.server.quorum.QuorumCnxManager)
[2018-04-03 12:00:40,707] WARN Send worker leaving thread (org.apache.zookeeper.server.quorum.QuorumCnxManager)
This set of warnings get repeated and getting observed in all 3 kafka nodes.
If anyone has any idea about why this warning gets generate, then please let me know.
Thanks in advance.
This sounds like a known issue with newer version of Zk, Check out this JIRA https://issues.apache.org/jira/browse/ZOOKEEPER-2938
In my case, I was replacing a ZK node and the old one was still running which I didn't realize. So I had created 2x nodes with the same "myid".