Apache Flink Zookeeper Checkpoint Conflict After Upgrade - apache-zookeeper

I recently upgraded an Apache Flink cluster from 1.3.2 to 1.4.2 and now I get following exception in Zookeeper Logs
2018-06-19 18:45:34,658 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#649] - Got user-level KeeperException when processing sessionid:0x363f3574f420001 type:create cxid:0x2284d zxid:0x900016e9d txntype:-1 reqpath:n/a Error Path:/flink/cluster_one/checkpoints/5e8ad58b9f2ef81b155d0e15b23d2365/0000000000000010305/81783429-1405-4609-9cdb-ce9dc95b8272 Error:KeeperErrorCode = NodeExists for /flink/cluster_one/checkpoints/5e8ad58b9f2ef81b155d0e15b23d2365/0000000000000010305/81783429-1405-4609-9cdb-ce9dc95b8272
Because of this the Apache Flink node where this is thrown is repeatedly kicked out of the cluster and then rejoins.
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://flink#tm03-dev:6124/user/taskmanager#-50864731]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.messages.JobManagerMessages$LeaderSessionMessage".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
... 1 more
06/22/2018 16:11:29 Job execution switched to status FAILING.
java.lang.Exception: Cannot deploy task Source: Custom Source -> StringToJSONEvents (1/1) (a78bb6a5d7c90a7b5cee785bc4ac2426) - TaskManager (37d2c90dd316c87974dda50e0b4525d6 # tm03-dev (dataPort=6125)) not responding after a timeout of 10000 ms
at org.apache.flink.runtime.executiongraph.Execution.lambda$deploy$3(Execution.java:529)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://flink#tm03-dev:6124/user/taskmanager#-50864731]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.messages.JobManagerMessages$LeaderSessionMessage".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
... 1 more
06/22/2018 16:11:29 FilterLoginFailedEvents -> Sink: SendRequestToService(1/1) switched to CANCELING
06/22/2018 16:11:29 FilterLoginFailedEvents -> Sink: SendRequestToService(1/1) switched to CANCELED
In TaskManager logs this keeps repeating
2018-06-22 15:59:45.168 [main-SendThread(ip-10-11-21-15.domain:2181)] DEBUG o.a.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0x163f3574f380005 after 0ms
2018-06-22 15:59:58.513 [main-SendThread(ip-10-11-21-15.domain:2181)] DEBUG o.a.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0x163f3574f380005 after 0ms
2018-06-22 16:00:10.658 [flink-akka.actor.default-dispatcher-8172] INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink#flink-jobmanager01-flink-dev.domain.com:26229/user/jobmanager (attempt 9305, timeout: 30000 milliseconds)
2018-06-22 16:00:11.858 [main-SendThread(ip-10-11-21-15.domain:2181)] DEBUG o.a.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0x163f3574f380005 after 0ms
2018-06-22 16:00:14.971 [flink-akka.actor.default-dispatcher-9313] DEBUG akka.remote.transport.ProtocolStateActor flink-akka.remote.default-remote-dispatcher-15 - Association between local [tcp://flink#tm03-flink-dev:25422] and remote [tcp://flink#flink-jobmanager01-flink-dev.domain.com:26229] was disassociated because the ProtocolStateActor failed: Unknown
2018-06-22 16:00:14.971 [flink-akka.actor.default-dispatcher-9313] DEBUG akka.remote.transport.ProtocolStateActor flink-akka.remote.default-remote-dispatcher-15 - Association between local [tcp://flink#tm03-flink-dev:25422] and remote [tcp://flink#flink-jobmanager01-flink-dev.domain.com:26229] was disassociated because the ProtocolStateActor failed: Unknown
2018-06-22 16:00:14.971 [flink-akka.actor.default-dispatcher-9313] WARN akka.remote.ReliableDeliverySupervisor flink-akka.remote.default-remote-dispatcher-15 - Association with remote system [akka.tcp://flink#flink-jobmanager01-flink-dev.domain.com:26229] has failed, address is now gated for [5000] ms. Reason: [Disassociated]
2018-06-22 16:00:14.971 [flink-akka.actor.default-dispatcher-9313] WARN akka.remote.ReliableDeliverySupervisor flink-akka.remote.default-remote-dispatcher-15 - Association with remote system [akka.tcp://flink#flink-jobmanager01-flink-dev.domain.com:26229] has failed, address is now gated for [5000] ms. Reason: [Disassociated]
2018-06-22 16:00:14.971 [flink-akka.actor.default-dispatcher-9313] DEBUG akka.remote.transport.netty.NettyTransport New I/O worker #2 - Remote connection to [flink-jobmanager01-flink-dev.domain.com/10.11.21.24:26229] was disconnected because of [id: 0x735f1b8b, /10.11.21.13:25422 :> flink-jobmanager01-flink-dev.domain.com/10.11.21.24:26229] DISCONNECTED
2018-06-22 16:00:14.971 [flink-akka.actor.default-dispatcher-9313] DEBUG akka.remote.transport.netty.NettyTransport New I/O worker #2 - Remote connection to [flink-jobmanager01-flink-dev.domain.com/10.11.21.24:26229] was disconnected because of [id: 0x735f1b8b, /10.11.21.13:25422 :> flink-jobmanager01-flink-dev.domain.com/10.11.21.24:26229] DISCONNECTED
2018-06-22 16:00:25.199 [main-SendThread(ip-10-11-21-15.domain:2181)] DEBUG o.a.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0x163f3574f380005 after 0ms
2018-06-22 16:00:38.539 [main-SendThread(ip-10-11-21-15.domain:2181)] DEBUG o.a.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0x163f3574f380005 after 0ms
2018-06-22 16:00:40.679 [flink-akka.actor.default-dispatcher-8172] INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink#flink-jobmanager01-flink-dev.domain.com:26229/user/jobmanager (attempt 9306, timeout: 30000 milliseconds)
Another clue, on the leader JobManager such 2 messages are displayed in the log every few seconds:
2018-06-27 17:33:46.632 [jobmanager-future-thread-2] DEBUG o.a.flink.runtime.rest.handler.legacy.metrics.MetricFetcher - Could not retrieve QueryServiceGateway.
java.util.concurrent.CompletionException: akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://flink#tm03-dev:6124/), Path(/user/MetricQueryService_64bde0e9e6f3f0a906a30e88c261c9d7)]
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:442)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at scala.concurrent.Promise$class.complete(Promise.scala:55)
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:157)
at scala.concurrent.Promise$class.failure(Promise.scala:104)
at scala.concurrent.impl.Promise$DefaultPromise.failure(Promise.scala:157)
at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:68)
at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:66)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73)
at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:76)
at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:120)
at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:75)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:534)
at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:558)
at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:595)
at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:584)
at akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:98)
at akka.remote.ReliableDeliverySupervisor$$anonfun$gated$1.applyOrElse(Endpoint.scala:353)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.remote.ReliableDeliverySupervisor.aroundReceive(Endpoint.scala:203)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://flink#tm03-dev:6124/), Path(/user/MetricQueryService_64bde0e9e6f3f0a906a30e88c261c9d7)]
... 27 common frames omitted
2018-06-27 17:34:01.625 [flink-akka.actor.default-dispatcher-19] DEBUG org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler - Error while handling request.
java.util.concurrent.CompletionException: org.apache.flink.runtime.rest.NotFoundException: Could not find job 93d6fa4fb5b2355bb03253cb80d81ef5.
at org.apache.flink.runtime.rest.handler.legacy.AbstractExecutionGraphRequestHandler.lambda$handleJsonRequest$0(AbstractExecutionGraphRequestHandler.java:70)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.rest.handler.legacy.ExecutionGraphCache.lambda$getExecutionGraph$0(ExecutionGraphCache.java:130)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:444)
at akka.dispatch.OnComplete.internal(Future.scala:259)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at scala.concurrent.Promise$class.complete(Promise.scala:55)
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:157)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:534)
at org.apache.flink.runtime.jobmanager.MemoryArchivist$$anonfun$handleMessage$1.applyOrElse(MemoryArchivist.scala:123)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at org.apache.flink.runtime.jobmanager.MemoryArchivist.aroundReceive(MemoryArchivist.scala:65)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.runtime.rest.NotFoundException: Could not find job 93d6fa4fb5b2355bb03253cb80d81ef5.
... 53 common frames omitted
What does this exception mean?
Are we supposed to erase contents of Zookeeper folders (high-availability.zookeeper.path.root) before upgrade?

This question was related to Apache Flink 1.4.2 akka.actor.ActorNotFound.
After we
restarted the JobManagers
increased TaskManager Memory size from 1GB to 2GB
The issue seems to be gone and the cluster is working fine now.

Related

taskManager could not connect to the newly elected jobmanager

I have a Flink cluster running on minikube : 1 jobmanager and 3 taskmanagers.
I am using Kubernetes Ha service to handle jobmanager leader election.
when i am trying to kill the jobmanager to simulate a crash , the taskmanager could not connect
the new jobmanager it try always to connect the previous ip address of the jobmanager that was terminated.
here is the exception :
2021-05-05 12:14:28.126 [flink-akka.actor.default-dispatcher-3] WARN akka.remote.ReliableDeliverySupervisor flink-akka.remote.default-remote-dispatcher-7 - Association with remote system [akka.tcp://flink#172.17.0.7:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#172.17.0.7:6123]] Caused by: [java.net.NoRouteToHostException: No route to host]
2021-05-05 12:14:28.131 [flink-akka.actor.default-dispatcher-3] ERROR o.a.f.runtime.rest.handler.cluster.ClusterOverviewHandler - Unhandled exception.
org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.
at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:386)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
at scala.concurrent.java8.FuturesConvertersImpl$CF$$anon$1.accept(FutureConvertersImpl.scala:61)
at scala.concurrent.java8.FuturesConvertersImpl$CF$$anon$1.accept(FutureConvertersImpl.scala:53)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.rpc.exceptions.RpcConnectionException: Could not connect to rpc endpoint under address akka.tcp://flink#172.17.0.7:6123/user/rpc/resourcemanager_0.
at org.apache.flink.runtime.rpc.akka.AkkaRpcService.lambda$resolveActorAddress$10(AkkaRpcService.java:570)
at scala.concurrent.java8.FuturesConvertersImpl$CF$$anon$1.accept(FutureConvertersImpl.scala:59)
... 5 common frames omitted
Caused by: org.apache.flink.runtime.rpc.exceptions.RpcConnectionException: Could not connect to rpc endpoint under address akka.tcp://flink#172.17.0.7:6123/user/rpc/resourcemanager_0.
... 7 common frames omitted
Caused by: akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://flink#172.17.0.7:6123/), Path(/user/rpc/resourcemanager_0)]
at akka.actor.ActorSelection.$anonfun$resolveOne$1(ActorSelection.scala:71)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73)
at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:81)
at akka.dispatch.BatchingExecutor.execute(BatchingExecutor.scala:120)
at akka.dispatch.BatchingExecutor.execute$(BatchingExecutor.scala:114)
at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:80)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:573)
at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:556)
at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:593)
at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:582)
at akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:104)
at akka.remote.EndpointWriter.postStop(Endpoint.scala:606)
at akka.actor.Actor.aroundPostStop(Actor.scala:536)
at akka.actor.Actor.aroundPostStop$(Actor.scala:536)
at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:458)
at akka.actor.dungeon.FaultHandling.finishTerminate(FaultHandling.scala:210)
at akka.actor.dungeon.FaultHandling.terminate(FaultHandling.scala:172)
at akka.actor.dungeon.FaultHandling.terminate$(FaultHandling.scala:142)
at akka.actor.ActorCell.terminate(ActorCell.scala:429)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:533)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:549)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:283)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Getting started with keycloak and kubernetes

I am trying to run keycloak on kubernetes so I follow this example on my kubernetes cluster. But when I run command:
kubectl create -f https://raw.githubusercontent.com/keycloak/keycloak-quickstarts/latest/kubernetes-examples/keycloak.yaml
I got database error after 2 minutes. Why I can not run this simple example without error ?
00:15:29,210 INFO [org.keycloak.connections.infinispan.DefaultInfinispanConnectionProviderFactory] (ServerService Thread Pool -- 66) Node name: keycloak-b6b94bd59-v96tk, Site name: null
00:17:39,565 WARN [org.jboss.jca.core.connectionmanager.pool.strategy.OnePool] (ServerService Thread Pool -- 66) IJ000604: Throwable while attempting to get a new connection: null: javax.resource.ResourceException: IJ031084: Unable to create connection
at org.jboss.ironjacamar.jdbcadapters#1.4.23.Final//org.jboss.jca.adapters.jdbc.local.LocalManagedConnectionFactory.createLocalManagedConnection(LocalManagedConnectionFactory.java:345)
at org.jboss.ironjacamar.jdbcadapters#1.4.23.Final//org.jboss.jca.adapters.jdbc.local.LocalManagedConnectionFactory.getLocalManagedConnection(LocalManagedConnectionFactory.java:352)
at org.jboss.ironjacamar.jdbcadapters#1.4.23.Final//org.jboss.jca.adapters.jdbc.local.LocalManagedConnectionFactory.createManagedConnection(LocalManagedConnectionFactory.java:287)
at org.jboss.ironjacamar.impl#1.4.23.Final//org.jboss.jca.core.connectionmanager.pool.mcp.SemaphoreConcurrentLinkedDequeManagedConnectionPool.createConnectionEventListener(SemaphoreConcurrentLinkedDequeManagedConnectionPool.java:1322)
at org.jboss.ironjacamar.impl#1.4.23.Final//org.jboss.jca.core.connectionmanager.pool.mcp.SemaphoreConcurrentLinkedDequeManagedConnectionPool.getConnection(SemaphoreConcurrentLinkedDequeManagedConnectionPool.java:499)
at org.jboss.ironjacamar.impl#1.4.23.Final//org.jboss.jca.core.connectionmanager.pool.AbstractPool.getSimpleConnection(AbstractPool.java:632)
at org.jboss.ironjacamar.impl#1.4.23.Final//org.jboss.jca.core.connectionmanager.pool.AbstractPool.getConnection(AbstractPool.java:604)
at org.jboss.ironjacamar.impl#1.4.23.Final//org.jboss.jca.core.connectionmanager.AbstractConnectionManager.getManagedConnection(AbstractConnectionManager.java:624)
at org.jboss.ironjacamar.impl#1.4.23.Final//org.jboss.jca.core.connectionmanager.tx.TxConnectionManagerImpl.getManagedConnection(TxConnectionManagerImpl.java:440)
at org.jboss.ironjacamar.impl#1.4.23.Final//org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:789)
at org.jboss.ironjacamar.jdbcadapters#1.4.23.Final//org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:151)
at org.jboss.as.connector#21.0.2.Final//org.jboss.as.connector.subsystems.datasources.WildFlyDataSource.getConnection(WildFlyDataSource.java:64)
at org.keycloak.keycloak-model-jpa#12.0.4//org.keycloak.connections.jpa.DefaultJpaConnectionProviderFactory.getConnection(DefaultJpaConnectionProviderFactory.java:371)
at org.keycloak.keycloak-model-jpa#12.0.4//org.keycloak.connections.jpa.updater.liquibase.lock.LiquibaseDBLockProvider.lazyInit(LiquibaseDBLockProvider.java:65)
at org.keycloak.keycloak-model-jpa#12.0.4//org.keycloak.connections.jpa.updater.liquibase.lock.LiquibaseDBLockProvider.lambda$waitForLock$2(LiquibaseDBLockProvider.java:96)
at org.keycloak.keycloak-server-spi-private#12.0.4//org.keycloak.models.utils.KeycloakModelUtils.suspendJtaTransaction(KeycloakModelUtils.java:654)
at org.keycloak.keycloak-model-jpa#12.0.4//org.keycloak.connections.jpa.updater.liquibase.lock.LiquibaseDBLockProvider.waitForLock(LiquibaseDBLockProvider.java:94)
at org.keycloak.keycloak-services#12.0.4//org.keycloak.services.resources.KeycloakApplication$1.run(KeycloakApplication.java:136)
at org.keycloak.keycloak-server-spi-private#12.0.4//org.keycloak.models.utils.KeycloakModelUtils.runJobInTransaction(KeycloakModelUtils.java:228)
at org.keycloak.keycloak-services#12.0.4//org.keycloak.services.resources.KeycloakApplication.startup(KeycloakApplication.java:129)
at org.keycloak.keycloak-wildfly-extensions#12.0.4//org.keycloak.provider.wildfly.WildflyPlatform.onStartup(WildflyPlatform.java:29)
at org.keycloak.keycloak-services#12.0.4//org.keycloak.services.resources.KeycloakApplication.<init>(KeycloakApplication.java:115)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at org.jboss.resteasy.resteasy-jaxrs#3.13.2.Final//org.jboss.resteasy.core.ConstructorInjectorImpl.construct(ConstructorInjectorImpl.java:152)
at org.jboss.resteasy.resteasy-jaxrs#3.13.2.Final//org.jboss.resteasy.spi.ResteasyProviderFactory.createProviderInstance(ResteasyProviderFactory.java:2815)
at org.jboss.resteasy.resteasy-jaxrs#3.13.2.Final//org.jboss.resteasy.spi.ResteasyDeployment.createApplication(ResteasyDeployment.java:371)
at org.jboss.resteasy.resteasy-jaxrs#3.13.2.Final//org.jboss.resteasy.spi.ResteasyDeployment.startInternal(ResteasyDeployment.java:283)
at org.jboss.resteasy.resteasy-jaxrs#3.13.2.Final//org.jboss.resteasy.spi.ResteasyDeployment.start(ResteasyDeployment.java:93)
at org.jboss.resteasy.resteasy-jaxrs#3.13.2.Final//org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.init(ServletContainerDispatcher.java:140)
at org.jboss.resteasy.resteasy-jaxrs#3.13.2.Final//org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.init(HttpServletDispatcher.java:42)
at io.undertow.servlet#2.2.2.Final//io.undertow.servlet.core.LifecyleInterceptorInvocation.proceed(LifecyleInterceptorInvocation.java:117)
at org.wildfly.extension.undertow#21.0.2.Final//org.wildfly.extension.undertow.security.RunAsLifecycleInterceptor.init(RunAsLifecycleInterceptor.java:78)
at io.undertow.servlet#2.2.2.Final//io.undertow.servlet.core.LifecyleInterceptorInvocation.proceed(LifecyleInterceptorInvocation.java:103)
at io.undertow.servlet#2.2.2.Final//io.undertow.servlet.core.ManagedServlet$DefaultInstanceStrategy.start(ManagedServlet.java:305)
at io.undertow.servlet#2.2.2.Final//io.undertow.servlet.core.ManagedServlet.createServlet(ManagedServlet.java:145)
at io.undertow.servlet#2.2.2.Final//io.undertow.servlet.core.DeploymentManagerImpl$2.call(DeploymentManagerImpl.java:588)
at io.undertow.servlet#2.2.2.Final//io.undertow.servlet.core.DeploymentManagerImpl$2.call(DeploymentManagerImpl.java:559)
at io.undertow.servlet#2.2.2.Final//io.undertow.servlet.core.ServletRequestContextThreadSetupAction$1.call(ServletRequestContextThreadSetupAction.java:42)
at io.undertow.servlet#2.2.2.Final//io.undertow.servlet.core.ContextClassLoaderSetupAction$1.call(ContextClassLoaderSetupAction.java:43)
at org.wildfly.extension.undertow#21.0.2.Final//org.wildfly.extension.undertow.security.SecurityContextThreadSetupAction.lambda$create$0(SecurityContextThreadSetupAction.java:105)
at org.wildfly.extension.undertow#21.0.2.Final//org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1530)
at org.wildfly.extension.undertow#21.0.2.Final//org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1530)
at org.wildfly.extension.undertow#21.0.2.Final//org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1530)
at org.wildfly.extension.undertow#21.0.2.Final//org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1530)
at io.undertow.servlet#2.2.2.Final//io.undertow.servlet.core.DeploymentManagerImpl.start(DeploymentManagerImpl.java:601)
at org.wildfly.extension.undertow#21.0.2.Final//org.wildfly.extension.undertow.deployment.UndertowDeploymentService.startContext(UndertowDeploymentService.java:97)
at org.wildfly.extension.undertow#21.0.2.Final//org.wildfly.extension.undertow.deployment.UndertowDeploymentService$1.run(UndertowDeploymentService.java:78)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at org.jboss.threads#2.4.0.Final//org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
at org.jboss.threads#2.4.0.Final//org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1990)
at org.jboss.threads#2.4.0.Final//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
at org.jboss.threads#2.4.0.Final//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377)
at java.base/java.lang.Thread.run(Thread.java:834)
at org.jboss.threads#2.4.0.Final//org.jboss.threads.JBossThread.run(JBossThread.java:513)
Caused by: com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at com.mysql.jdbc#8.0.22//com.mysql.cj.jdbc.exceptions.SQLError.createCommunicationsException(SQLError.java:174)
at com.mysql.jdbc#8.0.22//com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:64)
at com.mysql.jdbc#8.0.22//com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:836)
at com.mysql.jdbc#8.0.22//com.mysql.cj.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:456)
at com.mysql.jdbc#8.0.22//com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:246)
at com.mysql.jdbc#8.0.22//com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:198)
at org.jboss.ironjacamar.jdbcadapters#1.4.23.Final//org.jboss.jca.adapters.jdbc.local.LocalManagedConnectionFactory.createLocalManagedConnection(LocalManagedConnectionFactory.java:321)
... 57 more
Caused by: com.mysql.cj.exceptions.CJCommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at com.mysql.jdbc#8.0.22//com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)
at com.mysql.jdbc#8.0.22//com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)
at com.mysql.jdbc#8.0.22//com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151)
at com.mysql.jdbc#8.0.22//com.mysql.cj.exceptions.ExceptionFactory.createCommunicationsException(ExceptionFactory.java:167)
at com.mysql.jdbc#8.0.22//com.mysql.cj.protocol.a.NativeSocketConnection.connect(NativeSocketConnection.java:89)
at com.mysql.jdbc#8.0.22//com.mysql.cj.NativeSession.connect(NativeSession.java:144)
at com.mysql.jdbc#8.0.22//com.mysql.cj.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:956)
at com.mysql.jdbc#8.0.22//com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:826)
... 61 more
Caused by: java.net.ConnectException: Connection timed out (Connection timed out)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.base/java.net.Socket.connect(Socket.java:609)
at com.mysql.jdbc#8.0.22//com.mysql.cj.protocol.StandardSocketFactory.connect(StandardSocketFactory.java:155)
at com.mysql.jdbc#8.0.22//com.mysql.cj.protocol.a.NativeSocketConnection.connect(NativeSocketConnection.java:63)
... 64 more
00:17:39,573 FATAL [org.keycloak.services] (ServerService Thread Pool -- 66) Error during startup: java.lang.RuntimeException: Failed to connect to database
your error is clearly informing keylock is trying to connect with the database but can't get it there might be an issue with your configuration file or environment you are passing.
You can try this out : https://github.com/harsh4870/Keycloack-postgres-kubernetes-deployment
let me know if these files don't work. it's not for a production use case but you can use it for setting up the dev environment.

To achieve Kafka-mule integration using Mule 4, I installed zookeeper&kafka on my machine. Both the servers connection goes off after sometime

I started by creating zookeeper and kafka servers respectively on my machine. The connection gets established for both the servers but goes off after a few mins, throwing some errors.
ZOOKEEPER ERROR LOG-
'''2020-05-11 14:32:36,908 [myid:] - INFO [SyncThread:0:FileTxnLog#284] - Creating new log file: log.1
2020-05-11 14:36:05,170 [myid:] - WARN [NIOWorkerThread-1:NIOServerCnxn#373] - Close of session 0x10000cc587b0003
java.io.IOException: An existing connection was forcibly closed by the remote host
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:324)
at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-05-11 14:36:10,589 [myid:] - INFO [SessionTracker:ZooKeeperServer#600] - Expiring session 0x10000cc587b0003, timeout of 6000ms exceeded
'''
KAFKA ERROR LOG - Below are the kafka server error logs## Heading ##
'''[2020-05-11 14:33:41,077] INFO [KafkaServer id=0] started (kafka.server.KafkaServer)
[2020-05-11 14:36:04,762] ERROR Error while writing to checkpoint file C:\Kafka\Kafkakafka-logs\replication-offset-checkpoint (kafka.server.LogDirFailureChannel)
java.nio.file.FileAlreadyExistsException: C:\Kafka\Kafkakafka-logs\replication-offset-checkpoint.tmp -> C:\Kafka\Kafkakafka-logs\replication-offset-checkpoint
at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:81)
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
.....
[2020-05-11 14:36:04,776] INFO [ReplicaManager broker=0] Stopping serving replicas in dir C:\Kafka\Kafkakafka-logs (kafka.server.ReplicaManager)
[2020-05-11 14:36:04,776] ERROR [ReplicaManager broker=0] Error while writing to highwatermark file in directory C:\Kafka\Kafkakafka-logs (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.KafkaStorageException: Error while writing to checkpoint file C:\Kafka\Kafkakafka-logs\replication-offset-checkpoint
Caused by: java.nio.file.FileAlreadyExistsException: C:\Kafka\Kafkakafka-logs\replication-offset-checkpoint.tmp -> C:\Kafka\Kafkakafka-logs\replication-offset-checkpoint
at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:81)
'''

How to investigate failing dataproc worker processes?

I'm running a PySpark job and Im having trouble determining the cause of failure on worker processes.
While my job is running I started noticing stack traces in the job output such as:
16/04/10 03:24:21 WARN org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1460240417530_0021_01_000003 on host: cluster-2-w-0.c.my-project.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
[Stage 0:=================================> (19 + 13) / 32]16/04/10 03:26:21 WARN org.apache.spark.rpc.netty.NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(2,Container marked as failed: container_1460240417530_0021_01_000003 on host: cluster-2-w-0.c.my-project.internal. Exit status: -100. Diagnostics: Container released on a *lost* node)] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:359)
at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receive$1.applyOrElse(YarnSchedulerBackend.scala:176)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
... 11 more
16/04/10 03:26:40 WARN org.apache.spark.rpc.netty.NettyRpcEndpointRef: Error sending message [message = RequestExecutors(23,0,Map())] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Failure.recover(Try.scala:185)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at org.spark-project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
at scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:133)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at scala.concurrent.Promise$class.complete(Promise.scala:55)
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.processBatch$1(Future.scala:643)
at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply$mcV$sp(Future.scala:658)
at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635)
at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at scala.concurrent.Future$InternalCallbackExecutor$Batch.run(Future.scala:634)
at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:685)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153)
at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:241)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply in 120 seconds
at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:242)
... 7 more
[Stage 0:=================================> (19 + 13) / 32]
Ill also notice the overall CPU usage of the cluster slowly drop as worker nodes fail. These nodes seem to permanently fail and do not re-join the cluster:
I'm using preemtible machines but when I check the status of these machines they are still running and have not been preempted. So I'm guessing its something wrong on the worker:
It could be because of the heavy workload in the workers. Try to increase spark.network.timeout (default 120) to a bigger number.
If that is not resolving the error, most likely causes are garbage collection. Try to run a memory profile with the following options.
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/ -XX:+CMSClassUnloadingEnabled

IBM Worklight 6.0 - Error while building and deploying an application

I am getting below error while building an app in Worklight Studio:
AUDIT ] CWWKF0011I: The server worklight is ready to run a smarter planet.
[ERROR ] Failed executing POST /applications/upload
java.lang.RuntimeException: java.net.SocketTimeoutException: Socket operation timed out before it could be completed
LOG:
org.jboss.resteasy.spi.ReaderException: java.lang.RuntimeException: java.net.SocketTimeoutException: Socket operation timed out before it could be completed
at org.jboss.resteasy.core.MessageBodyParameterInjector.inject(MessageBodyParameterInjector.java:202)
at org.jboss.resteasy.core.MethodInjectorImpl.injectArguments(MethodInjectorImpl.java:136)
at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:159)
at org.jboss.resteasy.core.ResourceMethod.invokeOnTarget(ResourceMethod.java:257)
at org.jboss.resteasy.core.ResourceMethod.invoke(ResourceMethod.java:222)
at org.jboss.resteasy.core.ResourceMethod.invoke(ResourceMethod.java:211)
at org.jboss.resteasy.core.SynchronousDispatcher.getResponse(SynchronousDispatcher.java:542)
at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:524)
at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:126)
at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:208)
at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:55)
at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:50)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1240)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:760)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:443)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.invokeTarget(WebAppFilterChain.java:127)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:88)
at com.worklight.core.auth.impl.AuthenticationFilter$1.execute(AuthenticationFilter.java:192)
at com.worklight.core.auth.impl.AuthenticationServiceBean.accessResource(AuthenticationServiceBean.java:76)
at com.worklight.core.auth.impl.AuthenticationFilter.doFilter(AuthenticationFilter.java:196)
at com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:194)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:85)
at com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:949)
at com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1029)
at com.ibm.ws.webcontainer.webapp.WebApp.handleRequest(WebApp.java:4499)
at com.ibm.ws.webcontainer.osgi.DynamicVirtualHost$2.handleRequest(DynamicVirtualHost.java:282)
at com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:954)
at com.ibm.ws.webcontainer.osgi.DynamicVirtualHost$2.run(DynamicVirtualHost.java:252)
at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink$TaskWrapper.run(HttpDispatcherLink.java:584)
at com.ibm.ws.threading.internal.Worker.executeWork(Worker.java:439)
at com.ibm.ws.threading.internal.Worker.run(Worker.java:421)
at java.lang.Thread.run(Thread.java:695)
Caused by: java.lang.RuntimeException: java.net.SocketTimeoutException: Socket operation timed out before it could be completed
at org.jboss.resteasy.plugins.providers.multipart.MultipartInputImpl$BinaryOnlyMessageBuilder.body(MultipartInputImpl.java:140)
at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:101)
at org.jboss.resteasy.plugins.providers.multipart.MultipartInputImpl$BinaryMessage.<init>(MultipartInputImpl.java:153)
at org.jboss.resteasy.plugins.providers.multipart.MultipartInputImpl$BinaryMessage.<init>(MultipartInputImpl.java:146)
at org.jboss.resteasy.plugins.providers.multipart.MultipartInputImpl.parse(MultipartInputImpl.java:197)
at org.jboss.resteasy.plugins.providers.multipart.MultipartReader.readFrom(MultipartReader.java:50)
at org.jboss.resteasy.plugins.providers.multipart.MultipartReader.readFrom(MultipartReader.java:20)
at org.jboss.resteasy.core.interception.MessageBodyReaderContextImpl.proceed(MessageBodyReaderContextImpl.java:105)
at org.jboss.resteasy.plugins.interceptors.encoding.GZIPDecodingInterceptor.read(GZIPDecodingInterceptor.java:63)
at org.jboss.resteasy.core.interception.MessageBodyReaderContextImpl.proceed(MessageBodyReaderContextImpl.java:108)
at org.jboss.resteasy.core.MessageBodyParameterInjector.inject(MessageBodyParameterInjector.java:169)
... 32 more
Caused by: java.net.SocketTimeoutException: Socket operation timed out before it could be completed
at com.ibm.ws.tcpchannel.internal.SocketRWChannelSelector.checkForTimeouts(SocketRWChannelSelector.java:415)
at com.ibm.ws.tcpchannel.internal.ChannelSelector.run(ChannelSelector.java:226)
... 1 more
[11/14/13 11:33:38:370 GMT+05:30] 0000003f org.jboss.resteasy.core.SynchronousDispatcher E Failed executing POST /applications/upload
org.jboss.resteasy.spi.ReaderException: java.lang.RuntimeException: java.net.SocketTimeoutException: Socket operation timed out before it could be completed
at org.jboss.resteasy.core.MessageBodyParameterInjector.inject(MessageBodyParameterInjector.java:202)
at org.jboss.resteasy.core.MethodInjectorImpl.injectArguments(MethodInjectorImpl.java:136)
at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:159)
at org.jboss.resteasy.core.ResourceMethod.invokeOnTarget(ResourceMethod.java:257)
at org.jboss.resteasy.core.ResourceMethod.invoke(ResourceMethod.java:222)
at org.jboss.resteasy.core.ResourceMethod.invoke(ResourceMethod.java:211)
at org.jboss.resteasy.core.SynchronousDispatcher.getResponse(SynchronousDispatcher.java:542)
at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:524)
at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:126)
at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:208)
at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:55)
at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:50)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1240)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:760)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:443)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.invokeTarget(WebAppFilterChain.java:127)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:88)
at com.worklight.core.auth.impl.AuthenticationFilter$1.execute(AuthenticationFilter.java:192)
at com.worklight.core.auth.impl.AuthenticationServiceBean.accessResource(AuthenticationServiceBean.java:76)
at com.worklight.core.auth.impl.AuthenticationFilter.doFilter(AuthenticationFilter.java:196)
at com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:194)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:85)
at com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:949)
at com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1029)
at com.ibm.ws.webcontainer.webapp.WebApp.handleRequest(WebApp.java:4499)
at com.ibm.ws.webcontainer.osgi.DynamicVirtualHost$2.handleRequest(DynamicVirtualHost.java:282)
at com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:954)
at com.ibm.ws.webcontainer.osgi.DynamicVirtualHost$2.run(DynamicVirtualHost.java:252)
at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink$TaskWrapper.run(HttpDispatcherLink.java:584)
at com.ibm.ws.threading.internal.Worker.executeWork(Worker.java:439)
at com.ibm.ws.threading.internal.Worker.run(Worker.java:421)
at java.lang.Thread.run(Thread.java:695)
Caused by: java.lang.RuntimeException: java.net.SocketTimeoutException: Socket operation timed out before it could be completed
at org.jboss.resteasy.plugins.providers.multipart.MultipartInputImpl$BinaryOnlyMessageBuilder.body(MultipartInputImpl.java:140)
at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:101)
at org.jboss.resteasy.plugins.providers.multipart.MultipartInputImpl$BinaryMessage.<init>(MultipartInputImpl.java:153)
at org.jboss.resteasy.plugins.providers.multipart.MultipartInputImpl$BinaryMessage.<init>(MultipartInputImpl.java:146)
at org.jboss.resteasy.plugins.providers.multipart.MultipartInputImpl.parse(MultipartInputImpl.java:197)
at org.jboss.resteasy.plugins.providers.multipart.MultipartReader.readFrom(MultipartReader.java:50)
at org.jboss.resteasy.plugins.providers.multipart.MultipartReader.readFrom(MultipartReader.java:20)
at org.jboss.resteasy.core.interception.MessageBodyReaderContextImpl.proceed(MessageBodyReaderContextImpl.java:105)
at org.jboss.resteasy.plugins.interceptors.encoding.GZIPDecodingInterceptor.read(GZIPDecodingInterceptor.java:63)
at org.jboss.resteasy.core.interception.MessageBodyReaderContextImpl.proceed(MessageBodyReaderContextImpl.java:108)
at org.jboss.resteasy.core.MessageBodyParameterInjector.inject(MessageBodyParameterInjector.java:169)
... 32 more
Caused by: java.net.SocketTimeoutException: Socket operation timed out before it could be completed
at com.ibm.ws.tcpchannel.internal.SocketRWChannelSelector.checkForTimeouts(SocketRWChannelSelector.java:415)
at com.ibm.ws.tcpchannel.internal.ChannelSelector.run(ChannelSelector.java:226)
... 1 more
My system configuration is:
MAC 10.9
Eclipse JUNO SR2
Java 6
See if the answer here helps: org.eclipse.jetty.io.EofException at Build and deploy
Open the eclipse.ini file and
Add to it the following: X:MaxPermSize=320m -Djava.util.Arrays.useLegacyMergeSort=true
It could also be that you have a firewall that is blocking you somewhat - if you do have one, try disabling it and see if deployment passes.