Blocked Thread on Ignite Cluster

Blocked Thread on Ignite Cluster - kubernetes

Running Ignite inside kubernetes environment, but after a while when the load is increased getting below Blocked Thread in Thread dump,
qtp102185114-154-acceptor-2#6275fad4-ServerConnector#6a3aa688{HTTP/1.1}{0.0.0.0:8080} - priority:5 - threadId:0x00007f78a6e1c000 - nativeId:0x84 - state:BLOCKED
stackTrace:
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:234)
- waiting to lock <0x00000000a049f430> (a java.lang.Object)
at org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:377)
at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:500)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
Can anyone explain what's going wrong here? Given Blocked thread is on every node of a cluster.

It looks like Jetty (which is used for REST API internally in Ignite) doesn't have enough threads in the pool to accept all incoming connection when loading increases. I would suggest checking CPU utilization firstly and it is not 100% then you are able to increase threads count in Jetty XML configuration

Related

Updating Java Heap size while deploying/installing keycloak using bitnami/keycloak

I keep getting a OutOfMemoryError while deploying/installing keycloak using bitnami/keycloak in a Kubernetes pod.
2023-02-09 17:33:49,316 ERROR [org.jboss.threads.errors] (build-19) Thread Thread[build-19,5,build group] threw an uncaught exception: java.util.concurrent.RejectedExecutionException
at org.jboss.threads.RejectingExecutor.execute(RejectingExecutor.java:38)
at org.jboss.threads.EnhancedQueueExecutor.rejectException(EnhancedQueueExecutor.java:2049)
at org.jboss.threads.EnhancedQueueExecutor.doStartThread(EnhancedQueueExecutor.java:1713)
at org.jboss.threads.EnhancedQueueExecutor.execute(EnhancedQueueExecutor.java:787)
at io.quarkus.builder.BuildContext.depFinished(BuildContext.java:261)
at io.quarkus.builder.BuildContext.run(BuildContext.java:294)
at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18)
at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2449)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1452)
at java.base/java.lang.Thread.run(Thread.java:829)
at org.jboss.threads.JBossThread.run(JBossThread.java:501)
Suppressed: java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
at java.base/java.lang.Thread.start0(Native Method)
at java.base/java.lang.Thread.start(Thread.java:798)
at org.jboss.threads.JBossThread.start(JBossThread.java:558)
at org.jboss.threads.EnhancedQueueExecutor.doStartThread(EnhancedQueueExecutor.java:1709)
... 8 more
I suspect it could be because of the Java Heap size. Please correct me if I am wrong.
I have checked the resources allocated to the machine; everything looks fine. The question is how do I update the heap size in value.yaml file? Is JAVA_OPTS_APPEND in https://github.com/bitnami/charts/blob/main/bitnami/keycloak/templates/configmap-env-vars.yaml is the env variable to be overridden?
If anybody has an example, please share.

Drools 6.5 ConcurrentModificationException with LinkedHashMap Fact

In our Java application using Drools 6.5 final release, we use Disruptor to run the same rules by different user threads, each thread has its own dedicated Session object while all the sessions are created from a common KieBase. Dev/QA did not see the following error but in Production, we see the error: the object being inserted is LinkedHashMap instance and this object will definitely being processed by one user thread (based on hashCode of immutable object coming with the message), so this is strange that this LinkedHashCode object would be modified by thread other than the user threads. Any thoughts on what could be the cause?
07:04:15.719 ERROR [RuleHandler6] erf.SupportsProfilingHandlerBase - Exception -
java.util.ConcurrentModificationException
at java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:711)
at java.util.LinkedHashMap$LinkedEntryIterator.next(LinkedHashMap.java:744)
at java.util.LinkedHashMap$LinkedEntryIterator.next(LinkedHashMap.java:742)
at java.util.AbstractMap.hashCode(AbstractMap.java:507)
at org.drools.core.common.EqualityAssertMapComparator.hashCodeOf(EqualityAssertMapComparator.java:46)
at org.drools.core.util.ObjectHashMap.get(ObjectHashMap.java:90)
at org.drools.core.common.ClassAwareObjectStore.getHandleForObject(ClassAwareObjectStore.java:150)
at org.drools.core.common.NamedEntryPoint.getFactHandle(NamedEntryPoint.java:680)
at consolidator.services.DroolsKieContainer$SessionWrapper.internalFire(DroolsKieContainer.java:198)
at consolidator.services.DroolsKieContainer$SessionWrapper.fire(DroolsKieContainer.java:175)
at consolidator.services.DroolsKieService.fire(DroolsKieService.java:153)
at consolidator.disruptor.RuleHandler.handleFIX(RuleHandler.java:88)
at consolidator.disruptor.RuleHandler.onEventCore(RuleHandler.java:68)
at consolidator.disruptor.RuleHandler.onEventCore(RuleHandler.java:15)
at consolidator.perf.SupportsProfilingHandlerBase.onEvent(SupportsProfilingHandlerBase.java:43)
at consolidator.perf.SupportsProfilingHandlerBase.onEvent(SupportsProfilingHandlerBase.java:9)
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:129)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

MongoDB java driver - massive number of parked threads

In a long run I am seeing massive number of threads getting piled up by MongoDB Java driver (v3.0.3). All these threads are server monitoring threads, all parked waiting:
cluster-ClusterId{value='562233d1b26c940820028340', description='null'}-192.168.0.2:27017
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForSignalOrTimeout(DefaultServerMonitor.java:237)
com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForNext(DefaultServerMonitor.java:218)
com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:167)
java.lang.Thread.run(Unknown Source)
Right now there are about 250 of them. I don't think that many threads are needed to monitor a connection to a single database host. I am obviously doing something wrong.., but as far as I can tell we didn't do any setting changes when moved from driver v2 to v3. Could be a bug in driver? Any ideas?

This issue has been fixed in 3.2.2.
https://jira.mongodb.org/browse/JAVA-2074

Apache Camel blocking on multicast

I have an application that consists of many modules deployed in several JBoss instances. Those modules broadcast their version numbers via JMS to each other.
It works in the following way. Each module periodically broadcasts its versions like this:
camelTemplate.asyncSendBody(moduleVersionsOutputChannel, new ArrayList<>(moduleVersions));
Here's the definition of moduleVersionsOutputChannel:
<endpoint id="moduleVersionsOutputChannel" uri="direct:module.versions.output.channel"/>
Then there's a route that consumes messages from moduleVersionsOutputChannel (referred as source below) and posts the messages into topics:
from(source).multicast()
.parallelProcessing().timeout(moduleVersionsMonitoringConfig.getMulticastTimeout())
.to(moduleVersionsMonitoringConfig.getChannels())
.id(moduleVersionsService.getCurrentModuleId() + "-outbound");
At the beginning everything works fine, but after some time (hours or days) the threads that do multicast are blocked. I observe blocks at 2 different points.
The first block happens in the thread where asyncSendBody is called:
"moduleVersionsBroadcastScheduler-1#57542" prio=5 tid=0x423 nid=NA waiting
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Unsafe.java:-1)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
at org.apache.camel.processor.MulticastProcessor.doProcessParallel(MulticastProcessor.java:328)
at org.apache.camel.processor.MulticastProcessor.process(MulticastProcessor.java:214)
at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:72)
at org.apache.camel.processor.interceptor.TraceInterceptor.process(TraceInterceptor.java:163)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:191)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:191)
at org.apache.camel.component.direct.DirectProducer.process(DirectProducer.java:51)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:191)
at org.apache.camel.processor.UnitOfWorkProducer.process(UnitOfWorkProducer.java:73)
at org.apache.camel.impl.ProducerCache$2.doInProducer(ProducerCache.java:378)
at org.apache.camel.impl.ProducerCache$2.doInProducer(ProducerCache.java:346)
at org.apache.camel.impl.ProducerCache.doInProducer(ProducerCache.java:242)
at org.apache.camel.impl.ProducerCache.sendExchange(ProducerCache.java:346)
at org.apache.camel.impl.ProducerCache.send(ProducerCache.java:184)
at org.apache.camel.impl.DefaultProducerTemplate.send(DefaultProducerTemplate.java:124)
at org.apache.camel.impl.DefaultProducerTemplate.sendBody(DefaultProducerTemplate.java:137)
at org.apache.camel.impl.DefaultProducerTemplate$14.call(DefaultProducerTemplate.java:621)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(ThreadPoolExecutor.java:2025)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132)
at org.apache.camel.impl.DefaultProducerTemplate.asyncSendBody(DefaultProducerTemplate.java:626)
at ...
Another time the execution blocks at the later point when it actually tries to post a message to JMS topics:
"Camel (moduleVersionsContext) thread #1044 - Multicast#52031" daemon prio=5 tid=0xa16 nid=NA waiting
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Unsafe.java:-1)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:472)
at org.hornetq.core.client.impl.ClientProducerCreditsImpl.acquireCredits(ClientProducerCreditsImpl.java:90)
at org.hornetq.core.client.impl.ClientProducerImpl.sendRegularMessage(ClientProducerImpl.java:307)
at org.hornetq.core.client.impl.ClientProducerImpl.doSend(ClientProducerImpl.java:288)
at org.hornetq.core.client.impl.ClientProducerImpl.send(ClientProducerImpl.java:140)
at org.hornetq.jms.client.HornetQMessageProducer.doSend(HornetQMessageProducer.java:438)
at org.hornetq.jms.client.HornetQMessageProducer.send(HornetQMessageProducer.java:205)
at org.springframework.jms.core.JmsTemplate.doSend(JmsTemplate.java:589)
at org.apache.camel.component.jms.JmsConfiguration$CamelJmsTemplate.doSend(JmsConfiguration.java:336)
at org.apache.camel.component.jms.JmsConfiguration$CamelJmsTemplate.doSendToDestination(JmsConfiguration.java:275)
at org.apache.camel.component.jms.JmsConfiguration$CamelJmsTemplate.access$100(JmsConfiguration.java:217)
at org.apache.camel.component.jms.JmsConfiguration$CamelJmsTemplate$1.doInJms(JmsConfiguration.java:231)
at org.springframework.jms.core.JmsTemplate.execute(JmsTemplate.java:466)
at org.apache.camel.component.jms.JmsConfiguration$CamelJmsTemplate.send(JmsConfiguration.java:228)
at org.apache.camel.component.jms.JmsProducer.doSend(JmsProducer.java:431)
at org.apache.camel.component.jms.JmsProducer.processInOnly(JmsProducer.java:385)
at org.apache.camel.component.jms.JmsProducer.process(JmsProducer.java:153)
at org.apache.camel.processor.SendProcessor.process(SendProcessor.java:110)
at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:72)
at org.apache.camel.processor.interceptor.TraceInterceptor.process(TraceInterceptor.java:163)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:191)
at org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:398)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:191)
at org.apache.camel.util.AsyncProcessorHelper.process(AsyncProcessorHelper.java:105)
at org.apache.camel.processor.MulticastProcessor.doProcessParallel(MulticastProcessor.java:712)
at org.apache.camel.processor.MulticastProcessor.access$200(MulticastProcessor.java:83)
at org.apache.camel.processor.MulticastProcessor$1.call(MulticastProcessor.java:293)
at org.apache.camel.processor.MulticastProcessor$1.call(MulticastProcessor.java:278)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Can you tell me whether I have a bug in my Camel configuration or it's a known Camel issue and there's some workaround?
My environment:
Apache Camel 2.12.3
JBoss 6.1 EAP
OpenJDK 1.7.0_55
Ubuntu Linux 12.04 LTS

I had similar problem. Try to configure the cxf producer endpoint to synchronous. This resolved my issue
http://camel.465427.n5.nabble.com/Intermittent-STUCK-threads-in-Weblogic-Server-due-to-camel-application-td5750624.html

Spark fails on big shuffle jobs with java.io.IOException: Filesystem closed

I often find spark fails with large jobs with a rather unhelpful meaningless exception. The worker logs look normal, no errors, but they get state "KILLED". This is extremely common for large shuffles, so operations like .distinct.
The question is, how do I diagnose what's going wrong, and ideally, how do I fix it?
Given that a lot of these operations are monoidal I've been working around the problem by splitting the data into, say 10, chunks, running the app on each chunk, then running the app on all of the resulting outputs. In other words - meta-map-reduce.
14/06/04 12:56:09 ERROR client.AppClient$ClientActor: Master removed our application: FAILED; stopping client
14/06/04 12:56:09 WARN cluster.SparkDeploySchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection...
14/06/04 12:56:09 WARN scheduler.TaskSetManager: Loss was due to java.io.IOException
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:703)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:779)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:840)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:209)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:164)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:149)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257)
at scala.collection.AbstractIterator.toList(Iterator.scala:1157)
at $line5.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:13)
at $line5.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:13)
at org.apache.spark.rdd.RDD$$anonfun$1.apply(RDD.scala:450)
at org.apache.spark.rdd.RDD$$anonfun$1.apply(RDD.scala:450)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
at org.apache.spark.scheduler.Task.run(Task.scala:53)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

As of September 1st 2014, this is an "open improvement" in Spark. Please see https://issues.apache.org/jira/browse/SPARK-3052. As syrza pointed out in the given link, the shutdown hooks are likely done in incorrect order when an executor failed which results in this message. I understand you will have to little more investigation to figure out the main cause of problem (i.e. why your executor failed). If it is a large shuffle, it might be an out-of-memory error which cause executor failure which then caused the Hadoop Filesystem to be closed in their shutdown hook. So, the RecordReaders in running tasks of that executor throw "java.io.IOException: Filesystem closed" exception. I guess it will be fixed in subsequent release and then you will get more helpful error message :)

Something calls DFSClient.close() or DFSClient.abort(), closing the client. The next file operation then results in the above exception.
I would try to figure out what calls close()/abort(). You could use a breakpoint in your debugger, or modify the Hadoop source code to throw an exception in these methods, so you would get a stack trace.

The exception about “file system closed” can be solved if the spark job is running on a cluster. You can set properties like spark.executor.cores , spark.driver.cores and spark.akka.threads to the maximum values w.r.t your resource availability. I had the same problem when my dataset was pretty large with JSON data about 20 million records. I fixed it with the above properties and it ran like a charm. In my case, I set those properties to 25,25 and 20 respectively. Hope it helps!!
Reference Link:
http://spark.apache.org/docs/latest/configuration.html

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Blocked Thread on Ignite Cluster - kubernetes

Related

Updating Java Heap size while deploying/installing keycloak using bitnami/keycloak

Drools 6.5 ConcurrentModificationException with LinkedHashMap Fact

MongoDB java driver - massive number of parked threads

Apache Camel blocking on multicast

Spark fails on big shuffle jobs with java.io.IOException: Filesystem closed

Categories

Resources