Kubernetes java.io.IOException: Broken pipe error - kubernetes

I am trying deploy a pod to a kubernetes 11.2 cluster and seeing this error.
Weird part is that this workers fine on another cluster (different envs) and config is same exact. Only thing, I noticed is with this failing cluster, I felt like the nodes were little slow to login. But everything seems to be deploying correctly, except for this error.
There are no changes to the code or the config to deploy, what could be the reason for this to happen only on this particular cluster and not other envs. (It works on dev, test and pre-prod) doesn't work on prod, totally dazed I am and not sure if this is infrastructure issue possibly to the kubernetes config or the application needs to be able to handle this. another thing is that i don't see the nodes going down or any errors related to the memory or lack of resources, like disk pressure and such.
any advise would be highly appreciated.
2018-09-02 18:29:51.048 INFO 29 --- [-nio-443-exec-6] f.a.AutowiredAnnotationBeanPostProcessor : JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
2018-09-02 18:29:56.493 ERROR 29 --- [-nio-443-exec-6] c.t.s.est.server.ServletDispatcher : An unexpected error occured while processing a request on the following uri /.well-known/est/App Service/senroll
org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipe
...
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.8.0_171]
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[na:1.8.0_171]
...
2018-09-02 18:29:56.499 ERROR 29 --- [-nio-443-exec-6] o.a.c.c.C.[.[.[/].[servletDispatcher] : Servlet.service() for servlet [servletDispatcher] in context with path [] threw exception
java.lang.IllegalStateException: Cannot call sendError() after the response has been committed
at org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:472) ~[tomcat-embed-core-8.5.16.jar!/:8.5.16]
at com.trilliantnetworks.security.est.server.ServletDispatcher.unexpectedError(ServletDispatcher.java:230) ~[est-servlet-1.0.1-SNAPSHOT.jar!/:na]
at com.trilliantnetworks.security.est.server.ServletDispatcher.doPost(ServletDispatcher.java:211) ~[est-servlet-1.0.1-SNAPSHOT.jar!/:na]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:661) ~[tomcat-embed-core-8.5.16.jar!/:8.5.16]
...
...
2018-09-02 18:29:56.527 ERROR 29 --- [-nio-443-exec-6] o.a.c.c.C.[Tomcat].[localhost] : Exception Processing ErrorPage[errorCode=0, location=/error]
org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipe
at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:321) ~[tomcat-embed-core-8.5.16.jar!/:8.5.16]
at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:284) ~[tomcat-embed-core-8.5.16.jar!/:8.5.16]
The nodes are fine. They are not going down or anything. I also looked at the events to see if any errors related to memory show up, but unfortunately nothing.
So, I am seeing this error in one of the pods, which installs some things to the database. This is the first step and should not fail
Mainly this is happening, the read timeout. So, I am wondering if there is some kind of timeout that  I can set in the cluster, to wait for the API response, little longer.
2018-09-02 18:33:35.818 INFO 29 --- [ main] c.t.s.c.i.r.impl.ExternalSignerService : Exception while generatePermanentKeyStore:java.net.SocketTimeoutException: Read timed out

Related

Service weblogic.server.ServerLifeCycle Service was started at level 9 but it has a run level of 10

I have little knowledge in WebLogic and the demand arose to upload the application on a new server.
When I upload the WebLogic12c adminServer (startAdminServer.sh), the following error occurs:
Caused By: java.lang.IllegalStateException: Service weblogic.server.ServerLifeCycleService was started at level 9 but it has a run level of 10. The full descriptor is SystemDescriptor(
implementation=weblogic.server.ServerLifeCycleService
name=ServerLifeCycleService
contracts={weblogic.server.ServerLifeCycleService,weblogic.server.ServerService}
scope=org.glassfish.hk2.runlevel.RunLevel
qualifiers={javax.inject.Named}
descriptorType=CLASS
descriptorVisibility=NORMAL
metadata=runLevelValue={10}
rank=0
loader=HK2LoaderImpl(weblogic.utils.classloaders.GenericClassLoader#1623b78d finder: weblogic.utils.classloaders.CodeGenClassFinder#4afb17c1 annotation: )
proxiable=null
proxyForSameScope=null
analysisName=null
id=158
locatorId=0
identityHashCode=1248382004
reified=true)
at org.glassfish.hk2.runlevel.internal.AsyncRunLevelContext.validate(AsyncRunLevelContext.java:446)
at org.glassfish.hk2.runlevel.internal.AsyncRunLevelContext.findOrCreate(AsyncRunLevelContext.java:299)
at org.glassfish.hk2.runlevel.RunLevelContext.findOrCreate(RunLevelContext.java:85)
at org.jvnet.hk2.internal.Utilities.createService(Utilities.java:2126)
at org.jvnet.hk2.internal.ServiceLocatorImpl.internalGetService(ServiceLocatorImpl.java:777)
Truncated. see log file for complete stacktrace
Can you help me?

How can i solve java.lang.IllegalStateException: Illegal access:?

I am using Eclipse to develope a webapp. Everything was working fine, but sometimes when I tried to publist->start my server, I received the following error:
Oct 03, 2019 6:52:55 PM org.apache.catalina.loader.WebappClassLoaderBase checkStateForResourceLoading
INFO: Illegal access: this web application instance has been stopped already. Could not load []. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load []. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1383)
at org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:1036)
at com.mysql.cj.jdbc.AbandonedConnectionCleanupThread.checkThreadContextClassLoader(AbandonedConnectionCleanupThread.java:117)
at com.mysql.cj.jdbc.AbandonedConnectionCleanupThread.run(AbandonedConnectionCleanupThread.java:84)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
For the first few times, I just restarted the server, and it was working fine, but now even a restart doesent solve the problem. I read about it online, and some sources mentioned that it might be a thread management issue, but I couldn't get further than that. Can anybody please provide some info? Thanks.
The mysql driver is trying to clean-up on web application shutdown which is good. Unfortunately it is trying to load additional classes to perform this clean-up and - since the web application is shutting down - class loading is no longer available. Ideally, the driver would load all the classes it needed to clean-up when it firsts loads - that would avoid this issue.
If you can figure out which classes it is trying to load - a check of the driver source code should show you which - then you should be able to load them yourself on application start in, for example, a ServletContextListener.
If you created any Entity class and forgot to give ID to any specific column then also this problem occurs. So assign #ID to any column and run again. your issue will resolve.

Spark fails on big shuffle jobs with java.io.IOException: Filesystem closed

I often find spark fails with large jobs with a rather unhelpful meaningless exception. The worker logs look normal, no errors, but they get state "KILLED". This is extremely common for large shuffles, so operations like .distinct.
The question is, how do I diagnose what's going wrong, and ideally, how do I fix it?
Given that a lot of these operations are monoidal I've been working around the problem by splitting the data into, say 10, chunks, running the app on each chunk, then running the app on all of the resulting outputs. In other words - meta-map-reduce.
14/06/04 12:56:09 ERROR client.AppClient$ClientActor: Master removed our application: FAILED; stopping client
14/06/04 12:56:09 WARN cluster.SparkDeploySchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection...
14/06/04 12:56:09 WARN scheduler.TaskSetManager: Loss was due to java.io.IOException
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:703)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:779)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:840)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:209)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:164)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:149)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257)
at scala.collection.AbstractIterator.toList(Iterator.scala:1157)
at $line5.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:13)
at $line5.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:13)
at org.apache.spark.rdd.RDD$$anonfun$1.apply(RDD.scala:450)
at org.apache.spark.rdd.RDD$$anonfun$1.apply(RDD.scala:450)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
at org.apache.spark.scheduler.Task.run(Task.scala:53)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
As of September 1st 2014, this is an "open improvement" in Spark. Please see https://issues.apache.org/jira/browse/SPARK-3052. As syrza pointed out in the given link, the shutdown hooks are likely done in incorrect order when an executor failed which results in this message. I understand you will have to little more investigation to figure out the main cause of problem (i.e. why your executor failed). If it is a large shuffle, it might be an out-of-memory error which cause executor failure which then caused the Hadoop Filesystem to be closed in their shutdown hook. So, the RecordReaders in running tasks of that executor throw "java.io.IOException: Filesystem closed" exception. I guess it will be fixed in subsequent release and then you will get more helpful error message :)
Something calls DFSClient.close() or DFSClient.abort(), closing the client. The next file operation then results in the above exception.
I would try to figure out what calls close()/abort(). You could use a breakpoint in your debugger, or modify the Hadoop source code to throw an exception in these methods, so you would get a stack trace.
The exception about “file system closed” can be solved if the spark job is running on a cluster. You can set properties like spark.executor.cores , spark.driver.cores and spark.akka.threads to the maximum values w.r.t your resource availability. I had the same problem when my dataset was pretty large with JSON data about 20 million records. I fixed it with the above properties and it ran like a charm. In my case, I set those properties to 25,25 and 20 respectively. Hope it helps!!
Reference Link:
http://spark.apache.org/docs/latest/configuration.html

org.neo4j.kernel.lifecycle.LifecycleException in Neo4j

I've been using neo4j 1.9 RC1 for the past two months. Yesterday, after an eclipse crash I started having this this exception:
Exception in thread "main" java.lang.RuntimeException: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.impl.transaction.TxManager#bf5743' was successfully initialized, but failed to start. Please see attached cause exception.
at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:282)
at org.neo4j.kernel.EmbeddedGraphDatabase.<init>(EmbeddedGraphDatabase.java:90)
at org.neo4j.kernel.EmbeddedGraphDatabase.<init>(EmbeddedGraphDatabase.java:75)
at org.neo4j.kernel.EmbeddedGraphDatabase.<init>(EmbeddedGraphDatabase.java:60)
at fr.inria.atlanmod.neo4emf.drivers.impl.PersistenceService.<init>(PersistenceService.java:44)
at fr.inria.atlanmod.neo4emf.drivers.impl.PersistenceServiceFactory.createPersistenceService(PersistenceServiceFactory.java:27)
at fr.inria.atlanmod.neo4emf.drivers.impl.PersistenceManager.<init>(PersistenceManager.java:80)
at fr.inria.atlanmod.neo4emf.impl.Neo4emfResource.<init>(Neo4emfResource.java:58)
at fr.inria.atlanmod.neo4emf.impl.Neo4emfResourceFactory.createResource(Neo4emfResourceFactory.java:58)
at main.JDTASTMain.main(JDTASTMain.java:35)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.impl.transaction.TxManager#bf5743' was successfully initialized, but failed to start. Please see attached cause exception.
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:497)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:104)
at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:260)
... 9 more
Caused by: org.neo4j.graphdb.TransactionFailureException: Unable to start TM, no active tx log file found but found either tm_tx_log.1 or tm_tx_log.2 file, please set one of them as active or remove them.
at org.neo4j.kernel.impl.transaction.TxManager.openLog(TxManager.java:738)
at org.neo4j.kernel.impl.transaction.TxManager.start(TxManager.java:138)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:491)
... 11 more
I am running it with Java 1.7. Any ideas?
It seems that your data directory is corrupt. There is already a suggestion printed to the log to fix that issue:
Unable to start TM, no active tx log file found but found either
tm_tx_log.1 or tm_tx_log.2 file, please set one of them as active or
remove them.

Jboss Startup error

On an existing application I am trying to startup the JBOSS server and I get following error.
Unfortunately I do not know clearly as yet, what all has been configured and is being used on this jboss. For now; I quickly want to get the error sorted out.
If looking at the stack trace, you can suggest what kind of configuration is missing or what I should be looking at to fix this problem; then I'll be really grateful.
any help is appreciated.
------------------After Skaffman's help below; I was able to reduce the stacktrace to this------------------
07:36:36,971 ERROR [URLDeploymentScanner] Incomplete Deployment listing:
--- MBeans waiting for other MBeans ---
ObjectName: xyz.management:service=Queue,name=managementQueue
State: CONFIGURED
I Depend On:
jboss.messaging:service=ServerPeer
ObjectName: xyz.management:service=Queue,name=indexQueue
State: CONFIGURED
I Depend On:
jboss.messaging:service=ServerPeer
ObjectName: xyz.management:service=Queue,name=adaptiveLearningQueue
State: CONFIGURED
I Depend On:
jboss.messaging:service=ServerPeer
ObjectName: xyz.management:service=Queue,name=xyzErrorQueue
State: CONFIGURED
I Depend On:
jboss.messaging:service=ServerPeer
--- MBEANS THAT ARE THE ROOT CAUSE OF THE PROBLEM ---
ObjectName: jboss.messaging:service=ServerPeer
State: NOTYETINSTALLED
Depends On Me:
xyz.management:service=Queue,name=managementQueue
xyz.management:service=Queue,name=indexQueue
xyz.management:service=Queue,name=adaptiveLearningQueue
xyz.management:service=Queue,name=xyzErrorQueue
javax.naming.NameNotFoundException: XAConnectionFactory not bound
at org.jnp.server.NamingServer.getBinding(NamingServer.java:529)
at org.jnp.server.NamingServer.getBinding(NamingServer.java:537)
at org.jnp.server.NamingServer.getObject(NamingServer.java:543)
at org.jnp.server.NamingServer.lookup(NamingServer.java:296)
at org.jnp.interfaces.NamingContext.lookup(NamingContext.java:627)
at org.jnp.interfaces.NamingContext.lookup(NamingContext.java:589)
at javax.naming.InitialContext.lookup(InitialContext.java:392)
at org.jboss.ejb.plugins.jms.DLQHandler.createService(DLQHandler.java:182)
at org.jboss.system.ServiceMBeanSupport.jbossInternalCreate(ServiceMBeanSupport.java:260)
at org.jboss.system.ServiceMBeanSupport.create(ServiceMBeanSupport.java:188)
at org.jboss.ejb.plugins.jms.JMSContainerInvoker.innerStartDelivery(JMSContainerInvoker.java:665)
at org.jboss.ejb.plugins.jms.JMSContainerInvoker$ExceptionListenerImpl$ExceptionListenerRunnable.run(JMSContainerInvoker.java:1594)
at java.lang.Thread.run(Thread.java:619)
I think you have multiple issues here.
JBoss is trying to deploy the file velocity.log which it has found in its deploy directory. This is clearly not what you intended, but realise that any files that get dropped into the deploy directory , JBoss will try and deploy. You need to find out what's doing that, and stop it.
The second problem is that you have a bunch of JMS deployments (e.g. MDBs) somewhere in your application, but JMS is not present (or has not been configured correctly) on this server.