Spark fails on big shuffle jobs with java.io.IOException: Filesystem closed - scala

I often find spark fails with large jobs with a rather unhelpful meaningless exception. The worker logs look normal, no errors, but they get state "KILLED". This is extremely common for large shuffles, so operations like .distinct.
The question is, how do I diagnose what's going wrong, and ideally, how do I fix it?
Given that a lot of these operations are monoidal I've been working around the problem by splitting the data into, say 10, chunks, running the app on each chunk, then running the app on all of the resulting outputs. In other words - meta-map-reduce.
14/06/04 12:56:09 ERROR client.AppClient$ClientActor: Master removed our application: FAILED; stopping client
14/06/04 12:56:09 WARN cluster.SparkDeploySchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection...
14/06/04 12:56:09 WARN scheduler.TaskSetManager: Loss was due to java.io.IOException
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:703)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:779)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:840)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:209)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:164)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:149)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257)
at scala.collection.AbstractIterator.toList(Iterator.scala:1157)
at $line5.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:13)
at $line5.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:13)
at org.apache.spark.rdd.RDD$$anonfun$1.apply(RDD.scala:450)
at org.apache.spark.rdd.RDD$$anonfun$1.apply(RDD.scala:450)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
at org.apache.spark.scheduler.Task.run(Task.scala:53)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

As of September 1st 2014, this is an "open improvement" in Spark. Please see https://issues.apache.org/jira/browse/SPARK-3052. As syrza pointed out in the given link, the shutdown hooks are likely done in incorrect order when an executor failed which results in this message. I understand you will have to little more investigation to figure out the main cause of problem (i.e. why your executor failed). If it is a large shuffle, it might be an out-of-memory error which cause executor failure which then caused the Hadoop Filesystem to be closed in their shutdown hook. So, the RecordReaders in running tasks of that executor throw "java.io.IOException: Filesystem closed" exception. I guess it will be fixed in subsequent release and then you will get more helpful error message :)

Something calls DFSClient.close() or DFSClient.abort(), closing the client. The next file operation then results in the above exception.
I would try to figure out what calls close()/abort(). You could use a breakpoint in your debugger, or modify the Hadoop source code to throw an exception in these methods, so you would get a stack trace.

The exception about “file system closed” can be solved if the spark job is running on a cluster. You can set properties like spark.executor.cores , spark.driver.cores and spark.akka.threads to the maximum values w.r.t your resource availability. I had the same problem when my dataset was pretty large with JSON data about 20 million records. I fixed it with the above properties and it ran like a charm. In my case, I set those properties to 25,25 and 20 respectively. Hope it helps!!
Reference Link:
http://spark.apache.org/docs/latest/configuration.html

Related

Spark Structured Streaming recovering from a query exception

Is it possible to recover automatically from an exception thrown during query execution?
Context: I'm developing a Spark application that reads data from a Kafka topic, processes the data, and outputs to S3. However, after running for a couple of days in production, the spark application faces some network hiccups from S3 that causes an exception to be thrown and stops the application. It's also worth mentioning that this application runs on Kubernetes using GCP's Spark k8s Operator.
From what I've seen so far, these exceptions are minor and a simple restart of the application solves the issue. Can we handle those exceptions and restart the structured streaming query automatically?
Here's an example of a thrown exception:
Exception in thread "main" org.apache.spark.sql.streaming.StreamingQueryException: Job aborted.
=== Streaming Query ===
Identifier: ...
Current Committed Offsets: ...
Current Available Offsets: ...
Current State: ACTIVE
Thread State: RUNNABLE
Logical Plan: ...
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:297)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
Caused by: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at io.blahblahView$$anonfun$11$$anonfun$apply$2.apply(View.scala:90)
at io.blahblahView $$anonfun$11$$anonfun$apply$2.apply(View.scala:82)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at io.blahblahView$$anonfun$11.apply(View.scala:82)
at io.blahblahView$$anonfun$11.apply(View.scala:79)
at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:35)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$17.apply(MicroBatchExecution.scala:537)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5.apply(MicroBatchExecution.scala:535)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:534)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281)
... 1 more
Caused by: java.io.FileNotFoundException: No such file or directory: s3a://.../view/v1/_temporary/0
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:993)
at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:734)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:291)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:361)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334)
at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:48)
at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:166)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:187)
... 47 more
What's the simplest way of taking care of such issues automatically?
After spending too many hours trying to find an elegant fix to this issue, and not finding anything, here's what I came up with.
Some might say it's a hack, but it's simple, it works and solves a complex problem. I tested it in production and it solves the issue of recovering automatically from failure due to an occasional minor exception.
I call it a The Query Watchdog. Here's the simplest version where the watchdog will retry running the query indefinitely:
val writer = df.writeStream...
while (true) {
val query = writer.start()
try {
query.awaitTermination()
}
catch {
case e: StreamingQueryException => println("Streaming Query Exception caught!: " + e);
}
}
Some people might want to replace the while(true) with some kind of counter to limit the number of retries. Someone could also supplement this code and send notifications through slack or email whenever a retry happened. Others could simply collect the number of retries in Prometheus.
Hope it helps,
Cheers
Since you are using the Spark Operator, why not using its restart functionality? If the controller notices that the application has stopped then it will re-submit it automatically.
This will work assuming that the application fails, i.e. the driver pod stops. There are some cases where a driver exception is thrown but the driver pod keeps running without doing anything. In that case the Spark Operator will think that the application is still running ok.
No, there is not in a reliable way to do this. BTW, No is also an answer.
Logic for checking exceptions are generally via try / catch running on the driver.
As unexpected situations at Executor level are already standardly handled by the Spark Framework itself for Structured Streaming, and if the error is non-recoverable, then the App / Job simply crashes after signalling of error(s) back to the driver unless you code try / catch within the various foreachXXX constructs.
That said, it is not clear for the foreachXXX constructs that the micro batch will be recoverable in such an approach afaics, some part of the microbatch is highly likely lost. Hard to test though.
Given that Spark has things standardly catered for that you cannot hook into, why would it be possible to insert a loop or try/catch in the source of the program? Likewise broadcast variables area an issue - although some have techniques around this so they say. But it is not in the spirit of the framework.
So, good question as I wonder(ed) about this (in the past).
Depending on you Spark runtime and environment, an alternative recommended for example in Databricks documentation is to simply let the streaming queries fail so that the retries can be handled at Spark job level.
One of the benefits of this is that it decouples the retry policy and related email notifications from your application.

Kubernetes java.io.IOException: Broken pipe error

I am trying deploy a pod to a kubernetes 11.2 cluster and seeing this error.
Weird part is that this workers fine on another cluster (different envs) and config is same exact. Only thing, I noticed is with this failing cluster, I felt like the nodes were little slow to login. But everything seems to be deploying correctly, except for this error.
There are no changes to the code or the config to deploy, what could be the reason for this to happen only on this particular cluster and not other envs. (It works on dev, test and pre-prod) doesn't work on prod, totally dazed I am and not sure if this is infrastructure issue possibly to the kubernetes config or the application needs to be able to handle this. another thing is that i don't see the nodes going down or any errors related to the memory or lack of resources, like disk pressure and such.
any advise would be highly appreciated.
2018-09-02 18:29:51.048 INFO 29 --- [-nio-443-exec-6] f.a.AutowiredAnnotationBeanPostProcessor : JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
2018-09-02 18:29:56.493 ERROR 29 --- [-nio-443-exec-6] c.t.s.est.server.ServletDispatcher : An unexpected error occured while processing a request on the following uri /.well-known/est/App Service/senroll
org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipe
...
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.8.0_171]
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[na:1.8.0_171]
...
2018-09-02 18:29:56.499 ERROR 29 --- [-nio-443-exec-6] o.a.c.c.C.[.[.[/].[servletDispatcher] : Servlet.service() for servlet [servletDispatcher] in context with path [] threw exception
java.lang.IllegalStateException: Cannot call sendError() after the response has been committed
at org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:472) ~[tomcat-embed-core-8.5.16.jar!/:8.5.16]
at com.trilliantnetworks.security.est.server.ServletDispatcher.unexpectedError(ServletDispatcher.java:230) ~[est-servlet-1.0.1-SNAPSHOT.jar!/:na]
at com.trilliantnetworks.security.est.server.ServletDispatcher.doPost(ServletDispatcher.java:211) ~[est-servlet-1.0.1-SNAPSHOT.jar!/:na]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:661) ~[tomcat-embed-core-8.5.16.jar!/:8.5.16]
...
...
2018-09-02 18:29:56.527 ERROR 29 --- [-nio-443-exec-6] o.a.c.c.C.[Tomcat].[localhost] : Exception Processing ErrorPage[errorCode=0, location=/error]
org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipe
at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:321) ~[tomcat-embed-core-8.5.16.jar!/:8.5.16]
at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:284) ~[tomcat-embed-core-8.5.16.jar!/:8.5.16]
The nodes are fine. They are not going down or anything. I also looked at the events to see if any errors related to memory show up, but unfortunately nothing.
So, I am seeing this error in one of the pods, which installs some things to the database. This is the first step and should not fail
Mainly this is happening, the read timeout. So, I am wondering if there is some kind of timeout that  I can set in the cluster, to wait for the API response, little longer.
2018-09-02 18:33:35.818 INFO 29 --- [ main] c.t.s.c.i.r.impl.ExternalSignerService : Exception while generatePermanentKeyStore:java.net.SocketTimeoutException: Read timed out

Osmosis throws strange Java error

I have been attempting to import a north-america-latest.osm.pbf (from Geofabrik) into a Postgres database for some time. After reviewing the wiki detailed usage page thoroughly, I set the database to include all necessary tables (pgSnapshot) via the included sql scripts. I also made sure that osmosis was functioning as intended by running a smaller file (Antarctica) through, and I got the results I expected. However, when I attempt to do the same process with the north america file, I get an error that is dissimilar to others that have been reported on the web. I am trying to get this data onto a server, uploads to my local seem to be fine.
Here's my code (via command prompt) :
C:\Users\eddie\Desktop>osmosis --read-pbf-fast north-america-latest.osm.pbf --log-progress interval=3000 --write-pgsql nodeLocationStoreType="TempFile" host=1*.8*.*.*0* database=osm postgresSchema=osm_updates user=eddie password=***
Here is the error message I get:
SEVERE: Thread for task 1-read-pbf-fast failed
org.springframework.dao.EmptyResultDataAccessException: Incorrect result
size: expected 1, actual 0
at org.springframework.dao.support.DataAccessUtils.requiredSingleResult(DataAccessUtils.java:71)
at org.springframework.jdbc.core.JdbcTemplate.queryForObject(JdbcTemplate.java:495)
at org.springframework.jdbc.core.JdbcTemplate.queryForObject(JdbcTemplate.java:500)
at org.openstreetmap.osmosis.pgsnapshot.common.SchemaVersionValidator.validateDBVersion(SchemaVersionValidator.java:64)
at org.openstreetmap.osmosis.pgsnapshot.common.SchemaVersionValidator.validateVersion(SchemaVersionValidator.java:47)
at org.openstreetmap.osmosis.pgsnapshot.v0_6.impl.CopyFilesetLoader.run(CopyFilesetLoader.java:77)
at org.openstreetmap.osmosis.pgsnapshot.v0_6.PostgreSqlCopyWriter.complete(PostgreSqlCopyWriter.java:117)
at org.openstreetmap.osmosis.core.progress.v0_6.EntityProgressLogger.complete(EntityProgressLogger.java:82)
at org.openstreetmap.osmosis.pbf2.v0_6.PbfReader.run(PbfReader.java:96)
at java.lang.Thread.run(Unknown Source)
Jul 19, 2018 8:28:24 AM org.openstreetmap.osmosis.core.Osmosis main
SEVERE: Execution aborted.
org.openstreetmap.osmosis.core.OsmosisRuntimeException: One or more tasks failed.
at org.openstreetmap.osmosis.core.pipeline.common.Pipeline.waitForCompletion(Pipeline.java:146)
at org.openstreetmap.osmosis.core.Osmosis.run(Osmosis.java:92)
at org.openstreetmap.osmosis.core.Osmosis.main(Osmosis.java:37)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchStandard(Launcher.java:330)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:238)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
at org.codehaus.classworlds.Launcher.main(Launcher.java:47)
I am running osmosis .46, Postgres/PostGis 10/2.4 on Windows 10 with 12GB of RAM and 2 Intel 2.4GHz processors.
UPDATE: the error now occurs even when I run smaller files. Additionally, osmosis behaves as if it is processing a larger file (reaches node 4614331685 for Antarctica) as seen through the progress logger messages. An upload of OSM data for Canada to my local went through without any issues, so the problem probably has to do with the server I am trying to connect to. If anyone has any clues as to how to decipher the error message though, I'd like to hear them!
I got osmosis to work by taking #mmd 's advice of turning off the schema validation. Even though I ran the pgsnapshot scripts and had been successful putting data there before, something about doing all of north america seemed to throw it off. I'll update this answer after subsequent database updates.

RESTEasy Exception: RESTEASY003770: Response is committed, can't handle exception

I'm getting a strange exception from my RESTEasy server. This happens when a particularly large response is being returned to the client.
The exception occurs after my code has returned the result object to RESTEasy, so it's all inside the RESTEasy layer. The server is TomCat.
Small responses are fine, but large ones are triggering the error. This happens if I return JSON or XML, and is triggered by size of the response, not the content in the objects I'm returning.
I've searched around, but haven't found anything helpful yet, and am pretty much lost at this point....
org.jboss.resteasy.spi.UnhandledException: RESTEASY003770: Response is committed, can't handle exception
at org.jboss.resteasy.core.SynchronousDispatcher.writeException(SynchronousDispatcher.java:174)
at org.jboss.resteasy.core.SynchronousDispatcher.writeResponse(SynchronousDispatcher.java:478)
at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:422)
at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:209)
at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:221)
at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:56)
at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:51)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:292)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:616)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:522)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1095)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:672)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1502)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1458)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipe
at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:393)
at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:426)
at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:339)
at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:418)
at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:406)
at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:97)
at org.jboss.resteasy.plugins.server.servlet.HttpServletResponseWrapper$DeferredOutputStream.write(HttpServletResponseWrapper.java:46)
at org.jboss.resteasy.util.CommitHeaderOutputStream.write(CommitHeaderOutputStream.java:71)
at org.jboss.resteasy.util.DelegatingOutputStream.write(DelegatingOutputStream.java:48)
at com.fasterxml.jackson.core.json.UTF8JsonGenerator._flushBuffer(UTF8JsonGenerator.java:2003)
at com.fasterxml.jackson.core.json.UTF8JsonGenerator.close(UTF8JsonGenerator.java:1049)
at org.jboss.resteasy.plugins.providers.jackson.ResteasyJackson2Provider.writeTo(ResteasyJackson2Provider.java:209)
at org.jboss.resteasy.core.interception.AbstractWriterInterceptorContext.writeTo(AbstractWriterInterceptorContext.java:131)
at org.jboss.resteasy.core.interception.ServerWriterInterceptorContext.writeTo(ServerWriterInterceptorContext.java:60)
at org.jboss.resteasy.core.interception.AbstractWriterInterceptorContext.proceed(AbstractWriterInterceptorContext.java:120)
at org.jboss.resteasy.plugins.interceptors.encoding.GZIPEncodingInterceptor.aroundWriteTo(GZIPEncodingInterceptor.java:100)
at org.jboss.resteasy.core.interception.AbstractWriterInterceptorContext.proceed(AbstractWriterInterceptorContext.java:124)
at org.jboss.resteasy.plugins.interceptors.encoding.ServerContentEncodingAnnotationFilter.aroundWriteTo(ServerContentEncodingAnnotationFilter.java:60)
at org.jboss.resteasy.core.interception.AbstractWriterInterceptorContext.proceed(AbstractWriterInterceptorContext.java:124)
at org.jboss.resteasy.core.ServerResponseWriter.writeNomapResponse(ServerResponseWriter.java:98)
at org.jboss.resteasy.core.SynchronousDispatcher.writeResponse(SynchronousDispatcher.java:473)
... 27 more
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at org.apache.tomcat.util.net.NioChannel.write(NioChannel.java:124)
at org.apache.tomcat.util.net.NioBlockingSelector.write(NioBlockingSelector.java:101)
at org.apache.tomcat.util.net.NioSelectorPool.write(NioSelectorPool.java:172)
at org.apache.coyote.http11.InternalNioOutputBuffer.writeToSocket(InternalNioOutputBuffer.java:139)
at org.apache.coyote.http11.InternalNioOutputBuffer.flushBuffer(InternalNioOutputBuffer.java:244)
at org.apache.coyote.http11.InternalNioOutputBuffer.addToBB(InternalNioOutputBuffer.java:189)
at org.apache.coyote.http11.InternalNioOutputBuffer.access$000(InternalNioOutputBuffer.java:41)
at org.apache.coyote.http11.InternalNioOutputBuffer$SocketOutputBuffer.doWrite(InternalNioOutputBuffer.java:320)
at org.apache.coyote.http11.filters.IdentityOutputFilter.doWrite(IdentityOutputFilter.java:93)
at org.apache.coyote.http11.AbstractOutputBuffer.doWrite(AbstractOutputBuffer.java:256)
at org.apache.coyote.Response.doWrite(Response.java:501)
at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:388)
... 47 more
Update: I actually had this happen again! Scott's answer below covered the first case, but the second time was due to inserting a null value for a response header. RESTEasy accepted the null header value at the time I inserted it, but then blew up when it tried to serialize the response after exiting from my code.
Ok, having been through this now a couple of times - I've personally seen this behaviour when there is an issue with the front end reverse proxy if you are using one. For example, I've seen issues when using nginx where the system was deployed and run with different users. A permission issue on temporary directories means you might not see the issue until the response hits a certain size. For example (it's one line, the line breaks are for visibility):
2017-04-20T17:46:40.03987 [nginx] 2017/04/20 13:46:40 [crit] 64537#0:
*14195 open() "/usr/local/var/run/nginx/proxy_temp/6/43/0000000436"
failed (13: Permission denied) while reading upstream, client: 10.0.0.1,
server: foo.bar.com, request: "GET /this/awesome/endpoint HTTP/1.1",
upstream: "http://127.0.0.1:9999/this/awesome/endpoint", host: "foo.bar.com"
Caused by: org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipe
It seems client is closing the connection before response is send completely to it ,can you increase client timeout and then check

Spring MongoDB TCP connection ends then does a query

I am having a very strange problem with Spring 3.1 and MongoDB. I am using the Spring Mongo libs. This is a pretty simple call just pulls some data and inserts some new data. But as it is in the loop when pulling new data the connection is killed. From a capture I see it pull data insert data, then do another query but after a two second delay it sends a FIN packet to the server. the server then a little bit later tries to send data back to the client, but after that the client sends RSTs as it has already killed the connection.
Please let me know what other information I can provide to help understand this issue. I can provide a download link for the a capture file as well.
Thank you for your help
Trace output
org.springframework.dao.DataAccessResourceFailureException: can't call something : id561la.ytel.com/172.31.214.55:27017/cdrstat; nested exception is com.mongodb.MongoException$Network: can't call something : id561la.ytel.com/172.31.214.55:27017/cdrstat
at org.springframework.data.mongodb.core.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:56)
at org.springframework.data.mongodb.core.MongoTemplate.potentiallyConvertRuntimeException(MongoTemplate.java:1665)
at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1548)
at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1336)
at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1322)
at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:495)
at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:486)
at dao.BaseMongoSQLDAO.queryForList(BaseMongoSQLDAO.java:35)
at dao.PrefixStatSqlMapDAO.list(PrefixStatSqlMapDAO.java:59)
at services.CDRReaderService.getPrefixStat(CDRReaderService.java:705)
at services.CDRReaderService.processFile(CDRReaderService.java:659)
at services.CDRReaderService.access$100(CDRReaderService.java:42)
at services.CDRReaderService$1.run(CDRReaderService.java:150)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)
Caused by: com.mongodb.MongoException$Network: can't call something : id561la.ytel.com/172.31.214.55:27017/cdrstat
at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:295)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:257)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:310)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:295)
at com.mongodb.DBCursor._check(DBCursor.java:368)
at com.mongodb.DBCursor._hasNext(DBCursor.java:459)
at com.mongodb.DBCursor.hasNext(DBCursor.java:484)
at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1534)
... 13 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:146)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at org.bson.io.Bits.readFully(Bits.java:46)
at org.bson.io.Bits.readFully(Bits.java:33)
at org.bson.io.Bits.readFully(Bits.java:28)
at com.mongodb.Response.<init>(Response.java:40)
at com.mongodb.DBPort.go(DBPort.java:124)
at com.mongodb.DBPort.call(DBPort.java:74)
at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:286)
... 20 more