Spark job Killed: /tmp/spark-driver.log does not exist - pyspark

I am running a Spark job in Cloudera Data Science Workbench. Sometimes it runs okay, but sometimes it fails with this error:
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /tmp/spark-driver.log (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
Upon checking, the files exists:
cdsw#jw4l5ll7jj0l3bcy:~$ ls /tmp/spark-driver.log
/tmp/spark-driver.log
I've already looked at the Spark UI log and can't find any other error. This is the only error we found. Already desperate for answers. Any leads would be appreciated.
Thanks!

The error stack-trace states permission issues for the /tmp directory on your machines. You can refer to the following answer by Michail N. I hope it solves your issues.

Related

Osmosis throws strange Java error

I have been attempting to import a north-america-latest.osm.pbf (from Geofabrik) into a Postgres database for some time. After reviewing the wiki detailed usage page thoroughly, I set the database to include all necessary tables (pgSnapshot) via the included sql scripts. I also made sure that osmosis was functioning as intended by running a smaller file (Antarctica) through, and I got the results I expected. However, when I attempt to do the same process with the north america file, I get an error that is dissimilar to others that have been reported on the web. I am trying to get this data onto a server, uploads to my local seem to be fine.
Here's my code (via command prompt) :
C:\Users\eddie\Desktop>osmosis --read-pbf-fast north-america-latest.osm.pbf --log-progress interval=3000 --write-pgsql nodeLocationStoreType="TempFile" host=1*.8*.*.*0* database=osm postgresSchema=osm_updates user=eddie password=***
Here is the error message I get:
SEVERE: Thread for task 1-read-pbf-fast failed
org.springframework.dao.EmptyResultDataAccessException: Incorrect result
size: expected 1, actual 0
at org.springframework.dao.support.DataAccessUtils.requiredSingleResult(DataAccessUtils.java:71)
at org.springframework.jdbc.core.JdbcTemplate.queryForObject(JdbcTemplate.java:495)
at org.springframework.jdbc.core.JdbcTemplate.queryForObject(JdbcTemplate.java:500)
at org.openstreetmap.osmosis.pgsnapshot.common.SchemaVersionValidator.validateDBVersion(SchemaVersionValidator.java:64)
at org.openstreetmap.osmosis.pgsnapshot.common.SchemaVersionValidator.validateVersion(SchemaVersionValidator.java:47)
at org.openstreetmap.osmosis.pgsnapshot.v0_6.impl.CopyFilesetLoader.run(CopyFilesetLoader.java:77)
at org.openstreetmap.osmosis.pgsnapshot.v0_6.PostgreSqlCopyWriter.complete(PostgreSqlCopyWriter.java:117)
at org.openstreetmap.osmosis.core.progress.v0_6.EntityProgressLogger.complete(EntityProgressLogger.java:82)
at org.openstreetmap.osmosis.pbf2.v0_6.PbfReader.run(PbfReader.java:96)
at java.lang.Thread.run(Unknown Source)
Jul 19, 2018 8:28:24 AM org.openstreetmap.osmosis.core.Osmosis main
SEVERE: Execution aborted.
org.openstreetmap.osmosis.core.OsmosisRuntimeException: One or more tasks failed.
at org.openstreetmap.osmosis.core.pipeline.common.Pipeline.waitForCompletion(Pipeline.java:146)
at org.openstreetmap.osmosis.core.Osmosis.run(Osmosis.java:92)
at org.openstreetmap.osmosis.core.Osmosis.main(Osmosis.java:37)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchStandard(Launcher.java:330)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:238)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
at org.codehaus.classworlds.Launcher.main(Launcher.java:47)
I am running osmosis .46, Postgres/PostGis 10/2.4 on Windows 10 with 12GB of RAM and 2 Intel 2.4GHz processors.
UPDATE: the error now occurs even when I run smaller files. Additionally, osmosis behaves as if it is processing a larger file (reaches node 4614331685 for Antarctica) as seen through the progress logger messages. An upload of OSM data for Canada to my local went through without any issues, so the problem probably has to do with the server I am trying to connect to. If anyone has any clues as to how to decipher the error message though, I'd like to hear them!
I got osmosis to work by taking #mmd 's advice of turning off the schema validation. Even though I ran the pgsnapshot scripts and had been successful putting data there before, something about doing all of north america seemed to throw it off. I'll update this answer after subsequent database updates.

Liferay server in Spring Tool Suite (STS) gives permission denied error

Using liferay-ce-portal-7.0-ga3 on my Mac system El Capitan 10.11, I have setup my Liferay 7.x server in Spring Tool Suite. It has tomact 8 server included in it. I have included Liferay IDE in my STS and generated Liferay Plugin Project but when I am trying to run the application I am getting Permission denied exception, which is shown below. I have sudo credentials of my Mac system. How can I get rid of this Permission denied exception while trying to run my application? I have attached the screen shot of the error too.
java.util.logging.ErrorManager: 4
java.io.FileNotFoundException: /Users/remo/Projects/xnet/Development/tools/liferay-ce-portal-7.0-ga3/tomcat-8.0.32/logs/catalina.2016-12-08.log (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at org.apache.juli.FileHandler.openWriter(FileHandler.java:384)
at org.apache.juli.FileHandler.<init>(FileHandler.java:96)
at org.apache.juli.AsyncFileHandler.<init>(AsyncFileHandler.java:71)
at org.apache.juli.AsyncFileHandler.<init>(AsyncFileHandler.java:67)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at org.apache.juli.ClassLoaderLogManager.readConfiguration(ClassLoaderLogManager.java:562)
at org.apache.juli.ClassLoaderLogManager.readConfiguration(ClassLoaderLogManager.java:505)
at org.apache.juli.ClassLoaderLogManager.readConfiguration(ClassLoaderLogManager.java:309)
at java.util.logging.LogManager$3.run(LogManager.java:399)
at java.util.logging.LogManager$3.run(LogManager.java:396)
at java.security.AccessController.doPrivileged(Native Method)
at java.util.logging.LogManager.readPrimordialConfiguration(LogManager.java:396)
at java.util.logging.LogManager.access$800(LogManager.java:145)
at java.util.logging.LogManager$2.run(LogManager.java:345)
at java.security.AccessController.doPrivileged(Native Method)
at java.util.logging.LogManager.ensureLogManagerInitialized(LogManager.java:338)
at java.util.logging.LogManager.getLogManager(LogManager.java:378)
at java.util.logging.Logger.demandLogger(Logger.java:448)
at java.util.logging.Logger.getLogger(Logger.java:502)
at com.sun.jmx.remote.util.ClassLogger.<init>(ClassLogger.java:55)
at sun.management.jmxremote.ConnectorBootstrap.<clinit>(ConnectorBootstrap.java:846)
at sun.management.Agent.startAgent(Agent.java:257)
at sun.management.Agent.startAgent(Agent.java:447)
java.util.logging.ErrorManager: 4
java.io.FileNotFoundException: /Users/remo/Projects/xnet/Development/tools/liferay-ce-portal-7.0-ga3/tomcat-8.0.32/logs/localhost.2016-12-08.log (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
My best guess is that you've used your sudo power to start STS/Liferay/Tomcat as root once. Now your logfiles and OSGi state files are owned by root and can't be overwritten when you're starting the server as an unprivileged user.
As it's quite bad practice to run an internet facing server as root, I'd suggest to sudo chown -r remo /Users/remo/Projects/xnet/Development/tools/liferay-ce-portal-7.0-ga3/ (assuming that this is the chown syntax on MacOS). This command changes ownership of the files in /Users/remo/Projects/xnet/Development/tools/liferay-ce-portal-7.0-ga3/ to your user (remo) recursively (-r)
Starting the server from eclipse typically is done using your own user account and should work then.

Merge Operation Fails -gpload utility greenplum

We would like try to describe my problem below:
We have small gpdb cluster. In that,we are trying for Data integration using Talend tool.
We are trying to load the incremental from a table to another table, quite simple... I thought...
Job Data Flow is
tgreenplumconnection
|
tmssqlinput--->thdfsoutput-->tmap-->tgreenplumgpload--tgreenplumcommit
Getting error
Exception in thread "Thread-1" java.lang.RuntimeException: Cannot run program "gpload": CreateProcess error=2, The system cannot find the file specified
at bigdata.sormaster_stg0_copy_0_1.SorMaster_stg0_Copy$2.run(SorMaster_stg0_Copy.java:6425)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(ProcessImpl.java:386)
at java.lang.ProcessImpl.start(ProcessImpl.java:137)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
at java.lang.Runtime.exec(Runtime.java:620)
at java.lang.Runtime.exec(Runtime.java:528)
at bigdata.sormaster_stg0_copy_0_1.SorMaster_stg0_Copy$2.run(SorMaster_stg0_Copy.java:6413)

PriviledgedActionException while running kmeans on hadoop

I am trying to run KMeans on hadoop, using this guidelines.
http://www.slideshare.net/titusdamaiyanti/hadoop-installation-k-means-clustering-mapreduce?qid=44b5881c-089d-474b-b01d-c35a2f91cc67&v=qf1&b=&from_search=1#likes-panel
I am running this in eclipse-luna. when I executed, both map and reduce are showing they are complete 100%. But i am not getting output. Instead i am getting following error at the end. Please some help me to solve this..
15/03/20 11:29:44 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-hduser/mapred/staging/hduser378797276/.staging/job_local378797276_0002
15/03/20 11:29:44 ERROR security.UserGroupInformation: PriviledgedActionException as:hduser cause:java.io.IOException: No input paths specified in job
Exception in thread "main" java.io.IOException: No input paths specified in job
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:193)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1071)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at com.clustering.mapreduce.KMeansClusteringJob.main(KMeansClusteringJob.java:114)
You have to provide the input file location before running the map reduce program. There are two ways for providing the input file location.
Using ecliple go to run configration and provide the file name as arguments
Convert your program to jar file and run the below command inside your hadoop cluster
hadoop jar NameOfYourJarFile InputFileLocation OutPutFileLocation `

Spark fails on big shuffle jobs with java.io.IOException: Filesystem closed

I often find spark fails with large jobs with a rather unhelpful meaningless exception. The worker logs look normal, no errors, but they get state "KILLED". This is extremely common for large shuffles, so operations like .distinct.
The question is, how do I diagnose what's going wrong, and ideally, how do I fix it?
Given that a lot of these operations are monoidal I've been working around the problem by splitting the data into, say 10, chunks, running the app on each chunk, then running the app on all of the resulting outputs. In other words - meta-map-reduce.
14/06/04 12:56:09 ERROR client.AppClient$ClientActor: Master removed our application: FAILED; stopping client
14/06/04 12:56:09 WARN cluster.SparkDeploySchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection...
14/06/04 12:56:09 WARN scheduler.TaskSetManager: Loss was due to java.io.IOException
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:703)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:779)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:840)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:209)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:164)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:149)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257)
at scala.collection.AbstractIterator.toList(Iterator.scala:1157)
at $line5.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:13)
at $line5.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:13)
at org.apache.spark.rdd.RDD$$anonfun$1.apply(RDD.scala:450)
at org.apache.spark.rdd.RDD$$anonfun$1.apply(RDD.scala:450)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
at org.apache.spark.scheduler.Task.run(Task.scala:53)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
As of September 1st 2014, this is an "open improvement" in Spark. Please see https://issues.apache.org/jira/browse/SPARK-3052. As syrza pointed out in the given link, the shutdown hooks are likely done in incorrect order when an executor failed which results in this message. I understand you will have to little more investigation to figure out the main cause of problem (i.e. why your executor failed). If it is a large shuffle, it might be an out-of-memory error which cause executor failure which then caused the Hadoop Filesystem to be closed in their shutdown hook. So, the RecordReaders in running tasks of that executor throw "java.io.IOException: Filesystem closed" exception. I guess it will be fixed in subsequent release and then you will get more helpful error message :)
Something calls DFSClient.close() or DFSClient.abort(), closing the client. The next file operation then results in the above exception.
I would try to figure out what calls close()/abort(). You could use a breakpoint in your debugger, or modify the Hadoop source code to throw an exception in these methods, so you would get a stack trace.
The exception about “file system closed” can be solved if the spark job is running on a cluster. You can set properties like spark.executor.cores , spark.driver.cores and spark.akka.threads to the maximum values w.r.t your resource availability. I had the same problem when my dataset was pretty large with JSON data about 20 million records. I fixed it with the above properties and it ran like a charm. In my case, I set those properties to 25,25 and 20 respectively. Hope it helps!!
Reference Link:
http://spark.apache.org/docs/latest/configuration.html