The root scratch dir: /tmp/hive on HDFS should be writable

The root scratch dir: /tmp/hive on HDFS should be writable - scala

I am trying to run my spark Scala code on eclipse in Windows 10. While trying to read my CSV file I am getting the error:
The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
I have gone through various similar questions on the platform but nothing worked for me.
Things I have tried so far:
1- Tried winutils for Spark 2.3.1(Version of Spark I have to use)
2- gave permissions using:
C:\hadoop\bin\winutils.exe chmod 777 C:\tmp\hive
Is /tmp/hive same as C:\tmp\hive? is there any fix I can try programatically in Spark and Scala to use a differenent location while creating the spark Session?

Related

Unable to find file under resources folder

When I run my app in sbt console my-app/run (my-app is a module), I get this error
java.io.FileNotFoundException: file:.../target/bg-jobs/sbt_eea980c4/job-3/target/aaa32a5e/b2227e4f/my-app_2.13-0.1.0-SNAPSHOT.jar!/my_file.csv
My directory structure is my-app->src->main->resources->my_file.csv. The way I am accessing the file in my code is:
val file = new File(getClass.getResource("/my_file.csv").getPath)
What am I missing here?

sbt stuck your resources in a jar, so you can't access them as a file. You can use getClass().getResourceAsStream("/my_file.csv") instead.

Reading a basic file from resources directory with spark submit

I have no clue why in the world this would be causing me this much grief but it is. How can I just very simply grab my txt file from uber jar packaged resources directory when I run spark-submit and pass it to a spark.read? Yes. IDE is simple and works. But running with spark-submit is plaguing me with the good old fashioned:
Path does not exist: file:/opt/spark/jars/<myjar>.jar!/datasets/mllib/sample_kmeans_data.txt
My folder structure is very standard:
src
main
resources
sample_kmeans_data.txt
My very vanilla loader:
val kmeansData =
getClass.getClassLoader.getResource("datasets/mllib/sample_kmeans_data.txt").getPath
val dataset: DataFrame = spark.read
.format("libsvm")
.load(kmeansData)
dataset.show
I have also confirmed datasets folder at root level after extracting jar and I've tried many different versions of classLoader, all leading to same error. Lastly, reading the file as a stream or input buffer without spark works fine and can clearly get to the file from jar with spark submit. I'm getting tripped up by what the loader in spark needs as just an input path from the jar.

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. spark Eclipse on windows 7

I'm not able to run a simple spark job in Scala IDE (Maven spark project) installed on Windows 7
Spark core dependency has been added.
val conf = new SparkConf().setAppName("DemoDF").setMaster("local")
val sc = new SparkContext(conf)
val logData = sc.textFile("File.txt")
logData.count()
Error:
16/02/26 18:29:33 INFO SparkContext: Created broadcast 0 from textFile at FrameDemo.scala:13
16/02/26 18:29:34 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:293)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362)
at <br>org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)<br>
at scala.Option.map(Option.scala:145)<br>
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)<br>
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)<br>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br>
at scala.Option.getOrElse(Option.scala:120)<br>
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br>
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)<br>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br>
at scala.Option.getOrElse(Option.scala:120)<br>
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br>
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)<br>
at org.apache.spark.rdd.RDD.count(RDD.scala:1143)<br>
at com.org.SparkDF.FrameDemo$.main(FrameDemo.scala:14)<br>
at com.org.SparkDF.FrameDemo.main(FrameDemo.scala)<br>

Here is a good explanation of your problem with the solution.
Download the version of winutils.exe from https://github.com/steveloughran/winutils.
Set up your HADOOP_HOME environment variable on the OS level or programmatically:
System.setProperty("hadoop.home.dir", "full path to the folder with winutils");
Enjoy

Download winutils.exe
Create folder, say C:\winutils\bin
Copy winutils.exe inside C:\winutils\bin
Set environment variable HADOOP_HOME to C:\winutils

Follow this:
Create a bin folder in any directory(to be used in step 3).
Download winutils.exe and place it in the bin directory.
Now add System.setProperty("hadoop.home.dir", "PATH/TO/THE/DIR"); in your code.

1) Download winutils.exe from https://github.com/steveloughran/winutils
2) Create a directory In windows "C:\winutils\bin
3) Copy the winutils.exe inside the above bib folder .
4) Set the environmental property in the code
System.setProperty("hadoop.home.dir", "file:///C:/winutils/");
5) Create a folder "file:///C:/temp" and give 777 permissions.
6) Add config property in spark Session ".config("spark.sql.warehouse.dir", "file:///C:/temp")"

You can alternatively download winutils.exe from GITHub:
https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin
replace hadoop-2.7.1 with the version you want and place the file in D:\hadoop\bin
If you do not have access rights to the environment variable settings
on your machine, simply add the below line to your code:
System.setProperty("hadoop.home.dir", "D:\\hadoop");

On Windows 10 - you should add two different arguments.
(1) Add the new variable and value as - HADOOP_HOME and path (i.e. c:\Hadoop) under System Variables.
(2) Add/append new entry to the "Path" variable as "C:\Hadoop\bin".
The above worked for me.

if we see below issue
ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
then do following steps
download winutils.exe from http://public-repo-1.hortonworks.com/hdp-
win-alpha/winutils.exe.
and keep this under bin folder of any folder you created for.e.g. C:\Hadoop\bin
and in program add following line before creating SparkContext or SparkConf
System.setProperty("hadoop.home.dir", "C:\Hadoop");

I got the same problem while running unit tests. I found this workaround solution:
The following workaround allows to get rid of this message:
File workaround = new File(".");
System.getProperties().put("hadoop.home.dir", workaround.getAbsolutePath());
new File("./bin").mkdirs();
new File("./bin/winutils.exe").createNewFile();
from: https://issues.cloudera.org/browse/DISTRO-544

Setting the Hadoop_Home environment variable in system properties didn't work for me. But this did:
Set the Hadoop_Home in the Eclipse Run Configurations environment tab.
Follow the 'Windows Environment Setup' from here

Download winutils.exe and hadoop.dll in your windows machine.
create folder C:\hadoop\bin
Copy winutils.exe and hadoop.dll in newly created hadoop folder
Setup environment variable
HADOOP_HOME=C:\hadoop

On top of mentioning your environment variable for HADOOP_HOME in windows as C:\winutils, you also need to make sure you are the administrator of the machine. If not and adding environment variables prompts you for admin credentials (even under USER variables) then these variables will be applicable once you start your command prompt as administrator.

I have also faced the similar problem with the following details Java 1.8.0_121,
Spark spark-1.6.1-bin-hadoop2.6, Windows 10 and Eclipse Oxygen.When I ran my WordCount.java in Eclipse using HADOOP_HOME as a system variable as mentioned in the previous post, it did not work, what worked for me is -
System.setProperty("hadoop.home.dir", "PATH/TO/THE/DIR");
PATH/TO/THE/DIR/bin=winutils.exe whether you run within Eclipse as a Java application or by spark-submit from cmd using
spark-submit --class groupid.artifactid.classname --master local[2] /path to the jar file created using maven /path
to a demo test file /path to output directory command
Example: Go to the bin location of Spark/home/location/bin and execute the spark-submit as mentioned,
D:\BigData\spark-2.3.0-bin-hadoop2.7\bin>spark-submit --class com.bigdata.abdus.sparkdemo.WordCount --master local[1] D:\BigData\spark-quickstart\target\spark-quickstart-0.0.1-SNAPSHOT.jar D:\BigData\spark-quickstart\wordcount.txt

That's a tricky one... Your storage letter must be capical. For example "C:\..."

Missing sbt/boot/ directory in new play-2.1.1 installation. Playframework installation on Linux Mint 14

I'm trying to install the Play! Framework on a Linux Mint box, but I'm having a hard time getting play going. After the installation, I'm getting folliwing error message when I type play help at the command line:
$ play help
java.io.FileNotFoundException: /home/play-2.1.1/framework/sbt/boot/update.log (No such file or directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:218)
at java.io.FileOutputStream.<init>(FileOutputStream.java:171)
at java.io.FileWriter.<init>(FileWriter.java:90)
at xsbt.boot.Update.<init>(Checks.java:51)
at xsbt.boot.Launch.update(Launch.scala:275)
at xsbt.boot.Launch$$anonfun$jnaLoader$1.apply(Launch.scala:120)
at scala.Option.getOrElse(Option.scala:108)
at xsbt.boot.Launch.jnaLoader$2f324eef(Launch.scala:115)
at xsbt.boot.Launch.<init>(Launch.scala:94)
at xsbt.boot.Launcher$.apply(Launch.scala:290)
at xsbt.boot.Launch$.apply(Launch.scala:16)
at xsbt.boot.Boot$.runImpl(Boot.scala:31)
at xsbt.boot.Boot$.main(Boot.scala:20)
at xsbt.boot.Boot.main(Boot.scala)
Error during sbt execution: java.io.FileNotFoundException: /home/play-2.1.1/framework/sbt/boot/update.log (No such file or directory)
As can be seen, play is installed in the /home directory. Content of .bashrc file is as follows:
$ cat ~/.bashrc
export PATH=""/home/play-2.1.1:$PATH""
export JAVA_HOME=/usr/local/java/jdk1.7.0_12
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=$JAVA_HOME/lib/dt.jar:.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar
I wonder if I've left out something during the installation process that accounts for the difficulty in getting play up and running. Will appreciate helpfull hints and advise. Many thanks.

Make sure you have write permissions on the Play framework path. You will get this error if you do not.

java.io.IOException when running sbt from ensime?

I have tried ensime/sbt on mac os. First, I open the .scala file in project folder create from using sbt in command-line, then I ran ensime and it still work fine, but whenever I run ensime-sbt (c-c c-v s), I got
java.io.IOException: No such file or directory
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:883)
at xsbt.boot.Locks$.apply0(Locks.scala:34)
at xsbt.boot.Locks$.apply(Locks.scala:27)
at scala.collection.Iterable$class.$init$(Proxy.scala:32)
at xsbt.boot.Launch$ScalaProvider.<init>(Launch.scala:107)
at xsbt.boot.Launch$$anonfun$1.apply(Launch.scala:83)
at org.apache.ivy.plugins.namespace.NamespaceRule.newEntry(Cache.scala:17)
at org.apache.ivy.plugins.namespace.NamespaceRule.apply(Cache.scala:12)
at xsbt.boot.Launch.getScala(Launch.scala:85)
at xsbt.boot.Launch$.run(Launch.scala:49)
at xsbt.boot.Launch$$anonfun$explicit$1.apply(Launch.scala:43)
at xsbt.boot.Launch$.launch(Launch.scala:68)
at xsbt.boot.Launch$.apply(Launch.scala:14)
at xsbt.boot.Boot$.runImpl(Boot.scala:24)
at xsbt.boot.Boot$.main(Boot.scala:15)
at xsbt.boot.Boot.main(Boot.scala)
Error during sbt execution: java.io.IOException: No such file or directory
Process sbt exited abnormally with code 1
I tried using sbt from command-line and everything works from there (compile/run/console). I'm using sbt 0.10.1 and latest binary ensime on emacs24 (2011/07/24) on mac os.
Any idea that I'm doing it wrong ?

I had this and after applying strace I found the issue. The ensime-sbt.el function searches up from the cwd looking for ./project/build.properties. On finding this dir/file it assumes this is the root directory.
So just create this file and this issue should disappear. Would be nice if ensime created this file by default seeing as it's a required file for the sbt function to work.

I got the same error. This situation seems to be that sbt tried but failed to create ".sbt" and ".ivy" dir at the user's home directory. Maybe, the reason is that OS user doesn't have permission to write at the user's home directory.

It's something related with permission, maybe.
I checked the Locks.scala https://github.com/harrah/xsbt/blob/0.10/launch/Locks.scala source, and guessed "file.getParentFile.mkdirs()" did no work because of permission denial.
I encountered the same problem yesterday, and got it run a minute ago, by adding sudo:
"sudo emacs xxx.scala"

you can change sbt.ivy.home and ivy.home property. So, to augment Joachim's first solution, you would set both system properties:
like this:
java -Dsbt.ivy.home=/tmp/.ivy2/ -Divy.home=/tmp/.ivy2/ -jar dirname $0/sbt-launch.jar "$#"
hope to resolve you problem

This error also occurs when the files in the home directory that sbt tries to access are not owned by the user that tries to run it. run a chmod 777 on the directoris in the home master and the issue will be solved.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

The root scratch dir: /tmp/hive on HDFS should be writable - scala

Related

Unable to find file under resources folder

Reading a basic file from resources directory with spark submit

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. spark Eclipse on windows 7

Missing sbt/boot/ directory in new play-2.1.1 installation. Playframework installation on Linux Mint 14

java.io.IOException when running sbt from ensime?

Categories

Resources