java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. spark Eclipse on windows 7 - eclipse

I'm not able to run a simple spark job in Scala IDE (Maven spark project) installed on Windows 7
Spark core dependency has been added.
val conf = new SparkConf().setAppName("DemoDF").setMaster("local")
val sc = new SparkContext(conf)
val logData = sc.textFile("File.txt")
logData.count()
Error:
16/02/26 18:29:33 INFO SparkContext: Created broadcast 0 from textFile at FrameDemo.scala:13
16/02/26 18:29:34 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:293)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362)
at <br>org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)<br>
at scala.Option.map(Option.scala:145)<br>
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)<br>
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)<br>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br>
at scala.Option.getOrElse(Option.scala:120)<br>
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br>
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)<br>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br>
at scala.Option.getOrElse(Option.scala:120)<br>
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br>
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)<br>
at org.apache.spark.rdd.RDD.count(RDD.scala:1143)<br>
at com.org.SparkDF.FrameDemo$.main(FrameDemo.scala:14)<br>
at com.org.SparkDF.FrameDemo.main(FrameDemo.scala)<br>

Here is a good explanation of your problem with the solution.
Download the version of winutils.exe from https://github.com/steveloughran/winutils.
Set up your HADOOP_HOME environment variable on the OS level or programmatically:
System.setProperty("hadoop.home.dir", "full path to the folder with winutils");
Enjoy

Download winutils.exe
Create folder, say C:\winutils\bin
Copy winutils.exe inside C:\winutils\bin
Set environment variable HADOOP_HOME to C:\winutils

Follow this:
Create a bin folder in any directory(to be used in step 3).
Download winutils.exe and place it in the bin directory.
Now add System.setProperty("hadoop.home.dir", "PATH/TO/THE/DIR"); in your code.

1) Download winutils.exe from https://github.com/steveloughran/winutils
2) Create a directory In windows "C:\winutils\bin
3) Copy the winutils.exe inside the above bib folder .
4) Set the environmental property in the code
System.setProperty("hadoop.home.dir", "file:///C:/winutils/");
5) Create a folder "file:///C:/temp" and give 777 permissions.
6) Add config property in spark Session ".config("spark.sql.warehouse.dir", "file:///C:/temp")"

You can alternatively download winutils.exe from GITHub:
https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin
replace hadoop-2.7.1 with the version you want and place the file in D:\hadoop\bin
If you do not have access rights to the environment variable settings
on your machine, simply add the below line to your code:
System.setProperty("hadoop.home.dir", "D:\\hadoop");

On Windows 10 - you should add two different arguments.
(1) Add the new variable and value as - HADOOP_HOME and path (i.e. c:\Hadoop) under System Variables.
(2) Add/append new entry to the "Path" variable as "C:\Hadoop\bin".
The above worked for me.

if we see below issue
ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
then do following steps
download winutils.exe from http://public-repo-1.hortonworks.com/hdp-
win-alpha/winutils.exe.
and keep this under bin folder of any folder you created for.e.g. C:\Hadoop\bin
and in program add following line before creating SparkContext or SparkConf
System.setProperty("hadoop.home.dir", "C:\Hadoop");

I got the same problem while running unit tests. I found this workaround solution:
The following workaround allows to get rid of this message:
File workaround = new File(".");
System.getProperties().put("hadoop.home.dir", workaround.getAbsolutePath());
new File("./bin").mkdirs();
new File("./bin/winutils.exe").createNewFile();
from: https://issues.cloudera.org/browse/DISTRO-544

Setting the Hadoop_Home environment variable in system properties didn't work for me. But this did:
Set the Hadoop_Home in the Eclipse Run Configurations environment tab.
Follow the 'Windows Environment Setup' from here

Download winutils.exe and hadoop.dll in your windows machine.
create folder C:\hadoop\bin
Copy winutils.exe and hadoop.dll in newly created hadoop folder
Setup environment variable
HADOOP_HOME=C:\hadoop

On top of mentioning your environment variable for HADOOP_HOME in windows as C:\winutils, you also need to make sure you are the administrator of the machine. If not and adding environment variables prompts you for admin credentials (even under USER variables) then these variables will be applicable once you start your command prompt as administrator.

I have also faced the similar problem with the following details Java 1.8.0_121,
Spark spark-1.6.1-bin-hadoop2.6, Windows 10 and Eclipse Oxygen.When I ran my WordCount.java in Eclipse using HADOOP_HOME as a system variable as mentioned in the previous post, it did not work, what worked for me is -
System.setProperty("hadoop.home.dir", "PATH/TO/THE/DIR");
PATH/TO/THE/DIR/bin=winutils.exe whether you run within Eclipse as a Java application or by spark-submit from cmd using
spark-submit --class groupid.artifactid.classname --master local[2] /path to the jar file created using maven /path
to a demo test file /path to output directory command
Example: Go to the bin location of Spark/home/location/bin and execute the spark-submit as mentioned,
D:\BigData\spark-2.3.0-bin-hadoop2.7\bin>spark-submit --class com.bigdata.abdus.sparkdemo.WordCount --master local[1] D:\BigData\spark-quickstart\target\spark-quickstart-0.0.1-SNAPSHOT.jar D:\BigData\spark-quickstart\wordcount.txt

That's a tricky one... Your storage letter must be capical. For example "C:\..."

Related

The root scratch dir: /tmp/hive on HDFS should be writable

I am trying to run my spark Scala code on eclipse in Windows 10. While trying to read my CSV file I am getting the error:
The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
I have gone through various similar questions on the platform but nothing worked for me.
Things I have tried so far:
1- Tried winutils for Spark 2.3.1(Version of Spark I have to use)
2- gave permissions using:
C:\hadoop\bin\winutils.exe chmod 777 C:\tmp\hive
Is /tmp/hive same as C:\tmp\hive? is there any fix I can try programatically in Spark and Scala to use a differenent location while creating the spark Session?

Scala Standalone JAR with a conf Folder

I'm using the sbt assembly jar plugin to create a standalone jar file. My project folder structure would look like this:
MyProject
-src
- main
- scala
- mypackages and source files
- conf // contains application.conf, application.test.conf and so on
- test
-project // contains all the build related files
- README.md
I now want to be able to run the fat jar that I produce against a version of the application.conf that I specify as a System property!
So here is what I do in my unit test!
System.setProperty("environment", "test")
And this is how I load the config in one of the files in my src folder:
val someEnv = Option(System.getProperty("environment", "")).filter(_.nonEmpty) // gives me some(test)
val name = s"application.${someEnv.get}.conf"
I can see that the environment variable is set and I get the environment passed it. But later on I load the application.test.conf as below:
ConfigFactory.load(name).resolve()
It however loads just the edfault application.conf and not the one that I specify!
What is wrong in my case? Where should I put the conf folder? I'm trying to run it against my unit test which is inside the test folder!
I believe you need to specify the full name of the configuration file. The .conf is optional. Try
ConfigFactory.load(s"application.${someEnv.get}").resolve()
The docs for ConfigFactory.load(String) indicate you need to supply
name (optionally without extension) of a resource on classpath
Ok! Here is what I had to do! Change the name of the folder where the config file is located. I originally had it as conf and I had to rename it to resources and bang it worked!

Eclipse run error

When i try to run my code on Eclipse this error appears:
Usage: javaw [-options] class [args...]
(to execute a class)
or javaw [-options] -jar jarfile [args...]
(to execute a jar file)
where options include:
-d32 use a 32-bit data model if available
-d64 use a 64-bit data model if available
-server to select the "server" VM
-hotspot is a synonym for the "server" VM [deprecated]
The default VM is server.
-cp <class search path of directories and zip/jar files>
-classpath <class search path of directories and zip/jar files>
A ; separated list of directories, JAR archives,
and ZIP archives to search for class files.
-D<name>=<value>
set a system property
-verbose:[class|gc|jni]
enable verbose output
-version print product version and exit
-version:<value>
require the specified version to run
-showversion print product version and continue
-jre-restrict-search | -no-jre-restrict-search
include/exclude user private JREs in the version search
-? -help print this help message
-X print help on non-standard options
-ea[:<packagename>...|:<classname>]
-enableassertions[:<packagename>...|:<classname>]
enable assertions with specified granularity
-da[:<packagename>...|:<classname>]
-disableassertions[:<packagename>...|:<classname>]
disable assertions with specified granularity
-esa | -enablesystemassertions
enable system assertions
-dsa | -disablesystemassertions
disable system assertions
-agentlib:<libname>[=<options>]
load native agent library <libname>, e.g. -agentlib:hprof
see also, -agentlib:jdwp=help and -agentlib:hprof=help
-agentpath:<pathname>[=<options>]
load native agent library by full pathname
-javaagent:<jarpath>[=<options>]
load Java programming language agent, see java.lang.instrument
-splash:<imagepath>
show splash screen with specified image
See http://www.oracle.com/technetwork/java/javase/documentation/index.html for more details.
i try to coment my entired code and this error still appear.
It seems you haven't set your java path correctly.
Setting Up Eclipse with Java 1.6 on Windows
How To Install and Get Started with Java Programming
Run eclipse in clean mode
Edit the eclipse.ini file located in your Eclipse install directory and insert -clean as the first line.
If this is happening to a specific project only and other projects are running fine then your default run configuration might have changed. You may try the following
- Run -> Run As -> 1 Java Application.
I fixed this issue by deleting some of my old runtime configurations. Eclipse then started automatically generating them again.

RxTx installation on windows java.lang.NoClassDefFoundError: gnu/io/CommPort

I put rxtxcomm.jar into jre/lib/ext folder, but I still get NoClassDefFoundError Isn't this folder automatically taken into the global classpath?
Thanks
yes it is taken automatically to classpath, but RXTXcomm uses JNI /native external libraries (.so and .dll files), you must provide the path to them when running your program in command line:
java -jar yourprogram.jar -Djava.library.path="PATH_TO_EXTERNAL_LIBRARIES"
for linux:
suppose you unpacked the rxtx.zip to
/home/user/
if you have 32bit x86 platofrm:
PATH_TO_EXTERNAL_LIBRARIES = /home/user/Linux/i686-unknown-linux-gnu/
if you have 64bit x86 platform the it would be:
PATH_TO_EXTERNAL_LIBRARIES = /home/user/Linux/x86_64-unknown-linux-gnu/
for windows:
suppose you downloaded and unpacked it to C:\rxtxt
PATH_TO_EXTERNAL_LIBRARIES = C:\rxtxt\Windows\i368-mingw32\
If you find it cumbersome to do it from command line you can do it from yout code (before opening port via RXTXcomm):
System.setProperty("java.library.path","PATH_TO_EXTERNAL_LIBRARIES");
EDIT:
of course, you must put RXTXcomm.jar in your classpath in addition to all of the above. If running from command line as jar packaged program - yourprogram.jar - inside the jar you must have a META-INF folder that contains MANIFEST.MF with the following entries:
Class-Path: lib/RXTXcomm.jar
Main-Class: pkg.Main
and yourprogram.jar must be in folder which has folder lib in which is RXTXcomm.jar, also
the class with
public static void main(String[] args)
method must be called Main and reside in package named pkg (just replace pkg.Main with whatever you have).
Then you can run your program succesfully and open a serial port if you have one. This approach also eliminates the need to copy anything in jre/lib/ext folder
EDIT^2:
or, if you don't want to pack your program in jar, position yourself in folder which contains the folder pkg and write:
java -cp PATH_TO_RXTX/RXTXcomm.jar -Djava.library.path="PATH_TO_EXTERNAL_LIBRARIES" pkg.Main
(all paths can be relative or absolute)
EDIT^3:
but I would recommend Java Simple Serial Connector instead of RXTXcomm:
it handles heavy load from multiple threads as opposed to RXTXcomm (tested in production)
external libraries are packed in jar so there is no need for setting java.library.path

java.io.IOException when running sbt from ensime?

I have tried ensime/sbt on mac os. First, I open the .scala file in project folder create from using sbt in command-line, then I ran ensime and it still work fine, but whenever I run ensime-sbt (c-c c-v s), I got
java.io.IOException: No such file or directory
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:883)
at xsbt.boot.Locks$.apply0(Locks.scala:34)
at xsbt.boot.Locks$.apply(Locks.scala:27)
at scala.collection.Iterable$class.$init$(Proxy.scala:32)
at xsbt.boot.Launch$ScalaProvider.<init>(Launch.scala:107)
at xsbt.boot.Launch$$anonfun$1.apply(Launch.scala:83)
at org.apache.ivy.plugins.namespace.NamespaceRule.newEntry(Cache.scala:17)
at org.apache.ivy.plugins.namespace.NamespaceRule.apply(Cache.scala:12)
at xsbt.boot.Launch.getScala(Launch.scala:85)
at xsbt.boot.Launch$.run(Launch.scala:49)
at xsbt.boot.Launch$$anonfun$explicit$1.apply(Launch.scala:43)
at xsbt.boot.Launch$.launch(Launch.scala:68)
at xsbt.boot.Launch$.apply(Launch.scala:14)
at xsbt.boot.Boot$.runImpl(Boot.scala:24)
at xsbt.boot.Boot$.main(Boot.scala:15)
at xsbt.boot.Boot.main(Boot.scala)
Error during sbt execution: java.io.IOException: No such file or directory
Process sbt exited abnormally with code 1
I tried using sbt from command-line and everything works from there (compile/run/console). I'm using sbt 0.10.1 and latest binary ensime on emacs24 (2011/07/24) on mac os.
Any idea that I'm doing it wrong ?
I had this and after applying strace I found the issue. The ensime-sbt.el function searches up from the cwd looking for ./project/build.properties. On finding this dir/file it assumes this is the root directory.
So just create this file and this issue should disappear. Would be nice if ensime created this file by default seeing as it's a required file for the sbt function to work.
I got the same error. This situation seems to be that sbt tried but failed to create ".sbt" and ".ivy" dir at the user's home directory. Maybe, the reason is that OS user doesn't have permission to write at the user's home directory.
It's something related with permission, maybe.
I checked the Locks.scala https://github.com/harrah/xsbt/blob/0.10/launch/Locks.scala source, and guessed "file.getParentFile.mkdirs()" did no work because of permission denial.
I encountered the same problem yesterday, and got it run a minute ago, by adding sudo:
"sudo emacs xxx.scala"
you can change sbt.ivy.home and ivy.home property. So, to augment Joachim's first solution, you would set both system properties:
like this:
java -Dsbt.ivy.home=/tmp/.ivy2/ -Divy.home=/tmp/.ivy2/ -jar dirname $0/sbt-launch.jar "$#"
hope to resolve you problem
This error also occurs when the files in the home directory that sbt tries to access are not owned by the user that tries to run it. run a chmod 777 on the directoris in the home master and the issue will be solved.