I am running WordCount example in eclipse luna 3.8. My job is running fine on localjobrunner but I want it to run on yarn cluster because want to access hadoop logs. Somewhere I read that if job is running on local then it do not create logs until it submit to the resource manager. Submitting job to resource manager is possible only when job is running on yarn.
My working environment:
hadoop-2.6.0 running as pseudo distribute mode.
eclipse luna 3.8.
Any help will be appreciated.
Initialize the Job with YARN specific configurations. Add these configurations in the driver,
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://localhost:8020");
conf.set("mapreduce.framework.name", "yarn");
conf.set("yarn.resourcemanager.address", "localhost:8032");
You need yarn-site.xml and core-site.xml correctly on your classpath as well as all yarn and mapreduce jars(dependencies). Now you may have those jars from maven or something but you most likely missing those config files. You can set these on classpath from "Run as configuration" in eclipse. I assume you have local hadoop installation with these configuration files and you can run hadoop commands. In that case you can point your classpath to that installation's conf dir and lib dirs. It may be tedious but start with just pointing to conf dir (which contains core-site and yarn-site) first and see if that works. If not then also exclude your eclipses local dependencies (maven or similar) of yarn and mapreduce and explicitly set them from your installation dir. check this article for setting classpath for hadoop1:
https://letsdobigdata.wordpress.com/2013/12/07/running-hadoop-mapreduce-application-from-eclipse-kepler/
Here's another article from MapR (ignore mapr client related setup)
https://mapr.com/blog/basic-notes-on-configuring-eclipse-as-a-hadoop-development-environment-for-mapr/
You can do similar steps for hadoop2(yarn) but basic idea is your application runtime has to pickup correct jars and config files on classpath to be able to successfully deploy it on cluster.
Related
I am new to Spark and as I am learning this framework, I figured out that, to the best of my knowledge, there are two ways for running a spark application when written in Scala:
Package the project into a JAR file, and then run it with the spark-submit script.
Running the project directly with sbt run.
I am wondering what the difference between those two modes of execution could be, especially when running with sbt run can throw a java.lang.InterruptedException when it runs perfectly with spark-submit.
Thanks!
SBT is a build tool (that I like running on Linux) that does not necessarily imply Spark usage. It just so happens it is used like IntelliJ for Spark applications.
You can package and run an application in a single JVM under SBT Console, but not at scale. So, if you created a Spark application with dependencies indicated, SBT will compile the code with package and create a jar file with required dependencies etc. to run locally.
You can also use assembly option in SBT which creates an uber jar or fat jar with all dependencies contained in jar that you upload to your cluster and run via invoking spark-submit. So, again, if you created a Spark application with dependencies indicated, SBT will via assembly, compile the code and create an uber jar file with all required dependencies etc., except external file(s) that you need to ship to Workers, to run on your cluster (in general).
Spark Sbt and Spark-submit are 2 completely different Things
Spark sbt is build tool. If you have created a spark application, sbt will help you compile that code and create a jar file with required dependencies etc.
Spark-submit is used to submit spark job to cluster manager. You may be using standalone, Mesos or Yarn as your cluster Manager. spark-submit will submit your job to cluster manager and your job will start on cluster.
Hope this helps.
Cheers!
I am trying to use eclipse (Kepler) for building, running MapReduce (ver 2), using maven plugin.
I am able to successfully build the project, and i see that maven created jar file as final output, suppose that jar is -> mapreducedemo.jar.
Now, my question is how to run this jar using eclipse?
I tried it on command prompt and it works fine, like this:
--> $ hadoop jar mapreducedemo.jar MainDriver input output.
The thing is hadoop is shell script and it sets all the env variables internally, and required jars.
How can we run this mapreducedemo.jar using eclipse?
Any answers would be great help.
Thanks,
Vipin
You should be able to just run MainDriver as a java application from within Eclipse. Maven will make sure you have all your dependencies and MainDriver, once it configures the job will submit it for execution, just as it does when you run the hadoop jar command.
Is it possible to start a Map Reduce job on a remote cluster with the Eclipse Run Dialog (F11)?
Currently I have to run it with the External Tool Chain Dialog and Maven.
Note: To execute it on a local cluster is no big deal with the Run Dialog. But for a remote connection it's mandatory to have a compiled JAR. Otherwise you get a ClassNotFoundException (also if Jar-By-Class is set)
Our current Setup is:
Spring-Data-Hadoop 1.0.0
STS - Springsource Toolsuite
Maven
CDH4
This we set on our applicationContext.xml (this is what you specify in the *-site.xml on a vanilla hadoop)
<hdp:configuration id="hadoopConfiguration">
fs.defaultFS=hdfs://carolin.ixcloud.net:8020
mapred.job.tracker=michaela.ixcloud.net:8021
</hdp:configuration>
Is there a way to tell Eclipse it should build a JAR when the Run Dialog is executed.
I do not know if it builds a new jar (may be you must extract a jar to a folder), adding "Run Configurations->Classpath" your jar clears the problem "ClassNotFoundException".
I have a Map/Reduce program which loads a file and reads it into hbase. How do I execute my program through Eclipse? I googled and found 2 ways:
1) Using Eclipse Hadoop plugin
2) Create a jar file and execute it in Hadoop server
But, can I execute my Map/Reduce program by giving connection details and run in eclipse? Can any one tell me the exact procedure to run an Hbase Map/Reduce program?
I have done the following:
Installed and configured hadoop (and hdfs) on my machine
Built a maven-ized java project with all of the classes for my hadoop job
One of those classes is my "MR" or "Job" class that has a static main method that configures and submits my hadoop job
I run the MR class in Eclipse as a java application
The job runs in hadoop using the libraries in the java project's classpath (and therefore doesn't show up in the job tracker). Any reference to HDFS files uses the HDFS file system you installed and formatted using the non-eclipse hadoop install.
This works great with the debugger in Eclipse, although JUnit tests are kind of a pain to build by hand.
I am using Orion server for my Java-based web application. I have a run configuration that launches Orion with the correct classpaths and all necessary configuration. I also have several ANT scripts for copying files to the build path. I want to create an ANT script that shuts down Orion, copies necessary files, and restarts Orion. I can shutdown and copy in ANT, but I can't figure out how to launch a run configuration. I prefer to reference the launch configuration as opposed to specifying all of the configurations in the ANT script as well. Is this possible?
With eclipse remote control you can launch eclipse run configurations in eclipse from a simple java client application.
Ant4Eclipse is an Eclipse plugin and looks like it can do what you are asking. I have never used it myself so can't guarantee but reading their documentation they say you can create an Executor task that works on your launch configuration artifact. You will then reference this task in your build file.