Running a Map Reduce Program in Eclipse - eclipse

I have a Map/Reduce program which loads a file and reads it into hbase. How do I execute my program through Eclipse? I googled and found 2 ways:
1) Using Eclipse Hadoop plugin
2) Create a jar file and execute it in Hadoop server
But, can I execute my Map/Reduce program by giving connection details and run in eclipse? Can any one tell me the exact procedure to run an Hbase Map/Reduce program?

I have done the following:
Installed and configured hadoop (and hdfs) on my machine
Built a maven-ized java project with all of the classes for my hadoop job
One of those classes is my "MR" or "Job" class that has a static main method that configures and submits my hadoop job
I run the MR class in Eclipse as a java application
The job runs in hadoop using the libraries in the java project's classpath (and therefore doesn't show up in the job tracker). Any reference to HDFS files uses the HDFS file system you installed and formatted using the non-eclipse hadoop install.
This works great with the debugger in Eclipse, although JUnit tests are kind of a pain to build by hand.

Related

Difference in running a spark application with sbt run or with spark-submit script

I am new to Spark and as I am learning this framework, I figured out that, to the best of my knowledge, there are two ways for running a spark application when written in Scala:
Package the project into a JAR file, and then run it with the spark-submit script.
Running the project directly with sbt run.
I am wondering what the difference between those two modes of execution could be, especially when running with sbt run can throw a java.lang.InterruptedException when it runs perfectly with spark-submit.
Thanks!
SBT is a build tool (that I like running on Linux) that does not necessarily imply Spark usage. It just so happens it is used like IntelliJ for Spark applications.
You can package and run an application in a single JVM under SBT Console, but not at scale. So, if you created a Spark application with dependencies indicated, SBT will compile the code with package and create a jar file with required dependencies etc. to run locally.
You can also use assembly option in SBT which creates an uber jar or fat jar with all dependencies contained in jar that you upload to your cluster and run via invoking spark-submit. So, again, if you created a Spark application with dependencies indicated, SBT will via assembly, compile the code and create an uber jar file with all required dependencies etc., except external file(s) that you need to ship to Workers, to run on your cluster (in general).
Spark Sbt and Spark-submit are 2 completely different Things
Spark sbt is build tool. If you have created a spark application, sbt will help you compile that code and create a jar file with required dependencies etc.
Spark-submit is used to submit spark job to cluster manager. You may be using standalone, Mesos or Yarn as your cluster Manager. spark-submit will submit your job to cluster manager and your job will start on cluster.
Hope this helps.
Cheers!

Apache Beam Program execution without using Maven

I want to run a simple example Beam Program using Apache Spark runner.
1) I was to able to compile the program in my local successfully.
2) I want to push the JAR file to QA box where Maven is not installed.
3) I see the example with Maven command which compiles and executes the examples program.
4) Could you please tell me the steps to run the code without installing Maven.
5) spark-submit command runs fine.
6) Do you want me to put all the dependent JAR files one by one in
/opt/mapr/spark/spark-2.1.0/jars directory to execute the program
Thanks.
You can do this by following the instructions in Beam Spark Runner documentation.
These instructions demonstrate how to Submit your application by doing the following:
Packaging your application, with dependencies, as an uber jar.
Submit your packaged application using spark-submit.

How to run mapreduce jar using eclipse

I am trying to use eclipse (Kepler) for building, running MapReduce (ver 2), using maven plugin.
I am able to successfully build the project, and i see that maven created jar file as final output, suppose that jar is -> mapreducedemo.jar.
Now, my question is how to run this jar using eclipse?
I tried it on command prompt and it works fine, like this:
--> $ hadoop jar mapreducedemo.jar MainDriver input output.
The thing is hadoop is shell script and it sets all the env variables internally, and required jars.
How can we run this mapreducedemo.jar using eclipse?
Any answers would be great help.
Thanks,
Vipin
You should be able to just run MainDriver as a java application from within Eclipse. Maven will make sure you have all your dependencies and MainDriver, once it configures the job will submit it for execution, just as it does when you run the hadoop jar command.

Hadoop and Hbase configuration in Eclipse

I am using windows 7 and cygwin. I am successfully configure Hadoop 1.0.3 and Hbase 0.94.16 and also create table and insert data in table.
Now I want to configure Hadoop and Hbase in eclipse(windows 7) so plz suggest if have any idea. Thankyou .
After expanding a whole day finally I got the solution.These are some steps to configure Hbase in Eclipse IDE.
Using cygwin all hbase running successfully.
Firstly get the some jar file from hbase and hadoop lib folder(hadoop,hbase,hbase- test,common-logging,commons-configuration)
Create a simple java project and configure these all jar files(projectname -> BuildPath -> configureBuildPath)
After these steps attach Hbase config folder in your project(ProjectName -> BuildPath -> Link Source)
Than run your program for create table in Hbase.
I think you might find this tool useful. It comes in pretty handy and allows us to run Hbase inside of Eclipse without much fuss. You just have to pull it into Eclipse and run. Once you run it you'll get a tray icon on your PC. Then you just have to do a right click+start server and it'll start Hbase inside your Eclipse. And the good thing is that you don't need Hadoop setup for this. All the tables, including -ROOT- and .META., will be created on your local FS.
HTH

Start a Hadoop Map Reduce Job on a remote cluster in Eclipse with the run dialog (F11)

Is it possible to start a Map Reduce job on a remote cluster with the Eclipse Run Dialog (F11)?
Currently I have to run it with the External Tool Chain Dialog and Maven.
Note: To execute it on a local cluster is no big deal with the Run Dialog. But for a remote connection it's mandatory to have a compiled JAR. Otherwise you get a ClassNotFoundException (also if Jar-By-Class is set)
Our current Setup is:
Spring-Data-Hadoop 1.0.0
STS - Springsource Toolsuite
Maven
CDH4
This we set on our applicationContext.xml (this is what you specify in the *-site.xml on a vanilla hadoop)
<hdp:configuration id="hadoopConfiguration">
fs.defaultFS=hdfs://carolin.ixcloud.net:8020
mapred.job.tracker=michaela.ixcloud.net:8021
</hdp:configuration>
Is there a way to tell Eclipse it should build a JAR when the Run Dialog is executed.
I do not know if it builds a new jar (may be you must extract a jar to a folder), adding "Run Configurations->Classpath" your jar clears the problem "ClassNotFoundException".