Apache Beam Program execution without using Maven - apache-beam

I want to run a simple example Beam Program using Apache Spark runner.
1) I was to able to compile the program in my local successfully.
2) I want to push the JAR file to QA box where Maven is not installed.
3) I see the example with Maven command which compiles and executes the examples program.
4) Could you please tell me the steps to run the code without installing Maven.
5) spark-submit command runs fine.
6) Do you want me to put all the dependent JAR files one by one in
/opt/mapr/spark/spark-2.1.0/jars directory to execute the program
Thanks.

You can do this by following the instructions in Beam Spark Runner documentation.
These instructions demonstrate how to Submit your application by doing the following:
Packaging your application, with dependencies, as an uber jar.
Submit your packaged application using spark-submit.

Related

Difference in running a spark application with sbt run or with spark-submit script

I am new to Spark and as I am learning this framework, I figured out that, to the best of my knowledge, there are two ways for running a spark application when written in Scala:
Package the project into a JAR file, and then run it with the spark-submit script.
Running the project directly with sbt run.
I am wondering what the difference between those two modes of execution could be, especially when running with sbt run can throw a java.lang.InterruptedException when it runs perfectly with spark-submit.
Thanks!
SBT is a build tool (that I like running on Linux) that does not necessarily imply Spark usage. It just so happens it is used like IntelliJ for Spark applications.
You can package and run an application in a single JVM under SBT Console, but not at scale. So, if you created a Spark application with dependencies indicated, SBT will compile the code with package and create a jar file with required dependencies etc. to run locally.
You can also use assembly option in SBT which creates an uber jar or fat jar with all dependencies contained in jar that you upload to your cluster and run via invoking spark-submit. So, again, if you created a Spark application with dependencies indicated, SBT will via assembly, compile the code and create an uber jar file with all required dependencies etc., except external file(s) that you need to ship to Workers, to run on your cluster (in general).
Spark Sbt and Spark-submit are 2 completely different Things
Spark sbt is build tool. If you have created a spark application, sbt will help you compile that code and create a jar file with required dependencies etc.
Spark-submit is used to submit spark job to cluster manager. You may be using standalone, Mesos or Yarn as your cluster Manager. spark-submit will submit your job to cluster manager and your job will start on cluster.
Hope this helps.
Cheers!

compile scala-spark file to jar file

Im working on a project of frequent item sets, and I use the Algorithm FP-Growth, I depend on the version developed in Scala-Spark
https://github.com/apache/spark/blob/v2.1.0/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala
I need to modify this code and recompile it to have a jar file that I can include it to Spark-shell , and call its functions in spark
the problem s that spark-shell is un interpreter , and it finds errors in this file, Ive tried Sbt with eclipse but it did not succeded .
what i need is compiler that can use the last version of scala and spark-shel libraries to compile this file to jar file.
Got your question now!
All you need to do is add dependency jars(scala, java, etc.,) with respect to the machine you are going to use you own jar. Later on add the jars to spark-shell and you can use it like below,
spark-shell --jars your_jar.jar
Follow this steps:
check out Spark repository
modify files to want to modify
build project
run ./dev/make-distribution.sh script, which is inside Spark repository
run Spark Shell from your Spark distribution

Cucumber and Eclipse IDE; How to make jar for a test case

I'm a Cucumber and Eclipse beginner and have a few questions and hope you can help me to get through this: I created a sample cucumber test scenario, a sample test steps and a cucumber runner. The scenarios runs fine within eclipse IDE (Neon). I used Maven as the dependency manager. I also installed the Maven command line module. The step code is Java.
Here is the (basic) question: How do I create a jar file from my cucumber test scenario so that execute it via command line so that I can bring the test scenario to Jenkins CI? Is there anything I need to do with Maven BEFORE I can build the jar file?
Thanks a lot folks!
If you run Cucumber using the JUnit runner, then all you have to do to run from a command line is to execute Maven and make sure you invoke a life cycle phase that includes running the unit tests. One way would be
mvn test
An example that might get you up and running can be found at
https://github.com/cucumber/cucumber-java-skeleton

How to run mapreduce jar using eclipse

I am trying to use eclipse (Kepler) for building, running MapReduce (ver 2), using maven plugin.
I am able to successfully build the project, and i see that maven created jar file as final output, suppose that jar is -> mapreducedemo.jar.
Now, my question is how to run this jar using eclipse?
I tried it on command prompt and it works fine, like this:
--> $ hadoop jar mapreducedemo.jar MainDriver input output.
The thing is hadoop is shell script and it sets all the env variables internally, and required jars.
How can we run this mapreducedemo.jar using eclipse?
Any answers would be great help.
Thanks,
Vipin
You should be able to just run MainDriver as a java application from within Eclipse. Maven will make sure you have all your dependencies and MainDriver, once it configures the job will submit it for execution, just as it does when you run the hadoop jar command.

Running a Map Reduce Program in Eclipse

I have a Map/Reduce program which loads a file and reads it into hbase. How do I execute my program through Eclipse? I googled and found 2 ways:
1) Using Eclipse Hadoop plugin
2) Create a jar file and execute it in Hadoop server
But, can I execute my Map/Reduce program by giving connection details and run in eclipse? Can any one tell me the exact procedure to run an Hbase Map/Reduce program?
I have done the following:
Installed and configured hadoop (and hdfs) on my machine
Built a maven-ized java project with all of the classes for my hadoop job
One of those classes is my "MR" or "Job" class that has a static main method that configures and submits my hadoop job
I run the MR class in Eclipse as a java application
The job runs in hadoop using the libraries in the java project's classpath (and therefore doesn't show up in the job tracker). Any reference to HDFS files uses the HDFS file system you installed and formatted using the non-eclipse hadoop install.
This works great with the debugger in Eclipse, although JUnit tests are kind of a pain to build by hand.