create project jar in scala - scala

I have a self-contained application in SBT. My data is stored on HDFS (the hadoop file system).How can I get a jar file to run my work on another machine.
The directory of my project is the following:
/MyProject
/target
/scala-2.11
/MyApp_2.11-1.0.jar
/src
/main
/scala

If you don't have any dependencies then running sbt package will create a jar will all your code.
You can then run your Spark app as:
$SPARK_HOME/bin/spark-submit --name "an-app" my-app.jar
If your project has external dependencies (other than spark itself; if it's just Spark or any of it's dependencies, then the above approach still works), then you have two options:
1) Use the sbt assembly plugin to create an uper jar with your entire class-path. Running sbt assembly will create another jar which you can use in the same way as before.
2) If you only have very few simple dependecies (say just joda-time), then you can simply include them into your spark-submit script.
$SPARK_HOME/bin/spark-submit --name "an-app" --packages "joda-time:joda-time:2.9.6" my-app.jar

Unlike Java, in Scala, the file’s package name doesn’t have to match the directory name. In fact, for simple tests like this,
you can place this file in the root directory of your SBT project, if you prefer.
From the root directory of the project, you can compile the project:
$ sbt compile
Run the project:
$ sbt run
Package the project:
$ sbt package
Here is link to understand:
http://alvinalexander.com/scala/sbt-how-to-compile-run-package-scala-project

Related

How to run jar generated by sbt package via sbt

I have a jar that was created by running sbt package
I've already set the main file in the jar in build.sbt
mainClass in (Compile, packageBin) := Some("com.company.mysql.Main")
addCommandAlias("updatemysql", "runMain com.company.mysql.Main")
I've tried
sbt "runMain target/scala-2.12/update-mysql_2.12-0.1-SNAPSHOT.jar"
sbt target/scala-2.12/update-mysql_2.12-0.1-SNAPSHOT.jar com.company.mysql.Main
sbt target/scala-2.12/update-mysql_2.12-0.1-SNAPSHOT.jar:com.company.mysql.Main
sbt update-mysql-assembly-0.1-SNAPSHOT.jar/run
sbt run update-mysql-assembly-0.1-SNAPSHOT.jar
^ this gives No main class detected even though main class is set in build.sbt as shown a few lines above.
I need to run the jar through sbt because it's the only way I know how to overwrite config file that is contained in the jar using -Dpath.to.config.param=new_value
In sbt run and runMain use classpath containing all the dependencies as well as folders with outputs of compilation tasks - which means that none of them takes JAR as an argument.
I think it would be possible to run this particular JAR from sbt by writing a custom task that would depend on output of package task (that is JAR filepath value) and run it as external process... though from the question it seems that this is not the actual problem.
The actual problem is running the JAR with flags passed into JVM instead of program itself which can be achieved by something like:
# clean assembly ensures that there is only 1 JAR in target
# update-mysql_2.12-*.jar picks the only JAR no matter what is its version
# -D arguments NEED to be passed before -jar to pass it to JVM and not the JAR
sbt clean assembly && \
java -Dpath.to.config.param=new_value -jar target/scala-2.12/update-mysql_2.12-*.jar

How to add external jar files to a spark scala project

I am trying to use an LSH implementation of Scala(https://github.com/marufaytekin/lsh-spark) in my Spark project.I cloned the repository with some changes to the sbt file (added Organisation)
To use this implementation , I compiled it using sbt compile and moved the jar file to the "lib" folder of my project and updated the sbt configuration file of my project , which looks like this ,
Now when I try to compile my project using sbt compile , It fails to load the external jar file ,showing the error message "unresolved dependency: com.lendap.spark.lsh.LSH#lsh-scala_2.10;0.0.1-SNAPSHOT: not found".
Am i following the right steps for adding an external jar file ?
How do i solve the dependency issue
As an alternative, you can build the lsh-spark project and add the jar in your spark application.
To add the external jars, addJar option can be used while executing spark application. Refer Running spark application on yarn
This issue isn't related to spark but to sbt configuration.
Make sure you followed the correct folder structure imposed by sbt and added your jar in the lib folder, as explained here - lib folder should be at the same level as build.sbt (cf. this post).
You might also want to check out this SO post.

sbt-assembly: Create jar for a single project of a multi-project build

I have a multi-project build.sbt file. I would like to assemble the jar for just one of the projects. Currently, I do the following:
$ sbt
project analysis
assembly
...
exit
I would like to save a few steps and assemble the jar for the project "analysis" from the command line. Is there a way to do this?
Thanks.
You can use sbt without its REPL:
$ sbt analysis/assembly

Running Spark sbt project without sbt?

I have a Spark project which I can run from sbt console. However, when I try to run it from the command line, I get Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkContext. This is expected, because the Spark libs are listed as provided in the build.sbt.
How do I configure things so that I can run the JAR from the command line, without having to use sbt console?
To run Spark stand-alone you need to build a Spark assembly.
Run sbt/sbt assembly on the spark root dir. This will create: assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
Then you build your job jar with dependencies (either with sbt assembly or maven-shade-plugin)
You can use the resulting binaries to run your spark job from the command line:
ADD_JARS=job-jar-with-dependencies.jar SPARK_LOCAL_IP=<IP> java -cp spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar:job-jar-with-dependencies.jar com.example.jobs.SparkJob
Note: If you need other HDFS version, you need to follow additional steps before building the assembly. See About Hadoop Versions
Using sbt assembly plugin we can create a single jar. After doing that you can simply run it using java -jar command
For more details refer

How to create executable single jar which include webapp resources by sbt-assembly with scalatra

I'm making webapp using scalatra framework via sbt & xsbt-web-plugin.
I want to package all resources(templates, css, js) into a single jar.
In sbt with sbt-assembly plugin, assembly command makes single jar which includes all of project's dependencies.
$ java -jar myproject.jar
and I open it in browser
Could not load resource: [/WEB-INF/views/index.scaml]; are you sure it's within [null]?
I unzipped jar to confirm that it does not include src/main/webapp/*.
How can I config sbt for including src/main/webapp/* and building executable jar?
Resources are meant to be put under the resources folders. There are two such folders:
src/main/resources for resources available at runtime
src/test/resources for resources available only during testing
sbt will package those automatically for you when you run package-war or test. The project does not need to have the assembly plugin for sbt to include resources.
In your case, you should put the WEB-INF directory in src/main/resources/WEB-INF/.