How to add external jar files to a spark scala project - scala

I am trying to use an LSH implementation of Scala(https://github.com/marufaytekin/lsh-spark) in my Spark project.I cloned the repository with some changes to the sbt file (added Organisation)
To use this implementation , I compiled it using sbt compile and moved the jar file to the "lib" folder of my project and updated the sbt configuration file of my project , which looks like this ,
Now when I try to compile my project using sbt compile , It fails to load the external jar file ,showing the error message "unresolved dependency: com.lendap.spark.lsh.LSH#lsh-scala_2.10;0.0.1-SNAPSHOT: not found".
Am i following the right steps for adding an external jar file ?
How do i solve the dependency issue

As an alternative, you can build the lsh-spark project and add the jar in your spark application.
To add the external jars, addJar option can be used while executing spark application. Refer Running spark application on yarn

This issue isn't related to spark but to sbt configuration.
Make sure you followed the correct folder structure imposed by sbt and added your jar in the lib folder, as explained here - lib folder should be at the same level as build.sbt (cf. this post).
You might also want to check out this SO post.

Related

IntelliJ Scala maven build jar not generating class files

I have created spark scala(version 2.11) application and try to build using maven(version-3) using IntelliJ. At first time,able to compile and built the jar using maven successfully and able to test spark application using jar on cluster as well.Next time,I have modified some of the existing scala class code and tried to build again, code compiled and generate jar file successfully without any issues but there are no scala classes in latest jar file.I would like to know why maven build is not generating class file when I build.Can you please let me know what could be the problem and how Can I fix it ?
The easiest way to build scala applications for spark is to use SBT and fat jar plugin. Details were already described there:
How to build an Uber JAR (Fat JAR) using SBT within IntelliJ IDEA?
Just don't forget to exclude spark jars from fat jar with provided.

how to package and reference text file in resources folder

I have a spark project using scala and sbt. At one point it references a text file which I want to be packaged.
This is how it is referenced in the application source:
getClass.getResource("/myFile.txt")
This works fine running the source code with sbt run. But I want it to be packaged and deployed to a server.
In build.sbt, after some googling I have got this to work
import NativePackagerHelper._
mappings in Universal ++= directory("src/main/resources")
adding this meant that the myFile.txt appears in the resources folder in the package. created using
sbt universal:packageBin
resulting folder structure:
target - universal - bin
- lib
- resources
however when I run my packaged application from bin/my-application.bat , I get the following error
Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:/Software/my-application-0.0.1/lib/my-application-0.0.1.jar!/myFile.txt;
Bear in mind I have zero experience of deploying scala or jvm based things so you may have to spoonfeed me the explanation
EDIT I later realised that the text file was in fact included in the .jar file.
the issue then was that getResource does not work in this case and I had to adapt my code to use getResourceAsStream
This can have multiple reasons.
Include files in your resulting jar
You added this line, which is not correct
import NativePackagerHelper._
mappings in Universal ++= directory("src/main/resources")
The src/main/resources directory is the resourceDirectory in Compile and the contents are always present in the package jar file (not the zip!). So I would highly recommend removing this snippet as you will have your files twice in your classpath.
The mappings in Universal (documentation link) define the content of the created package (with universal:packageBin the zip file). I assume that you are using the JavaAppPackaging plugin, which configures your entire build. By default all dependencies and your actual build artifact end up in the libs folder. Start scripts are being place in bin.
The start scripts also create a valid classpath, which includes all libraries in lib and nothing else by default.
TL;DR You simply put your files in src/main/resources and they will be available on the classpath.
How to find files on the classpath
You posted this snippet
getClass.getResource("/myFile.txt")
This will lookup a file called myFile.txt in the roots of your classpath. As in the comment suggested you should open your application jar file and find a text file myFile.txt at the root, otherwise it won't be found.
hope that helps,
Muki

How to use Phantom in Scala IDE

I want to use phantom with my scala IDE.So for this i clone the git hub repository and created a .jar file of phantom using sbt -> compile -> package.I add this .jar file to build path in my Scala IDE but still while importing
import com.websudos.phantom.connectors._
is throwing error that
object connector is not a member of com.websudos.phantom.
While using auto complete function of scala ide it is showing only the import for
import com.websudos.phantom.example
.I don't know if the jar files got created for example then why it is not created for other.
I search in internet but all other option are given as to add dependency in sbt build path but i dont want to use it.
Use sbt-assebly instead to create a fat jar.
https://github.com/sbt/sbt-assembly

How to create standalone jar file for elastic4s from sources?

I'm trying to create a standalone jar file from the elastic4s sources on github. I've used sbt compile as detailed on the github page, but I can't find the jar file.
How do I use sbt to create the jar file so I can import it into my projects as a dependency?
The compile task will only compile the project.
> help compile
Compiles sources.
If you want to create a jar file and use it as a dependency in your project, you have two ways of doing that.
Unmanaged dependency (not recommended)
Unmanaged dependency run +package, which will create a jar file for each supported scala version, which you can use in your projects as an unmanaged dependency. Copy the package-generated jar to lib folder in your project.
The jar files will be located in target/scala-2.11 and target/scala-2.10, depending on the Scala version you want to use it with.
Publish to Local Repository (recommended yet imperfect)
If you want to include your custom-built elastic4s, as a managed dependency, you have to run +publishLocal. This will do the same as above, but additionally it will publish the artifact to your local repository. Assuming you've built it with version := "1.2.1.1-SNAPSHOT", you can include it in your project by just adding:
libraryDependencies += "com.sksamuel.elastic4s" %% "elastic4s" % "1.2.1.1-SNAPSHOT"
What makes the approach imperfect is that once you shared the project on GitHub (or any other project sharing platform), people will have to do publishLocal themselves to be able to build your project. The dependency should therefore go to one of the official binary repositories so when a dependency is needed, it's downloaded from Internet. Consult Publishing.
What is the + character in front of the commands
The + in the commands is for cross-building, if you don't use it the command will be executed only using scalaVersion declared in the build.sbt of the project.

How to build a jar file out of github project in sbt to be used in a scala program

I am trying to use scala to access Amazon's DynamoDB and found this great package on github https://github.com/piotrga/async-dynamo
so I downloaded the code as a zip file , unzipped it and then did "sbt clean test" and getting the following error
error sbt.ResolveException: unresolved dependency: asyncdynamo#async-dynamo;1.6.0: not found
Questions : is this the correct way to generate a jar file that I can include in my Scala program or is there a better way?
thanks in advance.
EDIT:
just for the benefit of others, the SCALA SBT documentation provides lots of information regarding the build process.
Instead of generating a jar file, you can just run 'sbt publish-local' and then include the lines for the managed dependency in the other project.
Sbt/ivy will see you have the artifact that way you don't need to add the jar to the other project which is much cleaner.
Then for example if you need to update the other project you don't need to replace the jar again - just publish-local again and clean and run your other project!
You are not the only one to have problems with this it seems, see github issues page:
https://github.com/piotrga/async-dynamo/issues
The command 'sbt clean test' will run the tests sbt detects. If you want a .jar file you could use 'sbt clean package', which produces a .jar in the target/ folder.
I cloned the repo and was able to run sbt package after changing release.sbt a bit. I had to change the 'publishTo'-variable as it seemed to depend on the repository creators local environment variable, so I just commented it away.
I did not get the dependency problem, so I suppose it is correctly declared. The tests it tries to run do fail though, but sbt package compiles produces the .jar just fine.
EDIT: As Matthias Schlaipfer pointed out in the comments, the more elegant way(and much easier) would just be to add this as an depency in your build.sbt. From the readme, this is what you need to add:
resolvers += "piotrga" at
"https://github.com/piotrga/piotrga.github.com/tree/master/maven-repo"
libraryDependencies += "asyncdynamo" % "async-dynamo" % "1.6"