how to package and reference text file in resources folder - scala

I have a spark project using scala and sbt. At one point it references a text file which I want to be packaged.
This is how it is referenced in the application source:
getClass.getResource("/myFile.txt")
This works fine running the source code with sbt run. But I want it to be packaged and deployed to a server.
In build.sbt, after some googling I have got this to work
import NativePackagerHelper._
mappings in Universal ++= directory("src/main/resources")
adding this meant that the myFile.txt appears in the resources folder in the package. created using
sbt universal:packageBin
resulting folder structure:
target - universal - bin
- lib
- resources
however when I run my packaged application from bin/my-application.bat , I get the following error
Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:/Software/my-application-0.0.1/lib/my-application-0.0.1.jar!/myFile.txt;
Bear in mind I have zero experience of deploying scala or jvm based things so you may have to spoonfeed me the explanation
EDIT I later realised that the text file was in fact included in the .jar file.
the issue then was that getResource does not work in this case and I had to adapt my code to use getResourceAsStream

This can have multiple reasons.
Include files in your resulting jar
You added this line, which is not correct
import NativePackagerHelper._
mappings in Universal ++= directory("src/main/resources")
The src/main/resources directory is the resourceDirectory in Compile and the contents are always present in the package jar file (not the zip!). So I would highly recommend removing this snippet as you will have your files twice in your classpath.
The mappings in Universal (documentation link) define the content of the created package (with universal:packageBin the zip file). I assume that you are using the JavaAppPackaging plugin, which configures your entire build. By default all dependencies and your actual build artifact end up in the libs folder. Start scripts are being place in bin.
The start scripts also create a valid classpath, which includes all libraries in lib and nothing else by default.
TL;DR You simply put your files in src/main/resources and they will be available on the classpath.
How to find files on the classpath
You posted this snippet
getClass.getResource("/myFile.txt")
This will lookup a file called myFile.txt in the roots of your classpath. As in the comment suggested you should open your application jar file and find a text file myFile.txt at the root, otherwise it won't be found.
hope that helps,
Muki

Related

How to add external jar files to a spark scala project

I am trying to use an LSH implementation of Scala(https://github.com/marufaytekin/lsh-spark) in my Spark project.I cloned the repository with some changes to the sbt file (added Organisation)
To use this implementation , I compiled it using sbt compile and moved the jar file to the "lib" folder of my project and updated the sbt configuration file of my project , which looks like this ,
Now when I try to compile my project using sbt compile , It fails to load the external jar file ,showing the error message "unresolved dependency: com.lendap.spark.lsh.LSH#lsh-scala_2.10;0.0.1-SNAPSHOT: not found".
Am i following the right steps for adding an external jar file ?
How do i solve the dependency issue
As an alternative, you can build the lsh-spark project and add the jar in your spark application.
To add the external jars, addJar option can be used while executing spark application. Refer Running spark application on yarn
This issue isn't related to spark but to sbt configuration.
Make sure you followed the correct folder structure imposed by sbt and added your jar in the lib folder, as explained here - lib folder should be at the same level as build.sbt (cf. this post).
You might also want to check out this SO post.

Add a .so library with class files to sbt scala project

I need to call some c functions from my sbt project. I have already used SWIG and created a .so file along with .class files.
I wrapped everything in a jar file and put it in lib/ folder but it keeps saying no value found.
How can I use the .so library generated from swig along with the .classes files in an sbt project?
The name of the library is libsample.so.
I can successfully upload the library in sbt with System.LoadLibrary("sample") but I cannot call sample.entry() (not found value sample...)
Can you use something like a DependencyWalker and make sure your are not missing any dependencies on they box?

How to setup SBT dependency for scala code that is located in same directory as Build.scala

Our project features a kind of adhoc "plugin" that reads csv files and stuffs the contents into a database.
This code is defined in /project/DataMigrate.scala
We had our own poorly implemented version of a CSV parser that isn't up to the task anymore so I tried to add this https://github.com/tototoshi/scala-csv to the libraryDependencies in /project/Build.scala but that did not make it importable from DataMigrate.scala. I also tried putting the library dependency in /project/build.sbt as I read something about "Build definition of a build definition", but that did not seem to help either.
Is it at all possible to add dependencies for code that lives in /project?
SBT is recursive, so just as you can define dependencies and settings of the actual project in [something].sbt or project/[something].scala you can define dependencies and settings of the projects project (any ad hoc plugins etc) in project/[something].sbt or project/project/[something].scala

How to make sbt-native-packager refrain from putting my resources into a jar file?

I'm using the sbt-native-packager plugin to generate a start script for my application, which is very convenient as this plugin generates the correct classpath specification with all my library dependencies. I am not distributing this applictaion, therefore I'm not packaging the entire thing into one tarball. I just use the lib directory generated by sbt-native-packager that contains all the jar-files on which my project depends, both third-party libraries as well as the jar-file that contains my own class and resource files.
In my project's src/main/resources directory I have files that I want to be able to edit without having to use sbt-native-packager to regenerate the entire installation, for example configuration files. This is difficult because those files are zipped up in the jar file with all my classes.
Question: how can I tell sbt-native-packager not to put my resource files into a jar-file, while still generating the start-script with the correct classpath for those resource files to be located and read by my application as they are now from within the jar file? If this means leaving all my class files out of a jar file that is fine, as long as the files from src/main/resources remain as files that I can change without re-invoking sbt stage and as long as the start-script works.
While it is possible to filter these resources I would suggest to put them into a different directory and add them to the classpath.
Modifying the start script generated by sbt-native-packager is a bit cumbersome as the class com.typesafe.sbt.packager.archetypes.JavaAppBashScript that is generating the classpath is prefixing all paths with $lib_dir/. The cleanest approach would probably be to provide your own implementation and use that to generate the bashScriptDefines.
A simpler but hacky way would be to just add the following lines to your build.sbt:
packageArchetype.java_server
// add your config files to the classpath for running inside sbt
unmanagedClasspath in Compile += Attributed.blank(sourceDirectory.value/"main"/"config")
// map all files in src/main/config to config in the packaged app
mappings in Universal ++= {
val configDir = sourceDirectory.value/"main"/"config"
for {
file <- (configDir ** AllPassFilter).get
relative <- file.relativeTo(configDir.getParentFile)
mapping = file -> relative.getPath
} yield mapping
}
scriptClasspath ~= (cp => "../config" +: cp)
This will prepend $lib_dir/../config to your start script's classpath. If your app has to run on Windows you will have to provide similar settings for the batScriptDefines.

Scala Play "dist" command creates zip archive with 2 entries with identical file names

Our Play Scala project includes the Stanford NLP package as a dependency, declared as follows in our Build.scala file:
val coreNlp = "edu.stanford.nlp" % "stanford-corenlp" % "3.2.0"
The dependency is resolved on Maven and results in the download of two distinct jar files:
stanford-corenlp-3.2.0.jar
stanford-corenlp-3.2.0-models.jar
as shown when searching Maven.
So far, so good: our Play app works properly.
But when we package the app using one of the commands below:
play dist
sbt dist
both jar files end up with the same exact name,
edu.stanford.nlp.stanford-corenlp-3.2.0.jar
in the resulting .zip archive artifact, as shown by running the following command on the .zip file:
unzip -lv generated-dist.zip
Subsequently, when the .zip is later inflated to deploy the app, only one of the two jar files survives, and we're unable to run the app thus deployed, since one of the two jars ends up missing.
Is there a workaround, e.g. a different way to declare the dependency, to avoid name collision when dist is executed?