sbt package: exclude Python files in JAR file - scala

I'd like to figure out how to excluded the Python files from the JAR file generated by the sbt package command.
The Delta lake project uses SBT version 0.13.18 and the JAR file is built with sbt package.
The directory structure is as follows:
python/
delta/
testing/
utils.py
tests/
test_sql.py
src/
main/
...scala files
build.sbt
run-tests.py
It follows the standard SBT project structure with a couple of Python files added in.
The Python files are included in the JAR file when sbt package is run and I'd like to figure out how to exclude all the Python files.
Here's what I tried adding to the build.sbt file:
mappings in (Compile, packageBin) ~= { _.filter(!_._1.getName.endsWith("py")) } per this answer
excludeFilter in Compile := "*.py" per this answer
Neither of these worked.

Haven't tested it, but I think something like this when you make a fat jar.
assemblyMergeStrategry in assembly := {
case PathList(parts # _*) if parts.last.endsWith(".py") => MergeStrategy.discard
case _ => MergeStrategy.first // or whatever you currently have for your other merges
}

Related

Exclude Packages From Source Directory in SBT

I have been trying to get this to work where I would like to evict some packages being part of the finally packaged JAR.
// Filter when packaging
excludeFilter in (Compile, unmanagedSources) ~= { _ ||
((f: File) =>
f.getPath.containsSlice("/encode/"))
}
That piece of code in my build.sbt does indeed excludes the encode package in my src directory completely, but how do I do it when I need to exclude multiple such packages? Any ideas?sbt

sbt: set the base-directory of a remote RootProject

Disclaimer: I am new to sbt and Scala so I might be missing obvious things.
My objective here is to use the Scala compiler as a library from my main project. I was initially doing that by manually placing the scala jars in a libs directory in my project and then including that dir in my classpath. Note that at the time I wasn't using sbt. Now, I want to use sbt and also download the scala sources from github, build the scala jars and then build my project. I start by creating 2 directories: myProject and myProject/project. I then create the following 4 files:
The sbt version file:
// File 1: project/build.properties
sbt.version=0.13.17
The plugins file (not relevant to this question):
// File 2: project/plugins.sbt
addSbtPlugin("com.eed3si9n" % "sbt-buildinfo" % "0.7.0")
The build.sbt file:
// File 3: build.sbt
lazy val root = (project in file(".")).
settings(
inThisBuild(List(
organization := "me",
scalaVersion := "2.11.12",
version := "0.1.0-SNAPSHOT"
)),
name := "a name"
).dependsOn(ScalaDep)
lazy val ScalaDep = RootProject(uri("https://github.com/scala/scala.git"))
My source file:
// File 4: Test.scala
import scala.tools.nsc.MainClass
object Test extends App {
println("Hello World !")
}
If I run sbt inside myProject then sbt will download the scala sources from github and then try to compile them. The problem is that the base-directory is still myProject. This means that if the scala sbt source files refer to something that is in the scala base-directory they won't find it. For example, the scala/project/VersionUtil.scala file tries to open the scala/versions.properties file that lies in the scala base-directory.
Question: How can I set sbt to download a github repo and then build it using that project's base-directory instead of mine's (by that I mean the base-directory of myProject in the above example) ??
Hope that makes sense.
I would really appreciate any feedback on this.
Thanks in advance !
In the Scala ecosystem you usually depend on binary artifacts (libraries) that are published in Maven or Ivy repositories. Virtually all Scala projects publish binaries, including the compiler. So all you have to do is add the line below to your project settings:
libraryDependencies += "org.scala-lang" % "scala-compiler" % scalaVersion.value
dependsOn is used for dependencies between sub-projects in the same build.
For browsing sources you could use an IDE. IntelliJ IDEA can readily import Sbt projects and download/attach sources for library dependencies. Eclipse has an Sbt plugin that does the same. Ensime also, etc. Or just git clone the repository.

Including a Spark Package JAR file in a SBT generated fat JAR

The spark-daria project is uploaded to Spark Packages and I'm accessing spark-daria code in another SBT project with the sbt-spark-package plugin.
I can include spark-daria in the fat JAR file generated by sbt assembly with the following code in the build.sbt file.
spDependencies += "mrpowers/spark-daria:0.3.0"
val requiredJars = List("spark-daria-0.3.0.jar")
assemblyExcludedJars in assembly := {
val cp = (fullClasspath in assembly).value
cp filter { f =>
!requiredJars.contains(f.data.getName)
}
}
This code feels like a hack. Is there a better way to include spark-daria in the fat JAR file?
N.B. I want to build a semi-fat JAR file here. I want spark-daria to be included in the JAR file, but I don't want all of Spark in the JAR file!
The README for version 0.2.6 states the following:
In any case where you really can't specify Spark dependencies using sparkComponents (e.g. you have exclusion rules) and configure them as provided (e.g. standalone jar for a demo), you may use spIgnoreProvided := true to properly use the assembly plugin.
You should then use this flag on your build definition and set your Spark dependencies as provided as I do with spark-sql:2.2.0 in the following example:
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0" % "provided"
Please note that by setting this your IDE may no longer have the necessary dependencies references to compile and run your code locally, which would mean that you would have to add the necessary JARs to the classpath by hand. I do this often on IntelliJ, what I do is having a Spark distribution on my machine and adding its jars directory to the IntelliJ project definition (this question may help you with that, should you need it).

How can I publish my scala play project without application.conf?

In my case I run this commands for publish my scala project library:
./activator clean compile
./activator test
./activator publish-local
I need my local library application.conf only for tests.
But I wanna use application.conf of my new project
which use this library jar as dependency.
How can I build jar without application.conf?
Try to add the following to your build.sbt:
mappings in (Compile, packageBin) ~= { (ms: Seq[(File, String)]) =>
ms filterNot { case (file, dest) =>
dest.contains("application.conf")
}
}
This is adapted from another question (removing files from dist task output), but should work here also

Multiple project dependencies in SBT native packager

I am using the SBT native packager plugin (https://github.com/sbt/sbt-native-packager) for a project composed of multiple modules.
In my SBT settings I have:
lazy val settings = packageArchetype.java_application ++ Seq(
...
// Java is required to install this application
debianPackageDependencies in Debian ++= Seq("java2-runtime"),
// Include the module JAR in the ZIP file
mappings in Universal <+= (packageBin in Compile) map { jar =>
jar -> ("lib/" + jar.getName)
}
)
The problem is that the generated ZIP, or DEB for example, do not seem to include my project's modules dependencies. There is only the final module JAR, and the libraries used in it, but not the modules that it depends on.
Do you know how could I fix that?
Found a solution to my problem:
I needed to add exportJars := true in my settings for all my internal dependencies to be embedded in the package.