Shading over third party classes - scala

I'm currently facing a problem with deploying an uber-jar to a Spark Streaming application, where there are congruent JARs with different versions which are causing spark to throw run-time exceptions. The library in question is TypeSafe Config.
After attempting many things, my solution was to defer to shading the provided dependency so it won't clash with the JAR provided by Spark at run-time.
Hence, I went to the documentation for sbt-assembly and under shading, I saw the following example:
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("org.apache.commons.io.**" -> "shadeio.#1")
.inLibrary("commons-io" % "commons-io" % "2.4", ...).inProject
)
Attempting to shade over com.typesafe.config, I tried applying the following solution to my build.sbt:
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.typesafe.config.**" -> "shadeio.#1").inProject
)
I assumed it was supposed to rename any reference to TypeSafe Config in my project. But, this doesn't work. It matches multiple classes in my project and causing them to be removed from the uber jar. I see this when trying to run sbt assembly:
Fully-qualified classname does not match jar entry:
jar entry: ***/Identifier.class
class name: **/Identifier.class
Omitting ***/OtherIdentifier.class.
Fully-qualified classname does not match jar entry:
jar entry: ***\SparkBaseJobRunner$$anonfun$1.class
class name: ***/SparkBaseJobRunner$$anonfun$1.class
I also attempted using:
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.typesafe.config.**" -> "shadeio.#1")
.inLibrary("com.typesafe" % "config" % "1.3.0")
This did finish the assemblying process of the uber JAR, but didn't have the desired run time effect.
I'm not sure I fully comprehend the effect shading has on my build process with sbt.
How can I shade over references to com.typesafe.config in my project so when I invoke the library at run-time Spark will load my shaded library and avoid the clash caused by versioning?
I'm running sbt-assembly v0.14.1

Turns out this was a bug in sbt-assembly where shading was completely broken on Windows. This caused source files to be removed from the uber JAR, and for tests to fail as the said classes were unavailable.
I created a pull request to fix this. Starting version 0.14.3 of SBT, the shading feature works properly. All you need to do is update to the relevant version in plugins.sbt:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")
In order to shade a specific JAR in your project, you do the following:
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.typesafe.config.**" -> "my_conf.#1")
.inLibrary("com.typesafe" % "config" % "1.3.0")
.inProject
)
This will rename the com.typesafe.config assembly to be packaged inside my_conf. You can then view this using jar -tf on your assembly (omitted irrelevant parts for brevity):
***> jar -tf myassembly.jar
my_conf/
my_conf/impl/
my_conf/parser/
Edit
I wrote a blog post describing the issue and the process that led to it for anyone interested in a more in-depth explanation.

Related

How to externalize protobuf files in JVM ecosystem?

I stumbled upon this Akka grpc tutorial which suggests that we can create a jar from a project that has .proto file under src/main/proto and add it as a dependency in client and server projects to build their respective stubs.
libraryDependencies += "com.example" %% "my-grpc-service" % "1.0.0" % "protobuf-src"
But this doesn't seem to work!! Are there any example projects that demonstrates how this would work in action? How can we externalize protobuf sources and use the same in a jvm based project?
I was able to figure out how to externalise protobuf files properly as per suggestion from akka-grpc docs.
The problem was that I was not adding sbt-akka-grpc plugin required by sbt to recognise .proto files and include them in the packaged jar. Without this plugin there won't be any .proto file made available in the packaged jar.
addSbtPlugin("com.lightbend.akka.grpc" % "sbt-akka-grpc" % "1.1.0")
Make sure to add organization settings in your build.sbt to prepare jar correctly.
organization := "com.iamsmkr"
Also, if you wish to cross-compile this jar to multiple versions add following entries in your build.sbt:
scalaVersion := "2.13.3"
crossScalaVersions := Seq(scalaVersion.value, "2.12.14")
and then to publish:
$ sbt +publishLocal
With appropriate jars published you can now add them as dependencies in your client and server projects like so:
libraryDependencies +=
"com.iamsmkr" %% "prime-protobuf" % protobufSourceVersion % "protobuf-src"
You can check out this project I am working on to see this in action.
Alternate Way
An alternate way I figured is that you can keep your .proto files in a root directory and then refer them in client and server build.sbt like so:
PB.protoSources.in(Compile) := Seq(sourceDirectory.value / ".." / ".." / "proto")
Checkout this project to see it in action.

Including a Spark Package JAR file in a SBT generated fat JAR

The spark-daria project is uploaded to Spark Packages and I'm accessing spark-daria code in another SBT project with the sbt-spark-package plugin.
I can include spark-daria in the fat JAR file generated by sbt assembly with the following code in the build.sbt file.
spDependencies += "mrpowers/spark-daria:0.3.0"
val requiredJars = List("spark-daria-0.3.0.jar")
assemblyExcludedJars in assembly := {
val cp = (fullClasspath in assembly).value
cp filter { f =>
!requiredJars.contains(f.data.getName)
}
}
This code feels like a hack. Is there a better way to include spark-daria in the fat JAR file?
N.B. I want to build a semi-fat JAR file here. I want spark-daria to be included in the JAR file, but I don't want all of Spark in the JAR file!
The README for version 0.2.6 states the following:
In any case where you really can't specify Spark dependencies using sparkComponents (e.g. you have exclusion rules) and configure them as provided (e.g. standalone jar for a demo), you may use spIgnoreProvided := true to properly use the assembly plugin.
You should then use this flag on your build definition and set your Spark dependencies as provided as I do with spark-sql:2.2.0 in the following example:
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0" % "provided"
Please note that by setting this your IDE may no longer have the necessary dependencies references to compile and run your code locally, which would mean that you would have to add the necessary JARs to the classpath by hand. I do this often on IntelliJ, what I do is having a Spark distribution on my machine and adding its jars directory to the IntelliJ project definition (this question may help you with that, should you need it).

sbt assembly package dependencies in multiple artifacts

I am trying to generate, with sbt assembly, from a single project several jars. Each containing some of the dependencies.
So far I have found only this QA that is close to what I am looking for. However I don't need to have separate configs, basically when I run assembly, I just want to generate all the different jars.
To be more concrete. I want to generate:
One jar with my code and some general dependencies
One jar with hadoop dependencies <- this is the problem, as I don't know how to say, generate another jar that has only those dependencies.
One jar with scala
Without going deep into complex sbt configurations, you could try another approach. The hadoop dependencies being standard, you could mark them as provided in your build to exclude them.
"org.apache.hadoop" % "hadoop-client" % "2.6.0" % "provided"
For Scala, the library jar is also standard and can be downloaded separately by your "user". To remove it from the fat jar, use the following setting (assembly 0.13.0):
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
The user of your fat jar is then aked to provide both Scala and Hadoop libraries in the classpath.
For example, when using Spark this is the correct approach as these two libraries are both provided by the Spark running environment. The same logic applies for the Hadoop MapReduce environment.

How do I resolve my own test artifacts in SBT?

One of my projects will provide a jar package supposed to be used for unit testing in several other projects. So far I managed to make sbt produce a objects-commons_2.10-0.1-SNAPSHOT-test.jar and have it published in my repository.
However, I can't find a way to tell sbt to use that artifact with the testing scope in other projects.
Adding the following dependencies in my build.scala will not get the test artifact loaded.
"com.company" %% "objects-commons" % "0.1-SNAPSHOT",
"com.company" %% "objects-commons" % "0.1-SNAPSHOT-test" % "test",
What I need is to use the default .jar file as compile and runtime dependency and the -test.jar as dependency in my test scope. But somehow sbt never tries to resolve the test jar.
How to use test artifacts
To enable publishing the test artifact when the main artifact is published you need to add to your build.sbt of the library:
publishArtifact in (Test, packageBin) := true
Publish your artifact. There should be at least two JARs: objects-commons_2.10.jar and objects-commons_2.10-test.jar.
To use the library at runtime and the test library at test scope add the following lines to build.sbt of the main application:
libraryDependencies ++= Seq("com.company" % "objects-commons_2.10" % "0.1-SNAPSHOT"
, "com.company" % "objects-commons_2.10" % "0.1-SNAPSHOT" % "test" classifier "tests" //for SBT 12: classifier test (not tests with s)
)
The first entry loads the the runtime libraries and the second entry forces that the "tests" artifact is only available in the test scope.
I created an example project:
git clone git#github.com:schleichardt/stackoverflow-answers.git --branch so15290881-how-do-i-resolve-my-own-test-artifacts-in-sbt
Or you can view the example directly in github.
Your problem is that sbt thinks that your two jars are the same artifact, but with different versions. It takes the "latest", which is 0.1-SNAPSHOT, and ignores the 0.1-SNAPSHOT-test. This is the same behaviour as you would see if, for instance you have 0.1-SNAPSHOT and 0.2-SNAPSHOT.
I don't know what is in these two jars, but if you want them both to be on the classpath, which is what you seem to want to do, then you'll need to change the name of the test artifact to objects-commons-test, as Kazuhiro suggested. It seems that this should be easy enough for you, since you're already putting it in the repo yourself.
It will work fine if you change the name like this.
"com.company" %% "objects-commons" % "0.1-SNAPSHOT",
"com.company" %% "objects-commons-test" % "0.1-SNAPSHOT" % "test",

When does a SBT package get downloaded/built?

I want to use http://dispatch.databinder.net/Dispatch.html .
The site indicates I must add this to project/plugins.sbt:
libraryDependencies += "net.databinder.dispatch" %% "core" % "0.9.1"
which I did. I then restarted the play console and compiled.
Importing doesnt work:
import dispatch._
Guess I have been silly, but then I never used a build system when using Java.
How must I trigger the process that downloads/builds the package? Where are the jars (or equivalent) stored; can I reuse them? When is the package available for use by the Play application?
It doesn't say you should add it to project/plugins.sbt. That is the wrong place. It says to add to the build.sbt file, on the root of your project. Being a Play project, project/Build.scala might be more appropriate -- I don't know if it will pick up settings from build.sbt or not.
To add the dependency in your Build.scala:
val appDependencies = Seq(
"net.databinder.dispatch" %% "core" % "0.9.1"
)
You probably need to run sbt update.
From the sbt Command Line Reference:
update Resolves and retrieves external dependencies as described in library dependencies.