sbt+spring assembly deduplicate: different file contents found in the following - scala

where i use sbt assembly my scala project, i met a error like that: deduplicate: different file contents found in the following: and below is the picture:
and my build.sbt:
sbt version: 0.13.15
scala version: 2.8.11
jdk: 1.8

You need to set the assembly merge strategy to either take one of the files, concat them or remove them completely:
assemblyMergeStrategy in assembly := {
case PathList("org", "springframework", xs#_*) => MergeStrategy.last
case x => MergeStrategy.defaultMergeStrategy(x)
}

I am using jdk 8(tried in 11 also) / sbt 1.4.7 / scala 2.13.3 and the following 2 plugins in my rootprojectdir/project/plugins.sbt
addSbtPlugin("com.typesafe.sbt" % "sbt-native-packager" % "1.8.0")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.15.0")
assemblyMergeStrategy in assembly := {
case PathList(ps # _*) if ps.contains("module-info.class") => MergeStrategy.discard
case x => MergeStrategy.defaultMergeStrategy(x)
}
Also I have added the above strategy in my build.sbt, but still getting the same error. The sbt-assembly plugin ticket is still open in github.
Any workaround will be much appreciated

Related

sbt-assembly is not including scala libraries

The code I'm writing will be run in AWS Lambda which only has the Java 8 runtime installed so I need the scala libraries to be included in my jar. When I give it the jar I built with sbt-assembly I'm getting java.lang.NoClassDefFoundError: scala/Function3.
This is all I have in build.sbt for the assembly plugin:
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = true)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", _ # _*) => MergeStrategy.discard
case _ => MergeStrategy.first
}
This still happens whether or not I have the line that discards the META-INF files.
I used brew to install scala and I have tried setting my $SCALA_HOME to /usr/local/opt/scala/idea (caveats section of brew info scala) and /usr/local/bin/scala (output of which scala)
:: EDIT ::
I unpacked the jar and found that the class in question was actually included in the jar here: scala/Function3_scala-library-2.12.7_scala-library-2.12.7_scala-library-2.12.7.class
I added scala-library.jar to my classpath in Intellij, removed the target folder and ran sbt assembly and that worked.

Rename assembly-generated uberjar in SBT

How to rename and move an uberjar generated with SBT assembly plugin?
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6")
My assemblyMergeStrategy(for META-INF removal):
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
It generates something like :
target/scala-2.12/my-project-assembly-0.1.jar
which I would like to able to automatically rename (and generate in another directory) with a consistent name (without the need of a separate script).
You can find a bit of documentation in project's page. There, you can find the keys you can rewrite for the assembly task.
The ones you are searching for are assemblyJarName and assemblyOutputPath. Then, your project build should look something like:
lazy val myProject = (project in file(".")).
settings(
...
assemblyJarName in assembly := "myName.jar",
assemblyOutputPath in assembly := "...",
...
)

Proper way to make a Spark Fat Jar using SBT

I need a Fat Jar with Spark because I'm creating a custom node for Knime. Basically it's a self-contained jar executed inside Knime and I assume a Fat Jar is the only way to spawn a local Spark Job. Eventually we will go on submitting a job to a remote cluster but for now I need it to spawn this way.
That said, I made a Fat Jar using this: https://github.com/sbt/sbt-assembly
I made an empty sbt project, included Spark-core in the dependencies and assembled the Jar. I added it to the manifest of my custom Knime node and tried to spawn a simple job (pararellize a collection, collect it and print it). It starts but I get this error:
No configuration setting found for key 'akka.version'
I have no idea how to solve it.
Edit: this is my build.sbt
name := "SparkFatJar"
version := "1.0"
scalaVersion := "2.11.6"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.3.0"
)
libraryDependencies += "com.typesafe.akka" %% "akka-actor" % "2.3.8"
assemblyJarName in assembly := "SparkFatJar.jar"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
I've found this mergestrategy for Spark somewhere on the internet but I can't find the source right now.
I think the issue is with how you've setup assemblyMergeStrategy. Try this:
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case "application.conf" => MergeStrategy.concat
case "reference.conf" => MergeStrategy.concat
case x =>
val baseStrategy = (assemblyMergeStrategy in assembly).value
baseStrategy(x)
}

How to build an Uber JAR (Fat JAR) using SBT within IntelliJ IDEA?

I'm using SBT (within IntelliJ IDEA) to build a simple Scala project.
I would like to know what is the simplest way to build an Uber JAR file (aka Fat JAR, Super JAR).
I'm currently using SBT but when I'm submiting my JAR file to Apache Spark I get the following error:
Exception in thread "main" java.lang.SecurityException: Invalid
signature file digest for Manifest main attributes
Or this error during compilation time:
java.lang.RuntimeException: deduplicate: different file contents found
in the following:
PATH\DEPENDENCY.jar:META-INF/DEPENDENCIES
PATH\DEPENDENCY.jar:META-INF/MANIFEST.MF
It looks like it is because some of my dependencies include signature files (META-INF) which needs to be removed in the final Uber JAR file.
I tried to use the sbt-assembly plugin like that:
/project/assembly.sbt
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.12.0")
/project/plugins.sbt
logLevel := Level.Warn
/build.sbt
lazy val commonSettings = Seq(
name := "Spark-Test"
version := "1.0"
scalaVersion := "2.11.4"
)
lazy val app = (project in file("app")).
settings(commonSettings: _*).
settings(
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.2.0",
"org.apache.spark" %% "spark-streaming" % "1.2.0",
"org.apache.spark" % "spark-streaming-twitter_2.10" % "1.2.0"
)
)
When I click "Build Artifact..." in IntelliJ IDEA I get a JAR file. But I end up with the same error...
I'm new to SBT and not very experimented with IntelliJ IDE.
Thanks.
Finally I totally skip using IntelliJ IDEA to avoid generating noise in my global understanding :)
I started reading the official SBT tutorial.
I created my project with the following file structure :
my-project/project/assembly.sbt
my-project/src/main/scala/myPackage/MyMainObject.scala
my-project/build.sbt
Added the sbt-assembly plugin in my assembly.sbt file. Allowing me to build a fat JAR :
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.12.0")
My minimal build.sbt looks like :
lazy val root = (project in file(".")).
settings(
name := "my-project",
version := "1.0",
scalaVersion := "2.11.4",
mainClass in Compile := Some("myPackage.MyMainObject")
)
val sparkVersion = "1.2.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming-twitter" % sparkVersion
)
// META-INF discarding
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
}
Note: The % "provided" means not to include the dependency in the final fat JAR (those libraries are already included in my workers)
Note: META-INF discarding inspired by this answser.
Note: Meaning of % and %%
Now I can build my fat JAR using SBT (how to install it) by running the following command in my /my-project root folder:
sbt assembly
My fat JAR is now located in the new generated /target folder :
/my-project/target/scala-2.11/my-project-assembly-1.0.jar
For those who wants to embeed SBT within IntelliJ IDE: How to run sbt-assembly tasks from within IntelliJ IDEA?
3 Step Process For Building Uber JAR/Fat JAR in IntelliJ Idea:
Uber JAR/Fat JAR : JAR file having all external libraray dependencies in it.
Adding SBT Assembly plugin in IntelliJ Idea
Go to ProjectName/project/target/plugins.sbt file and add this line addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.12.0")
Adding Merge,Discard and Do Not Add strategy in build.sbt
Go to ProjectName/build.sbt file and add the Strategy for Packaging of an Uber JAR
Merge Strategy : If there is conflict in two packages about a version of library then which one to pack in Uber JAR.
Discard Strategy : To remove some files from library which you do not want to package in Uber JAR.
Do not Add Strategy : Do not add some package to Uber JAR.For ex: spark-core will be already present at your Spark Cluster.So we should not package this in Uber JAR
Merge Strategy and Discard Strategy Basic Code :
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
So you are asking to discard META-INF files using this command MergeStrategy.discard and for rest of the files you are taking the first occurrence of library file if there is any conflict by using this command MergeStrategy.first.
Do not Add Strategy Basic Code :
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.1" %"provided"
If we do not want to add the spark-core to our Uber JAR file as it will be already on our clutser, so we are adding the % "provided" at end of it library dependency.
Building Uber JAR with all its dependencies
In terminal type sbt assembly for building up the package
Voila!!! Uber JAR is built. JAR will be in ProjectName/target/scala-XX
Add the following line to your project/plugins.sbt
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.12.0")
Add the following to your build.sbt
mainClass in assembly := some("package.MainClass")
assemblyJarName := "desired_jar_name_after_assembly.jar"
val meta = """META.INF(.)*""".r
assemblyMergeStrategy in assembly := {
case PathList("javax", "servlet", xs # _*) => MergeStrategy.first
case PathList(ps # _*) if ps.last endsWith ".html" => MergeStrategy.first
case n if n.startsWith("reference.conf") => MergeStrategy.concat
case n if n.endsWith(".conf") => MergeStrategy.concat
case meta(_) => MergeStrategy.discard
case x => MergeStrategy.first
}
The Assembly merge strategy is used to resolve conflicts occurred when creating fat jar.

sbt: how to write a MergeStrategy to choose a specific library version

Here is the attempt to exclude the javax.servlet classes:
libraryDependencies ++= Seq(
..
("org.apache.spark" % "spark-core_2.10" % sparkVersion \
% "compile->default" withSources()).exclude("org.mortbay.jetty", "servlet-api"),
Here is my attempt at a MergeStrategy:
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => {
..
case PathList("javax", "servlet", xs # _*) => MergeStrategy.singleOrError
..
Following shows that multiple copies of the javax.servlet classes are being loaded:
[error] (*:assembly) singleOrError: found multiple files for same
target path: [error]
C:\Users\s80035683.ivy2\cache\org.mortbay.jetty\servlet-api-2.5\jars\servlet-api-2.5-6.1.14.jar:javax/servlet/Filter.class
[error]
C:\Users\s80035683.ivy2\cache\javax.servlet\servlet-api\jars\servlet-api-2.5.jar:javax/servlet/Filter.class
[error]
C:\Users\s80035683.ivy2\cache\org.eclipse.jetty.orbit\javax.servlet\orbits\javax.servlet-3.0.0.v201112011016.jar:javax/servlet/Filter.class
[error]
C:\Users\s80035683.ivy2\cache\org.mortbay.jetty\servlet-api\jars\servlet-api-2.5-20110124.jar:javax/servlet/Filter.class
Note: this is not a DUPLICATE since similar questions do NOT address this issue. For example +the answer to the following question ("highest version selected by default") is NOT working
How could SBT choose highest version amongst dependencies?
You can use sbt-dependency-graph and see from where the servlet-api is included as a transitive dependency.
You can do that by using what-depends-on <organization> <module> <revision> command.
Knowing that you can exclude the transitive dependency.
This is easier than excluding class from a specific jar (unless you don't care which jar, and you just want to take it from any jar).
You can also check my answer to another question on how to write a custom MergeStrategy, which would know from which jar the class was comming.