Deduplication in build.sbt - scala

There are conflicting jline jar files in the includes path that need to be deduplicated. I have attempted to do so as follows:
I am building a combined spark and kafka fat jar and the jline jar file is duplicated.
Here is the build.sbt file
import sbt._
import sbt.Keys._
import java.io.File
import AssemblyKeys._
name := "kafkascala"
version := "0.1.0-SNAPSHOT"
scalaVersion := "2.10.4"
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.10" % "0.9.1",
"org.apache.spark" % "spark-examples_2.10" % "0.9.1",
"org.apache.spark" % "spark-tools_2.10" % "0.9.1",
"org.apache.kafka" % "kafka_2.10" % "0.8.1.1" intransitive()
withSources
)
resolvers += "Apache repo" at "https://repository.apache.org/content/repositories/releases"
assemblySettings
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => {
case PathList("maven","jline","jline","pom.properties") => MergeStrategy.discard
case x => old(x)
}
}
Here is the error output from running sbt assembly
java.lang.RuntimeException: deduplicate: different file contents found in the following:
C:\Users\S80035683\.ivy2\cache\jline\jline\jars\jline-0.9.94.jar:META-INF/maven/jline/jline/pom.properties
C:\Users\S80035683\.ivy2\cache\org.jruby\jruby-complete\jars\jruby-complete-1.6.5.jar:META-INF/maven/jline/jline/pom.properties
at sbtassembly.Plugin$Assembly$.sbtassembly$Plugin$Assembly$$applyStrategy$1(Plugin.scala:253)
at sbtassembly.Plugin$Assembly$$anonfun$15.apply(Plugin.scala:270)
at sbtassembly.Plugin$Assembly$$anonfun$15.apply(Plugin.scala:267)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at sbtassembly.Plugin$Assembly$.applyStrategies(Plugin.scala:272)
at sbtassembly.Plugin$Assembly$.x$4$lzycompute$1(Plugin.scala:172)
at sbtassembly.Plugin$Assembly$.x$4$1(Plugin.scala:170)
at sbtassembly.Plugin$Assembly$.stratMapping$lzycompute$1(Plugin.scala:170)
at sbtassembly.Plugin$Assembly$.stratMapping$1(Plugin.scala:170)
at sbtassembly.Plugin$Assembly$.inputs$lzycompute$1(Plugin.scala:214)
at sbtassembly.Plugin$Assembly$.inputs$1(Plugin.scala:204)
at sbtassembly.Plugin$Assembly$.apply(Plugin.scala:230)
at sbtassembly.Plugin$Assembly$$anonfun$assemblyTask$1.apply(Plugin.scala:373)
at sbtassembly.Plugin$Assembly$$anonfun$assemblyTask$1.apply(Plugin.scala:370)
at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:42)
at sbt.std.Transform$$anon$4.work(System.scala:64)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:237)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:237)
at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:18)
at sbt.Execute.work(Execute.scala:244)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:237)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:237)
at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:160)
at sbt.CompletionService$$anon$2.call(CompletionService.scala:30)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[error] (*:assembly) deduplicate: different file contents found in the following:
[error] C:\Users\S80035683\.ivy2\cache\jline\jline\jars\jline-0.9.94.jar:META-INF/maven/jline/jline/pom.properties
[error] C:\Users\S80035683\.ivy2\cache\org.jruby\jruby-complete\jars\jruby-complete-1.6.5.jar:META-INF/maven/jline/jline/pom.properties
Here is the section of build.sbt that was attempting to address the issue:
assemblySettings
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => {
case PathList("maven","jline","jline","pom.properties") => MergeStrategy.discard
case x => old(x)
}
}

Your PathList is wrong. You're missing "META-INF" in front of the "maven". Something like this should work:
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => {
case PathList("META-INF", "maven","jline","jline", "pom.properties" ) => MergeStrategy.discard
case PathList("META-INF", "maven","jline","jline", "pom.xml" ) => MergeStrategy.discard
case x => old(x)
}
}
or a bit more concise (exclude both pom.properties and pom.xml in one line):
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => {
case PathList("META-INF", "maven","jline","jline", ps) if ps.startsWith("pom") => MergeStrategy.discard
case x => old(x)
}
}

I have reached the conclusion that the defaultMergeStrategy in sbt is unnecessarily causing deduplicate errors. According to the sbt assembly author the decision to fail rather than select first is because they believe the user should have to explicitly decide, but most users don't even know what the error means and takes them ages to figure out that changing deduplicate to first just makes stuff work. I think making the build fail was a mistake, the following is how the defaultMergeStrategy should look:
// Strat copied from defaultMergeStrategy with the
// "fail and confuse the hell out the user" lines changed to
// "just bloody work and stop pissing everyone off"
mergeStrategy in assembly <<= (mergeStrategy in assembly) ((old) => {
case x if Assembly.isConfigFile(x) =>
MergeStrategy.concat
case PathList(ps # _*) if Assembly.isReadme(ps.last) || Assembly.isLicenseFile(ps.last) =>
MergeStrategy.rename
case PathList("META-INF", xs # _*) =>
(xs map {_.toLowerCase}) match {
case ("manifest.mf" :: Nil) | ("index.list" :: Nil) | ("dependencies" :: Nil) =>
MergeStrategy.discard
case ps # (x :: xs) if ps.last.endsWith(".sf") || ps.last.endsWith(".dsa") =>
MergeStrategy.discard
case "plexus" :: xs =>
MergeStrategy.discard
case "services" :: xs =>
MergeStrategy.filterDistinctLines
case ("spring.schemas" :: Nil) | ("spring.handlers" :: Nil) =>
MergeStrategy.filterDistinctLines
case _ => MergeStrategy.first // Changed deduplicate to first
}
case PathList(_*) => MergeStrategy.first // added this line
})

Related

unable to merge aws classes in sbt

I'm having trouble assembling a scala project with sbt. I have a merge conflict involving aws dependency jars.
I looked at a bunch of posts and I don't understand why my merge strategy is not working.
This is my assembly error:
[error] (*:assembly) deduplicate: different file contents found in the following:
[error] /Users/xxxx/.ivy2/cache/com.amazon.redshift/redshift-jdbc42/jars/redshift-jdbc42-1.2.27.1051.jar:com/amazonaws/auth/AWSCredentials.class
[error] /Users/xxxx/.ivy2/cache/com.amazonaws/aws-java-sdk-core/jars/aws-java-sdk-core-1.11.339.jar:com/amazonaws/auth/AWSCredentials.class
[error] deduplicate: different file contents found in the following:
[error] /Users/xxxx/.ivy2/cache/com.amazon.redshift/redshift-jdbc42/jars/redshift-jdbc42-1.2.27.1051.jar:com/amazonaws/auth/AWSCredentialsProvider.class
[error] /Users/xxxx/.ivy2/cache/com.amazonaws/aws-java-sdk-core/jars/aws-java-sdk-core-1.11.339.jar:com/amazonaws/auth/AWSCredentialsProvider.class
[error] deduplicate: different file contents found in the following:
[error] /Users/xxxx/.ivy2/cache/com.amazon.redshift/redshift-jdbc42/jars/redshift-jdbc42-1.2.27.1051.jar:com/amazonaws/auth/AWSSessionCredentials.class
[error] /Users/xxxx/.ivy2/cache/com.amazonaws/aws-java-sdk-core/jars/aws-java-sdk-core-1.11.339.jar:com/amazonaws/auth/AWSSessionCredentials.class
[error] deduplicate: different file contents found in the following:
[error] /Users/xxxx/.ivy2/cache/com.amazon.redshift/redshift-jdbc42/jars/redshift-jdbc42-1.2.27.1051.jar:com/amazonaws/auth/AWSSessionCredentialsProvider.class
[error] /Users/xxxx/.ivy2/cache/com.amazonaws/aws-java-sdk-core/jars/aws-java-sdk-core-1.11.339.jar:com/amazonaws/auth/AWSSessionCredentialsProvider.class
[error] deduplicate: different file contents found in the following:
[error] /Users/xxxx/.ivy2/cache/com.amazon.redshift/redshift-jdbc42/jars/redshift-jdbc42-1.2.27.1051.jar:mozilla/public-suffix-list.txt
[error] /Users/xxxx/.ivy2/cache/org.apache.httpcomponents/httpclient/jars/httpclient-4.5.5.jar:mozilla/public-suffix-list.txt
[error] Total time: 2 s, completed Feb 20, 2020 9:14:32 AM
This is my build.sbt
name := "scala-redshift-connection"
version := "0.1"
scalaVersion := "2.11.12"
resolvers += "Mulesoft" at "https://repository.mulesoft.org/nexus/content/repositories/public/"
libraryDependencies += "com.amazonaws" % "aws-java-sdk-secretsmanager" % "1.11.339"
libraryDependencies += "org.scalatest" %% "scalatest" % "3.1.0" % Test
libraryDependencies += "com.amazon.redshift" % "redshift-jdbc42" % "1.2.27.1051"
assemblyMergeStrategy in assembly := {
{
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
}
I have also tried this:
assemblyMergeStrategy in assembly := {
{
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case PathList("com", "amazonaws", "auth", xs#_*) => MergeStrategy.last
case x => MergeStrategy.first
}
}
I'm using sbt-assembly version 0.14.10
You're picking last of com.amazonaws.auth, which still picks it. It should work if you either discard the redhat or amazon one. In addition, you have to pick only one of public-suffix-list.txt.
assemblyMergeStrategy in assembly := {
{
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case PathList("com", "amazonaws", "auth", xs#_*) => MergeStrategy.discard
case PathList(xs # _*) if xs.last == "public-suffix-list.txt" => MergeStrategy.first
case _ => MergeStrategy.first
}
}
In my case I used a different library which also had a dependency on Apache httpclient and only had to pick first for the public-suffix-list.txt.

Error with sbt-assembly and Play Framework

Trying to build a fat jar of a play (2.6.6) + scala.js application, getting
[error] (play/*:assembly) deduplicate: different file contents found in the following:
[error] /home/user/.ivy2/cache/com.typesafe.play/play_2.12/jars/play_2.12-2.6.6.jar:play/reference-overrides.conf
[error] /home/user/.ivy2/cache/com.typesafe.play/play-akka-http-server_2.12/jars/play-akka-http-server_2.12-2.6.6.jar:play/reference-overrides.conf
build.sbt
mainClass in assembly := Some("play.core.server.ProdServerStart")
//fullClasspath in assembly += Attributed.blank(PlayKeys.playPackageAssets.value)
(Inspired by https://www.playframework.com/documentation/2.6.6/Deploying#Using-the-SBT-assembly-plugin)
(but not using playPackageAssets at the moment)
my assembly.sbt contains just addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.5")
I also tried with a "non-standard" config:
assemblyMergeStrategy in assembly := {
// Building fat jar without META-INF
case PathList("META-INF", xs # _*) => MergeStrategy.discard
// Take last config file
case PathList(ps # _*) if ps.last endsWith ".conf" => MergeStrategy.last
case o =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(o)
}
but no luck either. How to fix that the/a correct way?
You need to tell sbt-assembly how to merge these two reference-overrides.conf config files:
assemblyMergeStrategy in assembly := {
// Building fat jar without META-INF
case PathList("META-INF", xs # _*) => MergeStrategy.discard
// Take last config file
case PathList(ps # _*) if ps.last endsWith ".conf" => MergeStrategy.last
case PathList("reference-overrides.conf") => MergeStrategy.concat
case o =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(o)
}
I faced the same problem and solved it by adding the following to .
case PathList("play", "reference-overrides.conf") => MergeStrategy.concat

Dependency issue with Scalding and Hadoop with sbt-assembly

I'm trying to build a far with sbt of a simple hadoop job I'm trying to run in an attempt to run it on Amazon EMR. However when I run sbt assembly I get the following error:
[error] (*:assembly) deduplicate: different file contents found in the following:
[error] /Users/trenthauck/.ivy2/cache/org.mortbay.jetty/jsp-2.1/jars/jsp-2.1-6.1.14.jar:org/apache/jasper/compiler/Node$ChildInfo.class
[error] /Users/trenthauck/.ivy2/cache/tomcat/jasper-compiler/jars/jasper-compiler-5.5.12.jar:org/apache/jasper/compiler/Node$ChildInfo.class
[error] Total time: 10 s, completed Sep 14, 2013 4:49:24 PM
I attempted to follow the suggestion here https://groups.google.com/forum/#!topic/simple-build-tool/tzkq5TioIqM however it didn't work.
My build.sbt looks like:
import AssemblyKeys._
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{
case PathList("org", "apache", "jasper", xs # _*) => MergeStrategy.last
case x => old(x)
}
}
assemblySettings
name := "Scaling Play"
version := "SNAPSHOT-0.1"
scalaVersion := "2.10.1"
libraryDependencies ++= Seq(
"com.twitter" % "scalding-core_2.10" % "0.8.8",
"com.twitter" % "scalding-args_2.10" % "0.8.8",
"com.twitter" % "scalding-date_2.10" % "0.8.8",
"org.apache.hadoop" % "hadoop-core" % "1.0.0"
)
The order of the directives is important. You update the assembly settings, to overwrite it again a line later. First defining assemblySettings and then updating it will solve it.
The updated build.sbt:
import AssemblyKeys._
assemblySettings
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{
case PathList("org", "apache", "jasper", xs # _*) => MergeStrategy.last
case x => old(x)
}
}
…
After that you will discover that there are a lot more conflicting classes and other files. In this case you will require the following merges:
case PathList("org", "apache", xs # _*) => MergeStrategy.last
case PathList("javax", "servlet", xs # _*) => MergeStrategy.last
case PathList("com", "esotericsoftware", xs # _*) => MergeStrategy.last
case PathList("project.clj") => MergeStrategy.last
case PathList("overview.html") => MergeStrategy.last
case x => old(x)
Note that using merge strategies for class files may give problems, caused by incompatible versions of that specific class. If that is the case then your problem is larger, because then the dependencies are incompatible with each other. You have then to resort to removing the dependency and find/make a compatible version.

sbt-assembly and multiple class defs in dependencies

Being new to sbt and the sbt-assembly plugin I am confused about how one deals with builds involving different class definitions within dependencies I am trying to package.
[error] (*:assembly) deduplicate: different file contents found in the following:
[error] /Users/dm/.ivy2/cache/org.apache.tika/tika-app/jars/tika-app-1.3.jar:javax/xml/XMLConstants.class
[error] /Users/dm/.ivy2/cache/stax/stax-api/jars/stax-api-1.0.1.jar:javax/xml/XMLConstants.class
[error] /Users/dm/.ivy2/cache/xml-apis/xml-apis/jars/xml-apis-1.3.03.jar:javax/xml/XMLConstants.class
I've added:
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{
case PathList("javax", "xml", xs # _*) => MergeStrategy.first
}
}
to my build.sbt file, but I'm still getting the error above (regardless of whether or not it's in the build file). Any guidance would be greatly appreciated.
Thanks,
Don
I think you're close. Make sure that you add any rewiring after assemblySettings are loaded, and also pass any patterns you're not handling to the default strategy:
assemblySettings
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{
case PathList("javax", "xml", xs # _*) => MergeStrategy.first
case _ => old
}
}
Just an update—with current sbt (0.13.8) and sbt-assembly (0.13.0) versions, Eugene's code becomes:
assemblyMergeStrategy in assembly := {
case PathList("javax", "xml", xs # _*) => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}

assembly-merge-strategy issues using sbt-assembly

I am trying to convert a scala project into a deployable fat jar using sbt-assembly. When I run my assembly task in sbt I am getting the following error:
Merging 'org/apache/commons/logging/impl/SimpleLog.class' with strategy 'deduplicate'
:assembly: deduplicate: different file contents found in the following:
[error] /Users/home/.ivy2/cache/commons-logging/commons-logging/jars/commons-logging-1.1.1.jar:org/apache/commons/logging/impl/SimpleLog.class
[error] /Users/home/.ivy2/cache/org.slf4j/jcl-over-slf4j/jars/jcl-over-slf4j-1.6.4.jar:org/apache/commons/logging/impl/SimpleLog.class
Now from the sbt-assembly documentation:
If multiple files share the same relative path (e.g. a resource named
application.conf in multiple dependency JARs), the default strategy is
to verify that all candidates have the same contents and error out
otherwise. This behavior can be configured on a per-path basis using
either one of the following built-in strategies or writing a custom one:
MergeStrategy.deduplicate is the default described above
MergeStrategy.first picks the first of the matching files in classpath order
MergeStrategy.last picks the last one
MergeStrategy.singleOrError bails out with an error message on conflict
MergeStrategy.concat simply concatenates all matching files and includes the result
MergeStrategy.filterDistinctLines also concatenates, but leaves out duplicates along the way
MergeStrategy.rename renames the files originating from jar files
MergeStrategy.discard simply discards matching files
Going by this I setup my build.sbt as follows:
import sbt._
import Keys._
import sbtassembly.Plugin._
import AssemblyKeys._
name := "my-project"
version := "0.1"
scalaVersion := "2.9.2"
crossScalaVersions := Seq("2.9.1","2.9.2")
//assemblySettings
seq(assemblySettings: _*)
resolvers ++= Seq(
"Typesafe Releases Repository" at "http://repo.typesafe.com/typesafe/releases/",
"Typesafe Snapshots Repository" at "http://repo.typesafe.com/typesafe/snapshots/",
"Sonatype Repository" at "http://oss.sonatype.org/content/repositories/releases/"
)
libraryDependencies ++= Seq(
"org.scalatest" %% "scalatest" % "1.6.1" % "test",
"org.clapper" %% "grizzled-slf4j" % "0.6.10",
"org.scalaz" % "scalaz-core_2.9.2" % "7.0.0-M7",
"net.databinder.dispatch" %% "dispatch-core" % "0.9.5"
)
scalacOptions += "-deprecation"
mainClass in assembly := Some("com.my.main.class")
test in assembly := {}
mergeStrategy in assembly := mergeStrategy.first
In the last line of the build.sbt, I have:
mergeStrategy in assembly := mergeStrategy.first
Now, when I run SBT, I get the following error:
error: value first is not a member of sbt.SettingKey[String => sbtassembly.Plugin.MergeStrategy]
mergeStrategy in assembly := mergeStrategy.first
Can somebody point out what I might be doing wrong here?
Thanks
As for the current version 0.11.2 (2014-03-25), the way to define the merge strategy is different.
This is documented here, the relevant part is:
NOTE:
mergeStrategy in assembly expects a function, you can't do
mergeStrategy in assembly := MergeStrategy.first
The new way is (copied from the same source):
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{
case PathList("javax", "servlet", xs # _*) => MergeStrategy.first
case PathList(ps # _*) if ps.last endsWith ".html" => MergeStrategy.first
case "application.conf" => MergeStrategy.concat
case "unwanted.txt" => MergeStrategy.discard
case x => old(x)
}
}
This is possibly applicable to earlier versions as well, I don't know exactly when it has changed.
I think it should be MergeStrategy.first with a capital M, so mergeStrategy in assembly := MergeStrategy.first.
this is the proper way to merge most of the common java/scala projects.
it takes care of META-INF and classes.
also the service registration in META-INF is taken care of.
assemblyMergeStrategy in assembly := {
case x if Assembly.isConfigFile(x) =>
MergeStrategy.concat
case PathList(ps # _*) if Assembly.isReadme(ps.last) || Assembly.isLicenseFile(ps.last) =>
MergeStrategy.rename
case PathList("META-INF", xs # _*) =>
(xs map {_.toLowerCase}) match {
case ("manifest.mf" :: Nil) | ("index.list" :: Nil) | ("dependencies" :: Nil) =>
MergeStrategy.discard
case ps # (x :: xs) if ps.last.endsWith(".sf") || ps.last.endsWith(".dsa") =>
MergeStrategy.discard
case "plexus" :: xs =>
MergeStrategy.discard
case "services" :: xs =>
MergeStrategy.filterDistinctLines
case ("spring.schemas" :: Nil) | ("spring.handlers" :: Nil) =>
MergeStrategy.filterDistinctLines
case _ => MergeStrategy.first
}
case _ => MergeStrategy.first}
I have just setup a little sbt project that needs to rewire some mergeStrategies, and found the answer a little outdated, let me add my working code for versions (as of 4-7-2015)
sbt 0.13.8
scala 2.11.6
assembly 0.13.0
mergeStrategy in assembly := {
case x if x.startsWith("META-INF") => MergeStrategy.discard // Bumf
case x if x.endsWith(".html") => MergeStrategy.discard // More bumf
case x if x.contains("slf4j-api") => MergeStrategy.last
case x if x.contains("org/cyberneko/html") => MergeStrategy.first
case PathList("com", "esotericsoftware", xs#_ *) => MergeStrategy.last // For Log$Logger.class
case x =>
val oldStrategy = (mergeStrategy in assembly).value
oldStrategy(x)
}
For the new sbt version (sbt-version :0.13.11), I was getting the error for slf4j; for the time being took the easy way out : Please also check the answer here Scala SBT Assembly cannot merge due to de-duplication error in StaticLoggerBinder.class where sbt-dependency-graph tool is mentioned which is pretty cool to do this manually
assemblyMergeStrategy in assembly <<= (assemblyMergeStrategy in assembly) {
(old) => {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
}
Quick update: mergeStrategy is deprecated. Use assemblyMergeStrategy. Apart from that, earlier responses are still solid
Add following to build.sbt to add kafka as source or destination
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
//To add Kafka as source
case "META-INF/services/org.apache.spark.sql.sources.DataSourceRegister" =>
MergeStrategy.concat
case x => MergeStrategy.first
}