I have many Deduplicate found... error when build project with SBT :
[error] Deduplicate found different file contents in the following:
[error] Jar name = netty-all-4.1.68.Final.jar, jar org = io.netty, entry target = io/netty/handler/ssl/SslProvider.class
[error] Jar name = netty-handler-4.1.50.Final.jar, jar org = io.netty, entry target = io/netty/handler/ssl/SslProvider.class
...
For now I consider the option with shading all libraries (as here):
libraryDependencies ++= Seq(
"com.rometools" % "rome" % "1.18.0",
"com.typesafe.scala-logging" %% "scala-logging" % "3.9.5", // log
"ch.qos.logback" % "logback-classic" % "1.4.5", // log
"com.lihaoyi" %% "upickle" % "1.6.0", // file-io
"net.liftweb" %% "lift-json" % "3.5.0", // json
"org.apache.spark" %% "spark-sql" % "3.2.2", // spark
"org.apache.spark" %% "spark-core" % "3.2.2" % "provided", // spark
"org.postgresql" % "postgresql" % "42.5.1", // spark + postgresql
)
So that I added the following shade-rules:
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.lihaoyi.**" -> "crdaa.#1")
.inLibrary("com.lihaoyi" %% "upickle" % "1.6.0")
.inProject,
ShadeRule.rename("ch.qos.logback.**" -> "crdbb.#1")
.inLibrary("ch.qos.logback" % "logback-classic" % "1.4.5")
.inProject,
ShadeRule.rename("com.typesafe.**" -> "crdcc.#1")
.inLibrary("com.typesafe.scala-logging" %% "scala-logging" % "3.9.5")
.inProject,
ShadeRule.rename("org.apache.spark.spark-sql.**" -> "crddd.#1")
.inLibrary("org.apache.spark" %% "spark-sql" % "3.2.2")
.inProject,
ShadeRule.rename("org.apache.spark.spark-core.**" -> "crdee.#1")
.inLibrary("org.apache.spark" %% "spark-core" % "3.2.2")
.inProject,
ShadeRule.rename("com.rometools.**" -> "crdff.#1")
.inLibrary("com.rometools" % "rome" % "1.18.0")
.inProject,
ShadeRule.rename("org.postgresql.postgresql.**" -> "crdgg.#1")
.inLibrary("org.postgresql" % "postgresql" % "42.5.1")
.inProject,
ShadeRule.rename("net.liftweb.**" -> "crdhh.#1")
.inLibrary("net.liftweb" %% "lift-json" % "3.5.0")
.inProject,
)
But after reloading SBT when I start assembly I got the same errors with duplicates.
What can be problem here?
PS:
ThisBuild / scalaVersion := "2.13.10"
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.1.0")
Update
Finally I ditched the rename in favor of unmanagedJars + not including spark dependencies (most of the errors were caused by them) by setting provided option .
After that only Deduplicate-errors with module-info.class remains, but its solution (by changing merging strategy) is described in sbt-assembly-doc.
That is, I downloaded spark separately, copied their jars into ./jarlib directory (!!! not in ./lib directory), changed the following in build conf:
libraryDependencies ++= Seq(
//...
"org.apache.spark" %% "spark-sql" % "3.2.3" % "provided",
"org.apache.spark" %% "spark-core" % "3.2.3" % "provided",
)
unmanagedJars in Compile += file("./jarlib")
ThisBuild / assemblyMergeStrategy := {
case PathList("module-info.class") => MergeStrategy.discard
case x if x.endsWith("/module-info.class") => MergeStrategy.discard
case x =>
val oldStrategy = (ThisBuild / assemblyMergeStrategy).value
oldStrategy(x)
}
Spark-jars have been included in final jar
Update 2
As noted in comments unmanagedJars are useless in that case - so I removed unmanagedJars string from build.sbt
Noted Spark-jars which aren't included in final jar-file should be in class-path when you start jar.
In my case I copied Spark-jars + final jar to folder ./app and start jar by:
java -cp "./app/*" main.Main
... where main.Main is main-class.
Sometimes like this (put in your build.sbt) is how you typically remove the deduplication that comes when your libraries have overlapping libraries of their own:
assemblyMergeStrategy in assembly := {
case PathList("javax", "activation", _*) => MergeStrategy.first
case PathList("com", "sun", _*) => MergeStrategy.first
case "META-INF/io.netty.versions.properties" => MergeStrategy.first
case "META-INF/mime.types" => MergeStrategy.first
case "META-INF/mailcap.default" => MergeStrategy.first
case "META-INF/mimetypes.default" => MergeStrategy.first
case d if d.endsWith(".jar:module-info.class") => MergeStrategy.first
case d if d.endsWith("module-info.class") => MergeStrategy.first
case d if d.endsWith("/MatchersBinder.class") => MergeStrategy.discard
case d if d.endsWith("/ArgumentsProcessor.class") => MergeStrategy.discard
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
Related
When I user sbt assembly, it prints error like this:
[error] (*:assembly) scala.MatchError: org\apache\commons\io\IOCase.class (of class java.lang.String)
and these are my configurations:
1、assembly.sbt:
import AssemblyKeys._
assemblySettings
mergeStrategy in assembly := {
case PathList("org", "springframework", xs#_*) => MergeStrategy.last
}
2、bulid.sbt
import AssemblyKeys._
lazy val root = (project in file(".")).
settings(
name := "DmpRealtimeFlow",
version := "1.0",
scalaVersion := "2.11.8",
libraryDependencies += "com.jd.ads.index" % "ad_index_dmp_common" % "0.0.4-SNAPSHOT",
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.0" % "provided",
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.1.0" % "provided",
libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "2.1.0" % "provided",
libraryDependencies += "mysql" % "mysql-connector-java" % "5.1.8",
libraryDependencies += "org.springframework" % "spring-beans" % "3.1.0.RELEASE",
libraryDependencies += "org.springframework" % "spring-context" % "3.1.0.RELEASE",
libraryDependencies += "org.springframework" % "spring-core" % "3.1.0.RELEASE",
libraryDependencies += "org.springframework" % "spring-orm" % "3.1.0.RELEASE",
libraryDependencies += "org.mybatis" % "mybatis" % "3.2.1" % "compile",
libraryDependencies += "org.mybatis" % "mybatis-spring" % "1.2.2",
libraryDependencies += "c3p0" % "c3p0" % "0.9.1.2"
)
3、project tools:
sbt:0.13.5
assembly:0.11.2
java:1.7
scala:2.11.8
any help?
The problem may be in the missing default case in mergeStrategy in assembly block :
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
Also, mergeStrategy is deprecated and assemblyMergeStrategy should be used instead.
Basically the
{
case PathList("org", "springframework", xs#_*) => MergeStrategy.last
}
is a partial function String => MergeStrategy defined for only one type of inputs, i.e. for classes with package prefix "org\springframework". However, it is applied to all class files in the project and the first one that doesn't match the prefix above (org\apache\commons\io\IOCase.class) causes MatchError.
I work on a sbt-managed Spark project with spark-cloudant dependency. The code is available on GitHub (on spark-cloudant-compile-issue branch).
I've added the following line to build.sbt:
"cloudant-labs" % "spark-cloudant" % "1.6.4-s_2.10" % "provided"
And so build.sbt looks as follows:
name := "Movie Rating"
version := "1.0"
scalaVersion := "2.10.5"
libraryDependencies ++= {
val sparkVersion = "1.6.0"
Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming-kafka" % sparkVersion % "provided",
"org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
"org.apache.kafka" % "kafka-log4j-appender" % "0.9.0.0",
"org.apache.kafka" % "kafka-clients" % "0.9.0.0",
"org.apache.kafka" %% "kafka" % "0.9.0.0",
"cloudant-labs" % "spark-cloudant" % "1.6.4-s_2.10" % "provided"
)
}
assemblyMergeStrategy in assembly := {
case PathList("org", "apache", "spark", xs # _*) => MergeStrategy.first
case PathList("scala", xs # _*) => MergeStrategy.discard
case PathList("META-INF", "maven", "org.slf4j", xs # _* ) => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
unmanagedBase <<= baseDirectory { base => base / "lib" }
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
When I execute sbt assembly I get the following error:
java.lang.RuntimeException: Please add any Spark dependencies by
supplying the sparkVersion and sparkComponents. Please remove:
org.apache.spark:spark-core:1.6.0:provided
Probably related: https://github.com/databricks/spark-csv/issues/150
Can you try adding spIgnoreProvided := true to your build.sbt?
(This might not be the answer and I could have just posted a comment but I don't have enough reputation)
NOTE I still can't reproduce the issue, but think it does not really matter.
java.lang.RuntimeException: Please add any Spark dependencies by supplying the sparkVersion and sparkComponents.
In your case, your build.sbt misses a sbt resolver to find spark-cloudant dependency. You should add the following line to build.sbt:
resolvers += "spark-packages" at "https://dl.bintray.com/spark-packages/maven/"
PROTIP I strongly recommend using spark-shell first and only when you're comfortable with the package switch to sbt (esp. if you're new to sbt and perhaps other libraries/dependencies too). It's too much to digest in one bite. Follow https://spark-packages.org/package/cloudant-labs/spark-cloudant.
I got the following SBT files:
.
-- root
-- plugins.sbt
-- build.sbt
With plugins.sbt containing the following:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.12.0")
And build.sbt containing the following:
import sbt.Keys._
resolvers in ThisBuild ++= Seq("Apache Development Snapshot Repository" at "https://repository.apache.org/content/repositories/snapshots/", Resolver.sonatypeRepo("public"))
name := "flink-experiment"
lazy val commonSettings = Seq(
organization := "my.organisation",
version := "0.1.0-SNAPSHOT"
)
val flinkVersion = "1.1.0"
val sparkVersion = "2.0.0"
val kafkaVersion = "0.8.2.1"
val hadoopDependencies = Seq(
"org.apache.avro" % "avro" % "1.7.7" % "provided",
"org.apache.avro" % "avro-mapred" % "1.7.7" % "provided"
)
val flinkDependencies = Seq(
"org.apache.flink" %% "flink-scala" % flinkVersion % "provided",
"org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided",
"org.apache.flink" %% "flink-connector-kafka-0.8" % flinkVersion exclude("org.apache.kafka", "kafka_${scala.binary.version}")
)
val sparkDependencies = Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming-kafka-0-8" % sparkVersion exclude("org.apache.kafka", "kafka_${scala.binary.version}")
)
val kafkaDependencies = Seq(
"org.apache.kafka" %% "kafka" % "0.8.2.1"
)
val toolDependencies = Seq(
"com.github.scopt" %% "scopt" % "3.5.0"
)
val testDependencies = Seq(
"org.scalactic" %% "scalactic" % "2.2.6",
"org.scalatest" %% "scalatest" % "2.2.6" % "test"
)
lazy val root = (project in file(".")).
settings(commonSettings: _*).
settings(
libraryDependencies ++= hadoopDependencies,
libraryDependencies ++= flinkDependencies,
libraryDependencies ++= sparkDependencies,
libraryDependencies ++= kafkaDependencies,
libraryDependencies ++= toolDependencies,
libraryDependencies ++= testDependencies
).
enablePlugins(AssemblyPlugin)
run in Compile <<= Defaults.runTask(fullClasspath in Compile, mainClass in(Compile, run), runner in(Compile, run))
mainClass in assembly := Some("my.organization.experiment.Experiment")
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
Now sbt clean assembly sadly gives the following exception:
[error] (root/*:assembly) deduplicate: different file contents found in the following:
[error] /home/kevin/.ivy2/cache/org.apache.spark/spark-streaming-kafka-0-8_2.10/jars/spark-streaming-kafka-0-8_2.10-2.0.0.jar:org/apache/spark/unused/UnusedStubClass.class
[error] /home/kevin/.ivy2/cache/org.apache.spark/spark-tags_2.10/jars/spark-tags_2.10-2.0.0.jar:org/apache/spark/unused/UnusedStubClass.class
[error] /home/kevin/.ivy2/cache/org.spark-project.spark/unused/jars/unused-1.0.0.jar:org/apache/spark/unused/UnusedStubClass.class
How can I fix this?
https://github.com/sbt/sbt-assembly#excluding-jars-and-files
you can define assemblyMergeStrategy and probably discard ony file that you listed as they are all in 'unused' package.
You can override the default strategy for conflicts:
val defaultMergeStrategy: String => MergeStrategy = {
case x if Assembly.isConfigFile(x) =>
MergeStrategy.concat
case PathList(ps # _*) if Assembly.isReadme(ps.last) || Assembly.isLicenseFile(ps.last) =>
MergeStrategy.rename
case PathList("META-INF", xs # _*) =>
(xs map {_.toLowerCase}) match {
case ("manifest.mf" :: Nil) | ("index.list" :: Nil) | ("dependencies" :: Nil) =>
MergeStrategy.discard
case ps # (x :: xs) if ps.last.endsWith(".sf") || ps.last.endsWith(".dsa") =>
MergeStrategy.discard
case "plexus" :: xs =>
MergeStrategy.discard
case "services" :: xs =>
MergeStrategy.filterDistinctLines
case ("spring.schemas" :: Nil) | ("spring.handlers" :: Nil) =>
MergeStrategy.filterDistinctLines
case _ => MergeStrategy.deduplicate
}
case _ => MergeStrategy.deduplicate
}
as you can see assembly default strategy is MergeStrategy.deduplicate, you can add a new case case UnusedStubClass => MergeStrategy.first
I'm new to sbt/assembly. I'm trying to resolve some dependency problems, and it seems the only way to do it is through a custom merge strategy. However, whenever I try to add a merge strategy I get a seemingly random MatchError on compiling:
[error] (*:assembly) scala.MatchError: org/apache/spark/streaming/kafka/KafkaUtilsPythonHelper$$anonfun$13.class (of class java.lang.String)
I'm showing this match error for the kafka library, but if I take out that library altogether, I get a MatchError on another library. If I take out all the libraries, I get a MatchError on my own code. None of this happens if I take out the "assemblyMergeStrategy" block. I'm clearly missing something incredibly basic, but for the life of me I can't find it and I can't find anyone else that has this problem. I've tried the older mergeStrategy syntax, but as far as I can read from the docs and SO, this is the proper way to write it now. Please help?
Here is my project/assembly.sbt:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")
And my project.sbt file:
name := "Clerk"
version := "1.0"
scalaVersion := "2.11.6"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.1" % "provided",
"org.apache.spark" %% "spark-sql" % "1.6.1" % "provided",
"org.apache.spark" %% "spark-streaming" % "1.6.1" % "provided",
"org.apache.kafka" %% "kafka" % "0.8.2.1",
"ch.qos.logback" % "logback-classic" % "1.1.7",
"net.logstash.logback" % "logstash-logback-encoder" % "4.6",
"com.typesafe.scala-logging" %% "scala-logging" % "3.1.0",
"org.apache.spark" %% "spark-streaming-kafka" % "1.6.1",
("org.apache.spark" %% "spark-streaming-kafka" % "1.6.1").
exclude("org.spark-project.spark", "unused")
)
assemblyMergeStrategy in assembly := {
case PathList("org.slf4j", "impl", xs # _*) => MergeStrategy.first
}
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
You're missing a default case for your merge strategy pattern match:
assemblyMergeStrategy in assembly := {
case PathList("org.slf4j", "impl", xs # _*) => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
I am trying to create a jar file for my project. I am using sbt assembly command to generate one.
But getting error when it starts merging files:
scala.MatchError:
spray\http\parser\ProtocolParameterRules$$anonfun$DeltaSeconds$1.class
(of class java.lang.String)
My build.sbt looks like this:
lazy val commonSettings = Seq(
name := "SampleSpray",
version := "1.0",
scalaVersion := "2.11.7",
organization := "com.test"
)
mainClass in assembly := Some("com.example.Boot")
lazy val root = (project in file(".")).
settings(commonSettings: _*).
settings(
name := "test",
resolvers += "spray repo" at "http://repo.spray.io",
libraryDependencies ++= {
val akkaV = "2.3.9"
val sprayV = "1.3.3"
Seq(
"io.spray" %% "spray-can" % sprayV,
"io.spray" %% "spray-routing" % sprayV,
"io.spray" %% "spray-json" % "1.3.2",
"io.spray" %% "spray-testkit" % sprayV % "test",
"com.typesafe.akka" %% "akka-actor" % akkaV,
"com.typesafe.akka" %% "akka-testkit" % akkaV % "test",
"org.specs2" %% "specs2-core" % "2.3.11" % "test",
"com.sksamuel.elastic4s" %% "elastic4s-core" % "2.1.0",
"com.sksamuel.elastic4s" %% "elastic4s-jackson" % "2.1.0",
"net.liftweb" %% "lift-json" % "2.6+"
)
}
)
assemblyOption in assembly := (assemblyOption in assembly).value.copy(cacheUnzip = false)
assemblyMergeStrategy in assembly := {
case "BaseDateTime.class" => MergeStrategy.first
}
Don't know why the error is coming.
The setting assemblyMergeStrategy in assembly has the type String => MergeStrategy.
In your sbt file you are using the partial function
{
case "BaseDateTime.class" => MergeStrategy.first
}
which is syntactic sugar for
(s:String) => {
s match {
case "BaseDateTime.class" => MergeStrategy.first
}
}
This representation shows that the given function will not exhaustively match all passed strings. In your case sbt-assembly tried to merge the file named spray\http\parser\ProtocolParameterRules$$anonfun$DeltaSeconds$1.class into the fat jar, but could not find any matching merge strategy. You need a "default" case also:
(s:String) => {
s match {
case "BaseDateTime.class" => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
}
Or written as partial function:
{
case "BaseDateTime.class" => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
I also ran into the same issue when sbt-assembly's assembly task failed to create a fat jar due to a name conflict in the elasticsearch and its transitive joda-time dependencies. Elasticsearch redefines the class org.joda.time.base.BaseDateTime which is already implemented in the joda-time library. I've followed your approach to tell sbt-assembly how resolve this conflict using the following assemblyMergeStrategy:
assemblyMergeStrategy in assembly := {
case "org/joda/time/base/BaseDateTime.class" => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}