SBT Assembly - Deduplicate

SBT Assembly - Deduplicate - scala

I got the following SBT files:
.
-- root
-- plugins.sbt
-- build.sbt
With plugins.sbt containing the following:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.12.0")
And build.sbt containing the following:
import sbt.Keys._
resolvers in ThisBuild ++= Seq("Apache Development Snapshot Repository" at "https://repository.apache.org/content/repositories/snapshots/", Resolver.sonatypeRepo("public"))
name := "flink-experiment"
lazy val commonSettings = Seq(
organization := "my.organisation",
version := "0.1.0-SNAPSHOT"
)
val flinkVersion = "1.1.0"
val sparkVersion = "2.0.0"
val kafkaVersion = "0.8.2.1"
val hadoopDependencies = Seq(
"org.apache.avro" % "avro" % "1.7.7" % "provided",
"org.apache.avro" % "avro-mapred" % "1.7.7" % "provided"
)
val flinkDependencies = Seq(
"org.apache.flink" %% "flink-scala" % flinkVersion % "provided",
"org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided",
"org.apache.flink" %% "flink-connector-kafka-0.8" % flinkVersion exclude("org.apache.kafka", "kafka_${scala.binary.version}")
)
val sparkDependencies = Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming-kafka-0-8" % sparkVersion exclude("org.apache.kafka", "kafka_${scala.binary.version}")
)
val kafkaDependencies = Seq(
"org.apache.kafka" %% "kafka" % "0.8.2.1"
)
val toolDependencies = Seq(
"com.github.scopt" %% "scopt" % "3.5.0"
)
val testDependencies = Seq(
"org.scalactic" %% "scalactic" % "2.2.6",
"org.scalatest" %% "scalatest" % "2.2.6" % "test"
)
lazy val root = (project in file(".")).
settings(commonSettings: _*).
settings(
libraryDependencies ++= hadoopDependencies,
libraryDependencies ++= flinkDependencies,
libraryDependencies ++= sparkDependencies,
libraryDependencies ++= kafkaDependencies,
libraryDependencies ++= toolDependencies,
libraryDependencies ++= testDependencies
).
enablePlugins(AssemblyPlugin)
run in Compile <<= Defaults.runTask(fullClasspath in Compile, mainClass in(Compile, run), runner in(Compile, run))
mainClass in assembly := Some("my.organization.experiment.Experiment")
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
Now sbt clean assembly sadly gives the following exception:
[error] (root/*:assembly) deduplicate: different file contents found in the following:
[error] /home/kevin/.ivy2/cache/org.apache.spark/spark-streaming-kafka-0-8_2.10/jars/spark-streaming-kafka-0-8_2.10-2.0.0.jar:org/apache/spark/unused/UnusedStubClass.class
[error] /home/kevin/.ivy2/cache/org.apache.spark/spark-tags_2.10/jars/spark-tags_2.10-2.0.0.jar:org/apache/spark/unused/UnusedStubClass.class
[error] /home/kevin/.ivy2/cache/org.spark-project.spark/unused/jars/unused-1.0.0.jar:org/apache/spark/unused/UnusedStubClass.class
How can I fix this?

https://github.com/sbt/sbt-assembly#excluding-jars-and-files
you can define assemblyMergeStrategy and probably discard ony file that you listed as they are all in 'unused' package.

You can override the default strategy for conflicts:
val defaultMergeStrategy: String => MergeStrategy = {
case x if Assembly.isConfigFile(x) =>
MergeStrategy.concat
case PathList(ps # _*) if Assembly.isReadme(ps.last) || Assembly.isLicenseFile(ps.last) =>
MergeStrategy.rename
case PathList("META-INF", xs # _*) =>
(xs map {_.toLowerCase}) match {
case ("manifest.mf" :: Nil) | ("index.list" :: Nil) | ("dependencies" :: Nil) =>
MergeStrategy.discard
case ps # (x :: xs) if ps.last.endsWith(".sf") || ps.last.endsWith(".dsa") =>
MergeStrategy.discard
case "plexus" :: xs =>
MergeStrategy.discard
case "services" :: xs =>
MergeStrategy.filterDistinctLines
case ("spring.schemas" :: Nil) | ("spring.handlers" :: Nil) =>
MergeStrategy.filterDistinctLines
case _ => MergeStrategy.deduplicate
}
case _ => MergeStrategy.deduplicate
}
as you can see assembly default strategy is MergeStrategy.deduplicate, you can add a new case case UnusedStubClass => MergeStrategy.first

Related

Can't shade jars by shade plugin of SBT

I have many Deduplicate found... error when build project with SBT :
[error] Deduplicate found different file contents in the following:
[error] Jar name = netty-all-4.1.68.Final.jar, jar org = io.netty, entry target = io/netty/handler/ssl/SslProvider.class
[error] Jar name = netty-handler-4.1.50.Final.jar, jar org = io.netty, entry target = io/netty/handler/ssl/SslProvider.class
...
For now I consider the option with shading all libraries (as here):
libraryDependencies ++= Seq(
"com.rometools" % "rome" % "1.18.0",
"com.typesafe.scala-logging" %% "scala-logging" % "3.9.5", // log
"ch.qos.logback" % "logback-classic" % "1.4.5", // log
"com.lihaoyi" %% "upickle" % "1.6.0", // file-io
"net.liftweb" %% "lift-json" % "3.5.0", // json
"org.apache.spark" %% "spark-sql" % "3.2.2", // spark
"org.apache.spark" %% "spark-core" % "3.2.2" % "provided", // spark
"org.postgresql" % "postgresql" % "42.5.1", // spark + postgresql
)
So that I added the following shade-rules:
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.lihaoyi.**" -> "crdaa.#1")
.inLibrary("com.lihaoyi" %% "upickle" % "1.6.0")
.inProject,
ShadeRule.rename("ch.qos.logback.**" -> "crdbb.#1")
.inLibrary("ch.qos.logback" % "logback-classic" % "1.4.5")
.inProject,
ShadeRule.rename("com.typesafe.**" -> "crdcc.#1")
.inLibrary("com.typesafe.scala-logging" %% "scala-logging" % "3.9.5")
.inProject,
ShadeRule.rename("org.apache.spark.spark-sql.**" -> "crddd.#1")
.inLibrary("org.apache.spark" %% "spark-sql" % "3.2.2")
.inProject,
ShadeRule.rename("org.apache.spark.spark-core.**" -> "crdee.#1")
.inLibrary("org.apache.spark" %% "spark-core" % "3.2.2")
.inProject,
ShadeRule.rename("com.rometools.**" -> "crdff.#1")
.inLibrary("com.rometools" % "rome" % "1.18.0")
.inProject,
ShadeRule.rename("org.postgresql.postgresql.**" -> "crdgg.#1")
.inLibrary("org.postgresql" % "postgresql" % "42.5.1")
.inProject,
ShadeRule.rename("net.liftweb.**" -> "crdhh.#1")
.inLibrary("net.liftweb" %% "lift-json" % "3.5.0")
.inProject,
)
But after reloading SBT when I start assembly I got the same errors with duplicates.
What can be problem here?
PS:
ThisBuild / scalaVersion := "2.13.10"
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.1.0")
Update
Finally I ditched the rename in favor of unmanagedJars + not including spark dependencies (most of the errors were caused by them) by setting provided option .
After that only Deduplicate-errors with module-info.class remains, but its solution (by changing merging strategy) is described in sbt-assembly-doc.
That is, I downloaded spark separately, copied their jars into ./jarlib directory (!!! not in ./lib directory), changed the following in build conf:
libraryDependencies ++= Seq(
//...
"org.apache.spark" %% "spark-sql" % "3.2.3" % "provided",
"org.apache.spark" %% "spark-core" % "3.2.3" % "provided",
)
unmanagedJars in Compile += file("./jarlib")
ThisBuild / assemblyMergeStrategy := {
case PathList("module-info.class") => MergeStrategy.discard
case x if x.endsWith("/module-info.class") => MergeStrategy.discard
case x =>
val oldStrategy = (ThisBuild / assemblyMergeStrategy).value
oldStrategy(x)
}
Spark-jars have been included in final jar
Update 2
As noted in comments unmanagedJars are useless in that case - so I removed unmanagedJars string from build.sbt
Noted Spark-jars which aren't included in final jar-file should be in class-path when you start jar.
In my case I copied Spark-jars + final jar to folder ./app and start jar by:
java -cp "./app/*" main.Main
... where main.Main is main-class.

Sometimes like this (put in your build.sbt) is how you typically remove the deduplication that comes when your libraries have overlapping libraries of their own:
assemblyMergeStrategy in assembly := {
case PathList("javax", "activation", _*) => MergeStrategy.first
case PathList("com", "sun", _*) => MergeStrategy.first
case "META-INF/io.netty.versions.properties" => MergeStrategy.first
case "META-INF/mime.types" => MergeStrategy.first
case "META-INF/mailcap.default" => MergeStrategy.first
case "META-INF/mimetypes.default" => MergeStrategy.first
case d if d.endsWith(".jar:module-info.class") => MergeStrategy.first
case d if d.endsWith("module-info.class") => MergeStrategy.first
case d if d.endsWith("/MatchersBinder.class") => MergeStrategy.discard
case d if d.endsWith("/ArgumentsProcessor.class") => MergeStrategy.discard
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}

No TypeTag available for String

I'm trying to run my fat jar using scala -classpath "target/scala-2.13/Capstone-assembly-0.1.0-SNAPSHOT.jar" src/main/scala/project/Main.scala, but I get an error caused by .toString: val generateUUID: UserDefinedFunction = udf((str: String) => nameUUIDFromBytes(str.getBytes).toString) No TypeTag available for String, when I run from IDE everything is working but not from jar
My build.sbt:
ThisBuild / version := "0.1.0-SNAPSHOT"
ThisBuild / scalaVersion := "2.13.8"
lazy val root = (project in file("."))
.settings(
name := "Capstone"
)
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.2.0",
"org.apache.spark" %% "spark-sql" % "3.2.0",
"org.scalatest" %% "scalatest" % "3.2.12" % "test",
"org.rogach" %% "scallop" % "4.1.0"
)
compileOrder := CompileOrder.JavaThenScala
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
If I delete .toString I get an error Schema for type java.util.UUID is not supported
I tried to change String to java.util.String or scala.Predef.String, but this didn't work

Multi Module SBT projects issues

I am working on designing a multi module project in SBT.
The idea is to build two subprojects in SBT with one subproject having Apache spark 2.3 dependency and
the other one with Apache Spark 2.4 dependency. Each sub project should be capable of building individual
jars. I am pasting the build.sbt. While I try to run sbt assembly on each of these sub projects.
I am getting the below error.I tried removing the cache folder in .ivy2 and tried
recreating the project structure.
Error message(included relevant logs only)
[IJ]> multi2/assembly
[info] Including from cache: jackson-xc-1.9.13.jar
[info] Including from cache: minlog-1.3.0.jar
[warn] Merging 'META-INF\ASL2.0' with strategy 'discard'
[warn] Merging 'META-INF\DEPENDENCIES' with strategy 'discard'
[error] C:\Users\user_home\.ivy2\cache\org.apache.spark\spark-core_2.11\jars\spark-core_2.11-2.4.3.jar:org/apache/spark/unused/UnusedStubClass.class
[error] C:\Users\user_home\.ivy2\cache\org.apache.spark\spark-launcher_2.11\jars\spark-launcher_2.11-2.4.3.jar:org/apache/spark/unused/UnusedStubClass.class
[error] C:\Users\user_home\.ivy2\cache\org.apache.spark\spark-tags_2.11\jars\spark-tags_2.11-2.4.3.jar:org/apache/spark/unused/UnusedStubClass.class
[error] C:\Users\user_home\.ivy2\cache\org.spark-project.spark\unused\jars\unused-1.0.0.jar:org/apache/spark/unused/UnusedStubClass.class
Likewise there are other classes that cause errors as above..
Please suggest, and let me know for any additional information.
name := "testrepo"
organization in ThisBuild := "com.yt"
scalaVersion in ThisBuild := "2.11.8"
scalaBinaryVersion := "2.11"
lazy val common = (project in file("common"))
.settings(
name := "common",
commonSettings,
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.8.7",
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.8.7",
dependencyOverrides += "com.fasterxml.jackson.module" %% "jackson-module-scala" %
dependencyOverrides += "io.netty" % "netty" % "3.9.9.Final",
dependencyOverrides += "commons-net" % "commons-net" % "2.2",
dependencyOverrides += "com.google.guava" % "guava" % "11.0.2",
dependencyOverrides += "com.google.code.findbugs" % "jsr305" % "1.3.9",
libraryDependencies ++= commonDependencies
)
.disablePlugins(AssemblyPlugin)
lazy val multi1 = (project in file("multi1"))
.settings(
name := "multi1",
commonSettings,
assemblySettings,
libraryDependencies ++= commonDependencies ++ Seq(
dependencies.spark23
)
)
.dependsOn(
common
)
lazy val multi2 = (project in file("multi2"))
.settings(
name := "multi2",
commonSettings,
assemblySettings,
libraryDependencies ++= commonDependencies ++ Seq(
dependencies.spark24
)
)
.dependsOn(
common
)
/*val overrides = Seq("com.fasterxml.jackson.core" % "jackson-core" % "2.8.7",
"com.fasterxml.jackson.core" % "jackson-databind" % "2.8.7",
"com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.8.7") */
// DEPENDENCIES
lazy val dependencies =
new {
val hivejdbc="org.apache.hive" % "hive-jdbc" % "0.13.0" % "provided"
val spark23V = "2.3.0.cloudera3"
val spark24V="2.4.3"
val spark23="org.apache.spark" %% "spark-core" % spark23V
val spark24="org.apache.spark" %% "spark-core" % spark24V
}
lazy val commonDependencies = Seq(
dependencies.hivejdbc
)
lazy val compilerOptions = Seq(
"-unchecked",
"-feature",
"-language:existentials",
"-language:higherKinds",
"-language:implicitConversions",
"-language:postfixOps",
"-deprecation",
"-encoding",
"utf8"
)
lazy val commonSettings = Seq(
scalacOptions ++= compilerOptions,
resolvers ++= Seq(
"Local Maven Repository" at "file://" + Path.userHome.absolutePath + "/.m2/repository",
Resolver.sonatypeRepo("releases"),
Resolver.sonatypeRepo("snapshots")
)
)
lazy val assemblySettings = Seq(
assemblyJarName in assembly := name.value + ".jar",
assemblyMergeStrategy in assembly := {
case PathList("org.apache.spark", "spark-core_2.11", xs # _*) => MergeStrategy.last
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case "application.conf" => MergeStrategy.concat
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
)

scala.MatchError: org\apache\commons\io\IOCase.class (of class java.lang.String) in sbt+assembly

When I user sbt assembly, it prints error like this:
[error] (*:assembly) scala.MatchError: org\apache\commons\io\IOCase.class (of class java.lang.String)
and these are my configurations:
1、assembly.sbt:
import AssemblyKeys._
assemblySettings
mergeStrategy in assembly := {
case PathList("org", "springframework", xs#_*) => MergeStrategy.last
}
2、bulid.sbt
import AssemblyKeys._
lazy val root = (project in file(".")).
settings(
name := "DmpRealtimeFlow",
version := "1.0",
scalaVersion := "2.11.8",
libraryDependencies += "com.jd.ads.index" % "ad_index_dmp_common" % "0.0.4-SNAPSHOT",
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.0" % "provided",
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.1.0" % "provided",
libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "2.1.0" % "provided",
libraryDependencies += "mysql" % "mysql-connector-java" % "5.1.8",
libraryDependencies += "org.springframework" % "spring-beans" % "3.1.0.RELEASE",
libraryDependencies += "org.springframework" % "spring-context" % "3.1.0.RELEASE",
libraryDependencies += "org.springframework" % "spring-core" % "3.1.0.RELEASE",
libraryDependencies += "org.springframework" % "spring-orm" % "3.1.0.RELEASE",
libraryDependencies += "org.mybatis" % "mybatis" % "3.2.1" % "compile",
libraryDependencies += "org.mybatis" % "mybatis-spring" % "1.2.2",
libraryDependencies += "c3p0" % "c3p0" % "0.9.1.2"
)
3、project tools:
sbt:0.13.5
assembly:0.11.2
java:1.7
scala:2.11.8
any help?

The problem may be in the missing default case in mergeStrategy in assembly block :
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
Also, mergeStrategy is deprecated and assemblyMergeStrategy should be used instead.
Basically the
{
case PathList("org", "springframework", xs#_*) => MergeStrategy.last
}
is a partial function String => MergeStrategy defined for only one type of inputs, i.e. for classes with package prefix "org\springframework". However, it is applied to all class files in the project and the first one that doesn't match the prefix above (org\apache\commons\io\IOCase.class) causes MatchError.

scala.MatchError when creating fat jar using sbt assembly

I am trying to create a jar file for my project. I am using sbt assembly command to generate one.
But getting error when it starts merging files:
scala.MatchError:
spray\http\parser\ProtocolParameterRules$$anonfun$DeltaSeconds$1.class
(of class java.lang.String)
My build.sbt looks like this:
lazy val commonSettings = Seq(
name := "SampleSpray",
version := "1.0",
scalaVersion := "2.11.7",
organization := "com.test"
)
mainClass in assembly := Some("com.example.Boot")
lazy val root = (project in file(".")).
settings(commonSettings: _*).
settings(
name := "test",
resolvers += "spray repo" at "http://repo.spray.io",
libraryDependencies ++= {
val akkaV = "2.3.9"
val sprayV = "1.3.3"
Seq(
"io.spray" %% "spray-can" % sprayV,
"io.spray" %% "spray-routing" % sprayV,
"io.spray" %% "spray-json" % "1.3.2",
"io.spray" %% "spray-testkit" % sprayV % "test",
"com.typesafe.akka" %% "akka-actor" % akkaV,
"com.typesafe.akka" %% "akka-testkit" % akkaV % "test",
"org.specs2" %% "specs2-core" % "2.3.11" % "test",
"com.sksamuel.elastic4s" %% "elastic4s-core" % "2.1.0",
"com.sksamuel.elastic4s" %% "elastic4s-jackson" % "2.1.0",
"net.liftweb" %% "lift-json" % "2.6+"
)
}
)
assemblyOption in assembly := (assemblyOption in assembly).value.copy(cacheUnzip = false)
assemblyMergeStrategy in assembly := {
case "BaseDateTime.class" => MergeStrategy.first
}
Don't know why the error is coming.

The setting assemblyMergeStrategy in assembly has the type String => MergeStrategy.
In your sbt file you are using the partial function
{
case "BaseDateTime.class" => MergeStrategy.first
}
which is syntactic sugar for
(s:String) => {
s match {
case "BaseDateTime.class" => MergeStrategy.first
}
}
This representation shows that the given function will not exhaustively match all passed strings. In your case sbt-assembly tried to merge the file named spray\http\parser\ProtocolParameterRules$$anonfun$DeltaSeconds$1.class into the fat jar, but could not find any matching merge strategy. You need a "default" case also:
(s:String) => {
s match {
case "BaseDateTime.class" => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
}
Or written as partial function:
{
case "BaseDateTime.class" => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
I also ran into the same issue when sbt-assembly's assembly task failed to create a fat jar due to a name conflict in the elasticsearch and its transitive joda-time dependencies. Elasticsearch redefines the class org.joda.time.base.BaseDateTime which is already implemented in the joda-time library. I've followed your approach to tell sbt-assembly how resolve this conflict using the following assemblyMergeStrategy:
assemblyMergeStrategy in assembly := {
case "org/joda/time/base/BaseDateTime.class" => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

SBT Assembly - Deduplicate - scala

https://github.com/sbt/sbt-assembly#excluding-jars-and-files you can define assemblyMergeStrategy and probably discard ony file that you listed as they are all in 'unused' package.

Related

Can't shade jars by shade plugin of SBT

No TypeTag available for String

Multi Module SBT projects issues

scala.MatchError: org\apache\commons\io\IOCase.class (of class java.lang.String) in sbt+assembly

scala.MatchError when creating fat jar using sbt assembly

Categories

Resources