assemblyMergeStrategy type error - scala

I am trying to set up a small project to have an aws-lambda written in scala:
javacOptions ++= Seq("-source", "1.8", "-target", "1.8", "-Xlint")
lazy val root = (project in file(".")).
settings(
name := "xxx",
version := "0.1",
scalaVersion := "2.12.3",
retrieveManaged := true
)
libraryDependencies ++= Seq(
"com.amazonaws" % "aws-lambda-java-core" % "1.1.0" % Provided,
"com.amazonaws" % "aws-lambda-java-events" % "1.1.0" % Provided,
"org.scalatest" % "scalatest" % "2.2.6" % Test
)
scalacOptions += "-deprecation"
assemblyMergeStrategy in assembly <<= (assemblyMergeStrategy in assembly) {
(old) => {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
}
Results in :
xxx/build.sbt:25: error: not found: value assemblyMergeStrategy
assemblyMergeStrategy in assembly <<= (assemblyMergeStrategy in
assembly) { ^ [error] Type error in expression
The source of inspiration was this blog.
Also tried the provided version as mergeStrategy might have been replaced by assemblyMergeStrategy.

Did you reference assembly plugin in your project/plugins.sbt file?
assemblyMergeStrategy is defined by the plugin.
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")

Related

No TypeTag available for String

I'm trying to run my fat jar using scala -classpath "target/scala-2.13/Capstone-assembly-0.1.0-SNAPSHOT.jar" src/main/scala/project/Main.scala, but I get an error caused by .toString: val generateUUID: UserDefinedFunction = udf((str: String) => nameUUIDFromBytes(str.getBytes).toString) No TypeTag available for String, when I run from IDE everything is working but not from jar
My build.sbt:
ThisBuild / version := "0.1.0-SNAPSHOT"
ThisBuild / scalaVersion := "2.13.8"
lazy val root = (project in file("."))
.settings(
name := "Capstone"
)
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.2.0",
"org.apache.spark" %% "spark-sql" % "3.2.0",
"org.scalatest" %% "scalatest" % "3.2.12" % "test",
"org.rogach" %% "scallop" % "4.1.0"
)
compileOrder := CompileOrder.JavaThenScala
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
If I delete .toString I get an error Schema for type java.util.UUID is not supported
I tried to change String to java.util.String or scala.Predef.String, but this didn't work

Multi Module SBT projects issues

I am working on designing a multi module project in SBT.
The idea is to build two subprojects in SBT with one subproject having Apache spark 2.3 dependency and
the other one with Apache Spark 2.4 dependency. Each sub project should be capable of building individual
jars. I am pasting the build.sbt. While I try to run sbt assembly on each of these sub projects.
I am getting the below error.I tried removing the cache folder in .ivy2 and tried
recreating the project structure.
Error message(included relevant logs only)
[IJ]> multi2/assembly
[info] Including from cache: jackson-xc-1.9.13.jar
[info] Including from cache: minlog-1.3.0.jar
[warn] Merging 'META-INF\ASL2.0' with strategy 'discard'
[warn] Merging 'META-INF\DEPENDENCIES' with strategy 'discard'
[error] C:\Users\user_home\.ivy2\cache\org.apache.spark\spark-core_2.11\jars\spark-core_2.11-2.4.3.jar:org/apache/spark/unused/UnusedStubClass.class
[error] C:\Users\user_home\.ivy2\cache\org.apache.spark\spark-launcher_2.11\jars\spark-launcher_2.11-2.4.3.jar:org/apache/spark/unused/UnusedStubClass.class
[error] C:\Users\user_home\.ivy2\cache\org.apache.spark\spark-tags_2.11\jars\spark-tags_2.11-2.4.3.jar:org/apache/spark/unused/UnusedStubClass.class
[error] C:\Users\user_home\.ivy2\cache\org.spark-project.spark\unused\jars\unused-1.0.0.jar:org/apache/spark/unused/UnusedStubClass.class
Likewise there are other classes that cause errors as above..
Please suggest, and let me know for any additional information.
name := "testrepo"
organization in ThisBuild := "com.yt"
scalaVersion in ThisBuild := "2.11.8"
scalaBinaryVersion := "2.11"
lazy val common = (project in file("common"))
.settings(
name := "common",
commonSettings,
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.8.7",
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.8.7",
dependencyOverrides += "com.fasterxml.jackson.module" %% "jackson-module-scala" %
dependencyOverrides += "io.netty" % "netty" % "3.9.9.Final",
dependencyOverrides += "commons-net" % "commons-net" % "2.2",
dependencyOverrides += "com.google.guava" % "guava" % "11.0.2",
dependencyOverrides += "com.google.code.findbugs" % "jsr305" % "1.3.9",
libraryDependencies ++= commonDependencies
)
.disablePlugins(AssemblyPlugin)
lazy val multi1 = (project in file("multi1"))
.settings(
name := "multi1",
commonSettings,
assemblySettings,
libraryDependencies ++= commonDependencies ++ Seq(
dependencies.spark23
)
)
.dependsOn(
common
)
lazy val multi2 = (project in file("multi2"))
.settings(
name := "multi2",
commonSettings,
assemblySettings,
libraryDependencies ++= commonDependencies ++ Seq(
dependencies.spark24
)
)
.dependsOn(
common
)
/*val overrides = Seq("com.fasterxml.jackson.core" % "jackson-core" % "2.8.7",
"com.fasterxml.jackson.core" % "jackson-databind" % "2.8.7",
"com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.8.7") */
// DEPENDENCIES
lazy val dependencies =
new {
val hivejdbc="org.apache.hive" % "hive-jdbc" % "0.13.0" % "provided"
val spark23V = "2.3.0.cloudera3"
val spark24V="2.4.3"
val spark23="org.apache.spark" %% "spark-core" % spark23V
val spark24="org.apache.spark" %% "spark-core" % spark24V
}
lazy val commonDependencies = Seq(
dependencies.hivejdbc
)
lazy val compilerOptions = Seq(
"-unchecked",
"-feature",
"-language:existentials",
"-language:higherKinds",
"-language:implicitConversions",
"-language:postfixOps",
"-deprecation",
"-encoding",
"utf8"
)
lazy val commonSettings = Seq(
scalacOptions ++= compilerOptions,
resolvers ++= Seq(
"Local Maven Repository" at "file://" + Path.userHome.absolutePath + "/.m2/repository",
Resolver.sonatypeRepo("releases"),
Resolver.sonatypeRepo("snapshots")
)
)
lazy val assemblySettings = Seq(
assemblyJarName in assembly := name.value + ".jar",
assemblyMergeStrategy in assembly := {
case PathList("org.apache.spark", "spark-core_2.11", xs # _*) => MergeStrategy.last
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case "application.conf" => MergeStrategy.concat
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
)

Where do you put assemblyMergeStrategy in build.sbt?

I have a MergeStrategy problem. How do I resolve it? Why are all those squiggly lines there?
The error message is Type mismatch, expected: String => MergeStrategy, actual: String => Any
I am new to scala, so, I have no idea what that syntax means. I have tried copying different merge strategies from all over stackoverflow and none and them work.
I have scala version 2.12.7 and sbt version 1.2.6.
My build.sbt looks like this:
lazy val root = (project in file(".")).
settings(
name := "bigdata-mx-2",
version := "0.1",
scalaVersion := "2.12.7",
mainClass in Compile := Some("Main")
)
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-core" % "1.2.1",
"org.apache.parquet" % "parquet-hadoop" % "1.10.0",
"junit" % "junit" % "4.12" % Test,
"org.scalatest" %% "scalatest" % "3.2.0-SNAP10" % Test,
"org.scalacheck" %% "scalacheck" % "1.14.0" % Test,
"org.scala-lang" % "scala-library" % "2.12.7"
)
// Where do I put this thing:
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
Maybe I'm not putting it into the right place, where does it go?

SBT Assembly - Deduplicate

I got the following SBT files:
.
-- root
-- plugins.sbt
-- build.sbt
With plugins.sbt containing the following:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.12.0")
And build.sbt containing the following:
import sbt.Keys._
resolvers in ThisBuild ++= Seq("Apache Development Snapshot Repository" at "https://repository.apache.org/content/repositories/snapshots/", Resolver.sonatypeRepo("public"))
name := "flink-experiment"
lazy val commonSettings = Seq(
organization := "my.organisation",
version := "0.1.0-SNAPSHOT"
)
val flinkVersion = "1.1.0"
val sparkVersion = "2.0.0"
val kafkaVersion = "0.8.2.1"
val hadoopDependencies = Seq(
"org.apache.avro" % "avro" % "1.7.7" % "provided",
"org.apache.avro" % "avro-mapred" % "1.7.7" % "provided"
)
val flinkDependencies = Seq(
"org.apache.flink" %% "flink-scala" % flinkVersion % "provided",
"org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided",
"org.apache.flink" %% "flink-connector-kafka-0.8" % flinkVersion exclude("org.apache.kafka", "kafka_${scala.binary.version}")
)
val sparkDependencies = Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming-kafka-0-8" % sparkVersion exclude("org.apache.kafka", "kafka_${scala.binary.version}")
)
val kafkaDependencies = Seq(
"org.apache.kafka" %% "kafka" % "0.8.2.1"
)
val toolDependencies = Seq(
"com.github.scopt" %% "scopt" % "3.5.0"
)
val testDependencies = Seq(
"org.scalactic" %% "scalactic" % "2.2.6",
"org.scalatest" %% "scalatest" % "2.2.6" % "test"
)
lazy val root = (project in file(".")).
settings(commonSettings: _*).
settings(
libraryDependencies ++= hadoopDependencies,
libraryDependencies ++= flinkDependencies,
libraryDependencies ++= sparkDependencies,
libraryDependencies ++= kafkaDependencies,
libraryDependencies ++= toolDependencies,
libraryDependencies ++= testDependencies
).
enablePlugins(AssemblyPlugin)
run in Compile <<= Defaults.runTask(fullClasspath in Compile, mainClass in(Compile, run), runner in(Compile, run))
mainClass in assembly := Some("my.organization.experiment.Experiment")
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
Now sbt clean assembly sadly gives the following exception:
[error] (root/*:assembly) deduplicate: different file contents found in the following:
[error] /home/kevin/.ivy2/cache/org.apache.spark/spark-streaming-kafka-0-8_2.10/jars/spark-streaming-kafka-0-8_2.10-2.0.0.jar:org/apache/spark/unused/UnusedStubClass.class
[error] /home/kevin/.ivy2/cache/org.apache.spark/spark-tags_2.10/jars/spark-tags_2.10-2.0.0.jar:org/apache/spark/unused/UnusedStubClass.class
[error] /home/kevin/.ivy2/cache/org.spark-project.spark/unused/jars/unused-1.0.0.jar:org/apache/spark/unused/UnusedStubClass.class
How can I fix this?
https://github.com/sbt/sbt-assembly#excluding-jars-and-files
you can define assemblyMergeStrategy and probably discard ony file that you listed as they are all in 'unused' package.
You can override the default strategy for conflicts:
val defaultMergeStrategy: String => MergeStrategy = {
case x if Assembly.isConfigFile(x) =>
MergeStrategy.concat
case PathList(ps # _*) if Assembly.isReadme(ps.last) || Assembly.isLicenseFile(ps.last) =>
MergeStrategy.rename
case PathList("META-INF", xs # _*) =>
(xs map {_.toLowerCase}) match {
case ("manifest.mf" :: Nil) | ("index.list" :: Nil) | ("dependencies" :: Nil) =>
MergeStrategy.discard
case ps # (x :: xs) if ps.last.endsWith(".sf") || ps.last.endsWith(".dsa") =>
MergeStrategy.discard
case "plexus" :: xs =>
MergeStrategy.discard
case "services" :: xs =>
MergeStrategy.filterDistinctLines
case ("spring.schemas" :: Nil) | ("spring.handlers" :: Nil) =>
MergeStrategy.filterDistinctLines
case _ => MergeStrategy.deduplicate
}
case _ => MergeStrategy.deduplicate
}
as you can see assembly default strategy is MergeStrategy.deduplicate, you can add a new case case UnusedStubClass => MergeStrategy.first

scala.MatchError when creating fat jar using sbt assembly

I am trying to create a jar file for my project. I am using sbt assembly command to generate one.
But getting error when it starts merging files:
scala.MatchError:
spray\http\parser\ProtocolParameterRules$$anonfun$DeltaSeconds$1.class
(of class java.lang.String)
My build.sbt looks like this:
lazy val commonSettings = Seq(
name := "SampleSpray",
version := "1.0",
scalaVersion := "2.11.7",
organization := "com.test"
)
mainClass in assembly := Some("com.example.Boot")
lazy val root = (project in file(".")).
settings(commonSettings: _*).
settings(
name := "test",
resolvers += "spray repo" at "http://repo.spray.io",
libraryDependencies ++= {
val akkaV = "2.3.9"
val sprayV = "1.3.3"
Seq(
"io.spray" %% "spray-can" % sprayV,
"io.spray" %% "spray-routing" % sprayV,
"io.spray" %% "spray-json" % "1.3.2",
"io.spray" %% "spray-testkit" % sprayV % "test",
"com.typesafe.akka" %% "akka-actor" % akkaV,
"com.typesafe.akka" %% "akka-testkit" % akkaV % "test",
"org.specs2" %% "specs2-core" % "2.3.11" % "test",
"com.sksamuel.elastic4s" %% "elastic4s-core" % "2.1.0",
"com.sksamuel.elastic4s" %% "elastic4s-jackson" % "2.1.0",
"net.liftweb" %% "lift-json" % "2.6+"
)
}
)
assemblyOption in assembly := (assemblyOption in assembly).value.copy(cacheUnzip = false)
assemblyMergeStrategy in assembly := {
case "BaseDateTime.class" => MergeStrategy.first
}
Don't know why the error is coming.
The setting assemblyMergeStrategy in assembly has the type String => MergeStrategy.
In your sbt file you are using the partial function
{
case "BaseDateTime.class" => MergeStrategy.first
}
which is syntactic sugar for
(s:String) => {
s match {
case "BaseDateTime.class" => MergeStrategy.first
}
}
This representation shows that the given function will not exhaustively match all passed strings. In your case sbt-assembly tried to merge the file named spray\http\parser\ProtocolParameterRules$$anonfun$DeltaSeconds$1.class into the fat jar, but could not find any matching merge strategy. You need a "default" case also:
(s:String) => {
s match {
case "BaseDateTime.class" => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
}
Or written as partial function:
{
case "BaseDateTime.class" => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
I also ran into the same issue when sbt-assembly's assembly task failed to create a fat jar due to a name conflict in the elasticsearch and its transitive joda-time dependencies. Elasticsearch redefines the class org.joda.time.base.BaseDateTime which is already implemented in the joda-time library. I've followed your approach to tell sbt-assembly how resolve this conflict using the following assemblyMergeStrategy:
assemblyMergeStrategy in assembly := {
case "org/joda/time/base/BaseDateTime.class" => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}