Conflicting files in uber-jar creation in SBT using sbt-assembly - scala

I am trying to compile and package a fat jar using SBT and I keep running into the following error. I have tried everything from using library dependency exclude and merging.
[trace] Stack trace suppressed: run last *:assembly for the full output.
[error] (*:assembly) deduplicate: different file contents found in the following:
[error] /Users/me/.ivy2/cache/org.slf4j/slf4j-api/jars/slf4j-api-1 .7.10.jar:META-INF/maven/org.slf4j/slf4j-api/pom.properties
[error] /Users/me/.ivy2/cache/com.twitter/parquet-format/jars/parquet-format-2.2.0-rc1.jar:META-INF/maven/org.slf4j/slf4j-api/pom.properties
[error] Total time: 113 s, completed Jul 10, 2015 1:57:21 AM
The current incarnation of my build.sbt file is below:
import AssemblyKeys._
assemblySettings
name := "ldaApp"
version := "0.1"
scalaVersion := "2.10.4"
mainClass := Some("myApp")
libraryDependencies +="org.scalanlp" %% "breeze" % "0.11.2"
libraryDependencies +="org.scalanlp" %% "breeze-natives" % "0.11.2"
libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.3.1"
libraryDependencies +="org.ini4j" % "ini4j" % "0.5.4"
jarName in assembly := "myApp"
net.virtualvoid.sbt.graph.Plugin.graphSettings
libraryDependencies += "org.slf4j" %% "slf4j-api"" % "1.7.10" % "provided"
I realize I am doing something wrong...I just have no idea what.

Here is how you can handle these merge issues.
import sbtassembly.Plugin._
lazy val assemblySettings = sbtassembly.Plugin.assemblySettings ++ Seq(
publishArtifact in packageScala := false, // Remove scala from the uber jar
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{
case PathList("META-INF", "CHANGES.txt") => MergeStrategy.first
// ...
case PathList(ps # _*) if ps.last endsWith "pom.properties" => MergeStrategy.first
case x => old(x)
}
}
)
Then add these settings to your project.
lazy val projectToJar = Project(id = "MyApp", base = file(".")).settings(assemblySettings: _*)

I got your assembly build running by removing spark from the fat jar (mllib is already included in spark).
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "1.3.1" % "provided"
Like vitalii said in a comment, this solution was already here. I understand that spending hours on a problem without finding the fix can be frustrating but please be nice.

Related

Caliban federation with scala 3

There is no caliban.federation for scala 3 yet.
My question is what is a correct way to use it along with scala 3 libraries?
For now I have such a dependencies in my build.sbt:
lazy val `bookings` =
project
.in(file("."))
.settings(
scalaVersion := "3.0.1",
name := "bookings"
)
.settings(commonSettings)
.settings(dependencies)
lazy val dependencies = Seq(
libraryDependencies ++= Seq(
"com.github.ghostdogpr" %% "caliban-zio-http" % "1.1.0"
),
libraryDependencies ++= Seq(
org.scalatest.scalatest,
org.scalatestplus.`scalacheck-1-15`,
).map(_ % Test),
libraryDependencies +=
("com.github.ghostdogpr" %% "caliban-federation" % "1.1.0")
.cross(CrossVersion.for3Use2_13)
But when I'm trying to build it, it's erroring:
[error] (update) Conflicting cross-version suffixes in:
dev.zio:zio-query,
org.scala-lang.modules:scala-collection-compat,
dev.zio:zio-stacktracer,
dev.zio:izumi-reflect,
com.github.ghostdogpr:caliban-macros,
dev.zio:izumi-reflect-thirdparty-boopickle-shaded,
dev.zio:zio,
com.github.ghostdogpr:caliban,
dev.zio:zio-streams

Running Fat jar and getting path not found exception in SBT

I have a scala project which is referring to a file created in the src/main/resources/MyFile.csv.
I created a fat jar and following is my build.sbt
name := "my-project"
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.4.7",
"org.apache.spark" %% "spark-sql" % "2.4.7"
)
assemblyMergeStrategy in assembly := {
case PathList("META-INF","services",xs # _*) => MergeStrategy.filterDistinctLines
case PathList("META-INF",xs # _*) => MergeStrategy.discard
case "application.conf" => MergeStrategy.concat
case _ => MergeStrategy.first
}
But when I try running my fat jar I get the following error
Exception in thread "main" org.apache.spark.sql.AnalysisException:
Path does not exist:
file:/path/to/myProject/target/scala-2.11/src/main/resources/MyFile.csv
I am using the following command to run the jar
java -cp myProject.jar com.organization.myProject
Is there a way to include my file/folder paths in the build.sbt file?
UPDATE
I changed my build.sbt as follows
name := "my-project"
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.4.7",
"org.apache.spark" %% "spark-sql" % "2.4.7"
)
scalaSource in Compile := baseDirectory.value / "main"
assemblyMergeStrategy in assembly := {
case PathList("META-INF","services",xs # _*) => MergeStrategy.filterDistinctLines
case PathList("META-INF",xs # _*) => MergeStrategy.discard
case "application.conf" => MergeStrategy.concat
case _ => MergeStrategy.first
}
But its still giving the path not found exception. Anybody have an idea?

How to fix "origin location must be absolute" error in sbt project (with Spark 2.4.5 and DeltaLake 0.6.1)?

I am trying to setup a SBT project for Spark 2.4.5 with DeltaLake 0.6.1 . My build file is as follows.
However seems this configuration cannot resolve some dependencies.
[info] Reapplying settings...
[info] Set current project to red-basket-pipelnes (in build file:/Users/ashika.umagiliya/git-repo/redbasket-pipelines/red-basket-pipelnes/)
[info] Updating ...
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: UNRESOLVED DEPENDENCIES ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: org.antlr#antlr4;4.7: org.antlr#antlr4;4.7!antlr4.pom(pom.original) origin location must be absolute: file:/Users/ashika.umagiliya/.m2/repository/org/antlr/antlr4/4.7/antlr4-4.7.pom
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn]
[warn] Note: Unresolved dependencies path:
[warn] org.antlr:antlr4:4.7
[warn] +- io.delta:delta-core_2.11:0.6.1 (/Users/ashika.umagiliya/git-repo/redbasket-pipelines/red-basket-pipelnes/build.sbt#L13-26)
[warn] +- com.mycompany.dpd.solutions:deltalake-pipelnes_2.11:1.0
[error] sbt.librarymanagement.ResolveException: unresolved dependency: org.antlr#antlr4;4.7: org.antlr#antlr4;4.7!antlr4.pom(pom.original) origin location must be absolute: file:/Users/ashika.umagiliya/.m2/repository/org/antlr/antlr4/4.7/antlr4-4.7.pom
[error] at sbt.internal.librarymanagement.IvyActions$.resolveAndRetrieve(IvyActions.scala:332)
build.sbt
name := "deltalake-pipelnes"
version := "1.0"
organization := "com.mycompany.dpd.solutions"
// The compatible Scala version for Spark 2.4.1 is 2.11
scalaVersion := "2.11.12"
val sparkVersion = "2.4.5"
val scalatestVersion = "3.0.5"
val deltaLakeCore = "0.6.1"
val sparkTestingBaseVersion = s"${sparkVersion}_0.14.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
"org.apache.spark" %% "spark-avro" % sparkVersion % "provided",
"io.delta" %% "delta-core" % deltaLakeCore,
"org.scalatest" %% "scalatest" % scalatestVersion % "test",
"com.holdenkarau" %% "spark-testing-base" % sparkTestingBaseVersion % "test"
)
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
assemblyMergeStrategy in assembly := {
case PathList("org", "apache", xs # _*) => MergeStrategy.last
case PathList("changelog.txt") => MergeStrategy.last
case PathList(ps # _*) if ps.last contains "spring" => MergeStrategy.last
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
resolvers ++= Seq(
"SPDB Maven Repository" at "https://artifactory.mycompany-it.com/spdb-mvn/",
Resolver.mavenLocal)
publishMavenStyle := true
publishTo := {
val repoBaseUrl = "https://artifactory.mycompany-it.com/"
if (isSnapshot.value)
Some("snapshots" at repoBaseUrl + "spdb-mvn-snapshot/")
else
Some("releases" at repoBaseUrl + "spdb-mvn-release/")
}
publishConfiguration := publishConfiguration.value.withOverwrite(true)
publishLocalConfiguration := publishLocalConfiguration.value.withOverwrite(true)
credentials += Credentials(Path.userHome / ".sbt" / ".credentials")
artifact in (Compile, assembly) := {
val art = (artifact in (Compile, assembly)).value
art.withClassifier(Some("assembly"))
}
addArtifact(artifact in (Compile, assembly), assembly)
parallelExecution in Test := false
Any tips on how to fix this?
I haven't managed to figure it out myself when and why it happens, but I did experience similar resolution-related errors earlier.
Whenever I run into issues like yours I usually delete the affected directory (e.g. /Users/ashika.umagiliya/.m2/repository/org/antlr) and start over. It usually helps. If not, I delete ~./ivy2 too and start over. That's kind of sledge hammer and does the job (aka if all you have is a hammer, everything looks like a nail).
I always make sure to use the latest and greatest sbt. You seem to be on macOS so use sdk update early and often.
I'd also recommend using the latest and greatest versions of the libraries, and more specifically, of Spark it'd be 2.4.7 (in the 2.4.x line) while Delta Lake should be 0.8.0.

SBT, how to add unmanaged JARs to IntelliJ?

I have build.sbt file:
import sbt.Keys.libraryDependencies
lazy val scalatestVersion = "3.0.4"
lazy val scalaMockTestSupportVersion = "3.6.0"
lazy val typeSafeConfVersion = "1.3.2"
lazy val scalaLoggingVersion = "3.7.2"
lazy val logbackClassicVersion = "1.2.3"
lazy val commonSettings = Seq(
organization := "com.stulsoft",
version := "0.0.1",
scalaVersion := "2.12.4",
scalacOptions ++= Seq(
"-feature",
"-language:implicitConversions",
"-language:postfixOps"),
libraryDependencies ++= Seq(
"com.typesafe.scala-logging" %% "scala-logging" % scalaLoggingVersion,
"ch.qos.logback" % "logback-classic" % logbackClassicVersion,
"com.typesafe" % "config" % typeSafeConfVersion,
"org.scalatest" %% "scalatest" % scalatestVersion % "test",
"org.scalamock" %% "scalamock-scalatest-support" % scalaMockTestSupportVersion % "test"
)
)
unmanagedJars in Compile += file("lib/opencv-331.jar")
lazy val pimage = project.in(file("."))
.settings(commonSettings)
.settings(
name := "pimage"
)
parallelExecution in Test := true
It is working fine, if I use sbt run, but I cannot run from IntelliJ.
I receive error:
java.lang.UnsatisfiedLinkError: no opencv_java331 in java.library.path
I can add manually (File->Project Structure->Libraries->+ necessary dir).
My question is: is it possible to specify build.sbt that it will automatically create IntelliJ project with specified library?
I would say try to: drag and drop the dependency into the /lib which should be in the root directory of your project, if it's not there create it.
Run commands:
sbt reload
sbt update
Lastly you could try something like:
File -> Project Structure -> Modules -> then mark all the modules usually 1 to 3, delete them (don't worry won't delete your files) -> hit the green plus sign and select Import Module -> select root directory of your project and it should then refresh it
If none of these help, I'm out of ideas.

Run Scala Spark with SBT

The code below causes Spark to become unresponsive:
System.setProperty("hadoop.home.dir", "H:\\winutils");
val sparkConf = new SparkConf().setAppName("GroupBy Test").setMaster("local[1]")
val sc = new SparkContext(sparkConf)
def main(args: Array[String]) {
val text_file = sc.textFile("h:\\data\\details.txt")
val counts = text_file
.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
println(counts);
}
I'm setting hadoop.home.dir in order to avoid the error mentioned here: Failed to locate the winutils binary in the hadoop binary path
This is how my build.sbt file looks like:
lazy val root = (project in file(".")).
settings(
name := "hello",
version := "1.0",
scalaVersion := "2.11.0"
)
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.11" % "1.6.0"
)
Should Scala Spark be compilable/runnable using the sbt code in the file?
I think code is fine, it was taken verbatim from http://spark.apache.org/examples.html, but I am not sure if the Hadoop WinUtils path is required.
Update: "The solution was to use fork := true in the main build.sbt"
Here is the reference: Spark: ClassNotFoundException when running hello world example in scala 2.11
This is the content of my build.sbt. Notice that if your internet connection is slow it might take some time.
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.1",
"org.apache.spark" %% "spark-mllib" % "1.6.1",
"org.apache.spark" %% "spark-sql" % "1.6.1",
"org.slf4j" % "slf4j-api" % "1.7.12"
)
run in Compile <<= Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run))
In the main I added this, however it depends on where you placed the winutil folder.
System.setProperty("hadoop.home.dir", "c:\\winutil")