Spark Build Fails Because Of Avro Mapred Dependency - scala

I have a scala spark project that fails because of some dependency hell. Here is my build.sbt:
scalaVersion := "2.13.3"
val SPARK_VERSION = "3.2.0"
libraryDependencies ++= Seq(
"com.typesafe" % "config" % "1.3.1",
"com.github.pathikrit" %% "better-files" % "3.9.1",
"org.apache.commons" % "commons-compress" % "1.14",
"commons-io" % "commons-io" % "2.6",
"com.typesafe.scala-logging" %% "scala-logging" % "3.9.4",
"ch.qos.logback" % "logback-classic" % "1.2.3" exclude ("org.slf4j", "*"),
"org.plotly-scala" %% "plotly-render" % "0.8.1",
"org.apache.spark" %% "spark-sql" % SPARK_VERSION,
"org.apache.spark" %% "spark-mllib" % SPARK_VERSION,
// Test dependencies
"org.scalatest" %% "scalatest" % "3.2.10" % Test,
"com.amazon.deequ" % "deequ" % "2.0.0-spark-3.1" % Test,
"org.awaitility" % "awaitility" % "3.0.0" % Test,
"org.apache.spark" %% "spark-core" % SPARK_VERSION % Test,
"org.apache.spark" %% "spark-sql" % SPARK_VERSION % Test
Here is the build failure:
[error] stack trace is suppressed; run 'last update' for the full output
[error] stack trace is suppressed; run 'last ssExtractDependencies' for the full output
[error] (update) lmcoursier.internal.shaded.coursier.error.FetchError$DownloadingArtifacts: Error fetching artifacts:
[error] https://repo1.maven.org/maven2/org/apache/avro/avro-mapred/1.10.2/avro-mapred-1.10.2-hadoop2.jar: not found: https://repo1.maven.org/maven2/org/apache/avro/avro-mapred/1.10.2/avro-mapred-1.10.2-hadoop2.jar
[error] (ssExtractDependencies) lmcoursier.internal.shaded.coursier.error.FetchError$DownloadingArtifacts: Error fetching artifacts:
[error] https://repo1.maven.org/maven2/org/apache/avro/avro-mapred/1.10.2/avro-mapred-1.10.2-hadoop2.jar: not found: https://repo1.maven.org/maven2/org/apache/avro/avro-mapred/1.10.2/avro-mapred-1.10.2-hadoop2.jar
[error] Total time: 5 s, completed Dec 19, 2021, 5:14:33 PM
[info] shutting down sbt server
Is this caused by the fact that I',m using Scala 2.13?

I had to do the inevitable and add this to my build.sbt:
ThisBuild / useCoursier := false

Related

Error:- java.lang.NoClassDefFoundError: scala/Product$class

Test project Build.sbt
scalaVersion := "2.12.8"
crossScalaVersions := Seq("2.11.12", "2.12.8")
val scalaTestVersion = "3.0.5"
val rocksDBVersion = "5.17.2"
val kafkaVersion = "2.2.0"
lazy val deps = Seq(
// "javax.ws.rs" % "javax.ws.rs-api" % "2.1" jar(),
"org.apache.kafka" % "kafka-clients" % kafkaVersion,
"org.apache.kafka" % "kafka-clients" % kafkaVersion classifier "test",
"org.apache.kafka" % "kafka-streams" % kafkaVersion,
"org.apache.kafka" % "kafka-streams" % kafkaVersion classifier "test",
"org.apache.kafka" %% "kafka" % kafkaVersion,
"org.apache.kafka" % "kafka-streams-test-utils" % kafkaVersion,
"org.scalatest" %% "scalatest" % scalaTestVersion % "test",
"org.rocksdb" % "rocksdbjni" % rocksDBVersion % "test"
)
Test Project plugins.sbt
addSbtPlugin("com.github.gseitz" % "sbt-release" % "1.0.8")
This is my Test application sbt configuration file and when I run sbt package it will create a jar file for me then I have to use that jar in my other project.
Other project sbt file
scalaVersion := "2.12.6"
object Dependencies {
val deps = Seq(
"org.json4s" %% "json4s-native" % "3.5.3",
"ch.qos.logback" % "logback-classic" % "1.2.3",
"cool.graph" % "cuid-java" % "0.1.1",
"org.apache.kafka" % "kafka-streams" % "2.2.0",
"com.typesafe" % "config" % "1.3.3",
"org.scalaz" %% "scalaz-core" % "7.2.20",
"org.apache.commons" % "commons-lang3" % "3.8.1",
//Test
"org.scalatest" %% "scalatest" % "3.0.4" % Test,
"org.mockito" % "mockito-all" % "1.10.19" % Test,
"com.abc" %% "kafka-streams-test-kit" % "2.2.0" % Test
)
}
Other Project plugins.sbt
addSbtPlugin("se.marcuslonnberg" % "sbt-docker" % "1.5.0")
addSbtPlugin("net.virtual-void" % "sbt-dependency-graph" % "0.9.0")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6")
addSbtPlugin("com.github.gseitz" % "sbt-release" % "1.0.8")
As I'm new in Scala and I don't know what i did wrong in that as when i put my test scala jar into another project It will give me the error.
Error:-
java.lang.NoClassDefFoundError: scala/Product$class
[error] at com.shepherd.kafka.streams.test.kit.MockedStreams$Builder.<init>(MockedStreams.scala:17)
[error] at com.shepherd.kafka.streams.test.kit.MockedStreams$.apply(MockedStreams.scala:15)
[error] at com.shepherd.integration.WindowedThresholdSpec.$anonfun$new$4(WindowedThresholdSpec.scala:38)
[error] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
Can anyone please help me on this as I'm new also I have see this one java.lang.NoClassDefFoundError: scala/Product$class but this not help me so I have put my problem here.
Try with both the way like uploaded jar directly into the code and pull by S3 but the error is still exists.
resolvers += "Java.net Maven2 Repository" at "https://repo1.maven.org/maven2/"
resolvers += "S3" at "https://s3-eu-west-1.amazonaws.com/com.abc/" -> Test jar file
NOTE:- I have Set the same scala version in my both the application "2.12.6" but issue is still exists.

SBT Assembly plugin Errors out

I have written the following sbt file
name := "Test"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-client" % "2.7.1",
"org.apache.spark" % "spark-core_2.10" % "1.3.0",
"org.apache.avro" % "avro" % "1.7.7",
"org.apache.avro" % "avro-mapred" % "1.7.7"
)
mainClass := Some("com.test.Foo")
I also have the following assembly.sbt file in projects folder
resolvers += Resolver.url("bintray-sbt-plugins", url("http://dl.bintray.com/sbt/sbt-plugin-releases"))(Resolver.ivyStylePatterns)
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.0")
when i do sbt assembly i get a huge list of errors
[error] (*:assembly) deduplicate: different file contents found in the following:
[error] /Users/abhishek.srivastava/.ivy2/cache/com.esotericsoftware.kryo/kryo/bundles/kryo-2.21.jar:com/esotericsoftware/minlog/Log$Logger.class
[error] /Users/abhishek.srivastava/.ivy2/cache/com.esotericsoftware.minlog/minlog/jars/minlog-1.2.jar:com/esotericsoftware/minlog/Log$Logger.class
[error] deduplicate: different file contents found in the following:
[error] /Users/abhishek.srivastava/.ivy2/cache/com.esotericsoftware.kryo/kryo/bundles/kryo-2.21.jar:com/esotericsoftware/minlog/Log.class
[error] /Users/abhishek.srivastava/.ivy2/cache/com.esotericsoftware.minlog/minlog/jars/minlog-1.2.jar:com/esotericsoftware/minlog/Log.class
[error] deduplicate: different file contents found in the following:
I was able to resolve the problem. actually there is no need to build a fat jar because the "spark-submit" tool will have everything in the classpath anyway.
thus the right way to build the jar file is
name := "Test"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-client" % "2.7.1" % "provided",
"org.apache.spark" % "spark-core_2.10" % "1.3.0" % "provided",
"org.apache.avro" % "avro" % "1.7.7" % "provided",
"org.apache.avro" % "avro-mapred" % "1.7.7" % "provided"
)
mainClass := Some("com.test.Foo")
1. use the MergeStrategy, see sbt-assembly
2. exclude the duplicated jars, such as:
lazy val hbaseLibSeq = Seq(
("org.apache.hbase" % "hbase" % hbaseVersion).
excludeAll(
ExclusionRule(organization = "org.slf4j"),
ExclusionRule(organization = "org.mortbay.jetty"),
ExclusionRule(organization = "javax.servlet")),
("net.java.dev.jets3t" % "jets3t" % "0.6.1" % "provided").
excludeAll(ExclusionRule(organization = "javax.servlet"))
)
3. use the provided scope
show dependency tree:
➜ cat ~/.sbt/0.13/plugins/plugins.sbt
addSbtPlugin("net.virtual-void" % "sbt-dependency-graph" % "0.7.5")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.2")
➜ cat ~/.sbt/0.13/global.sbt
net.virtualvoid.sbt.graph.Plugin.graphSettings
➜ sbt dependency-graph

sbt project is very slow to resolve dependencies

My sbt project takes more than 15 minutes when I do
sbt clean compile
I am on a beefy machine on AWS. I am fairly certain its not a resource issue on cpu or internet bandwidth. Also, I have run this command a few times and hence the ivy cache is populated.
Here is all my build related files
/build.sbt
name := "ProjectX"
version := "1.0"
scalaVersion := "2.10.5"
libraryDependencies += ("org.apache.spark" %% "spark-streaming" % "1.4.1")
.exclude("org.slf4j", "slf4j-log4j12")
.exclude("log4j", "log4j")
.exclude("commons-logging", "commons-logging")
.%("provided")
libraryDependencies += ("org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.4.1")
.exclude("org.slf4j", "slf4j-log4j12")
.exclude("log4j", "log4j")
.exclude("commons-logging", "commons-logging")
libraryDependencies += "org.mongodb" %% "casbah" % "2.8.1"
//test
libraryDependencies += "org.scalatest" %% "scalatest" % "2.2.4" % "test"
//logging
libraryDependencies ++= Seq(
//facade
"org.slf4j" % "slf4j-api" % "1.7.12",
"org.clapper" %% "grizzled-slf4j" % "1.0.2",
//jcl (used by aws sdks)
"org.slf4j" % "jcl-over-slf4j" % "1.7.12",
//log4j1 (spark)
"org.slf4j" % "log4j-over-slf4j" % "1.7.12",
//log4j2
"org.apache.logging.log4j" % "log4j-api" % "2.3",
"org.apache.logging.log4j" % "log4j-core" % "2.3",
"org.apache.logging.log4j" % "log4j-slf4j-impl" % "2.3"
//alternative to log4j2
//"org.slf4j" % "slf4j-simple" % "1.7.5"
)
/project/build.properties
sbt.version = 0.13.8
/project/plugins.sbt
logLevel := Level.Warn
addSbtPlugin("org.scalastyle" %% "scalastyle-sbt-plugin" % "0.7.0")
resolvers += "sonatype-releases" at "https://oss.sonatype.org/content/repositories/releases/"
/project/assembly.sbt
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")
On the log do you see entries like:
[info] [SUCCESSFUL ] org.apache.spark#spark-streaming-kinesis-asl_2.10;1.4.1!spark-streaming-kinesis-asl_2.10.jar (239ms)
That's a sign that you're downloading these artifacts. In other words, the AMI you're launching doesn't have the Ivy cache populated.
Using sbt 0.13.12 on my laptop with SSD, I get about 5s for clean and then update.
so-31956971> update
[info] Updating {file:/xxx/so-31956971/}app...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[success] Total time: 5 s, completed Aug 25, 2016 4:00:00 AM

play-json breaks sbt build

Suddenly as of today my project has stopped compiling successfuly. Upon further investigation I've found out the reason is play-json library that I include in dependencies.
Here's my build.sbt:
name := """project-name"""
version := "1.0"
scalaVersion := "2.10.2"
libraryDependencies ++= Seq(
"com.typesafe.akka" %% "akka-actor" % "2.2.1",
"com.typesafe.akka" %% "akka-testkit" % "2.2.1",
"org.scalatest" %% "scalatest" % "1.9.1" % "test",
"org.bouncycastle" % "bcprov-jdk16" % "1.46",
"com.sun.mail" % "javax.mail" % "1.5.1",
"com.typesafe.slick" %% "slick" % "2.0.1",
"org.postgresql" % "postgresql" % "9.3-1101-jdbc41",
"org.slf4j" % "slf4j-nop" % "1.6.4",
"com.drewnoakes" % "metadata-extractor" % "2.6.2",
"com.typesafe.play" %% "play-json" % "2.2.2"
)
resolvers += "Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/"
If I try to create a new project in activator with all the lines except "com.typesafe.play" %% "play-json" % "2.2.2" then it compiles successfully. But once I add play-json I get the folloing error:
[error] References to undefined settings:
[error]
[error] *:playCommonClassloader from echo:run
[error]
[error] docs:managedClasspath from echo:run
[error]
[error] *:playReloaderClassloader from echo:run
[error]
[error] echo:playVersion from echo:echoTracePlayVersion
[error]
[error] *:playRunHooks from echo:playRunHooks
[error] Did you mean echo:playRunHooks ?
[error]
And I keep getting this error even if I remove play-json line. Why is it so? What should I do to fix it?

sbt 0.11.1 doesn't retrieve scalatra 2.1.0-SNAPSHOT dependency

I've just upgraded to sbt 0.11.1 that doesn't seem to be fetching a
certain dependency. Things worked fine before the upgrade.
I have this dependency:
"org.scalatra" %% "scalatra" % "2.1.0-SNAPSHOT",
And when I compile:
> update
[success] Total time: 0 s, completed Nov 18, 2011 5:44:16 PM
> compile
[info] Compiling 29 Scala sources and 1 Java source to
/home/yang/pod/sales/scala/target/scala-2.9.1/classes...
[error] /home/yang/pod/sales/scala/src/main/scala/com/pod/Web.scala:125:
not found: type ScalatraServlet
[error] class PodWeb extends ScalatraServlet with ScalateSupport with
FileUploadSupport {
[error] ^
[error] class file needed by ScalateSupport is missing.
[error] reference type ScalatraKernel of package org.scalatra refers
to nonexisting symbol.
[error] two errors found
[error] {file:/home/yang/pod/sales/scala/}pod/compile:compile:
Compilation failed
[error] Total time: 10 s, completed Nov 18, 2011 5:44:45 PM
The file seems to be missing:
$ ls /home/yang/.ivy2/cache/org.scalatra/scalatra_2.9.1/jars/
scalatra_2.9.1-2.1.0-SNAPSHOT-sources.jar
The file exists in the repo, though:
https://oss.sonatype.org/content/repositories/snapshots/org/scalatra/scalatra_2.9.1/2.1.0-SNAPSHOT/
This is still happening even if I blow away ~/.ivy2/. Any hints what's happening?
Complete build.sbt below:
name := "pod"
version := "1.0"
scalaVersion := "2.9.1"
seq(coffeeSettings: _*)
seq(webSettings :_*)
seq(sbtprotobuf.ProtobufPlugin.protobufSettings: _*)
libraryDependencies ++= Seq(
"org.scalaquery" % "scalaquery_2.9.0" % "0.9.4",
"postgresql" % "postgresql" % "9.0-801.jdbc4", // % "runtime",
"com.jolbox" % "bonecp" % "0.7.1.RELEASE",
"ru.circumflex" % "circumflex-orm" % "2.1-SNAPSHOT",
"ru.circumflex" % "circumflex-core" % "2.1-SNAPSHOT",
"net.sf.ehcache" % "ehcache-core" % "2.4.3",
// snapshots needed for scala 2.9.0 support
"org.scalatra" %% "scalatra" % "2.1.0-SNAPSHOT",
"org.scalatra" %% "scalatra-scalate" % "2.1.0-SNAPSHOT",
"org.scalatra" %% "scalatra-fileupload" % "2.1.0-SNAPSHOT",
"org.fusesource.scalate" % "scalate-jruby" % "1.5.0",
"org.fusesource.scalamd" % "scalamd" % "1.5", // % runtime,
"org.mortbay.jetty" % "jetty" % "6.1.22",
"net.debasishg" % "sjson_2.9.0" % "0.12",
"com.lambdaworks" % "scrypt" % "1.2.0",
"org.mortbay.jetty" % "jetty" % "6.1.22" % "container",
// "org.bowlerframework" %% "core" % "0.4.1",
"net.sf.opencsv" % "opencsv" % "2.1",
"org.apache.commons" % "commons-math" % "2.2",
"org.apache.commons" % "commons-lang3" % "3.0",
"com.google.protobuf" % "protobuf-java" % "2.4.1",
"ch.qos.logback" % "logback-classic" % "0.9.29",
"org.scalatest" % "scalatest_2.9.0" % "1.6.1",
"com.h2database" % "h2" % "1.3.158",
"pentaho.weka" % "pdm-3.7-ce" % "SNAPSHOT",
// this line doesn't work due to sbt bug:
// https://github.com/harrah/xsbt/issues/263
// work around by manually downloading this into the lib/ directory
// "org.rosuda" % "jri" % "0.9-1" from "https://dev.partyondata.com/deps/jri-0.9-1.jar",
"net.java.dev.jna" % "jna" % "3.3.0",
"org.scalala" % "scalala_2.9.0" % "1.0.0.RC2-SNAPSHOT",
"rhino" % "js" % "1.7R2",
"junit" % "junit" % "4.9",
"org.apache.commons" % "commons-email" % "1.2",
"commons-validator" % "commons-validator" % "1.3.1",
"oro" % "oro" % "2.0.8", // validator depends on this
"javax.servlet" % "servlet-api" % "2.5" % "provided->default"
)
fork in run := true
javaOptions in run ++= Seq(
"-Xmx3G",
"-Djava.library.path=" + System.getenv("HOME") +
"/R/x86_64-pc-linux-gnu-library/2.13/rJava/jri:" +
"/usr/lib/R/site-library/rJava/jri"
)
//javaOptions in run ++= Seq(
// "-Dcom.sun.management.jmxremote",
// "-Dcom.sun.management.jmxremote.port=3000",
// "-Dcom.sun.management.jmxremote.authenticate=false",
// "-Dcom.sun.management.jmxremote.ssl=false"
//)
scalacOptions ++= Seq("-g:vars", "-deprecation", "-unchecked")
// needed for the scalatra snapshots
resolvers ++= Seq(
"POD" at "https://dev.partyondata.com/deps/",
"Scala-Tools Snapshots" at "http://scala-tools.org/repo-snapshots/",
"Sonatype OSS Snapshots" at "http://oss.sonatype.org/content/repositories/snapshots/",
"Sonatype OSS releases" at "http://oss.sonatype.org/content/repositories/releases",
"ScalaNLP" at "http://repo.scalanlp.org/repo",
"Pentaho" at "http://repo.pentaho.org/artifactory/pentaho/",
"FuseSource snapshots" at "http://repo.fusesource.com/nexus/content/repositories/snapshots",
"JBoss" at "https://repository.jboss.org/nexus/content/repositories/thirdparty-releases"
)
initialCommands in consoleQuick := """
import scalala.scalar._;
import scalala.tensor.::;
import scalala.tensor.mutable._;
import scalala.tensor.dense._;
import scalala.tensor.sparse._;
import scalala.library.Library._;
import scalala.library.LinearAlgebra._;
import scalala.library.Statistics._;
import scalala.library.Plotting._;
import scalala.operators.Implicits._;
//
import scala.collection.{mutable => mut}
import scala.collection.JavaConversions._
import ru.circumflex.orm._
import ru.circumflex.core._
"""
//
// sxr
//
// addCompilerPlugin("org.scala-tools.sxr" %% "sxr" % "0.2.7")
//
// scalacOptions <+= (scalaSource in Compile) { "-P:sxr:base-directory:" + _.getAbsolutePath }
After blowing away not just ~/.ivy2 but ~/.m2 and ~/.sbt as well, everything worked again.
Sometimes ivy cache entries get corrupted - simply remove ~/.ivy2/cache/org.scalatra/scalatra_2.9.1/jars/, and let SBT re-fetch the dependency from the remote repo. If it doesn't work, try to remove an entire cache directory (~/.ivy2/cache).
I have had occasions where Ivy has got confused. I can't tell you why, unfortunately, but I have found that things work fine after deleting the entire ~/.ivy2 directory hierarchy. Clearly you'll have to download all your dependencies again, though :-(