Scala module 2.10.0 requires Jackson Databind version >= 2.10.0 and < 2.11.0 - scala

I have a sbt projects and I want to make a test with scala test and shared spark session. Several weeks ago my project started to make an error.
java.lang.ExceptionInInitializerError
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
.....
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.0 requires Jackson Databind version >= 2.10.0 and < 2.11.0
at com.fasterxml.jackson.module.scala.JacksonModule.setupModule(JacksonModule.scala:61)
at com.fasterxml.jackson.module.scala.JacksonModule.setupModule$(JacksonModule.scala:46)
There is a very simple test
import org.apache.spark.sql.QueryTest.checkAnswer
import org.apache.spark.sql.Row
import org.apache.spark.sql.test.SharedSparkSession
class SparkTestSpec extends SharedSparkSession {
import testImplicits._
test("join - join using") {
val df = Seq(1, 2, 3).toDF("int")
checkAnswer(df, Row(1) :: Row(2) :: Row(3) :: Nil)
}
}
And sbt config
ThisBuild / scalaVersion := "2.12.10"
val sparkVersion = "3.1.0"
val scalaTestVersion = "3.2.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion % Test,
"org.apache.spark" %% "spark-sql" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % Test,
"org.apache.spark" %% "spark-catalyst" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-hive" % sparkVersion % Test,
"org.apache.spark" %% "spark-hive" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-core" % sparkVersion % Test classifier "tests",
"log4j" % "log4j" % "1.2.17",
"org.slf4j" % "slf4j-log4j12" % "1.7.30",
"org.scalatest" %% "scalatest" % scalaTestVersion % Test,
"org.scalatestplus" %% "scalacheck-1-14" % "3.2.2.0",
)

This is a very classic issue with Jackson. The error tells you that you need to have a single version of Jackson across all your dependencies but it's not the case.
Usually you have both Spark and another library pulling transitively Jackson in different versions.
What you need to do is:
run sbt dependencyTree to identify which library is pulling Jackson and which version
define a dependencyOverrides to force the same Jackson version for all Jackson libraries (which version is up to you depending on compatibility with the other libraries needing it)

Related

sbt package not adding dependencies

I am trying to build jar using sbt package.
build.sbt:
name := "Simple Project"
version := "0.1"
scalaVersion := "2.11.8"
val sparkVersion = "2.3.2"
val connectorVersion = "2.3.0"
val cassandraVersion = "3.11"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
"org.scalaj" %% "scalaj-http" % "2.4.2",
"com.datastax.spark" %% "spark-cassandra-connector" % connectorVersion
)
The sbt package runs successfully but does not add spark-cassandra-connector and scalaj-http to the final jar created.
Do I need to add anything?
If you want the jar to contain all your dependencies, you have to use the sbt assemlbly plugin:
https://github.com/sbt/sbt-assembly

Getting error while running Scala Spark code to list blobs in storage

I am getting below error while trying to list blobs using google-cloud-storage library:
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
I have tried to change version of google-cloud-storage library in build.sbt but getting the same error again and again.
import com.google.auth.oauth2.GoogleCredentials
import com.google.cloud.storage._
import com.google.cloud.storage.Storage.BlobListOption
val credentials: GoogleCredentials = GoogleCredentials.getApplicationDefault()
val storage: Storage = StorageOptions.newBuilder().setCredentials(credentials).setProjectId(projectId).build().getService()
val blobs =storage.list(bucketName, BlobListOption.currentDirectory(), BlobListOption.prefix(path))
My build.sbt looks like this:
version := "0.1"
scalaVersion := "2.11.8"
logBuffered in Test := false
libraryDependencies ++=
Seq(
"org.apache.spark" %% "spark-core" % "2.2.0" % "provided",
"org.apache.spark" %% "spark-sql" % "2.2.0" % "provided",
"org.scalatest" %% "scalatest" % "3.0.0" % Test,
"com.typesafe" % "config" % "1.3.1",
"org.scalaj" %% "scalaj-http" % "2.4.0",
"com.google.cloud" % "google-cloud-storage" % "1.78.0"
)
Please, help me.
This is happening because Spark uses older version of Guava library than google-cloud-storage library that doesn't have Preconditions.checkArgument method. This leads to java.lang.NoSuchMethodError exception.
You can find more detailed answer and instructions on how to fix this issue here.

how to import crossValidatorModel using sbt for scala spark

i have a problem about CrossValidatorModel using scala sbt
this is my dependencies
libraryDependencies ++= Seq(
// spark
"org.apache.spark" %% "spark-core" % "2.3.1" % "provided" ,
"org.apache.spark" %% "spark-sql" % "2.3.1" % "provided",
"org.apache.spark" %% "spark-mllib" % "2.3.1" % "provided",
// protobuf
"com.thesamet.scalapb" %% "scalapb-runtime" % scalapbVersion % "protobuf",
//for grpc
"io.grpc" % "grpc-netty" % grpcJavaVersion,
"com.thesamet.scalapb" %% "scalapb-runtime-grpc" % scalapbVersion
)
and this my code
but while i import CrossValidator, that give me an error like this
import org.apache.spark.ml.tuning.CrossValidatorModel
(grpc-default-executor-0) java.lang.NoClassDefFoundError: org/apache/spark/ml/tuning/CrossValidatorModel$
java.lang.NoClassDefFoundError: org/apache/spark/ml/tuning/CrossValidatorModel$
at MlModel$lr_model$.<init>(server.scala:28)
at MlModel$lr_model$.<clinit>(server.scala)
at HelloWorldServer$RouteGuideImpl.getLabel(server.scala:77)
i was tried using scala 2.11 and 2.12, spark 2.3.1, 2.1.0 but this gives me the same error.
thanks
solve by changing this
"org.apache.spark" %% "spark-mllib" % "2.3.1" % "provided",
to
"org.apache.spark" %% "spark-mllib" % "2.3.1",

Spark dependencies configuration for streaming from Twitter

I am trying to run a Spark application with a Twitter streaming. However, I constantly experiencing problems with dependencies.
When I use org.apache.bahir spark-streaming-twitter dependency I get such an error:
module not found: org.apache.bahir#spark-streaming-twitter;2.0.0
Here is the corresponding build.sbt file:
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies ++= Seq(
"org.apache.bahir" %% "spark-streaming-twitter" % "2.0.0",
"org.apache.spark" %% "spark-core" % "2.3.0",
"org.apache.spark" % "spark-streaming_2.11" % "2.3.0",
"com.typesafe" % "config" % "1.3.0",
"org.twitter4j" % "twitter4j-stream" % "4.0.6"
)
But when I use older streaming dependency I get ClassNotFoundException: : org.apache.spark.Logging error.
Here is the corresponding build.sbt:
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.3.0",
"org.apache.spark" % "spark-streaming_2.11" % "2.3.0",
"com.typesafe" % "config" % "1.3.0",
"org.twitter4j" % "twitter4j-stream" % "4.0.6",
"org.apache.spark" %% "spark-streaming-twitter" % "1.6.3"
)
In order to run my application, I run sbt clean and package commands.
So what dependencies should I use and how to configure them to run my application?
Twitter backend has been removed from Spark with 2.0 release and version of bahir you declared doesn't match Spark version. Finally bahir Twitter already comes with twitter4j-stream dependency (4.0.4 at this moment). Use:
val sparkVersion = "2.3.0"
libraryDependencies ++= Seq(
"org.apache.bahir" %% "spark-streaming-twitter" % sparkVersion,
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion
)

How to set up a spark build.sbt file?

I have been trying all day and cannot figure out how to make it work.
So I have a common library that will be my core lib for spark.
My build.sbt file is not working:
name := "CommonLib"
version := "0.1"
scalaVersion := "2.12.5"
// addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6")
// resolvers += "bintray-spark-packages" at "https://dl.bintray.com/spark-packages/maven/"
// resolvers += Resolver.sonatypeRepo("public")
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"org.apache.spark" % "spark-sql_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"org.apache.hadoop" % "hadoop-common" % "2.7.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
// "org.apache.spark" % "spark-sql_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"org.apache.spark" % "spark-hive_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"org.apache.spark" % "spark-yarn_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"com.github.scopt" %% "scopt" % "3.7.0"
)
//addSbtPlugin("org.spark-packages" % "sbt-spark-package" % "0.2.6")
//libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.0"
//libraryDependencies ++= {
// val sparkVer = "2.1.0"
// Seq(
// "org.apache.spark" %% "spark-core" % sparkVer % "provided" withSources()
// )
//}
All the commented out are all the test I've done and I don't know what to do anymore.
My goal is to have spark 2.3 to work and to have scope available too.
For my sbt version, I have 1.1.1 installed.
Thank you.
I think I had two main issues.
Spark is not compatible with scala 2.12 yet. So moving to 2.11.12 solved one issue
The second issue is that for intelliJ SBT console to reload the build.sbt you either need to kill and restart the console or use the reload command which I didnt know so I was not actually using the latest build.sbt file.
There's a Giter8 template that should work nicely:
https://github.com/holdenk/sparkProjectTemplate.g8