I have the following build.sbt file:
name := "myProject"
version := "1.0"
scalaVersion := "2.11.8"
javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MaxPermSize=2048M", "-XX:+CMSClassUnloadingEnabled")
dependencyOverrides ++= Set(
"com.fasterxml.jackson.core" % "jackson-core" % "2.8.1"
)
// additional libraries
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.0.0" % "provided",
"org.apache.spark" %% "spark-sql" % "2.0.0" % "provided",
"org.apache.spark" %% "spark-hive" % "2.0.0" % "provided",
"com.databricks" %% "spark-csv" % "1.4.0",
"org.scalactic" %% "scalactic" % "2.2.1",
"org.scalatest" %% "scalatest" % "2.2.1" % "test",
"org.scalacheck" %% "scalacheck" % "1.12.4",
"com.holdenkarau" %% "spark-testing-base" % "2.0.0_0.4.4" % "test",
)
However, when I am running the code, I get this error:
An exception or error caused a run to abort.
java.lang.ExceptionInInitializerError
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Jackson version is too old 2.4.4
at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:56)
at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:549)
at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
... 58 more
Why is this the case?
I've added a newer version of Jackson to dependencyOverrides(after looking here Spark Parallelize? (Could not find creator property with name 'id')), so an older version shouldn't be used.
jackson-core and jackson-databind versions should match (at least up to the minor version, I believe).
So remove the dependencyOverrides and have
libraryDependencies ++= Seq(
...
"com.fasterxml.jackson.core" % "jackson-databind" % "2.8.1"
)
Or specify both in dependencyOverrides
dependencyOverrides ++= Set(
"com.fasterxml.jackson.core" % "jackson-core" % "2.8.1"
"com.fasterxml.jackson.core" % "jackson-databind" % "2.8.1"
)
Though I'm not sure I understand what you are trying to do; the linked question seems to say that you should used an older version (2.4.4).
Related
I have an sbt project that I am trying to build into a jar with the sbt-assembly plugin.
build.sbt:
name := "project-name"
version := "0.1"
scalaVersion := "2.11.12"
val sparkVersion = "2.4.0"
libraryDependencies ++= Seq(
"org.scalatest" %% "scalatest" % "3.0.5" % "test",
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "test",
// spark-hive dependencies for DataFrameSuiteBase. https://github.com/holdenk/spark-testing-base/issues/143
"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
"com.amazonaws" % "aws-java-sdk" % "1.11.513" % "provided",
"com.amazonaws" % "aws-java-sdk-sqs" % "1.11.513" % "provided",
"com.amazonaws" % "aws-java-sdk-s3" % "1.11.513" % "provided",
//"org.apache.hadoop" % "hadoop-aws" % "3.1.1"
"org.json" % "json" % "20180813"
)
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
test in assembly := {}
// https://github.com/holdenk/spark-testing-base
fork in Test := true
javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MaxPermSize=2048M", "-XX:+CMSClassUnloadingEnabled")
parallelExecution in Test := false
When I build the project with sbt assembly, the resulting jar contains /org/junit/... and /org/opentest4j/... files
Is there any way to not include these test related files in the final jar?
I have tried replacing the line:
"org.scalatest" %% "scalatest" % "3.0.5" % "test"
with:
"org.scalatest" %% "scalatest" % "3.0.5" % "provided"
I am also wondering how the files are included in the jar as junit is not referenced inside build.sbt (there are junit tests in the project however)?
Updated:
name := "project-name"
version := "0.1"
scalaVersion := "2.11.12"
val sparkVersion = "2.4.0"
val excludeJUnitBinding = ExclusionRule(organization = "junit")
libraryDependencies ++= Seq(
// Provided
"org.apache.spark" %% "spark-core" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "provided" excludeAll(excludeJUnitBinding),
"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
"com.amazonaws" % "aws-java-sdk" % "1.11.513" % "provided",
"com.amazonaws" % "aws-java-sdk-sqs" % "1.11.513" % "provided",
"com.amazonaws" % "aws-java-sdk-s3" % "1.11.513" % "provided",
// Test
"org.scalatest" %% "scalatest" % "3.0.5" % "test",
// Necessary
"org.json" % "json" % "20180813"
)
excludeDependencies += excludeJUnitBinding
// https://stackoverflow.com/questions/25144484/sbt-assembly-deduplication-found-error
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
// https://github.com/holdenk/spark-testing-base
fork in Test := true
javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MaxPermSize=2048M", "-XX:+CMSClassUnloadingEnabled")
parallelExecution in Test := false
To exclude certain transitive dependencies of a dependency, use the excludeAll or exclude methods.
The exclude method should be used when a pom will be published for the project. It requires the organization and module name to exclude.
For example:
libraryDependencies +=
"log4j" % "log4j" % "1.2.15" exclude("javax.jms", "jms")
The excludeAll method is more flexible, but because it cannot be represented in a pom.xml, it should only be used when a pom doesn’t need to be generated.
For example,
libraryDependencies +=
"log4j" % "log4j" % "1.2.15" excludeAll(
ExclusionRule(organization = "com.sun.jdmk"),
ExclusionRule(organization = "com.sun.jmx"),
ExclusionRule(organization = "javax.jms")
)
In certain cases a transitive dependency should be excluded from all dependencies. This can be achieved by setting up ExclusionRules in excludeDependencies(For sbt 0.13.8 and above).
excludeDependencies ++= Seq(
ExclusionRule("commons-logging", "commons-logging")
)
JUnit jar file downloads as part of below dependencies.
"org.apache.spark" %% "spark-core" % sparkVersion % "provided" //(junit)
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided"// (junit)
"com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "test" //(org.junit)
To exclude junit file please update your dependency as below.
val excludeJUnitBinding = ExclusionRule(organization = "junit")
"org.scalatest" %% "scalatest" % "3.0.5" % "test",
"org.apache.spark" %% "spark-core" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "test" excludeAll(excludeJUnitBinding)
Update:
Please update your build.abt as below.
resolvers += Resolver.url("bintray-sbt-plugins",
url("https://dl.bintray.com/eed3si9n/sbt-plugins/"))(Resolver.ivyStylePatterns)
val excludeJUnitBinding = ExclusionRule(organization = "junit")
libraryDependencies ++= Seq(
// Provided
"org.apache.spark" %% "spark-core" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "provided" excludeAll(excludeJUnitBinding),
"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
//"com.amazonaws" % "aws-java-sdk" % "1.11.513" % "provided",
//"com.amazonaws" % "aws-java-sdk-sqs" % "1.11.513" % "provided",
//"com.amazonaws" % "aws-java-sdk-s3" % "1.11.513" % "provided",
// Test
"org.scalatest" %% "scalatest" % "3.0.5" % "test",
// Necessary
"org.json" % "json" % "20180813"
)
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
fork in Test := true
javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MaxPermSize=2048M", "-XX:+CMSClassUnloadingEnabled")
parallelExecution in Test := false
plugin.sbt
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")
I have tried and it's not downloading junit jar file.
I am trying to run a Spark application with a Twitter streaming. However, I constantly experiencing problems with dependencies.
When I use org.apache.bahir spark-streaming-twitter dependency I get such an error:
module not found: org.apache.bahir#spark-streaming-twitter;2.0.0
Here is the corresponding build.sbt file:
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies ++= Seq(
"org.apache.bahir" %% "spark-streaming-twitter" % "2.0.0",
"org.apache.spark" %% "spark-core" % "2.3.0",
"org.apache.spark" % "spark-streaming_2.11" % "2.3.0",
"com.typesafe" % "config" % "1.3.0",
"org.twitter4j" % "twitter4j-stream" % "4.0.6"
)
But when I use older streaming dependency I get ClassNotFoundException: : org.apache.spark.Logging error.
Here is the corresponding build.sbt:
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.3.0",
"org.apache.spark" % "spark-streaming_2.11" % "2.3.0",
"com.typesafe" % "config" % "1.3.0",
"org.twitter4j" % "twitter4j-stream" % "4.0.6",
"org.apache.spark" %% "spark-streaming-twitter" % "1.6.3"
)
In order to run my application, I run sbt clean and package commands.
So what dependencies should I use and how to configure them to run my application?
Twitter backend has been removed from Spark with 2.0 release and version of bahir you declared doesn't match Spark version. Finally bahir Twitter already comes with twitter4j-stream dependency (4.0.4 at this moment). Use:
val sparkVersion = "2.3.0"
libraryDependencies ++= Seq(
"org.apache.bahir" %% "spark-streaming-twitter" % sparkVersion,
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion
)
I have been trying all day and cannot figure out how to make it work.
So I have a common library that will be my core lib for spark.
My build.sbt file is not working:
name := "CommonLib"
version := "0.1"
scalaVersion := "2.12.5"
// addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6")
// resolvers += "bintray-spark-packages" at "https://dl.bintray.com/spark-packages/maven/"
// resolvers += Resolver.sonatypeRepo("public")
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"org.apache.spark" % "spark-sql_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"org.apache.hadoop" % "hadoop-common" % "2.7.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
// "org.apache.spark" % "spark-sql_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"org.apache.spark" % "spark-hive_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"org.apache.spark" % "spark-yarn_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"com.github.scopt" %% "scopt" % "3.7.0"
)
//addSbtPlugin("org.spark-packages" % "sbt-spark-package" % "0.2.6")
//libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.0"
//libraryDependencies ++= {
// val sparkVer = "2.1.0"
// Seq(
// "org.apache.spark" %% "spark-core" % sparkVer % "provided" withSources()
// )
//}
All the commented out are all the test I've done and I don't know what to do anymore.
My goal is to have spark 2.3 to work and to have scope available too.
For my sbt version, I have 1.1.1 installed.
Thank you.
I think I had two main issues.
Spark is not compatible with scala 2.12 yet. So moving to 2.11.12 solved one issue
The second issue is that for intelliJ SBT console to reload the build.sbt you either need to kill and restart the console or use the reload command which I didnt know so I was not actually using the latest build.sbt file.
There's a Giter8 template that should work nicely:
https://github.com/holdenk/sparkProjectTemplate.g8
I'm using this template to develop a microservice:
http://www.typesafe.com/activator/template/activator-service-container-tutorial
My sbt file is like this:
import sbt._
import Keys._
name := "activator-service-container-tutorial"
version := "1.0.1"
scalaVersion := "2.11.6"
crossScalaVersions := Seq("2.10.5", "2.11.6")
resolvers += "Scalaz Bintray Repo" at "https://dl.bintray.com/scalaz/releases"
libraryDependencies ++= {
val containerVersion = "1.0.1"
val configVersion = "1.2.1"
val akkaVersion = "2.3.9"
val liftVersion = "2.6.2"
val sprayVersion = "1.3.3"
Seq(
"com.github.vonnagy" %% "service-container" % containerVersion,
"com.github.vonnagy" %% "service-container-metrics-reporting" % containerVersion,
"com.typesafe" % "config" % configVersion,
"com.typesafe.akka" %% "akka-actor" % akkaVersion exclude ("org.scala-lang" , "scala-library"),
"com.typesafe.akka" %% "akka-slf4j" % akkaVersion exclude ("org.slf4j", "slf4j-api") exclude ("org.scala-lang" , "scala-library"),
"ch.qos.logback" % "logback-classic" % "1.1.3",
"io.spray" %% "spray-can" % sprayVersion,
"io.spray" %% "spray-routing" % sprayVersion,
"net.liftweb" %% "lift-json" % liftVersion,
"com.typesafe.akka" %% "akka-testkit" % akkaVersion % "test",
"io.spray" %% "spray-testkit" % sprayVersion % "test",
"junit" % "junit" % "4.12" % "test",
"org.scalaz.stream" %% "scalaz-stream" % "0.7a" % "test",
"org.specs2" %% "specs2-core" % "3.5" % "test",
"org.specs2" %% "specs2-mock" % "3.5" % "test",
"com.twitter" %% "finagle-http" % "6.25.0",
"com.twitter" %% "bijection-util" % "0.7.2"
)
}
scalacOptions ++= Seq(
"-unchecked",
"-deprecation",
"-Xlint",
"-Ywarn-dead-code",
"-language:_",
"-target:jvm-1.7",
"-encoding", "UTF-8"
)
crossPaths := false
parallelExecution in Test := false
assemblyJarName in assembly := "santo.jar"
mainClass in assembly := Some("Service")
The project compiles fine!
But when I run assembly, the terminal show me this:
[error] (*:assembly) deduplicate: different file contents found in the following:
[error] /path/.ivy2/cache/io.dropwizard.metrics/metrics-core/bundles/metrics-core-3.1.1.jar:com/codahale/metrics/ConsoleReporter$1.class
[error] /path/.ivy2/cache/com.codahale.metrics/metrics-core/bundles/metrics-core-3.0.1.jar:com/codahale/metrics/ConsoleReporter$1.class
What options do I have to fix it?
Thanks
The issue as it seems transitive dependency of the dependency is resulting with two different versions of metrics-core. The best thing to do would be to used the right library dependency so that you end up with a single version of this library. Please use https://github.com/jrudolph/sbt-dependency-graph , if it is difficult to figure out dependencies.
If it is not possible to get to a single version then you would most likely to go down exclude route . I assume, this only work, if there is compatibility between the all required versions.
This question is partially related to sbt-scalariform plugin - can't resolve settings. I managed to run scalariform from command line as SBT task.
Now the problem is with IDEA. When I open my build.sbt, which looks like this:
import scalariform.formatter.preferences._
name := """scheduling-backend"""
version := "1.0"
scalaVersion := "2.10.2"
resolvers += "spray repo" at "http://repo.spray.io"
resolvers += "spray nightlies" at "http://nightlies.spray.io"
resolvers += "SpringSource Milestone Repository" at "http://repo.springsource.org/milestone"
resolvers += "Neo4j Cypher DSL Repository" at "http://m2.neo4j.org/content/repositories/releases"
libraryDependencies ++= Seq(
"com.typesafe.akka" %% "akka-actor" % "2.3.0",
"com.typesafe.akka" %% "akka-slf4j" % "2.3.0",
"com.typesafe.akka" %% "akka-testkit" % "2.3.0" % "test",
"com.typesafe.akka" %% "akka-persistence-experimental" % "2.3.0",
"io.spray" % "spray-can" % "1.3.0",
"io.spray" % "spray-routing" % "1.3.0",
"io.spray" % "spray-testkit" % "1.3.0" % "test",
"io.spray" %% "spray-json" % "1.2.5",
"ch.qos.logback" % "logback-classic" % "1.0.13",
"org.specs2" %% "specs2" % "1.14" % "test",
"org.springframework.scala" % "spring-scala" % "1.0.0.M2",
"org.springframework.data" % "spring-data-neo4j" % "3.0.0.RELEASE",
"org.springframework.data" % "spring-data-neo4j-rest" % "3.0.0.RELEASE",
"javax.validation" % "validation-api" % "1.1.0.Final",
"com.github.nscala-time" %% "nscala-time" % "0.8.0",
"org.neo4j" % "neo4j-kernel" % "2.0.1" % "test" classifier "tests",
"com.sun.jersey" % "jersey-core" % "1.9",
"org.mockito" % "mockito-all" % "1.9.5"
)
scalacOptions ++= Seq(
"-unchecked",
"-deprecation",
"-Xlint",
"-Ywarn-dead-code",
"-language:_",
"-target:jvm-1.7",
"-encoding", "UTF-8"
)
org.scalastyle.sbt.ScalastylePlugin.Settings
scalariformSettings
ScalariformKeys.preferences := ScalariformKeys.preferences.value
.setPreference(AlignParameters, true)
.setPreference(CompactControlReadability, true)
IDEA reports problems with my file.
I am getting Cannot resolve symbol scalariformSettings and Cannot resolve symbol ScalariformKeyseven if I everything works from terminal.
adding import com.typesafe.sbt.SbtScalariform._ to build.sbt seems to fix the error on 13.1.1 with scala plugin 0.33.403, but I have to admit it ignored the import at first and then randomly started to see it.