I'm trying to build a "hello world"-esque app that uses Spark streaming to stream data from a Kafka broker (this works), filters/processes this data, and pushes it to a (local) web browser using the Scalatra web framework and its supported web sockets functionality from Atmosphere. The Kafka/Spark chunk works independently, and the Scalatra/Atmosphere chunk also works independently. It's when I try to bring the two halves together that I run into issues with library dependencies.
The real question: how do I go about selecting library versions for which Spark will play nice with Scalatra?
A bare bones Scalatra/Atmosphere app works fine as follows:
organization := "com.example"
name := "example app"
version := "0.1.0"
scalaVersion := "2.12.2"
val ScalatraVersion = "2.5.+"
libraryDependencies ++= Seq(
"org.json4s" %% "json4s-jackson" % "3.5.2",
"org.scalatra" %% "scalatra" % ScalatraVersion,
"org.scalatra" %% "scalatra-scalate" % ScalatraVersion,
"org.scalatra" %% "scalatra-specs2" % ScalatraVersion % "test",
"org.scalatra" %% "scalatra-atmosphere" % ScalatraVersion,
"org.eclipse.jetty" % "jetty-webapp" % "9.4.6.v20170531" % "provided",
"javax.servlet" % "javax.servlet-api" % "3.1.0" % "provided"
)
enablePlugins(JettyPlugin)
But if I add new dependencies for Spark and Spark streaming, and knock the Scala version down to 2.11 (required for Spark-Kafka streaming):
organization := "com.example"
name := "example app"
version := "0.1.0"
scalaVersion := "2.11.8"
val ScalatraVersion = "2.5.+"
val SparkVersion = "2.2.0"
libraryDependencies ++= Seq(
"org.json4s" %% "json4s-jackson" % "3.5.2",
"org.scalatra" %% "scalatra" % ScalatraVersion,
"org.scalatra" %% "scalatra-scalate" % ScalatraVersion,
"org.scalatra" %% "scalatra-specs2" % ScalatraVersion % "test",
"org.scalatra" %% "scalatra-atmosphere" % ScalatraVersion,
"org.eclipse.jetty" % "jetty-webapp" % "9.4.6.v20170531" % "provided",
"javax.servlet" % "javax.servlet-api" % "3.1.0" % "provided"
)
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % SparkVersion,
"org.apache.spark" %% "spark-streaming" % SparkVersion,
"org.apache.spark" %% "spark-streaming-kafka-0-8" % SparkVersion
)
enablePlugins(JettyPlugin)
The code compiles, but I get SBT's eviction warning:
[warn] There may be incompatibilities among your library dependencies.
[warn] Here are some of the libraries that were evicted:
[warn] * org.json4s:json4s-jackson_2.11:3.2.11 -> 3.5.3
[warn] Run 'evicted' to see detailed eviction warnings
Then finally, when Jetty tries to run the web server, it fails with this error:
WARN:oejuc.AbstractLifeCycle:main: FAILED org.eclipse.jetty.annotations.ServletContainerInitializersStarter#53fb3dab: java.lang.NoClassDefFoundError: com/sun/jersey/spi/inject/InjectableProvider
java.lang.NoClassDefFoundError: com/sun/jersey/spi/inject/InjectableProvider
How do I get to the bottom of this? I'm new to the Scala world, and the intricacies of dependencies are blowing my mind.
One way to remove the eviction warning is to add the library dependency with the required version using dependencyOverrides
try to add the following in your SBT file and re-build the application
dependencyOverrides += "org.json4s" % "json4s-jackson_2.11" % "3.5.3"
Check SBT documentation here
Related
I just started using Spark 2.2 on HDP 2.6 and Iam facing issues when trying to do sbt compile
Error
[info] Updated file /home/maria_dev/structuredstreaming/project/build.properties: set sbt.version to 1.3.0
[info] Loading project definition from /home/maria_dev/structuredstreaming/project
[info] Fetching artifacts of
[info] Fetched artifacts of
[error] lmcoursier.internal.shaded.coursier.error.FetchError$DownloadingArtifacts: Error fetching artifacts:
[error] https://repo1.maven.org/maven2/com/squareup/okhttp3/okhttp-urlconnection/3.7.0/okhttp-urlconnection-3.7.0.jar: download error: Caught java.net.UnknownHostException: repo1.maven.org (repo1.maven.org) while downloading https://repo1.maven.org/maven2/com/squareup/okhttp3/okhttp-urlconnection/3.7.0/okhttp-urlconnection-3.7.0.jar
build.sbt file is as below
buid.sbt
scalaVersion := "2.11.8"
resolvers ++= Seq(
"Conjars" at "http://conjars.org/repo",
"Hortonworks Releases" at "http://repo.hortonworks.com/content/groups/public"
)
publishMavenStyle := true
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.2.0.2.6.3.0-235",
"org.apache.spark" %% "spark-sql" % "2.2.0.2.6.3.0-235",
"org.apache.phoenix" % "phoenix-spark2" % "4.7.0.2.6.3.0-235",
"org.apache.phoenix" % "phoenix-core" % "4.7.0.2.6.3.0-235",
"org.apache.kafka" % "kafka-clients" % "0.10.1.2.6.3.0-235",
"org.apache.spark" %% "spark-streaming" % "2.0.2" % "provided",
"org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.0.2",
"org.apache.spark" %% "spark-sql-kafka-0-10" % "2.0.2" % "provided",
"com.typesafe" % "config" % "1.3.1",
"com.typesafe.play" %% "play-json" % "2.7.2",
"com.solarmosaic.client" %% "mail-client" % "0.1.0",
"org.json4s" %% "json4s-jackson" % "3.2.10",
"org.apache.logging.log4j" % "log4j-api-scala_2.11" % "11.0",
"com.databricks" %% "spark-avro" % "3.2.0",
"org.elasticsearch" %% "elasticsearch-spark-20" % "5.0.0-alpha5",
"io.spray" %% "spray-json" % "1.3.3"
)
retrieveManaged := true
fork in run := true
It looks like coursier is attempting to fetch dependencies from repo1.maven.org which is being blocked. The Scala-Metals people an explanation here. Basically, you have to set a global Coursier config pointing to your corporate proxy server by setting up a mirror.properties file that looks like this:
central.from=https://repo1.maven.org/maven2
central.to=http://mycorporaterepo.com:8080/nexus/content/groups/public
Based on your OS, it will be:
Windows: C:\Users\\AppData\Roaming\Coursier\config\mirror.properties
Linux: ~/.config/coursier/mirror.properties
MacOS: ~/Library/Preferences/Coursier/mirror.properties
You also might need to setup SBT to use a proxy for downloading dependencies. For that, you will need to edit this file:
~/.sbt/repositories
Set it to the following:
[repositories]
local
maven-central: http://mycorporaterepo.com:8080/nexus/content/groups/public
The combination of those two settings should take care of everything you need to do to point SBT to the correct places.
I'm trying to build an Scalatra application that runs code with spark. I can actually build the fat jar with sbt-assembly and the endpoints work, but when running tests with org.scalatra.test.scalatest._ I get the following error:
*** RUN ABORTED ***
java.lang.NoSuchFieldError: INSTANCE
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.<clinit>(SSLConnectionSocketFactory.java:146)
at org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:964)
at org.scalatra.test.HttpComponentsClient$class.createClient(HttpComponentsClient.scala:100)
at my.package.MyServletTests.createClient(MyServletTests.scala:5)
at org.scalatra.test.HttpComponentsClient$class.submit(HttpComponentsClient.scala:63)
at my.package.MyServletTests.submit(MyServletTests.scala:5)
at org.scalatra.test.Client$class.post(Client.scala:62)
at my.package.MyServletTests.post(MyServletTests.scala:5)
at org.scalatra.test.Client$class.post(Client.scala:60)
at my.package.MyServletTests.post(MyServletTests.scala:5)
...
From other sources, this seemed to be an httpclient version error, since both Scalatra and Spark use different versions. These sources suggested the use of Maven Shade Plugin to rename one of these versions. I am, however, using sbt instead of Maven. Even though sbt has a shade functionality, it works when creating the fat jar, and I need this solution during development tests.
Sources are:
Conflict between httpclient version and Apache Spark
Error while trying to use an API. java.lang.NoSuchFieldError: INSTANCE
Is there any way to resolve this kind of conflict using SBT? I'm running Eclipse Scala-IDE and these are my dependencies in built.sbt:
val scalatraVersion = "2.6.5"
// scalatra
libraryDependencies ++= Seq(
"org.scalatra" %% "scalatra" % scalatraVersion,
"org.scalatra" %% "scalatra-scalatest" % scalatraVersion % "test",
"org.scalatra" %% "scalatra-specs2" % scalatraVersion,
"org.scalatra" %% "scalatra-swagger" % scalatraVersion,
"ch.qos.logback" % "logback-classic" % "1.2.3" % "runtime",
"org.eclipse.jetty" % "jetty-webapp" % "9.4.9.v20180320" % "container;compile",
"javax.servlet" % "javax.servlet-api" % "3.1.0" % "provided"
)
// From other projects:
// spark
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.4.0",
"org.apache.spark" %% "spark-mllib" % "2.4.0",
"org.apache.spark" %% "spark-sql" % "2.4.0"
)
// scalatest
libraryDependencies += "org.scalatest" % "scalatest_2.11" % "3.0.5" % "test"
I am trying to set up IntelliJ IDEA with sbt for a Scala project. The external dependencies are specified in my build.sbt and also listed inside the IDE, as can be seen in the screenshot. However, I still get compiler errors that saying the respective symbol cannot be resolved. Can anyone point me into the right direction?
Contents of my build.sbt:
lazy val midas = (project in file("."))
.settings(
name := "test",
mainClass in assembly := Some("core.Service"),
assemblyJarName in assembly := "test.jar",
test in assembly := {},
libraryDependencies ++= Seq(
"org.slf4j" % "slf4j-api" % "1.7.25",
"com.typesafe.akka" %% "akka-actor" % "2.5.13",
"com.typesafe.akka" %% "akka-slf4j" % "2.5.13",
"com.typesafe.akka" %% "akka-remote" % "2.5.13",
"org.scala-lang.modules" %% "scala-xml" % "1.1.0",
"com.typesafe.play" %% "play-json" % "2.6.9",
"com.typesafe.slick" %% "slick" % "3.2.3",
"com.typesafe.scala-logging" %% "scala-logging" % "3.9.0",
"ch.qos.logback" % "logback-classic" % "1.2.3",
"ch.qos.logback" % "logback-core" % "1.2.3",
"com.mchange" % "c3p0" % "0.9.5.2",
"joda-time" % "joda-time" % "2.10",
"org.joda" % "joda-convert" % "2.0.1",
"net.sourceforge.jtds" % "jtds" % "1.3.1"
)
)
Still the dependencies for both, akka and joda-time cannot be resolved inside the IDE. The sbt compile from the command line works fine however.
joda-convert does not include org.joda.time.DateTime - you need "joda-time" % "joda-time" % "2.9.4" or so.
Once you have added this (and fixed your missing Akka import, maybe "com.typesafe.akka" %% "akka-actor" % "2.4.17"), you need to refresh the sbt project.
I press Shift, Shift to bring up "Search Everywhere" and type "sbt". Choose sbt under Tool Windows, then click "Refresh all sbt projects" (blue reload icon) in the sbt window. I close this window because it is not usually useful.
Hope that helps.
I am trying to use cloudera scalats library for time series forecasting but unable to dowload the library using sbt.
Below is build.sbt file. I can see maven repo has 0.4.0 disted version, so not sure what wrong I am doing.
Can anyone please help me to know what wrong I am doing with sbt file?
import sbt.complete.Parsers._
scalaVersion := "2.11.8"
name := "Forecast Stock Price using Spark TimeSeries library"
val sparkVersion = "1.5.2"
//resolvers ++= Seq("Cloudera Repository" at "https://repository.cloudera.com/artifactory/cloudera-repos/")
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion withSources(),
"org.apache.spark" %% "spark-streaming" % sparkVersion withSources(),
"org.apache.spark" %% "spark-sql" % sparkVersion withSources(),
"org.apache.spark" %% "spark-hive" % sparkVersion withSources(),
"org.apache.spark" %% "spark-streaming-twitter" % sparkVersion withSources(),
"org.apache.spark" %% "spark-mllib" % sparkVersion withSources(),
"com.databricks" %% "spark-csv" % "1.3.0" withSources(),
"com.cloudera.sparkts" %% "sparkts" % "0.4.0"
)
Change
"com.cloudera.sparkts" %% "sparkts" % "0.4.0"
to
"com.cloudera.sparkts" % "sparkts" % "0.4.0"
sparkts is only distributed for Scala 2.11; it does not encode the Scala version in the artifact name.
I am trying to run basic spark streaming example on my machine using IntelliJ, but I am unable to resolve the dependency issues.
Please help me in fixing it.
name := "demoSpark"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq("org.apache.spark"% "spark-core_2.11"%"2.1.0",
"org.apache.spark" % "spark-sql_2.10" % "2.1.0",
"org.apache.spark" % "spark-streaming_2.11" % "2.1.0",
"org.apache.spark" % "spark-mllib_2.10" % "2.1.0"
)
At the very least, all the dependencies must use the same version of Scala, not a mix of 2.10 and 2.11. You can use %% symbol in sbt to ensure the right version is selected (the one you specified in scalaVersion).
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.1.0",
"org.apache.spark" %% "spark-sql" % "2.1.0",
"org.apache.spark" %% "spark-streaming" % "2.1.0",
"org.apache.spark" %% "spark-mllib" % "2.1.0"
)