Using cloudera scalats library in project using sbt - scala

I am trying to use cloudera scalats library for time series forecasting but unable to dowload the library using sbt.
Below is build.sbt file. I can see maven repo has 0.4.0 disted version, so not sure what wrong I am doing.
Can anyone please help me to know what wrong I am doing with sbt file?
import sbt.complete.Parsers._
scalaVersion := "2.11.8"
name := "Forecast Stock Price using Spark TimeSeries library"
val sparkVersion = "1.5.2"
//resolvers ++= Seq("Cloudera Repository" at "https://repository.cloudera.com/artifactory/cloudera-repos/")
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion withSources(),
"org.apache.spark" %% "spark-streaming" % sparkVersion withSources(),
"org.apache.spark" %% "spark-sql" % sparkVersion withSources(),
"org.apache.spark" %% "spark-hive" % sparkVersion withSources(),
"org.apache.spark" %% "spark-streaming-twitter" % sparkVersion withSources(),
"org.apache.spark" %% "spark-mllib" % sparkVersion withSources(),
"com.databricks" %% "spark-csv" % "1.3.0" withSources(),
"com.cloudera.sparkts" %% "sparkts" % "0.4.0"
)

Change
"com.cloudera.sparkts" %% "sparkts" % "0.4.0"
to
"com.cloudera.sparkts" % "sparkts" % "0.4.0"
sparkts is only distributed for Scala 2.11; it does not encode the Scala version in the artifact name.

Related

Cannot create a new scala project in IntelliJ

I cannot create a new scala project in IntelliJ. It does not recognize the scala syntax. For example, in the below code
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
)
It does not recognize the word Seq. THis is my assembly file
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.12.0")
My build.properties is
sbt.version=0.13.16

Need some help in fixing the Spark streaming dependency (Scala sbt)

I am trying to run basic spark streaming example on my machine using IntelliJ, but I am unable to resolve the dependency issues.
Please help me in fixing it.
name := "demoSpark"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq("org.apache.spark"% "spark-core_2.11"%"2.1.0",
"org.apache.spark" % "spark-sql_2.10" % "2.1.0",
"org.apache.spark" % "spark-streaming_2.11" % "2.1.0",
"org.apache.spark" % "spark-mllib_2.10" % "2.1.0"
)
At the very least, all the dependencies must use the same version of Scala, not a mix of 2.10 and 2.11. You can use %% symbol in sbt to ensure the right version is selected (the one you specified in scalaVersion).
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.1.0",
"org.apache.spark" %% "spark-sql" % "2.1.0",
"org.apache.spark" %% "spark-streaming" % "2.1.0",
"org.apache.spark" %% "spark-mllib" % "2.1.0"
)

How to troubleshoot SBT's library dependency warnings?

I'm trying to build a "hello world"-esque app that uses Spark streaming to stream data from a Kafka broker (this works), filters/processes this data, and pushes it to a (local) web browser using the Scalatra web framework and its supported web sockets functionality from Atmosphere. The Kafka/Spark chunk works independently, and the Scalatra/Atmosphere chunk also works independently. It's when I try to bring the two halves together that I run into issues with library dependencies.
The real question: how do I go about selecting library versions for which Spark will play nice with Scalatra?
A bare bones Scalatra/Atmosphere app works fine as follows:
organization := "com.example"
name := "example app"
version := "0.1.0"
scalaVersion := "2.12.2"
val ScalatraVersion = "2.5.+"
libraryDependencies ++= Seq(
"org.json4s" %% "json4s-jackson" % "3.5.2",
"org.scalatra" %% "scalatra" % ScalatraVersion,
"org.scalatra" %% "scalatra-scalate" % ScalatraVersion,
"org.scalatra" %% "scalatra-specs2" % ScalatraVersion % "test",
"org.scalatra" %% "scalatra-atmosphere" % ScalatraVersion,
"org.eclipse.jetty" % "jetty-webapp" % "9.4.6.v20170531" % "provided",
"javax.servlet" % "javax.servlet-api" % "3.1.0" % "provided"
)
enablePlugins(JettyPlugin)
But if I add new dependencies for Spark and Spark streaming, and knock the Scala version down to 2.11 (required for Spark-Kafka streaming):
organization := "com.example"
name := "example app"
version := "0.1.0"
scalaVersion := "2.11.8"
val ScalatraVersion = "2.5.+"
val SparkVersion = "2.2.0"
libraryDependencies ++= Seq(
"org.json4s" %% "json4s-jackson" % "3.5.2",
"org.scalatra" %% "scalatra" % ScalatraVersion,
"org.scalatra" %% "scalatra-scalate" % ScalatraVersion,
"org.scalatra" %% "scalatra-specs2" % ScalatraVersion % "test",
"org.scalatra" %% "scalatra-atmosphere" % ScalatraVersion,
"org.eclipse.jetty" % "jetty-webapp" % "9.4.6.v20170531" % "provided",
"javax.servlet" % "javax.servlet-api" % "3.1.0" % "provided"
)
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % SparkVersion,
"org.apache.spark" %% "spark-streaming" % SparkVersion,
"org.apache.spark" %% "spark-streaming-kafka-0-8" % SparkVersion
)
enablePlugins(JettyPlugin)
The code compiles, but I get SBT's eviction warning:
[warn] There may be incompatibilities among your library dependencies.
[warn] Here are some of the libraries that were evicted:
[warn] * org.json4s:json4s-jackson_2.11:3.2.11 -> 3.5.3
[warn] Run 'evicted' to see detailed eviction warnings
Then finally, when Jetty tries to run the web server, it fails with this error:
WARN:oejuc.AbstractLifeCycle:main: FAILED org.eclipse.jetty.annotations.ServletContainerInitializersStarter#53fb3dab: java.lang.NoClassDefFoundError: com/sun/jersey/spi/inject/InjectableProvider
java.lang.NoClassDefFoundError: com/sun/jersey/spi/inject/InjectableProvider
How do I get to the bottom of this? I'm new to the Scala world, and the intricacies of dependencies are blowing my mind.
One way to remove the eviction warning is to add the library dependency with the required version using dependencyOverrides
try to add the following in your SBT file and re-build the application
dependencyOverrides += "org.json4s" % "json4s-jackson_2.11" % "3.5.3"
Check SBT documentation here

override guava dependency version of spark

spark depends on an old version of guava.
i build my spark project with sbt assembly, excluding spark using provided, and including the latest version of guava.
However, when running sbt-assembly, the guava dependency is excluded also from the jar.
my build.sbt:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
"com.google.guava" % "guava" % "11.0"
)
if i remove the % "provided", then both spark and guava is included.
so, how can i exclude spark and include guava?
You are looking for shading options. See here but basically you need to add shading instructions. Something like this:
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.google.guava.**" -> "my_conf.#1")
.inLibrary("com.google.guava" % "config" % "11.0")
.inProject
)
There is also the corresponding maven-shade-plugin for those who prefer maven.

Compile Spark stream kinesis sample application using SBT tool

I am trying to compile and create a jar for Spark kinesis stream scala application provided by spark itself in the given link.
Kinesis word count sample
Following is my sbt file, It has all the dependencies and its compiling fine for simple programs.
name := "stream parser"
version := "1.0"
val sparkVersion = "2.0.2"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming-kinesis-asl" % sparkVersion % "provided"
)
But the kinesis sample throwing following error while compiling in my Ubuntu system.
trait Logging in package internal cannot be accessed in package org.apache.spark.internal
[error] object KinesisWordCountASL extends Logging {
[error] ^
Logging class import
import org.apache.spark.internal.Logging
Any Idea what could be the problem ?
According to spark repo 2.0 and above Logging class is included in package org.apache.spark.internal spark In previous releases it is in package org.apache.spark I'd suggest run your example setting up spark 2.0.0 or later. Or make changes according usage of appropiate libraries.
Same example with build.sbt is compiling without any issues
scalaVersion := "2.10.6"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.0.2",
"org.apache.spark" %% "spark-sql" % "2.0.2",
"org.apache.spark" %% "spark-streaming" % "2.0.2",
"org.apache.spark" %% "spark-streaming-kinesis-asl" % "2.0.2",
"com.amazonaws" % "amazon-kinesis-producer" % "0.10.2"
)