Not able to execute my SparkStreaming Program - scala

I have written the following Scala code and my platform is Cloudera CDH 5.2.1 on CentOS 6.5
Tutorial.scala
import org.apache.spark
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._
import org.apache.spark.streaming.StreamingContext._
import TutorialHelper._
object Tutorial {
def main(args: Array[String]) {
val checkpointDir = TutorialHelper.getCheckPointDirectory()
val consumerKey = "..."
val consumerSecret = "..."
val accessToken = "..."
val accessTokenSecret = "..."
try {
TutorialHelper.configureTwitterCredentials(consumerKey, consumerSecret, accessToken, accessTokenSecret)
val ssc = new StreamingContext(new SparkContext(), Seconds(1))
val tweets = TwitterUtils.createStream(ssc, None)
val tweetText = tweets.map(tweet => tweet.getText())
tweetText.print()
ssc.checkpoint(checkpointDir)
ssc.start()
ssc.awaitTermination()
} finally {
//ssc.stop()
}
}
}
My build.sbt file looks like
import AssemblyKeys._ // put this at the top of the file
name := "Tutorial"
scalaVersion := "2.10.3"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-streaming" % "1.0.0" % "provided",
"org.apache.spark" %% "spark-streaming-twitter" % "1.0.0"
)
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
resourceDirectory in Compile := baseDirectory.value / "resources"
assemblySettings
mergeStrategy in assembly := {
case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => MergeStrategy.discard
case "log4j.properties" => MergeStrategy.discard
case m if m.toLowerCase.startsWith("meta-inf/services/") => MergeStrategy.filterDistinctLines
case "reference.conf" => MergeStrategy.concat
case _ => MergeStrategy.first
}
I also created a file called projects/plugin.sbt which has the following content
addSbtPlugin("net.virtual-void" % "sbt-cross-building" % "0.8.1")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.9.1")
and project/build.scala
import sbt._
object Plugins extends Build {
lazy val root = Project("root", file(".")) dependsOn(
uri("git://github.com/sbt/sbt-assembly.git#0.9.1")
)
}
after this I can build my "uber" assembly by using
sbt assembly
now I run my code using
sudo -u hdfs spark-submit --class Tutorial --master local /tmp/Tutorial-assembly-0.1-SNAPSHOT.jar
I get the error
Configuring Twitter OAuth
Property twitter4j.oauth.accessToken set as [...]
Property twitter4j.oauth.consumerSecret set as [...]
Property twitter4j.oauth.accessTokenSecret set as [...]
Property twitter4j.oauth.consumerKey set as [...]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/jars/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/12/21 16:04:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-------------------------------------------
Time: 1419199472000 ms
-------------------------------------------
-------------------------------------------
Time: 1419199473000 ms
-------------------------------------------
14/12/21 16:04:33 ERROR ReceiverSupervisorImpl: Error stopping receiver 0org.apache.spark.Logging$class.log(Logging.scala:52)
org.apache.spark.streaming.twitter.TwitterReceiver.log(TwitterInputDStream.scala:60)
org.apache.spark.Logging$class.logInfo(Logging.scala:59)
org.apache.spark.streaming.twitter.TwitterReceiver.logInfo(TwitterInputDStream.scala:60)
org.apache.spark.streaming.twitter.TwitterReceiver.onStop(TwitterInputDStream.scala:101)
org.apache.spark.streaming.receiver.ReceiverSupervisor.stopReceiver(ReceiverSupervisor.scala:136)
org.apache.spark.streaming.receiver.ReceiverSupervisor.stop(ReceiverSupervisor.scala:112)
org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:127)
org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:106)
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:264)

You need to use sbt assembly plugin to prepare "assembled" jar file with all dependencies. It should contain all twitter util classes.
Links:
1. https://github.com/sbt/sbt-assembly
2. http://prabstechblog.blogspot.com/2014/04/creating-single-jar-for-spark-project.html
3. http://eugenezhulenev.com/blog/2014/10/18/run-tests-in-standalone-spark-cluster/
Or you can take a look at my Spark-Twitter project, it has configured sbt-assembly plugin: http://eugenezhulenev.com/blog/2014/11/20/twitter-analytics-with-spark/

CDH 5.2 packages Spark 1.1.0, but you build.sbt is using 1.0.0. Update the versions below and rebuild should fix your problem.
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-streaming" % "1.0.0" % "provided",
"org.apache.spark" %% "spark-streaming-twitter" % "1.0.0"
)

Related

Error while running sbt package: object apache is not a member of package org

When I try sbt package in my below code I get these following errors
object apache is not a member of package org
not found: value SparkSession
MY Spark Version: 2.4.4
My Scala Version: 2.11.12
My build.sbt
name := "simpleApp"
version := "1.0"
scalaVersion := "2.11.12"
//libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.4"
libraryDependencies ++= {
val sparkVersion = "2.4.4"
Seq( "org.apache.spark" %% "spark-core" % sparkVersion)
}
my Scala project
import org.apache.spark.sql.SparkSession
object demoapp {
def main(args: Array[String]) {
val logfile = "C:/SUPPLENTA_INFORMATICS/demo/hello.txt"
val spark = SparkSession.builder.appName("Simple App in Scala").getOrCreate()
val logData = spark.read.textFile(logfile).cache()
val numAs = logData.filter(line => line.contains("Washington")).count()
println(s"Lines are: $numAs")
spark.stop()
}
}
If you want to use Spark SQL, you also have to add the spark-sql module to the dependencies:
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.4"
Also, note that you have to reload your project in SBT after changing the build definition and import the changes in intelliJ.

NoClassDefFoundError: scala/Product$class in spark app

I am building a Spark application with bash script and I have an only spark-sql and core dependencies in the build.sbt file. So every time I call some rdd methods or convert the data to case class for dataset creation I get this error:
Caused by: java.lang.NoClassDefFoundError: scala/Product$class
I suspect that it is a dependency error. So how should I change my dependencies to fix this?
dependencies list:
import sbt._
object Dependencies {
lazy val scalaCsv = "com.github.tototoshi" %% "scala-csv" % "1.3.5"
lazy val sparkSql = "org.apache.spark" %% "spark-sql" % "2.3.3"
lazy val sparkCore = "org.apache.spark" %% "spark-core" % "2.3.3"
}
build.sbt file:
import Dependencies._
lazy val root = (project in file(".")).
settings(
inThisBuild(List(
scalaVersion := "2.11.12",
version := "test"
)),
name := "project",
libraryDependencies ++= Seq(scalaCsv, sparkSql, sparkCore),
mainClass in (Compile, run) := Some("testproject.spark.Main")
)
I launch spark app with spark 2.3.3 as my spark home directory like this:
#!/bin/sh
$SPARK_HOME/bin/spark-submit \
--class "testproject.spark.Main " \
--master local[*] \
target/scala-2.11/test.jar
Not sure what was the problem exactly, however, I have recreated the project and moved the source code there. The error disappeared

Specs2 test within plays gives me "could not find implicit value for evidence parameter of type org.specs2.main.CommandLineAsResult

I'm trying to write a test case for a simple REST API in Play2/Scala that send/receives JSON. My test looks like the following:
import org.junit.runner.RunWith
import org.specs2.matcher.JsonMatchers
import org.specs2.mutable._
import org.specs2.runner.JUnitRunner
import play.api.libs.json.{Json, JsArray, JsValue}
import play.api.test.Helpers._
import play.api.test._
import play.test.WithApplication
/**
* Add your spec here.
* You can mock out a whole application including requests, plugins etc.
* For more information, consult the wiki.
*/
#RunWith(classOf[JUnitRunner])
class APIv1Spec extends Specification with JsonMatchers {
val registrationJson = Json.parse("""{"device":"576b9cdc-d3c3-4a3d-9689-8cd2a3e84442", |
"firstName":"", "lastName":"Johnny", "email":"justjohnny#test.com", |
"pass":"myPassword", "acceptTermsOfService":true}
""")
def dropJsonElement(json : JsValue, element : String) = (json \ element).get match {
case JsArray(items) => util.dropAt(items, 1)
}
def invalidRegistrationData(remove : String) = {
dropJsonElement(registrationJson,remove)
}
"API" should {
"Return Error on missing first name" in new WithApplication {
val result= route(
FakeRequest(
POST,
"/api/v1/security/register",
FakeHeaders(Seq( ("Content-Type", "application/json") )),
invalidRegistrationData("firstName").toString()
)
).get
status(result) must equalTo(BAD_REQUEST)
contentType(result) must beSome("application/json")
}
...
However when I attempt to run sbt test, I get the following error:
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=384M; support was removed in 8.0
[info] Loading project definition from /home/cassius/brentspace/esalestracker/project
[info] Set current project to eSalesTracker (in build file:/home/cassius/brentspace/esalestracker/)
[info] Compiling 3 Scala sources to /home/cassius/brentspace/esalestracker/target/scala-2.11/test-classes...
[error] /home/cassius/brentspace/esalestracker/test/APIv1Spec.scala:34: could not find implicit value for evidence parameter of type org.specs2.main.CommandLineAsResult[play.test.WithApplication{val result: scala.concurrent.Future[play.api.mvc.Result]}]
[error] "Return Error on missing first name" in new WithApplication {
[error] ^
[error] one error found
[error] (test:compileIncremental) Compilation failed
[error] Total time: 3 s, completed 18/01/2016 9:30:42 PM
I have similar tests in other applications, but it looks like the new version of specs adds a lot of support for Futures and other things that invalidate previous tutorials. I'm on Scala 2.11.6, Activator 1.3.6 and my build.sbt looks like the following:
name := """eSalesTracker"""
version := "1.0-SNAPSHOT"
lazy val root = (project in file(".")).enablePlugins(PlayScala)
scalaVersion := "2.11.6"
libraryDependencies ++= Seq(
jdbc,
cache,
ws,
"com.typesafe.slick" %% "slick" % "3.1.0",
"org.postgresql" % "postgresql" % "9.4-1206-jdbc42",
"org.slf4j" % "slf4j-api" % "1.7.13",
"ch.qos.logback" % "logback-classic" % "1.1.3",
"ch.qos.logback" % "logback-core" % "1.1.3",
evolutions,
specs2 % Test,
"org.specs2" %% "specs2-matcher-extra" % "3.7" % Test
)
resolvers += "scalaz-bintray" at "http://dl.bintray.com/scalaz/releases"
resolvers += Resolver.url("Typesafe Ivy releases", url("https://repo.typesafe.com/typesafe/ivy-releases"))(Resolver.ivyStylePatterns)
// Play provides two styles of routers, one expects its actions to be injected, the
// other, legacy style, accesses its actions statically.
routesGenerator := InjectedRoutesGenerator
I think you are using the wrong WithApplication import.
Use this one:
import play.api.test.WithApplication
The last line of the testcase should be the assertion/evaluation statement.
e.g. before the last } of the failing testcase method put the statement false must beEqualTo(true) and run it again.

"object github is not a member of package com" for WeKanban2 example project in Scala in action book

I'm new in Scala so started from reading "Scala in Action" book 2013 year. But I cannot build plain sbt WeKanban2 project. I use newer version scala and sbt then in example.
I get error when run sbt:
error: object github is not a member of package com import com.github.siasia.WebPlugin._
my build.properties:
//was sbt.version=0.12.0
sbt.version=0.13.7
plugins.sbt
// was libraryDependencies <+= sbtVersion(v => "com.github.siasia" %% "xsbt-eb-plugin" (v+"-0.2.11.1"))
addSbtPlugin("com.earldouglas" % "xsbt-web-plugin" % "0.7.0")
build.sbt
import com.github.siasia.WebPlugin._
name := "weKanban"
organization := "scalainaction"
version := "0.2"
//was: scalaVersion := "2.10.0"
scalaVersion := "2.11.4"
scalacOptions ++= Seq("-unchecked", "-deprecation")
resolvers ++= Seq(
"Scala-Tools Maven2 Releases Repository" at "http://scala-tools.org/repo-releases",
"Scala-Tools Maven2 Snapshots Repository" at "http://scala-tools.org/repo-snapshots"
)
libraryDependencies ++= Seq(
"org.scalaz" %% "scalaz-core" % scalazVersion,
"org.scalaz" %% "scalaz-http" % scalazVersion,
"org.eclipse.jetty" % "jetty-servlet" % jettyVersion % "container",
"org.eclipse.jetty" % "jetty-webapp" % jettyVersion % "test, container",
"org.eclipse.jetty" % "jetty-server" % jettyVersion % "container",
"com.h2database" % "h2" % "1.2.137",
"org.squeryl" % "squeryl_2.10" % "0.9.5-6"
)
seq(webSettings :_*)
build.scala
import sbt._
import Keys._
object H2TaskManager {
var process: Option[Process] = None
lazy val H2 = config("h2") extend(Compile)
val startH2 = TaskKey[Unit]("start", "Starts H2 database")
val startH2Task = startH2 in H2 <<= (fullClasspath in Compile) map { cp =>
startDatabase(cp.map(_.data).map(_.getAbsolutePath()).filter(_.contains("h2database")))}
def startDatabase(paths: Seq[String]) = {
process match {
case None =>
val cp = paths.mkString(System.getProperty("path.seperator"))
val command = "java -cp " + cp + " org.h2.tools.Server"
println("Starting Database with command: " + command)
process = Some(Process(command).run())
println("Database started ! ")
case Some(_) =>
println("H2 Database already started")
}
}
val stopH2 = TaskKey[Unit]("stop", "Stops H2 database")
val stopH2Task = stopH2 in H2 :={
process match {
case None => println("Database already stopped")
case Some(_) =>
println("Stopping database...")
process.foreach{_.destroy()}
process = None
println("Database stopped...")
}
}
}
object MainBuild extends Build {
import H2TaskManager._
lazy val scalazVersion = "6.0.3"
lazy val jettyVersion = "7.3.0.v20110203"
lazy val wekanban = Project(
"wekanban",
file(".")) settings(startH2Task, stopH2Task)}
What should I do to fix this error? What should I do to start app from Idea 14 in debug mode?
Your import
import com.github.siasia.WebPlugin._
failed to find path github and below. In your plugins.sbt you wrote:
// was libraryDependencies <+= sbtVersion(v => "com.github.siasia" %% "xsbt-eb-plugin" (v+"-0.2.11.1"))
addSbtPlugin("com.earldouglas" % "xsbt-web-plugin" % "0.7.0")
The github-path probably existed in the outcommented com.github.siasia.xsbt-eb-plugin. However the com.earldouglas.xsbt-web-plugin have only the path com.earldouglas.xwp. You should check out if any of the classes there (https://github.com/earldouglas/xsbt-web-plugin/tree/master/src/main/scala) resembles what you need to import.
Probably not related exactly to this package but similar. The library was present in sbt and everything looked ok. But I still was unable to build the project so I went to my $HOME/.ivy2/cache located the dependency directory and deleted it, then went back the $PROJECT_HOME/ and executed sbt build

Compilation errors with spark cassandra connector and SBT

I'm trying to get the DataStax spark cassandra connector working. I've created a new SBT project in IntelliJ, and added a single class. The class and my sbt file is given below. Creating spark contexts seem to work, however, the moment I uncomment the line where I try to create a cassandraTable, I get the following compilation error:
Error:scalac: bad symbolic reference. A signature in CassandraRow.class refers to term catalyst
in package org.apache.spark.sql which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling CassandraRow.class.
Sbt is kind of new to me, and I would appreciate any help in understanding what this error means (and of course, how to resolve it).
name := "cassySpark1"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.1.0"
libraryDependencies += "com.datastax.spark" % "spark-cassandra-connector" % "1.1.0" withSources() withJavadoc()
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector-java" % "1.1.0-alpha2" withSources() withJavadoc()
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
And my class:
import org.apache.spark.{SparkConf, SparkContext}
import com.datastax.spark.connector._
object HelloWorld { def main(args:Array[String]): Unit ={
System.setProperty("spark.cassandra.query.retry.count", "1")
val conf = new SparkConf(true)
.set("spark.cassandra.connection.host", "cassandra-hostname")
.set("spark.cassandra.username", "cassandra")
.set("spark.cassandra.password", "cassandra")
val sc = new SparkContext("local", "testingCassy", conf)
> //val foo = sc.cassandraTable("keyspace name", "table name")
val rdd = sc.parallelize(1 to 100)
val sum = rdd.reduce(_+_)
println(sum) } }
You need to add spark-sql to dependencies list
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.1.0"
Add library dependency in your project's pom.xml file. It seems they have changed the Vector.class dependencies location in the new refactoring.