Spark-cassandra-connector: toArray does not work - scala

I am using the spark-cassandra-connector with Scala and I want to read data from cassandra and display it via the method toArray.
However, I get an error message that it is not member of a class, but it is indicated in the API. Could somebody help me in finding my error?
Here are my files:
build.sbt:
name := "Simple_Project"
version := "1.0"
scalaVersion := "2.11.8"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0-preview"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0-preview"
resolvers += "Spark Packages Repo" at "https://dl.bintray.com/spark-packages/maven"
libraryDependencies += "datastax" % "spark-cassandra-connector" % "2.0.0-M2-s_2.11"
SimpleScala.scala:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import com.datastax.spark.connector._
import com.datastax.spark.connector.rdd._
import org.apache.spark.sql.cassandra._
import org.apache.spark.sql.SQLContext
import com.datastax.spark.connector.cql.CassandraConnector._
object SimpleApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Simple Application")
conf.set("spark.cassandra.connection.host", "127.0.0.1")
val sc = new SparkContext(conf)
val rdd_2 = sc.cassandraTable("test_2", "words")
rdd_2.toArray.foreach(println)
}
}
Functions for cqlsh:
CREATE KEYSPACE test_2 WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1 };
CREATE TABLE test_2.words (word text PRIMARY KEY, count int);
INSERT INTO test_2.words (word, count) VALUES ('foo', 20);
INSERT INTO test_2.words (word, count) VALUES ('bar', 20);
Error message:
[info] Loading global plugins from /home/andi/.sbt/0.13/plugins
[info] Resolving org.scala-sbt.ivy#ivy;2.3.0-sbt-2cc8d2761242b072cedb0a04cb39435[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Loading project definition from /home/andi/test_spark/project
[info] Updating {file:/home/andi/test_spark/project/}test_spark-build...
[info] Resolving org.scala-sbt.ivy#ivy;2.3.0-sbt-2cc8d2761242b072cedb0a04cb39435[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Set current project to Simple_Project (in build file:/home/andi/test_spark/)
[info] Compiling 1 Scala source to /home/andi/test_spark/target/scala-2.11/classes...
[error] /home/andi/test_spark/src/main/scala/SimpleApp.scala:50: value toArray is not a member of com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow]
[error] rdd_2.toArray.foreach(println)
[error] ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
Many thanks in advance,
Andi

CassandraTableScanRDD.toArray method has been deprecated and removed since 2.0.0 release of Spark Cassandra Connector. This method was there until 1.6.0 release. You can use collect method instead.

Unfortunately, the document Spark Cassandra Connector still uses toArray. Anyway, here is that will work
rdd_2.collect.foreach(println)

Related

sbt - object apache is not a member of package org

I want to deploy and submit a spark program using sbt but its throwing error.
Code:
package in.goai.spark
import org.apache.spark.{SparkContext, SparkConf}
object SparkMeApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("First Spark")
val sc = new SparkContext(conf)
val fileName = args(0)
val lines = sc.textFile(fileName).cache
val c = lines.count
println(s"There are $c lines in $fileName")
}
}
build.sbt
name := "First Spark"
version := "1.0"
organization := "in.goai"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1"
resolvers += Resolver.mavenLocal
Under first/project directory
build.properties
bt.version=0.13.9
When I am trying to run sbt package its throwing error given below.
[root#hadoop first]# sbt package
[info] Loading project definition from /home/training/workspace_spark/first/project
[info] Set current project to First Spark (in build file:/home/training/workspace_spark/first/)
[info] Compiling 1 Scala source to /home/training/workspace_spark/first/target/scala-2.11/classes...
[error] /home/training/workspace_spark/first/src/main/scala/LineCount.scala:3: object apache is not a member of package org
[error] import org.apache.spark.{SparkContext, SparkConf}
[error] ^
[error] /home/training/workspace_spark/first/src/main/scala/LineCount.scala:9: not found: type SparkConf
[error] val conf = new SparkConf().setAppName("First Spark")
[error] ^
[error] /home/training/workspace_spark/first/src/main/scala/LineCount.scala:11: not found: type SparkContext
[error] val sc = new SparkContext(conf)
[error] ^
[error] three errors found
[error] (compile:compile) Compilation failed
[error] Total time: 4 s, completed May 10, 2018 4:05:10 PM
I have tried with extends to App too but no change.
Please remove resolvers += Resolver.mavenLocal from build.sbt. Since spark-core is available on Maven, we don't need to use local resolvers.
After that, you can try sbt clean package.

Not found value spark SBT project

Hi i am trying to set up a small spark application in SBT,
My build.sbt is
import Dependencies._
name := "hello"
version := "1.0"
scalaVersion := "2.11.8"
val sparkVersion = "1.6.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-streaming-twitter" % sparkVersion
)
libraryDependencies += scalaTest % Test
Everything works fine i get all dependencies resolved by SBT, but when i try importing spark in my hello.scala project file i get this error
not found: value spark
my hello.scala file is
package example
import org.apache.spark._
import org.apache.spark.SparkContext._
object Hello extends fileImport with App {
println(greeting)
anime.select("*").orderBy($"rating".desc).limit(10).show()
}
trait fileImport {
lazy val greeting: String = "hello"
var anime = spark.read.option("header", true).csv("C:/anime.csv")
var ratings = spark.read.option("header", true).csv("C:/rating.csv")
}
here is error file i get
[info] Compiling 1 Scala source to C:\Users\haftab\Downloads\sbt-0.13.16\sbt\alfutaim\target\scala-2.11\classes...
[error] C:\Users\haftab\Downloads\sbt-0.13.16\sbt\alfutaim\src\main\scala\example\Hello.scala:12: not found: value spark
[error] var anime = spark.read.option("header", true).csv("C:/anime.csv")
[error] ^
[error] C:\Users\haftab\Downloads\sbt-0.13.16\sbt\alfutaim\src\main\scala\example\Hello.scala:13: not found: value spark
[error] var ratings = spark.read.option("header", true).csv("C:/rating.csv")
[error] ^
[error] two errors found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 3 s, completed Sep 10, 2017 1:44:47 PM
spark is initialized in spark-shell only
but for the code you need to initialize the spark variable by yourself
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("testings").master("local").getOrCreate
you can change the testings name to your desired name .master option is optional if you want to run the code using spark-submit

value wholeTextFiles is not a member of org.apache.spark.SparkContext

I have a Scala code like below :-
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark._
object RecipeIO {
val sc = new SparkContext(new SparkConf().setAppName("Recipe_Extraction"))
def read(INPUT_PATH: String): org.apache.spark.rdd.RDD[(String)]= {
val data = sc.wholeTextFiles("INPUT_PATH")
val files = data.map { case (filename, content) => filename}
(files)
}
}
When I compile this code using sbt it gives me the error :
value wholeTextFiles is not a member of org.apache.spark.SparkContext.
I am importing all of which is required but it's still giving me this errror.
But when I compile this code by replacing wholeTextFiles with textFile, the code gets compiled.
What might be the problem here and how do I resolve that?
Thanks in advance!
Environment:
Scala compiler version 2.10.2
spark-1.2.0
Error:
[info] Set current project to RecipeIO (in build file:/home/akshat/RecipeIO/)
[info] Compiling 1 Scala source to /home/akshat/RecipeIO/target/scala-2.10.4/classes...
[error] /home/akshat/RecipeIO/src/main/scala/RecipeIO.scala:14: value wholeTexFiles is not a member of org.apache.spark.SparkContext
[error] val data = sc.wholeTexFiles(INPUT_PATH)
[error] ^
[error] one error found
[error] {file:/home/akshat/RecipeIO/}default-55aff3/compile:compile: Compilation failed
[error] Total time: 16 s, completed Jun 15, 2015 11:07:04 PM
My build.sbt file looks like this :
name := "RecipeIO"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "0.9.0-incubating"
libraryDependencies += "org.eclipse.jetty" % "jetty-server" % "8.1.2.v20120308"
ivyXML :=
<dependency org="org.eclipse.jetty.orbit" name="javax.servlet" rev="3.0.0.v201112011016">
<artifact name="javax.servlet" type="orbit" ext="jar"/>
</dependency>
You have a typo: it should be wholeTextFiles instead of wholeTexFiles.
As a side note, I think you want sc.wholeTextFiles(INPUT_PATH) and not sc.wholeTextFiles("INPUT_PATH") if you really want to use the INPUT_PATH variable.

Not able to execute my SparkStreaming Program

I have written the following Scala code and my platform is Cloudera CDH 5.2.1 on CentOS 6.5
Tutorial.scala
import org.apache.spark
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._
import org.apache.spark.streaming.StreamingContext._
import TutorialHelper._
object Tutorial {
def main(args: Array[String]) {
val checkpointDir = TutorialHelper.getCheckPointDirectory()
val consumerKey = "..."
val consumerSecret = "..."
val accessToken = "..."
val accessTokenSecret = "..."
try {
TutorialHelper.configureTwitterCredentials(consumerKey, consumerSecret, accessToken, accessTokenSecret)
val ssc = new StreamingContext(new SparkContext(), Seconds(1))
val tweets = TwitterUtils.createStream(ssc, None)
val tweetText = tweets.map(tweet => tweet.getText())
tweetText.print()
ssc.checkpoint(checkpointDir)
ssc.start()
ssc.awaitTermination()
} finally {
//ssc.stop()
}
}
}
My build.sbt file looks like
import AssemblyKeys._ // put this at the top of the file
name := "Tutorial"
scalaVersion := "2.10.3"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-streaming" % "1.0.0" % "provided",
"org.apache.spark" %% "spark-streaming-twitter" % "1.0.0"
)
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
resourceDirectory in Compile := baseDirectory.value / "resources"
assemblySettings
mergeStrategy in assembly := {
case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => MergeStrategy.discard
case "log4j.properties" => MergeStrategy.discard
case m if m.toLowerCase.startsWith("meta-inf/services/") => MergeStrategy.filterDistinctLines
case "reference.conf" => MergeStrategy.concat
case _ => MergeStrategy.first
}
I also created a file called projects/plugin.sbt which has the following content
addSbtPlugin("net.virtual-void" % "sbt-cross-building" % "0.8.1")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.9.1")
and project/build.scala
import sbt._
object Plugins extends Build {
lazy val root = Project("root", file(".")) dependsOn(
uri("git://github.com/sbt/sbt-assembly.git#0.9.1")
)
}
after this I can build my "uber" assembly by using
sbt assembly
now I run my code using
sudo -u hdfs spark-submit --class Tutorial --master local /tmp/Tutorial-assembly-0.1-SNAPSHOT.jar
I get the error
Configuring Twitter OAuth
Property twitter4j.oauth.accessToken set as [...]
Property twitter4j.oauth.consumerSecret set as [...]
Property twitter4j.oauth.accessTokenSecret set as [...]
Property twitter4j.oauth.consumerKey set as [...]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/jars/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/12/21 16:04:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-------------------------------------------
Time: 1419199472000 ms
-------------------------------------------
-------------------------------------------
Time: 1419199473000 ms
-------------------------------------------
14/12/21 16:04:33 ERROR ReceiverSupervisorImpl: Error stopping receiver 0org.apache.spark.Logging$class.log(Logging.scala:52)
org.apache.spark.streaming.twitter.TwitterReceiver.log(TwitterInputDStream.scala:60)
org.apache.spark.Logging$class.logInfo(Logging.scala:59)
org.apache.spark.streaming.twitter.TwitterReceiver.logInfo(TwitterInputDStream.scala:60)
org.apache.spark.streaming.twitter.TwitterReceiver.onStop(TwitterInputDStream.scala:101)
org.apache.spark.streaming.receiver.ReceiverSupervisor.stopReceiver(ReceiverSupervisor.scala:136)
org.apache.spark.streaming.receiver.ReceiverSupervisor.stop(ReceiverSupervisor.scala:112)
org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:127)
org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:106)
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:264)
You need to use sbt assembly plugin to prepare "assembled" jar file with all dependencies. It should contain all twitter util classes.
Links:
1. https://github.com/sbt/sbt-assembly
2. http://prabstechblog.blogspot.com/2014/04/creating-single-jar-for-spark-project.html
3. http://eugenezhulenev.com/blog/2014/10/18/run-tests-in-standalone-spark-cluster/
Or you can take a look at my Spark-Twitter project, it has configured sbt-assembly plugin: http://eugenezhulenev.com/blog/2014/11/20/twitter-analytics-with-spark/
CDH 5.2 packages Spark 1.1.0, but you build.sbt is using 1.0.0. Update the versions below and rebuild should fix your problem.
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-streaming" % "1.0.0" % "provided",
"org.apache.spark" %% "spark-streaming-twitter" % "1.0.0"
)

Test in Eclipse works but sbt throws MissingRequirementError: object scala.runtime in compiler mirror not found

I am messing around with parsing and scala.tools.nsc.interactive.Global in Scala and I ran into a problem while executing tests under sbt. The tests run fine from Eclipse both with JUnitRunner and the ScalaTest plugin. After long time spent on Google I can't figure out how to fix this.
When I execute sbt test the following error is thrown:
Exception encountered when attempting to run a suite with class name: compileutils.CompileTest *** ABORTED ***
[info] java.lang.ExceptionInInitializerError:
[info] at compileutils.CompileTest$$anonfun$3.apply$mcV$sp(CompileTest.scala:18)
[info] at compileutils.CompileTest$$anonfun$3.apply(CompileTest.scala:16)
[info] at compileutils.CompileTest$$anonfun$3.apply(CompileTest.scala:16)
[info] at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22)
[info] at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22)
[info] at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info] at org.scalatest.Transformer.apply(Transformer.scala:22)
[info] at org.scalatest.Transformer.apply(Transformer.scala:20)
[info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:158)
[info] ...
[info] Cause: scala.reflect.internal.MissingRequirementError: object scala.runtime in compiler mirror not found.
[info] at scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
[info] at scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
[info] at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
[info] at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40)
[info] at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
[info] at scala.reflect.internal.Mirrors$RootsBase.getPackage(Mirrors.scala:172)
[info] at scala.reflect.internal.Mirrors$RootsBase.getRequiredPackage(Mirrors.scala:175)
[info] at scala.reflect.internal.Definitions$DefinitionsClass.RuntimePackage$lzycompute(Definitions.scala:183)
[info] at scala.reflect.internal.Definitions$DefinitionsClass.RuntimePackage(Definitions.scala:183)
[info] at scala.reflect.internal.Definitions$DefinitionsClass.RuntimePackageClass$lzycompute(Definitions.scala:184)
[info] ...
The class under test
package compileutils
import scala.tools.nsc.Settings
import scala.tools.nsc.interactive.Global
import scala.tools.nsc.reporters.ConsoleReporter
import scala.tools.nsc.interactive.Response
import scala.io.Source
import scala.reflect.internal.util.SourceFile
import scala.reflect.internal.util.BatchSourceFile
import scala.reflect.io.AbstractFile
import java.io.File
object Compile {
val settings = new Settings
val reporter = new ConsoleReporter(settings)
val global = new Global(settings, reporter, "Study compile")
def parse(source: String): Compile.this.global.Tree = {
val sourceFile = new BatchSourceFile(".", source)
global.askReload(List(sourceFile), new Response[Unit])
global.parseTree(sourceFile)
}
def loadTypes(source: String): Either[Compile.this.global.Tree, Throwable] = {
val sourceFile = new BatchSourceFile(".", source)
val tResponse = new Response[global.Tree]
global.askReload(List(sourceFile), new Response[Unit])
global.askLoadedTyped(sourceFile, tResponse)
tResponse.get
}
}
The test
package compileutils
import org.scalatest.BeforeAndAfter
import org.junit.runner.RunWith
import org.scalatest.junit.JUnitRunner
import org.scalatest.FunSuite
import org.scalatest.Matchers._
#RunWith(classOf[JUnitRunner])
class CompileTest extends FunSuite with BeforeAndAfter {
val testSource = "class FromString {val s = \"dsasdsad \"}"
before {}
after {}
test("parse") {
//when
val tree = Compile.parse(testSource)
//then
tree should not be null
}
test("typer") {
//when
val typ = Compile.loadTypes(testSource)
//then
typ should be('left)
}
}
build.sbt
name := "Compiler study"
version := "0.1"
val scalaBuildVersion = "2.10.3"
scalaVersion := scalaBuildVersion
libraryDependencies += "org.scala-lang" % "scala-compiler" % scalaBuildVersion
libraryDependencies += "org.scala-lang" % "scala-library" % scalaBuildVersion
libraryDependencies += "org.scala-lang" % "scala-reflect" % scalaBuildVersion
libraryDependencies += "org.scalatest" %% "scalatest" % "2.1.0" % "test"
libraryDependencies += "junit" % "junit" % "4.11" % "test"
Environment:
sbt launcher version 0.13.0
Scala compiler version 2.10.3 -- Copyright 2002-2013, LAMP/EPFL
javac 1.6.0_45
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=13.10
DISTRIB_CODENAME=saucy
DISTRIB_DESCRIPTION="Ubuntu 13.10"
It looks like the scala jar isn't in your classpath when running sbt - make sure to add scala-library.jar to your classpath before you run sbt.
Based on one of your comments, it looks like you're running on windows. you might be also running into runtime jar access errors there if the classpath contains strange characters or spaces, or permission errors (e.g., if eclipse is running under an admin account, while sbt isn't).
Try reordering your dependency list, to put scala-library ahead of scala-compiler. if that doesn't work, try the troubleshooting advice here.
The scala-library.jar was not missing from the sbt's classpath but from the classpath of Global. Had to set it in code.
After modifying the source to
val settings = new Settings
val scalaLibraryPath = "/home/csajka/.ivy2/cache/org.scala-lang/scala-library/jars/scala-library-2.10.3.jar"
settings.bootclasspath.append(scalaLibraryPath)
settings.classpath.append(scalaLibraryPath)
val reporter = new ConsoleReporter(settings)
val global = new Global(settings, reporter, "Study compile")
the problem disappeared.
Thanks for the tip #blueberryfields!