Scala and GraphX in Spark - scala

Any idea why we get these errors?
ubuntu#group-3-vm1:~/software/sbt/bin$ ./sbt package
[info] Set current project to hello (in build file:/home/ubuntu/software/sbt/bin/)
[info] Compiling 1 Scala source to /home/ubuntu/software/sbt/bin/target/scala-2.11/classes...
[error] /home/ubuntu/software/sbt/bin/hi.scala:1: object apache is not a member of package org
[error] import org.apache.spark.SparkContext
[error] ^
[error] /home/ubuntu/software/sbt/bin/hi.scala:2: object apache is not a member of package org
[error] import org.apache.spark.SparkContext._
[error] ^
the code is:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.api.java._
import org.apache.spark.api.java.function.Function_
import org.apache.spark.graphx._
import org.apache.spark.graphx.lib._
import org.apache.spark.graphx.PartitionStrategy._
//class PartBQ1{
object PartBQ1{
val conf = new SparkConf().setMaster("spark://10.0.1.31:7077")
.setAppName("CS-838-Assignment2-Question2")
.set("spark.driver.memory", "1g")
.set("spark.eventLog.enabled", "true")
.set("spark.eventLog.dir", "/home/ubuntu/storage/logs")
.set("spark.executor.memory", "21g")
.set("spark.executor.cores", "4")
.set("spark.cores.max", "4")
.set("spark.task.cpus", "1")
val sc = new SparkContext(conf=conf)
sql_ctx = new SQLContext(sc)
graph = GraphLoader.edgeListFile(sc, "data2.txt")
}

Seems to be missing a sbt file. Like:
simple.sbt
name := "Simple Project"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.1"

Related

sbt - object apache is not a member of package org

I want to deploy and submit a spark program using sbt but its throwing error.
Code:
package in.goai.spark
import org.apache.spark.{SparkContext, SparkConf}
object SparkMeApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("First Spark")
val sc = new SparkContext(conf)
val fileName = args(0)
val lines = sc.textFile(fileName).cache
val c = lines.count
println(s"There are $c lines in $fileName")
}
}
build.sbt
name := "First Spark"
version := "1.0"
organization := "in.goai"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1"
resolvers += Resolver.mavenLocal
Under first/project directory
build.properties
bt.version=0.13.9
When I am trying to run sbt package its throwing error given below.
[root#hadoop first]# sbt package
[info] Loading project definition from /home/training/workspace_spark/first/project
[info] Set current project to First Spark (in build file:/home/training/workspace_spark/first/)
[info] Compiling 1 Scala source to /home/training/workspace_spark/first/target/scala-2.11/classes...
[error] /home/training/workspace_spark/first/src/main/scala/LineCount.scala:3: object apache is not a member of package org
[error] import org.apache.spark.{SparkContext, SparkConf}
[error] ^
[error] /home/training/workspace_spark/first/src/main/scala/LineCount.scala:9: not found: type SparkConf
[error] val conf = new SparkConf().setAppName("First Spark")
[error] ^
[error] /home/training/workspace_spark/first/src/main/scala/LineCount.scala:11: not found: type SparkContext
[error] val sc = new SparkContext(conf)
[error] ^
[error] three errors found
[error] (compile:compile) Compilation failed
[error] Total time: 4 s, completed May 10, 2018 4:05:10 PM
I have tried with extends to App too but no change.
Please remove resolvers += Resolver.mavenLocal from build.sbt. Since spark-core is available on Maven, we don't need to use local resolvers.
After that, you can try sbt clean package.

Spark-cassandra-connector: toArray does not work

I am using the spark-cassandra-connector with Scala and I want to read data from cassandra and display it via the method toArray.
However, I get an error message that it is not member of a class, but it is indicated in the API. Could somebody help me in finding my error?
Here are my files:
build.sbt:
name := "Simple_Project"
version := "1.0"
scalaVersion := "2.11.8"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0-preview"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0-preview"
resolvers += "Spark Packages Repo" at "https://dl.bintray.com/spark-packages/maven"
libraryDependencies += "datastax" % "spark-cassandra-connector" % "2.0.0-M2-s_2.11"
SimpleScala.scala:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import com.datastax.spark.connector._
import com.datastax.spark.connector.rdd._
import org.apache.spark.sql.cassandra._
import org.apache.spark.sql.SQLContext
import com.datastax.spark.connector.cql.CassandraConnector._
object SimpleApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Simple Application")
conf.set("spark.cassandra.connection.host", "127.0.0.1")
val sc = new SparkContext(conf)
val rdd_2 = sc.cassandraTable("test_2", "words")
rdd_2.toArray.foreach(println)
}
}
Functions for cqlsh:
CREATE KEYSPACE test_2 WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1 };
CREATE TABLE test_2.words (word text PRIMARY KEY, count int);
INSERT INTO test_2.words (word, count) VALUES ('foo', 20);
INSERT INTO test_2.words (word, count) VALUES ('bar', 20);
Error message:
[info] Loading global plugins from /home/andi/.sbt/0.13/plugins
[info] Resolving org.scala-sbt.ivy#ivy;2.3.0-sbt-2cc8d2761242b072cedb0a04cb39435[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Loading project definition from /home/andi/test_spark/project
[info] Updating {file:/home/andi/test_spark/project/}test_spark-build...
[info] Resolving org.scala-sbt.ivy#ivy;2.3.0-sbt-2cc8d2761242b072cedb0a04cb39435[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Set current project to Simple_Project (in build file:/home/andi/test_spark/)
[info] Compiling 1 Scala source to /home/andi/test_spark/target/scala-2.11/classes...
[error] /home/andi/test_spark/src/main/scala/SimpleApp.scala:50: value toArray is not a member of com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow]
[error] rdd_2.toArray.foreach(println)
[error] ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
Many thanks in advance,
Andi
CassandraTableScanRDD.toArray method has been deprecated and removed since 2.0.0 release of Spark Cassandra Connector. This method was there until 1.6.0 release. You can use collect method instead.
Unfortunately, the document Spark Cassandra Connector still uses toArray. Anyway, here is that will work
rdd_2.collect.foreach(println)

sc.textFile cannot resolve symbol at textFile

I am using IntelliJ for scala
below is my code:
import org.apache.spark.SparkConf
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object XXXX extends App{
val sc = new SparkConf()
val DataRDD = sc.textFile("/Users/itru/Desktop/XXXX.bz2").cache()
}
my build.sbt file contains below:
name := "XXXXX"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" % "1.5.2" %"provided",
"org.apache.spark" %% "spark-sql" % "1.5.2" % "provided")
Packages are downloaded, but still i see error "cannot resolve symbol at textFile" Am i missing any library dependencies
You haven't initialised your SparkContext.
val sparkConf = new SparkConf()
val sc = new SparkContext(sparkConf)
val dataRDD = sc.textFile("/Users/itru/Desktop/XXXX.bz2").cache()

Scala program isn't seeing dependencies downloaded via SBT

I'm writing a script to try to get Cassandra and Spark working together but I can't even get the program to compile. I am using SBT as the build tool and I have all the dependencies required for the program declared. The first time I ran sbt run it downloaded the dependencies but I would get an error when it started compiling the scala code shown below:
[info] Compiling 1 Scala source to /home/vagrant/ScalaTest/target/scala-2.10/classes...
[error] /home/vagrant/ScalaTest/src/main/scala/ScalaTest.scala:6: not found: type SparkConf
[error] val conf = new SparkConf(true)
[error] ^
[error] /home/vagrant/ScalaTest/src/main/scala/ScalaTest.scala:9: not found: type SparkContext
[error] val sc = new SparkContext("spark://192.168.10.11:7077", "test", conf)
[error] ^
[error] two errors found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 3 s, completed Jun 5, 2015 2:40:09 PM
This is the SBT build file
lazy val root = (project in file(".")).
settings(
name := "ScalaTest",
version := "1.0"
)
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.3.0-M1"
and this is the actual Scala program
import com.datastax.spark.connector._
object ScalaTest {
def main(args: Array[String]) {
val conf = new SparkConf(true)
.set("spark.cassandra.connection.host", "127.0.0.1")
val sc = new SparkContext("spark://192.168.10.11:7077", "test", conf)
}
}
Here is my directory structure
- ScalaTest
- build.sbt
- project
- src
- main
- scala
- ScalaTest.scala
- target
I don't know if this is the problem, but you're not importing the SparkConf and SparkContext classes definition. Thus try adding to your scala file:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

value wholeTextFiles is not a member of org.apache.spark.SparkContext

I have a Scala code like below :-
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark._
object RecipeIO {
val sc = new SparkContext(new SparkConf().setAppName("Recipe_Extraction"))
def read(INPUT_PATH: String): org.apache.spark.rdd.RDD[(String)]= {
val data = sc.wholeTextFiles("INPUT_PATH")
val files = data.map { case (filename, content) => filename}
(files)
}
}
When I compile this code using sbt it gives me the error :
value wholeTextFiles is not a member of org.apache.spark.SparkContext.
I am importing all of which is required but it's still giving me this errror.
But when I compile this code by replacing wholeTextFiles with textFile, the code gets compiled.
What might be the problem here and how do I resolve that?
Thanks in advance!
Environment:
Scala compiler version 2.10.2
spark-1.2.0
Error:
[info] Set current project to RecipeIO (in build file:/home/akshat/RecipeIO/)
[info] Compiling 1 Scala source to /home/akshat/RecipeIO/target/scala-2.10.4/classes...
[error] /home/akshat/RecipeIO/src/main/scala/RecipeIO.scala:14: value wholeTexFiles is not a member of org.apache.spark.SparkContext
[error] val data = sc.wholeTexFiles(INPUT_PATH)
[error] ^
[error] one error found
[error] {file:/home/akshat/RecipeIO/}default-55aff3/compile:compile: Compilation failed
[error] Total time: 16 s, completed Jun 15, 2015 11:07:04 PM
My build.sbt file looks like this :
name := "RecipeIO"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "0.9.0-incubating"
libraryDependencies += "org.eclipse.jetty" % "jetty-server" % "8.1.2.v20120308"
ivyXML :=
<dependency org="org.eclipse.jetty.orbit" name="javax.servlet" rev="3.0.0.v201112011016">
<artifact name="javax.servlet" type="orbit" ext="jar"/>
</dependency>
You have a typo: it should be wholeTextFiles instead of wholeTexFiles.
As a side note, I think you want sc.wholeTextFiles(INPUT_PATH) and not sc.wholeTextFiles("INPUT_PATH") if you really want to use the INPUT_PATH variable.