value wholeTextFiles is not a member of org.apache.spark.SparkContext - scala

I have a Scala code like below :-
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark._
object RecipeIO {
val sc = new SparkContext(new SparkConf().setAppName("Recipe_Extraction"))
def read(INPUT_PATH: String): org.apache.spark.rdd.RDD[(String)]= {
val data = sc.wholeTextFiles("INPUT_PATH")
val files = data.map { case (filename, content) => filename}
(files)
}
}
When I compile this code using sbt it gives me the error :
value wholeTextFiles is not a member of org.apache.spark.SparkContext.
I am importing all of which is required but it's still giving me this errror.
But when I compile this code by replacing wholeTextFiles with textFile, the code gets compiled.
What might be the problem here and how do I resolve that?
Thanks in advance!
Environment:
Scala compiler version 2.10.2
spark-1.2.0
Error:
[info] Set current project to RecipeIO (in build file:/home/akshat/RecipeIO/)
[info] Compiling 1 Scala source to /home/akshat/RecipeIO/target/scala-2.10.4/classes...
[error] /home/akshat/RecipeIO/src/main/scala/RecipeIO.scala:14: value wholeTexFiles is not a member of org.apache.spark.SparkContext
[error] val data = sc.wholeTexFiles(INPUT_PATH)
[error] ^
[error] one error found
[error] {file:/home/akshat/RecipeIO/}default-55aff3/compile:compile: Compilation failed
[error] Total time: 16 s, completed Jun 15, 2015 11:07:04 PM
My build.sbt file looks like this :
name := "RecipeIO"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "0.9.0-incubating"
libraryDependencies += "org.eclipse.jetty" % "jetty-server" % "8.1.2.v20120308"
ivyXML :=
<dependency org="org.eclipse.jetty.orbit" name="javax.servlet" rev="3.0.0.v201112011016">
<artifact name="javax.servlet" type="orbit" ext="jar"/>
</dependency>

You have a typo: it should be wholeTextFiles instead of wholeTexFiles.
As a side note, I think you want sc.wholeTextFiles(INPUT_PATH) and not sc.wholeTextFiles("INPUT_PATH") if you really want to use the INPUT_PATH variable.

Related

h2o scala code compile error not found object ai

I am trying to comile and run simple h2o scala code. But when I do sbt package I get errors.
Am I missing something in the sbt file
This is my h2o scala code
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql._
import ai.h2o.automl.AutoML
import ai.h2o.automl.AutoMLBuildSpec
import org.apache.spark.h2o._
object H2oScalaEg1 {
def main(args: Array[String]): Unit = {
val sparkConf1 = new SparkConf().setMaster("local[2]").setAppName("H2oScalaEg1App")
val sparkSession1 = SparkSession.builder.config(conf = sparkConf1).getOrCreate()
val h2oContext = H2OContext.getOrCreate(sparkSession1.sparkContext)
import h2oContext._
import java.io.File
import h2oContext.implicits._
import water.Key
}
}
And this is my sbt file.
name := "H2oScalaEg1Name"
version := "1.0"
scalaVersion := "2.11.12"
scalaSource in Compile := baseDirectory.value / ""
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.3"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.2.0"
libraryDependencies += "ai.h2o" % "h2o-core" % "3.22.1.3" % "runtime" pomOnly()
When I do sbt package I get these errors
[error] /home/myuser1/h2oScalaEg1/H2oScalaEg1.scala:7:8: not found: object ai
[error] import ai.h2o.automl.AutoML
[error] ^
[error] /home/myuser1/h2oScalaEg1/H2oScalaEg1.scala:8:8: not found: object ai
[error] import ai.h2o.automl.AutoMLBuildSpec
[error] ^
[error] /home/myuser1/h2oScalaEg1/H2oScalaEg1.scala:10:25: object h2o is not a member of package org.apache.spark
[error] import org.apache.spark.h2o._
[error] ^
[error] /home/myuser1/h2oScalaEg1/H2oScalaEg1.scala:20:20: not found: value H2OContext
[error] val h2oContext = H2OContext.getOrCreate(sparkSession1.sparkContext)
[error] ^
[error] /home/myuser1/h2oScalaEg1/H2oScalaEg1.scala:28:10: not found: value water
[error] import water.Key
[error] ^
[error] 5 errors found
How can I fix this problem.
My spark version in spark-2.2.3-bin-hadoop2.7
Thanks,
marrel
pomOnly() in build.sbt indicates to the dependency management handlers that jar libs/artifacts for this dependency should not be loaded and to only look for the metadata.
Try to use libraryDependencies += "ai.h2o" % "h2o-core" % "3.22.1.3" instead.
Edit 1: Additionally I think you are missing (at least) one library dependency:
libraryDependencies += "ai.h2o" % "h2o-automl" % "3.22.1.3"
see: https://search.maven.org/artifact/ai.h2o/h2o-automl/3.22.1.5/pom
Edit 2:
The last dependency you are missing is sparkling-water-core:
libraryDependencies += "ai.h2o" % "sparkling-water-core_2.11" % "2.4.6" should do the trick.
Here is the github of sparkling-water/core/src/main/scala/org/apache/spark/h2o
.

sbt - object apache is not a member of package org

I want to deploy and submit a spark program using sbt but its throwing error.
Code:
package in.goai.spark
import org.apache.spark.{SparkContext, SparkConf}
object SparkMeApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("First Spark")
val sc = new SparkContext(conf)
val fileName = args(0)
val lines = sc.textFile(fileName).cache
val c = lines.count
println(s"There are $c lines in $fileName")
}
}
build.sbt
name := "First Spark"
version := "1.0"
organization := "in.goai"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1"
resolvers += Resolver.mavenLocal
Under first/project directory
build.properties
bt.version=0.13.9
When I am trying to run sbt package its throwing error given below.
[root#hadoop first]# sbt package
[info] Loading project definition from /home/training/workspace_spark/first/project
[info] Set current project to First Spark (in build file:/home/training/workspace_spark/first/)
[info] Compiling 1 Scala source to /home/training/workspace_spark/first/target/scala-2.11/classes...
[error] /home/training/workspace_spark/first/src/main/scala/LineCount.scala:3: object apache is not a member of package org
[error] import org.apache.spark.{SparkContext, SparkConf}
[error] ^
[error] /home/training/workspace_spark/first/src/main/scala/LineCount.scala:9: not found: type SparkConf
[error] val conf = new SparkConf().setAppName("First Spark")
[error] ^
[error] /home/training/workspace_spark/first/src/main/scala/LineCount.scala:11: not found: type SparkContext
[error] val sc = new SparkContext(conf)
[error] ^
[error] three errors found
[error] (compile:compile) Compilation failed
[error] Total time: 4 s, completed May 10, 2018 4:05:10 PM
I have tried with extends to App too but no change.
Please remove resolvers += Resolver.mavenLocal from build.sbt. Since spark-core is available on Maven, we don't need to use local resolvers.
After that, you can try sbt clean package.

Not found value spark SBT project

Hi i am trying to set up a small spark application in SBT,
My build.sbt is
import Dependencies._
name := "hello"
version := "1.0"
scalaVersion := "2.11.8"
val sparkVersion = "1.6.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-streaming-twitter" % sparkVersion
)
libraryDependencies += scalaTest % Test
Everything works fine i get all dependencies resolved by SBT, but when i try importing spark in my hello.scala project file i get this error
not found: value spark
my hello.scala file is
package example
import org.apache.spark._
import org.apache.spark.SparkContext._
object Hello extends fileImport with App {
println(greeting)
anime.select("*").orderBy($"rating".desc).limit(10).show()
}
trait fileImport {
lazy val greeting: String = "hello"
var anime = spark.read.option("header", true).csv("C:/anime.csv")
var ratings = spark.read.option("header", true).csv("C:/rating.csv")
}
here is error file i get
[info] Compiling 1 Scala source to C:\Users\haftab\Downloads\sbt-0.13.16\sbt\alfutaim\target\scala-2.11\classes...
[error] C:\Users\haftab\Downloads\sbt-0.13.16\sbt\alfutaim\src\main\scala\example\Hello.scala:12: not found: value spark
[error] var anime = spark.read.option("header", true).csv("C:/anime.csv")
[error] ^
[error] C:\Users\haftab\Downloads\sbt-0.13.16\sbt\alfutaim\src\main\scala\example\Hello.scala:13: not found: value spark
[error] var ratings = spark.read.option("header", true).csv("C:/rating.csv")
[error] ^
[error] two errors found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 3 s, completed Sep 10, 2017 1:44:47 PM
spark is initialized in spark-shell only
but for the code you need to initialize the spark variable by yourself
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("testings").master("local").getOrCreate
you can change the testings name to your desired name .master option is optional if you want to run the code using spark-submit

Spark-cassandra-connector: toArray does not work

I am using the spark-cassandra-connector with Scala and I want to read data from cassandra and display it via the method toArray.
However, I get an error message that it is not member of a class, but it is indicated in the API. Could somebody help me in finding my error?
Here are my files:
build.sbt:
name := "Simple_Project"
version := "1.0"
scalaVersion := "2.11.8"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0-preview"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0-preview"
resolvers += "Spark Packages Repo" at "https://dl.bintray.com/spark-packages/maven"
libraryDependencies += "datastax" % "spark-cassandra-connector" % "2.0.0-M2-s_2.11"
SimpleScala.scala:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import com.datastax.spark.connector._
import com.datastax.spark.connector.rdd._
import org.apache.spark.sql.cassandra._
import org.apache.spark.sql.SQLContext
import com.datastax.spark.connector.cql.CassandraConnector._
object SimpleApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Simple Application")
conf.set("spark.cassandra.connection.host", "127.0.0.1")
val sc = new SparkContext(conf)
val rdd_2 = sc.cassandraTable("test_2", "words")
rdd_2.toArray.foreach(println)
}
}
Functions for cqlsh:
CREATE KEYSPACE test_2 WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1 };
CREATE TABLE test_2.words (word text PRIMARY KEY, count int);
INSERT INTO test_2.words (word, count) VALUES ('foo', 20);
INSERT INTO test_2.words (word, count) VALUES ('bar', 20);
Error message:
[info] Loading global plugins from /home/andi/.sbt/0.13/plugins
[info] Resolving org.scala-sbt.ivy#ivy;2.3.0-sbt-2cc8d2761242b072cedb0a04cb39435[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Loading project definition from /home/andi/test_spark/project
[info] Updating {file:/home/andi/test_spark/project/}test_spark-build...
[info] Resolving org.scala-sbt.ivy#ivy;2.3.0-sbt-2cc8d2761242b072cedb0a04cb39435[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Set current project to Simple_Project (in build file:/home/andi/test_spark/)
[info] Compiling 1 Scala source to /home/andi/test_spark/target/scala-2.11/classes...
[error] /home/andi/test_spark/src/main/scala/SimpleApp.scala:50: value toArray is not a member of com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow]
[error] rdd_2.toArray.foreach(println)
[error] ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
Many thanks in advance,
Andi
CassandraTableScanRDD.toArray method has been deprecated and removed since 2.0.0 release of Spark Cassandra Connector. This method was there until 1.6.0 release. You can use collect method instead.
Unfortunately, the document Spark Cassandra Connector still uses toArray. Anyway, here is that will work
rdd_2.collect.foreach(println)

SBT cannot resolve class declared in src/main/scala in a src/test/scala test class

I am trying to make my own custom CSV reader. I am using IntelliJ IDEA 14 with sbt and specs2 test framework.
The class I declared in src/main is as follows:
import java.io.FileInputStream
import scala.io.Source
class CSVStream(filePath:String) {
val csvStream = Source.fromInputStream(new FileInputStream(filePath)).getLines()
val headers = csvStream.next().split("\\,", -1)
}
The content of the test file in src/test is as follows:
import org.specs2.mutable._
object CSVStreamSpec {
val csvSourcePath = getClass.getResource("/csv_source.csv").getPath
}
class CSVStreamSpec extends Specification {
import CSVStreamLib.CSVStreamSpec._
"The CSV Stream reader" should {
"Extract the header" in {
val csvSource = CSVStream(csvSourcePath)
}
}
}
The build.sbt file contains the following:
name := "csvStreamLib"
version := "1.0"
scalaVersion := "2.11.4"
libraryDependencies ++= Seq("org.specs2" %% "specs2-core" % "2.4.15" % "test")
parallelExecution in Test := false
The error I am getting when I type test is as follows:
[error] /Users/raiyan/IdeaProjects/csvStreamLib/src/test/scala/csvStreamSpec.scala:18: not found: value CSVStream
[error] val csvSource = CSVStream(csvSourcePath)
[error] ^
[error] one error found
[error] (test:compile) Compilation failed
[error] Total time: 23 s, completed 30-Dec-2014 07:44:46
How do I make the CSVStream class accessible to the CSVStreamSpec class in the test file?
Update:
I tried it with sbt in the command line. The result is the same.
You forgot the new keyword. Without it, the compiler looks for the companion object named CSVStream, not the class. Since there is none, it complains. Add new and it'll work.