SparkConf not found running spark neo4j connector example - scala

I execute the spark neo4j example code like so:
spark-shell --conf spark.neo4j.bolt.password=TestNeo4j --packages neo4j-contrib:neo4j-spark-connector:1.0.0-RC1,graphframes:graphframes:0.1.0-spark1.6 -i neo4jspark.scala
My Scalafile:
import org.neo4j.spark._
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.rdd.RDD
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.SparkConf
val conf = new SparkConf.setMaster("local").setAppName("neo4jspark")
val sc = new SparkContext(conf)
val neo = Neo4j(sc)
val rdd = neo.cypher("MATCH (p:POINT) RETURN p").loadRowRdd
rdd.count
The error:
Loading neo4jspark.scala...
import org.neo4j.spark._
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.rdd.RDD
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.SparkConf
<console>:38: error: not found: value SparkConf
val conf = new SparkConf.setMaster("local").setAppName("neo4jspark")
^
<console>:38: error: not found: value conf
val sc = new SparkContext(conf)
^
<console>:39: error: not found: value Neo4j
val neo = Neo4j(sc)
^
<console>:38: error: not found: value neo
val rdd = neo.cypher("MATCH (p:POINT) RETURN p").loadRowRdd
^
<console>:39: error: object count is not a member of package org.apache.spark.streaming.rdd
rdd.count
^
I am importing the SparkConf, no idea whay it says there is no value for that, am I missing something (simple I hope)??
EDIT: This seems to be a version error:
I ran it with this start up:
spark-shell --conf spark.neo4j.bolt.password=TestNeo4j --packages neo4j-contrib:neo4j-spark-connector:2.0.0-M2,graphframes:graphframes:0.2.0-spark2.0-s_2.11 -i neo4jspark.scala
Still get the conf error but it does run. I just need to now figure out why anything i do with the returned RDD breaks :D This is on mac, just tested same versions of everything on my windows and it breaks like it showed her mostly because it couldn't get the import.org.neo4j.spark._ here is the error:
<console>:23: error: object neo4j is not a member of package org
import org.neo4j.spark._
^
No idea what is different about windows than the mac I am deving on :/

Related

Packaging scala class on databricks (error: not found: value dbutils)

Trying to make a package with a class
package x.y.Log
import scala.collection.mutable.ListBuffer
import org.apache.spark.sql.{DataFrame}
import org.apache.spark.sql.functions.{lit, explode, collect_list, struct}
import org.apache.spark.sql.types.{StructField, StructType}
import java.util.Calendar
import java.text.SimpleDateFormat
import org.apache.spark.sql.functions._
import spark.implicits._
class Log{
...
}
Everything runs fine on same notebook, but once I try to create package that I could use in other notebooks I get errors:
<notebook>:11: error: not found: object spark
import spark.implicits._
^
<notebook>:21: error: not found: value dbutils
val notebookPath = dbutils.notebook.getContext().notebookPath.get
^
<notebook>:22: error: not found: value dbutils
val userName = dbutils.notebook.getContext.tags("user")
^
<notebook>:23: error: not found: value dbutils
val userId = dbutils.notebook.getContext.tags("userId")
^
<notebook>:41: error: not found: value spark
var rawMeta = spark.read.format("json").option("multiLine", true).load("/FileStore/tables/xxx.json")
^
<notebook>:42: error: value $ is not a member of StringContext
.filter($"Name".isin(readSources))
Anyone knows how to package this class with these libs?
Assuming you are running Spark 2.x, the statement import spark.implicits._ only works when you have SparkSession object in the scope. The object Implicits is defined inside the SparkSession object. This object extends the SQLImplicits from previous verisons of spark Link to SparkSession code on Github. You can check the link to verify
package x.y.Log
import scala.collection.mutable.ListBuffer
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.{lit, explode, collect_list, struct}
import org.apache.spark.sql.types.{StructField, StructType}
import java.util.Calendar
import java.text.SimpleDateFormat
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SparkSession
class Log{
val spark: SparkSession = SparkSession.builder.enableHiveSupport().getOrCreate()
import spark.implicits._
...[rest of the code below]
}

Running wordcount failed in scala

I am trying to run wordcount program in scala. Here's how my code looks like.
package myspark;
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.implicits._
object WordCount {
def main(args: Array[String]) {
val sc = new SparkContext( "local", "Word Count", "/home/hadoop/spark-2.2.0-bin-hadoop2.7/bin", Nil, Map(), Map())
val input = sc.textFile("/myspark/input.txt")
Val count = input.flatMap(line ⇒ line.split(" "))
.map(word ⇒ (word, 1))
.reduceByKey(_ + _)
count.saveAsTextFile("outfile")
System.out.println("OK");
}
}
Then I tried to execute it in spark.
spark-shell -i /myspark/WordCount.scala
And I get this error.
... 149 more
<console>:14: error: not found: value spark
import spark.implicits._
^
<console>:14: error: not found: value spark
import spark.sql
^
That file does not exist
Can someone please explain the error in this code? I am very new to Spark and Scala both. I have verified that the input.txt file is in the mentioned location.
You can take a look here to get started : Learning Spark-WordCount
Other than that there are many a errors that I can see
import org.apache.spark..implicits._: the two dots wont work
Other than that have you added spark-dependency in your project ? Maybe even as provided ? You must do that atleast to run the spark code.
First of all check whether you have added the right dependencies . An i can see you did few mistake in your code .
create Sparksession not Sparkcontext SparkSessionAPI
import org.apache.spark.sql.SparkSession
val spark = SparkSession
.builder()
.appName("Spark SQL basic example")
.config("spark.some.config.option", "some-value")
.getOrCreate()
Then use this spark variable
import spark.implicits._
I am not sure why you have mentioned import org.apache.spark..implicits._ 2 dot between the spark..implicits

Error found when importing spark.implicits

I am using spark 1.4.0
When I tried to import spark.implicits using this command:
import spark.implicits._, this error appear:
<console>:19: error: not found: value spark
import spark.implicits._
^
Can anyone help me to resolve this problem ?
It's because SparkSession is avialable from Spark 2.0 and spark value is an object of type SparkSession in Spark REPL.
In Spark 1.4 use
import sqlContext.implicits._
Value sqlContext is automatically created in Spark REPL for Spark 1.x
To make it complete, first you have to create a sqlContext
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SQLContext
val conf = new SparkConf().setMaster("local").setAppName("my app")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._

44: error: value read is not a member of object org.apache.spark.sql.SQLContext

I am using Spark 1.6.1, and Scala 2.10.5. I am trying to read the csv file through com.databricks.
While launching the spark-shell, I use below lines as well
spark-shell --packages com.databricks:spark-csv_2.10:1.5.0 --driver-class-path path to/sqljdbc4.jar, and below is the whole code
import java.util.Properties
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
val conf = new SparkConf().setAppName("test").setMaster("local").set("spark.driver.allowMultipleContexts", "true");
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val df = SQLContext.read().format("com.databricks.spark.csv").option("inferScheme","true").option("header","true").load("path_to/data.csv");
I am getting below error:-
error: value read is not a member of object org.apache.spark.sql.SQLContext,
and the "^" is pointing toward "SQLContext.read().format" in the error message.
I did try the suggestions available in stackoverflow, as well as other sites as well. but nothing seems to be working.
SQLContext means object access - static methods in class.
You should use sqlContext variable, as methods are not static, but are in class
So code should be:
val df = sqlContext.read.format("com.databricks.spark.csv").option("inferScheme","true").option("header","true").load("path_to/data.csv");

Why do I get a type error in model.predictOnValues when I try the official example of Streaming Kmeans Clustering of Apache Spark?

I'm trying the Streaming Clustering example code at the end of the official guide, but I get a type error. Here is my code:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.clustering.StreamingKMeans
object Kmeans {
def main(args: Array[String]) {
val conf = new SparkConf().setMaster("local[2]").setAppName("kmeans")
val ssc = new StreamingContext(conf, Seconds(3))
val trainingData = ssc.textFileStream("training").map(Vectors.parse)
val testData = ssc.textFileStream("test").map(LabeledPoint.parse)
val numDimensions = 3
val numClusters = 2
val model = new StreamingKMeans()
.setK(numClusters)
.setDecayFactor(1.0)
.setRandomCenters(numDimensions, 0.0)
model.trainOn(trainingData)
model.predictOnValues(testData).print()
ssc.start()
ssc.awaitTermination()
}
}
But when I run
sbt package
I get the following error:
[error] found : org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.regression.LabeledPoint]
[error] required: org.apache.spark.streaming.dstream.DStream[(?, org.apache.spark.mllib.linalg.Vector)]
[error] model.predictOnValues(testData).print()
[error] ^
[error] one error found
[error] (compile:compile) Compilation failed
You need to map testData: DStream[LabeledPoint] to a DStream[(K, Vector)]:
model.predictOnValues(testData.map(lp => (lp.label, lp.features))).print()
You can find the complete example here: StreamingKMeansExample.scala