I'm using Jupyter Notebook to work with my Scala codes using the Toree notebook plugin.
How to import a locally saved Scala file (say TweetData.scala) in a notebook in the same directory?
Doing this directly, throws an error
In[1]: import TweetData._
Out[1]: Name: Compile Error
Message: <console>:17: error: not found: value TweetReader
import TweetReader._
^
StackTrace:
Related
I am trying to read data from neo4j using spark job. The import itself is showing this error. I tried to import org.neo4j.spark._ in intellij IDEA, but it is showing an error "Cannot resolve symbol spark". While trying in the spark-shell, it is throwing an error,
":23: error: object neo4j is not a member of package org
import org.neo4j.spark._"
Spark version - 3.1.1
dependencies
I am trying to launch jupyter notebook with scala. For that I used almond but I am running into problem when trying to import:
import org.apache.spark._
the error is:
object apache is not a member of package org
Running spark-shell --packages "graphframes:graphframes:0.7.0-spark2.4-s_2.11" in the bash shell works and I can successfully import graphframes 0.7, but when I try to use it in a scala jupyter notebook like this:
import scala.sys.process._
"spark-shell --packages \"graphframes:graphframes:0.7.0-spark2.4-s_2.11\""!
import org.graphframes._
gives error message:
<console>:53: error: object graphframes is not a member of package org
import org.graphframes._
Which from what I can tell means that it runs the bash command, but then still cannot find the retrieved package.
I am doing this on an EMR Notebook running a spark scala kernel.
Do I have to set some sort of spark library path in the jupyter environment?
That simply shouldn't work. What your code does is a simple attempt to start a new independent Spark shell. Furthermore Spark packages have to loaded when the SparkContext is initialized for the first time.
You should either add (assuming these are correct versions)
spark.jars.packages graphframes:graphframes:0.7.0-spark2.4-s_2.11
to your Spark configuration files, or use equivalent in your SparkConf / SparkSessionBuilder.config before SparkSession is initialized.
I have installed Scala kernel based on this doc: https://github.com/jupyter-scala/jupyter-scala
Kernel is there:
$ jupyter kernelspec list
Available kernels:
python3 /usr/local/homebrew/Cellar/python3/3.6.4_2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel/resources
scala /Users/bobyfarell/Library/Jupyter/kernels/scala
When I try to use Spark in the notebook I get this:
val sparkHome = "/opt/spark-2.3.0-bin-hadoop2.7"
val scalaVersion = scala.util.Properties.versionNumberString
import org.apache.spark.ml.Pipeline
Compilation Failed
Main.scala:57: object apache is not a member of package org
; import org.apache.spark.ml.Pipeline
^
I tried:
Setting SPARK_HOME and CLASSPATH to the location of $SPARK_HOME/jars
Setting -cp option pointing to $SPARK_HOME/jars in kernel.json
Setting classpath.add call before imports
None of these helped. Please note I don't want to use Toree, I want to use standalone spark and Scala kernel with Jupyter. A similar issue is reported here too: https://github.com/jupyter-scala/jupyter-scala/issues/63
It doesn't look like you are following the jupyter-scala directions for using Spark. You have to load spark into the kernel using the special imports.
When I open shell in windows 10 and type spark-shell,it opens scala.After that I type
import org.apache.spark.SparkContext
and it's ok on shell.There is no problem.
But when I try to run this command on eclipse it gives me error :
error: object apache is not a member of package org
import org.apache.spark.SparkContext
Why it gives that error ? What should I do now ?
I make some research on internet,some say give the classpath etc..I tried everything but remain unsolvable.
Please help