<console>:25: error: object databricks is not a member of package com - scala

I actually work on zeppelin with spark and scala. I want to import the library which contain : import com.databricks.spark.xml.
I tried but I have still the same mistake in zeppelin mistake : <console>:25: error: object databricks is not a member of package com.
What I've done actually ? I create a note in Zeppelin with this code : %dep
z.load("com.databricks:spark-xml_2.11:jar:0.5.0"). Even with that, the interpreter don't work. It's like it don't succeed to load the library.
Have you an idea why it don't work ?
Thanks for your help and have a nice day !

Your problem is very common and not intuitive to solve. I resolved an issue similar to this (I wanted to load the postgres jdbc connector in AWS EMR and I was using a linux terminal). Your issue can be resolved by checking if you can:
load the jar file manually to the environment that is hosting Zeppelin.
add the path of the jar file to your CLASSPATH environment variable. I don't know where you're hosting your files that manage your CLASSPATH env, but in EMR, my file, viewed from the Zeppelin root directory, was here: /usr/lib/zeppelin/conf/zeppelin-env.sh
download the zeppelin interpreter with
$ sudo ./bin/install-interpreter.sh --name "" --artifact
add the interpreter in Zeppelin wby going to the Zeppelin Interpreter GUI and add in the interpreter group.
Reboot Zeppelin with:
$ sudo stop zeppelin
$ sudo start zeppelin
It's very likely that your configurations may vary slightly, but I hope this helps provide some structure and relevance.

Related

pyspark pip installation hive-site.xml

I have installed pyspark using (pipenv install pyspark) and type pyspark after activating 'pipenv shell'
I can able to open pyspark terminal and able to run few spark code.
but I am trying to figure out to enable Hive (for that where I need to place hive-site.xml (with mysql metastore properties) and not able to see any spark/config folder in order to place hive-site.xml).
Unfortunately the existing application much relied on Pipefile (so i have to follow pipenv install pyspark)

Snowpark connection errors with 0.6.0 jar

I am trying to use snowpark(0.6.0) via Jupiter notebooks(after installing Scala almond kernel). I am using Windows laptop. Had to change the examples here a bit to work around windows. Following documentation here
https://docs.snowflake.com/en/developer-guide/snowpark/quickstart-jupyter.html
Ran into this error
java.lang.NoClassDefFoundError: Could not initialize class com.snowflake.snowpark.Session$
ammonite.$sess.cmd5$Helper.<init>(cmd5.sc:6)
ammonite.$sess.cmd5$.<init>(cmd5.sc:7)
ammonite.$sess.cmd5$.<clinit>(cmd5.sc:-1)
Also tried earlier with IntelliJ IDE,got bunch of errors with missing dependencies for log4j etc.
Can I get help.
Have not set it up in Widows but only with Linux.
You have to do the setup steps for each notebook that is going to use Snowpark (part from installing the kernel).
It's important to make sure you are using a unique folder for each notebook, as in step 2 in the guide.
What was the output of the import $ivy.com.snowflake:snowpark:0.6.0?

Jupyter for Scala with spylon-kernel without having to install Spark

Based on web search and as highly recommended, I am trying to run Jupyter on my local for Scala (using spylon-kernel).
I was able to create a notebook but while trying to run/play a Scala code snippet, I see this message initializing scala interpreter and in the console, I see this error:
ValueError: Couldn't find Spark, make sure SPARK_HOME env is set or Spark is in an expected location (e.g. from homebrew installation).
I am not planning to install Spark. Is there a way I can still use Jupyter for Scala without installing Spark?
I am new to Jupyter and the ecosystem. Pardon me for the amateur question.
Thanks

How to add customized jar in Jupyter Notebook in Scala

I need to use a third party jar (mysql) in my Scala script, if I use spark shell, I can specify the jar in the starting command like below:
spark2-shell --driver-class-path mysql-connector-java-5.1.15.jar --jars /opt/cloudera/parcels/SPARK2/lib/spark2/jars/mysql-connector-java-5.1.15.jar
However, how can I do this in Jupyter notebook? I remember there is a magic way to do it in pyspark, I am using Scala, and I can't change the environment setting of the kernel I am using.
I have the solution now, and it is very simple indeed as below:
Use a toree based Scala kernel (which is what I am using)
Use AddJar: in the notebook and run it, the jar will be downloaded and voila!
That's it.
AddJar http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.15/mysql-connector-java-5.1.15.jar

Setting Specific Python in Zeppelin Interpreter

What do I need to do beyond setting "zeppelin.pyspark.python" to make a Zeppelin interpreter us a specific Python executable?
Background:
I'm using Apache Zeppelin connected to a Spark+Mesos cluster. The cluster's worked fine for several years. Zeppelin is new and works fine in general.
But I'm unable to import numpy within functions applied to an RDD in pyspark. When I use Python subprocess to locate the Python executable, it shows that the code is being run in the system's Python, not in the virutalenv it needs to be in.
So I've seen a few questions on this issue that say the fix is to set "zeppelin.pyspark.python" to point to the correct python. I've done that and restarted the interpreter a few times. But it is still using the system Python.
Is there something additional I need to do? This is using Zeppelin 0.7.
On an older, custom snapshot build of Zeppelin I've been using on an EMR cluster, I set the following two properties to use a specific virtualenv:
"zeppelin.pyspark.python": "/path/to/bin/python",
"spark.executorEnv.PYSPARK_PYTHON": "/path/to/bin/python"
When you are in your activated venv in python:
(my_venv)$ python
>>> import sys
>>> sys.executable
# http://localhost:8080/#/interpreters
# search for 'python'
# set `zeppelin.python` to output of `sys.executable`