I have python2 env. on windows 10 with jupyter notebook.
after following instructions in this tutorial I managed to install spark on windows 10:
https://medium.com/#GalarnykMichael/install-spark-on-windows-pyspark-4498a5d8d66c
but when I try to run cell magic for SQL I get the following error :
ERROR:root:Cell magic %%sql not found.
when I used %lsmagic I could not find sql cell magic among them.
also I noticed there was no option for pyspark kernel when starting new notebook in Jupyter.
Are you trying to use SQL or Spark-SQL? I've used iPython-SQL which was great, and there's also SparkMagic which sounds like what you're looking for. Try installing SparkMagic, which does use %%sql magic.
Related
I am trying to use snowpark(0.6.0) via Jupiter notebooks(after installing Scala almond kernel). I am using Windows laptop. Had to change the examples here a bit to work around windows. Following documentation here
https://docs.snowflake.com/en/developer-guide/snowpark/quickstart-jupyter.html
Ran into this error
java.lang.NoClassDefFoundError: Could not initialize class com.snowflake.snowpark.Session$
ammonite.$sess.cmd5$Helper.<init>(cmd5.sc:6)
ammonite.$sess.cmd5$.<init>(cmd5.sc:7)
ammonite.$sess.cmd5$.<clinit>(cmd5.sc:-1)
Also tried earlier with IntelliJ IDE,got bunch of errors with missing dependencies for log4j etc.
Can I get help.
Have not set it up in Widows but only with Linux.
You have to do the setup steps for each notebook that is going to use Snowpark (part from installing the kernel).
It's important to make sure you are using a unique folder for each notebook, as in step 2 in the guide.
What was the output of the import $ivy.com.snowflake:snowpark:0.6.0?
Based on web search and as highly recommended, I am trying to run Jupyter on my local for Scala (using spylon-kernel).
I was able to create a notebook but while trying to run/play a Scala code snippet, I see this message initializing scala interpreter and in the console, I see this error:
ValueError: Couldn't find Spark, make sure SPARK_HOME env is set or Spark is in an expected location (e.g. from homebrew installation).
I am not planning to install Spark. Is there a way I can still use Jupyter for Scala without installing Spark?
I am new to Jupyter and the ecosystem. Pardon me for the amateur question.
Thanks
I'm trying to analyze my datasets on DB2 on Cloud in the Jupyter notebook created in Watson Studio. When using the "%sql" magic for connecting DB2 doesn't work naturally, showing no such module. According to an IBM tutorial, it is required to run the "%run db2re.ipynb" command in a Jupyter cell before connecting DB2. But when I run this cell nothing happens and the "%sql" magic still not available. Any advise is appreciated.
In general, there are two ways of accessing libraries in Watson Studio:
- Install or import a library, then reference it. Note that you need to specify the --user option.
- First save your own scripts, then import them.
There are also the built-in line and cell magics.
With that, I think I got it to work the following way:
1st cell, download db2re.ipynb to your environment:
%%sh
wget https://raw.githubusercontent.com/DB2-Samples/Db2re/master/db2re.ipynb
2nd cell, install necessary library:
!pip install --user qgrid
3rd cell, run the db2re.ipynb notebook extension:
%run db2re.ipynb
Thereafter, I was able to run a %sqlcommand.
What do I need to do beyond setting "zeppelin.pyspark.python" to make a Zeppelin interpreter us a specific Python executable?
Background:
I'm using Apache Zeppelin connected to a Spark+Mesos cluster. The cluster's worked fine for several years. Zeppelin is new and works fine in general.
But I'm unable to import numpy within functions applied to an RDD in pyspark. When I use Python subprocess to locate the Python executable, it shows that the code is being run in the system's Python, not in the virutalenv it needs to be in.
So I've seen a few questions on this issue that say the fix is to set "zeppelin.pyspark.python" to point to the correct python. I've done that and restarted the interpreter a few times. But it is still using the system Python.
Is there something additional I need to do? This is using Zeppelin 0.7.
On an older, custom snapshot build of Zeppelin I've been using on an EMR cluster, I set the following two properties to use a specific virtualenv:
"zeppelin.pyspark.python": "/path/to/bin/python",
"spark.executorEnv.PYSPARK_PYTHON": "/path/to/bin/python"
When you are in your activated venv in python:
(my_venv)$ python
>>> import sys
>>> sys.executable
# http://localhost:8080/#/interpreters
# search for 'python'
# set `zeppelin.python` to output of `sys.executable`
I am using a Jupyter Notebook on IBM Data Science Experience. Is it possible to enable SQL Magics/IPython-sql? How can I install it?
I want to connect to dashDB/DB2 and run SQL statements.
Yes, it is possible to use the IPython-sql (SQL Magics) module in the Jupyter Notebooks. The trick is to install it into the user space. Run the following in a code cell:
!pip install --user ipython-sql
If you want to connect to DB2 or dashDB, then you would need to install the related database drivers. Because the SQL Magics depend on SQLAlchemy, use these commands (same cell as the command above works):
!pip install --user ibm_db
!pip install --user ibm_db_sa
Once everything is installed, you need to load the SQL Magics extension:
%load_ext sql
I took the instructions on installing SQL Magics in the Data Science Experience from this blog post. It also has an example on how to connect to the database.
There is also another way to run SQLs against dashDB from IBM Data Science Experience. It has already pre-deployed the ibmdbpy and ibmdbR libraries for Python and R notebooks, respectively. So you don't have to set up anything before using it.
Here is a sample for Python:
https://apsportal.ibm.com/analytics/notebooks/5a59ba9b-02b2-40e4-b955-9727cb68c88b/view?access_token=09240b783432f1a62004bcc82b48a7aed07afc401e2f94a77c7e087b74d8c053
And here is one for R:
https://apsportal.ibm.com/analytics/notebooks/4ff39dad-f497-40c6-941c-43162c347819/view?access_token=9b2ae23b8ec4d8223a7f88950db66a72c736b269ef6cf1d658bb1fcd49c78f35