How to run magic commands in Jupyter deployed on Watson Studio? - db2

I'm trying to analyze my datasets on DB2 on Cloud in the Jupyter notebook created in Watson Studio. When using the "%sql" magic for connecting DB2 doesn't work naturally, showing no such module. According to an IBM tutorial, it is required to run the "%run db2re.ipynb" command in a Jupyter cell before connecting DB2. But when I run this cell nothing happens and the "%sql" magic still not available. Any advise is appreciated.

In general, there are two ways of accessing libraries in Watson Studio:
- Install or import a library, then reference it. Note that you need to specify the --user option.
- First save your own scripts, then import them.
There are also the built-in line and cell magics.
With that, I think I got it to work the following way:
1st cell, download db2re.ipynb to your environment:
%%sh
wget https://raw.githubusercontent.com/DB2-Samples/Db2re/master/db2re.ipynb
2nd cell, install necessary library:
!pip install --user qgrid
3rd cell, run the db2re.ipynb notebook extension:
%run db2re.ipynb
Thereafter, I was able to run a %sqlcommand.

Related

Install or import in Python?

I am a beginner in Python, trying to still learn the basics. I am mostly interested in using it for Data Analysis and Visualizations, with packages such as matplotlib.
Most of the examples I see, use the code
"import matplotlib"
or something similar.
But there are also cases when people suggest using pip install the use the package.
So, as a rule of thumb, when should one use import and when should one install through the terminal?
Let's say you want to use some library. Let its name be ABC. ABC has some function, let's say function1.
If you write
import ABC
ABC.function1()
you will get error. Because in your virtual environment python can't find library called ABC. You must install it first using pip install ABC in your terminal. After that same code will work.
You must install library first in order to use it.
There is no thumb rule for using a method to install. You can use any method for installing. Aim is to install so that the library is available when you run the code, else you will get an error.
In Windows, if you want to install a package/library use the following Command on DOS Prompt
python3 -m pip install matplotlib.
To Upgrade the same, use the following Command on DOS Prompt
python3 -m pip install --upgrade matplotlib.
You can install and upgrade the package/libraries through Jupyter too.
Once installed, you need to place the import <library_name> on top of the code in which you want to use that library.

Spark Cell magic not found

I have python2 env. on windows 10 with jupyter notebook.
after following instructions in this tutorial I managed to install spark on windows 10:
https://medium.com/#GalarnykMichael/install-spark-on-windows-pyspark-4498a5d8d66c
but when I try to run cell magic for SQL I get the following error :
ERROR:root:Cell magic %%sql not found.
when I used %lsmagic I could not find sql cell magic among them.
also I noticed there was no option for pyspark kernel when starting new notebook in Jupyter.
Are you trying to use SQL or Spark-SQL? I've used iPython-SQL which was great, and there's also SparkMagic which sounds like what you're looking for. Try installing SparkMagic, which does use %%sql magic.

Setting Specific Python in Zeppelin Interpreter

What do I need to do beyond setting "zeppelin.pyspark.python" to make a Zeppelin interpreter us a specific Python executable?
Background:
I'm using Apache Zeppelin connected to a Spark+Mesos cluster. The cluster's worked fine for several years. Zeppelin is new and works fine in general.
But I'm unable to import numpy within functions applied to an RDD in pyspark. When I use Python subprocess to locate the Python executable, it shows that the code is being run in the system's Python, not in the virutalenv it needs to be in.
So I've seen a few questions on this issue that say the fix is to set "zeppelin.pyspark.python" to point to the correct python. I've done that and restarted the interpreter a few times. But it is still using the system Python.
Is there something additional I need to do? This is using Zeppelin 0.7.
On an older, custom snapshot build of Zeppelin I've been using on an EMR cluster, I set the following two properties to use a specific virtualenv:
"zeppelin.pyspark.python": "/path/to/bin/python",
"spark.executorEnv.PYSPARK_PYTHON": "/path/to/bin/python"
When you are in your activated venv in python:
(my_venv)$ python
>>> import sys
>>> sys.executable
# http://localhost:8080/#/interpreters
# search for 'python'
# set `zeppelin.python` to output of `sys.executable`

Postgres/VirtualEnv/Flask

This one may be for the experts!
So here is the problem:
I have a flask app that gets its data from Postgresql and it runs fine in my normal environment. I tried to deploy it locally using virtualenv and after pip installing all the requirements the only one that gave me trouble is psycopg2 which appears to be dependency of Postgresql.
I then used this amazing article to help me install it, by putting export PATH=/Library/PostgreSQL/9.3/bin:$PATH to the .bash_profile file.
But now I get this error:
Library not loaded: libssl.1.0.0.dylib ...Image not found
What is going on?

How can I enable SQL Magics in Jupyter Notebooks on IBM Data Science Experience?

I am using a Jupyter Notebook on IBM Data Science Experience. Is it possible to enable SQL Magics/IPython-sql? How can I install it?
I want to connect to dashDB/DB2 and run SQL statements.
Yes, it is possible to use the IPython-sql (SQL Magics) module in the Jupyter Notebooks. The trick is to install it into the user space. Run the following in a code cell:
!pip install --user ipython-sql
If you want to connect to DB2 or dashDB, then you would need to install the related database drivers. Because the SQL Magics depend on SQLAlchemy, use these commands (same cell as the command above works):
!pip install --user ibm_db
!pip install --user ibm_db_sa
Once everything is installed, you need to load the SQL Magics extension:
%load_ext sql
I took the instructions on installing SQL Magics in the Data Science Experience from this blog post. It also has an example on how to connect to the database.
There is also another way to run SQLs against dashDB from IBM Data Science Experience. It has already pre-deployed the ibmdbpy and ibmdbR libraries for Python and R notebooks, respectively. So you don't have to set up anything before using it.
Here is a sample for Python:
https://apsportal.ibm.com/analytics/notebooks/5a59ba9b-02b2-40e4-b955-9727cb68c88b/view?access_token=09240b783432f1a62004bcc82b48a7aed07afc401e2f94a77c7e087b74d8c053
And here is one for R:
https://apsportal.ibm.com/analytics/notebooks/4ff39dad-f497-40c6-941c-43162c347819/view?access_token=9b2ae23b8ec4d8223a7f88950db66a72c736b269ef6cf1d658bb1fcd49c78f35