How to query redshift from glue jupiter notebook, like making use of psycopg2 or boto3 libraries? - amazon-redshift

Not able to make use of boto3 or psycopg2 libraries from glue jupiter note.
tried importing psycopg2 and boto3 libraries

Related

Impossible to import koalas in scala notebook

It seems basic but from what I see on databricks website, nothing works on my side
I have installed koalas package on my cluster
But when I try to import the package in my Scala notebook, I have issue.
command-3313152839336470:1: error: not found: value databricks
import databricks.koalas
If I do it in Python, everything works fine
Details cluster & notebook
Thanks for your help
Matt
Koalas is a Python package, which mimics the Pandas (another Python package) interfaces. Currently no Scala version is published, even though the project may contain some Scala code. The goal of Koalas is to provide a drop-in replacement for Pandas, to make use of the distributed nature of Apache Spark. Since Pandas is only available on Python I don't expect a direct of port on this in Scala.
https://github.com/databricks/koalas
For Scala your best bet is to use the DataSet and DataFrame APIs of Spark:
https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Dataset.html
https://databricks.com/blog/2016/01/04/introducing-apache-spark-datasets.html

ToreeInstall ERROR | Unknown interpreter PySpark. toree can not install PySpark

When I install PySpark for Jupyter notebook, I using this cmd:
jupyter toree install --kernel_name=tanveer --interpreters=PySpark --python="/usr/lib/python3.6"
But, I get the tips of
[ToreeInstall] ERROR | Unknown interpreter PySpark. Skipping installation of PySpark interpreter
So I don't know what a problem. I have set up Toree's Scala and SQL successfully. thinks
Toree version 0.3.0 removed support for PySpark and SparkR:
Removed support for PySpark and Spark R in Toree (use specific kernels)
Release notes here: incubator-toree release notes
I am not sure what "use specific kernels" means and continue to look for a Jupyter PySpark kernel.
As also mentioned in Lee's answer, Toree version 0.3.0 removed support for PySpark and SparkR. As per their release notes, they asked to "use specific kernels". For PySpark, this means manually install pyspark to be used with Jupyter.
Steps are simple as follow:
Install pyspark. Either by pip install pyspark, or by download Apache Spark binary package and decompress into a specific folder.
Add the following 3 environment variables. How to do this depends on your OS. For example, on my MacOS, I added the following lines to the file ~/.bash_profile
export SPARK_HOME=<path_to_your_installed_spark_files>
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
That's it. To start your PySpark Jupyter Notebook, simply run "pyspark" from your command line, and choose "Python" kernel
Refer to https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781788835367/1/ch01lvl1sec17/installing-jupyter
or
https://opensource.com/article/18/11/pyspark-jupyter-notebook for more detailed instructions.

Cannot import sparknlp on Databricks

I'm trying to do an
import sparknlp
on the Databricks platform and I'm getting a similar message to the one reported at After installing sparknlp, cannot import sparknlp
I can't figure out how to get the python wrapper installed... I can access the spark-nlp library via Scala but I can't get the python version working. Any tips would be greatly appreciated!
These error can be produced due to sparknlp jars have been loaded correctly, but python wrappers library can not be imported. Make sure you have installed those wrappers correctly. Check out the sparknlp documentation site
As is said in documentation webpage, make sure that after install python sparknlp library:
pip install --index-url https://test.pypi.org/simple/ spark-nlp==1.5.4
that your environment variable PYTHONPATH can locate sparknlp wrappers.

Setting Specific Python in Zeppelin Interpreter

What do I need to do beyond setting "zeppelin.pyspark.python" to make a Zeppelin interpreter us a specific Python executable?
Background:
I'm using Apache Zeppelin connected to a Spark+Mesos cluster. The cluster's worked fine for several years. Zeppelin is new and works fine in general.
But I'm unable to import numpy within functions applied to an RDD in pyspark. When I use Python subprocess to locate the Python executable, it shows that the code is being run in the system's Python, not in the virutalenv it needs to be in.
So I've seen a few questions on this issue that say the fix is to set "zeppelin.pyspark.python" to point to the correct python. I've done that and restarted the interpreter a few times. But it is still using the system Python.
Is there something additional I need to do? This is using Zeppelin 0.7.
On an older, custom snapshot build of Zeppelin I've been using on an EMR cluster, I set the following two properties to use a specific virtualenv:
"zeppelin.pyspark.python": "/path/to/bin/python",
"spark.executorEnv.PYSPARK_PYTHON": "/path/to/bin/python"
When you are in your activated venv in python:
(my_venv)$ python
>>> import sys
>>> sys.executable
# http://localhost:8080/#/interpreters
# search for 'python'
# set `zeppelin.python` to output of `sys.executable`

How can I enable SQL Magics in Jupyter Notebooks on IBM Data Science Experience?

I am using a Jupyter Notebook on IBM Data Science Experience. Is it possible to enable SQL Magics/IPython-sql? How can I install it?
I want to connect to dashDB/DB2 and run SQL statements.
Yes, it is possible to use the IPython-sql (SQL Magics) module in the Jupyter Notebooks. The trick is to install it into the user space. Run the following in a code cell:
!pip install --user ipython-sql
If you want to connect to DB2 or dashDB, then you would need to install the related database drivers. Because the SQL Magics depend on SQLAlchemy, use these commands (same cell as the command above works):
!pip install --user ibm_db
!pip install --user ibm_db_sa
Once everything is installed, you need to load the SQL Magics extension:
%load_ext sql
I took the instructions on installing SQL Magics in the Data Science Experience from this blog post. It also has an example on how to connect to the database.
There is also another way to run SQLs against dashDB from IBM Data Science Experience. It has already pre-deployed the ibmdbpy and ibmdbR libraries for Python and R notebooks, respectively. So you don't have to set up anything before using it.
Here is a sample for Python:
https://apsportal.ibm.com/analytics/notebooks/5a59ba9b-02b2-40e4-b955-9727cb68c88b/view?access_token=09240b783432f1a62004bcc82b48a7aed07afc401e2f94a77c7e087b74d8c053
And here is one for R:
https://apsportal.ibm.com/analytics/notebooks/4ff39dad-f497-40c6-941c-43162c347819/view?access_token=9b2ae23b8ec4d8223a7f88950db66a72c736b269ef6cf1d658bb1fcd49c78f35