Why cannot I import 'pandas_udf' in Jupiter notebook? - pyspark

I run the following code in Jupyter notebook, but get ImportError. Note that 'udf' can be imported in Jupyter.
from pyspark.sql.functions import pandas_udf
ImportError Traceback (most recent call
last) in ()
----> 1 from pyspark.sql.functions import pandas_udf
ImportError: cannot import name 'pandas_udf'
Anyone knows how to fix it? Thank you very much!

It looks like you start jupyter notebook by itself, rather than start pyspark with jupyter notebook, which is following command:
PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark
If your jupyter notebook server process are running from another machine, maybe you want to use this command to make it available to all IP addresses of your sever.
(NOTE: This could be a potential security issue if your server is on a public or untrusted network)
PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook --ip=0.0.0.0 " pyspark
I will revised my answer if the problem still persist after you start jupyter notebook like that.

Related

Unable to connect hivellap from pyspark

I am using pyspark kernel inside jupyterhub and want to connect hivellap from spark . I am able to create a spark session but when i am trying to execute
from pyspark_llap import HiveWarehouseSession it is showing error no module found pyspark_llap
The same command i am able to execute in python kernel and it successfully executed.
Kindly suggest what configuration is needed to import HiveWarehouseSession from pyspark_llap inside pyspark kernel.

cannot import LanguageTranslatorV3 in AWS EC2

I am now trying to use the IBM Natural Language Translator in AWS EC2. However, I find I cannot import the LanguageTranslatorV3 in the AWS EC2, which can be done on my laptop. The error is shown below. Is there anyone that can help me to solve this problem? Thank you!
from watson_developer_cloud import LanguageTranslatorV3
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-62-0d0e9b329a15> in <module>()
----> 1 from watson_developer_cloud import LanguageTranslatorV3
ImportError: cannot import name 'LanguageTranslatorV3'
Looks like you are missing a dependency on watson-developer-cloud in your applications requirements.txt file.

Using profiles with ipython/jupyter

Here is help output from ipython:
Examples
ipython notebook # start the notebook
ipython notebook --profile=sympy # use the sympy profile
ipython notebook --certfile=mycert.pem # use SSL/TLS certificate
Seems straightforward .. but then when invoking
$ipython notebook --profile=pyspark
The following warning occurs:
[W 20:54:38.623 NotebookApp] Unrecognized alias: '--profile=pyspark',
it will probably have no effect.
So then the online help is inconsistent with the warning message.
What is the correct way to activate the profile?
Update I tried reversing the order as follows:
$ipython --profile=pyspark notebook
But then a different warning occurs:
[TerminalIPythonApp] WARNING | File not found: u'notebook
The option is for the ipython binary, but you are trying to pass the option to the notebook application, as evident from the warning, which is from NotebookApp:
[W 20:54:38.623 NotebookApp] Unrecognized alias: '--profile=pyspark',
it will probably have no effect.
That's basically saying you are passing an option to notebook which it doesn't recognize, so it won't have any effect.
You need to pass the option to ipython:
ipython --profile=foo -- notebook
Online docs are inaccurate. The order needs to be reversed - with the -- option before notebook:
$ipython --profile=pyspark notebook
But .. the issues go beyond that even ..
It seems that jupyter (newer version of ipython) may not respect ipython profiles at all.
There are multiple references to same. Here is one from the Spark mailing list
Does anyone have a pointer to Jupyter configuration with pyspark? The
current material on python inotebook is out of date, and jupyter
ignores ipython profiles.

Connection to pymongo

I am trying to connect to mongo in MAC using pymongo. I am getting the following error-
>>> from pymongo import MongoClient
Traceback (most recent call last):
File "", line 1, in
from pymongo import MongoClient
ImportError: cannot import name 'MongoClient'
I have tried Connection also. But it gives the same error. Any help?
Steps for troubleshooting:
First of all, verify if your environment is activated and you are in the correct environment.
If it is active and you're in the correct environment, then verify if you have installed pymongo.
If it's not installed in your environment, install it using pip install pymongo in the environment you want to work.

How to use read_gbq or other bq in IPython to access datasets hosted in BigQuery

I am using the iPython notebook to read the Google BigQuery public dataset for natality
I have done the installation for the google-api
easy_install --upgrade google-api-python-client.
However it still does not detect the installed API
Anyone has a iPython notebook to share on accessing the public dataset and loading it into a dataframe in iPython.
import pandas as pd
projectid = "xxxx"
data_frame = pd.read_gbq('SELECT * FROM xxxx', project_id = projectid)
303 if not _GOOGLE_API_CLIENT_INSTALLED:
--> 304 raise ImportError('Could not import Google API Client.')
305
306 if not _GOOGLE_FLAGS_INSTALLED:
ImportError: Could not import Google API Client
I have shared the iPython Notebook used at
http://nbviewer.ipython.org/urls/dl.dropbox.com/s/d77u2xarscagw0b/BigQuery_Trial8.ipynb?dl=0
Additional info:
I am running on a server with a docker instance used for the iPython server.
I have run the curl https://sdk.cloud.google.com | bash installation on the linux server
I have tried to run some of the shared notebooks
nbviewer.ipython.org/gist/fhoffa/6459195
or nbviewer.ipython.org/gist/fhoffa/6472099
However I also get
ImportError: No module named bq
I suspect it is a simple case of missing dependencies.
Anyone who has clues, help welcome
As I just said it here: https://stackoverflow.com/a/31708375/2533394
I solved the problem with this:
pip install --force-reinstall uritemplate.py
Make sure your Pandas is version 0.17 or higher:
pip install -U pandas
You can check with:
import pandas as pd
pd.__version__