How to import modules in IPython Clusters - ipython

I am trying to import some of my personal modules into my IPython Clusters. I am using Anacondas on Windows Vista 64 bit
from IPython.parallel import Client
rc = Client()
dview = rc[:]
with dview.sync_imports():
import lib.rf
It is giving me this error:
No module named 'lib.rf'
I can import the module in the rest of my IPython notebook, as I have this .bat file to start ipython notebook:
cd C:\Users\Jon\workspace\bf
set PYTHONPATH=%PYTHONPATH%;C:\Users\Jon\workspace\bf
C:\Anaconda\envs\p33\scripts\ipython notebook
I am using this similar code to start my ip clusters:
cd C:\Users\Jon\workspace\bf
set PYTHONPATH=%PYTHONPATH%;C:\Users\Jon\workspace\bf
C:\Anaconda\envs\p33\Scripts\ipcluster start --n=7
Why is this not working?
More info:
If I print out sys.path, I get a list that contains C:\Users\Jon\workspace\bf
If I print out the paths of my clusters, I get the same list:
%px sys.path
['',
'',
'',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\distribute-0.6.28-py3.3.egg',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\pykalman-0.9.5-py3.3.egg',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\patsy-0.2.1-py3.3.egg',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\joblib-0.8.3_r1-py3.3.egg',
'C:\\Users\\Jon\\workspace\\bf',
'C:\\Users\\Jon\\workspace\\bf\\my_numba',
'C:\\Anaconda\\envs\\p33\\python33.zip',
'C:\\Anaconda\\envs\\p33\\DLLs',
'C:\\Anaconda\\envs\\p33\\lib',
'C:\\Anaconda\\envs\\p33',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\Sphinx-1.2.3-py3.3.egg',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\win32',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\win32\\lib',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\Pythonwin',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\runipy-0.1.1-py3.3.egg',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\setuptools-7.0-py3.3.egg',
'C:\\Anaconda\\envs\\p33\\lib\\site-packages\\IPython\\extensions']
In [45]:
Further analysis:
%px lib.__path__
Out[0:11]: _NamespacePath(['C:\\Anaconda\\envs\\p33\\lib\\site-packages\\win32\\lib'])
lib.__path__
Out[57]: ['.\\lib']
Looks like the ipcluster and notebook are looking at lib in different places. I have tried renaming lib to mylib. It has not helped.

It seems that with dview.sync_imports() is being run someplace other than your IPython Notebook environment and is therefore relying a different PYTHONPATH. It is definitely not being run on one of the cluster engines and so wouldn't expect it to leverage your cluster settings of PYTHONPATH.
I'm thinking you'll need to have that directory in your PYTHONPATH (not your PATH) for the calling python environment because that is the location from which you are importing the modules.
The impact of the bit you have about setting the PYTHONPATH in the DOS shell from which you invoke ipclusters isn't clear to me. I can see that one might expect this to let the engines know about your directory, but I'm wondering if that PYTHONPATH gets initilized to the environment from which you call IPython.parallel.Client.

Related

Jupyter ImportError: No module named py4j.protocol despite py4j is installed

I read some posts regarding to the error I am seeing now when import pyspark, some suggest to install py4j, and I already did, and yet I am still seeing the error.
I am using a conda environment, here is the steps:
1. create a yml file and include the needed packages (including the py4j)
2. create a env based on the yml
3. create a kernel pointing to the env
4. start the kernel in Jupyter
5. running `import pyspark` throws error: ImportError: No module named py4j.protocol
The issue is resolved with adding environment section in kernel.json and explicitely specify the variables of the following:
"env": {
"HADOOP_CONF_DIR": "/etc/spark2/conf/yarn-conf",
"PYSPARK_PYTHON":"/opt/cloudera/parcels/Anaconda/bin/python",
"SPARK_HOME": "/opt/cloudera/parcels/SPARK2",
"PYTHONPATH": "/opt/cloudera/parcels/SPARK2/lib/spark2/python/lib/py4j-0.10.7-src.zip:/opt/cloudera/parcels/SPARK2/lib/spark2/python/",
"PYTHONSTARTUP": "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/shell.py",
"PYSPARK_SUBMIT_ARGS": " --master yarn --deploy-mode client pyspark-shell"
}

uWSGI + virtualenv 'No module named site'

So this seems to be a really common problem with this setup, but I can't find any solutions that work on SO. I've setup a very new Ubuntu 15.04 server, then installed nginx, virtualenv (and -wrapper), and uWSGI (via apt-get, so globally, not inside the virtualenv).
My virtualenv is located at /root/Env/example. Inside of the virtualenv, I installed Django, then at /srv/www/example/app ran Django's startproject command with the project name example, so I have vaguely this structure:
-root
-Env
-example
-bin
-lib
-srv
-www
-example
-app
-example
manage.py
-example
wsgi.py
...
My example.ini file for uWSGI looks like this:
[uwsgi]
project = example
plugin = python
chdir = /srv/www/example/app/example
home = /root/Env/example
module = example.wsgi:application
master = true
processes = 5
socket = /run/uwsgi/app/example/example.socket
chmod-socket = 664
uid = www-data
gid = www-data
vacuum = true
But no matter whether I run this via uwsgi --ini /etc/uwsgi/apps-enabled/example.ini or via daemon, I get the exact same error:
Python version: 2.7.9 (default, Apr 2 2015, 15:37:21) [GCC 4.9.2]
Set PythonHome to /root/Env/example
ImportError: No module named site
I should note that the Django project works via the built-in development server ./manage.py runserver, and that when I remove home = /root/Env/example the thing works (but is obviously using the global Python and Django rather than the virtualenv versions, which means it's useless for a proper virtualenv setup).
Can anyone see some obvious path error that I'm not seeing? As far as I can tell, home is entirely correct based on my directory structure, and everything else in the ini too, so why is it not working with this ImportError?
In my case, I was seeing this issue because the django app I was trying to run was written in python 3 whereas uwsgi was configured for python 2. I fixed the problem by:
recompiling uwsgi to support both python 2 and python 3 apps
(I followed this guide)
adding this to my mydjangoproject_uwsgi.ini:
plugins = python35 # or whatever you specified while compiling uwsgi
For other folks using Django, you should also make sure you are correctly specifying the following:
# Django dir that contains manage.py
chdir = /var/www/project/myprojectname
# Django wsgi (myprojectname is the name of your top-level project)
module = myprojectname.wsgi:application
# the virtualenv you are using (full path)
home = /home/ubuntu/Env/mydjangovenv
plugins = python35
As #Freek said, site refers to a python module.
The error claims that python cannot find that package, which is because you have specified python_home to the wrong location.
I've encountered with the same problem and my uwsgi.ini is like below:
[uwsgi]
# variable
base = /home/xx/
# project settings
chdir = %(base)/
module = botservice.uwsgi:application
home = %(base)/env/bin
For this configuration uwsgi can find python executable in /env/bin but no packages could be found under this folder. So I changed home to
home = %(base)/env/
and it worked for me.
In your case, I suggest digging into home directive and point it to a location which contains both python executable and packages.
The site module is in the root of django.
First check is to activate the virtualenv manually (source /root/Env/example/bin/activate, start python and import site). If that fails, pip install django.
Assuming that django is correctly installed in the virtualenv, make sure that uWSGI activates the virtualenv. Relevant uWSGI configuration directives:
plugins = python
virtualenv = /root/Env/example
and in case you have error importing example.wsgi:
pythonpath = /srv/www/example/app/example

sharing a namespace across multiple ipython notebooks

I would like to work with several ipython notebooks at once sharing the same namespace. Is there currently (ipython-1.1.0) a way to do this?
I tried creating different notebooks on the same ipython kernel, but the notebooks don't share a namespace. Also, I've been able to use a terminal console alongside a notebook on the same namespace using the answers in Using IPython console along side IPython notebook, but I couldn't find the notebook equivalent of the --existing argument.
Thanks a lot
Unfortunately this no longer works, you get error message ipython.kernel replaced by ipython.parallel.
A less elegant way than above to alter this is to change IPython/frontend/html/notebook/kernelmanager.py around line 273 from
kernel_id = self.kernel_for_notebook(notebook_id)
to
kernel_id = None
for notebook_id in self._notebook_mapping:
kernel_id = self._notebook_mapping[notebook_id]
break
For Anaconda python, replace start_kernel in kernelmanager.py with
def start_kernel(self, kernel_id=None, path=None, **kwargs):
global saved_kernel_id
if saved_kernel_id:
return saved_kernel_id
if kernel_id is None:
kwargs['extra_arguments'] = self.kernel_argv
if path is not None:
kwargs['cwd'] = self.cwd_for_path(path)
kernel_id = super(MappingKernelManager, self).start_kernel(**kwargs)
self.log.info("Kernel started: %s" % kernel_id)
self.log.debug("Kernel args: %r" % kwargs)
self.add_restart_callback(kernel_id,
lambda : self._handle_kernel_died(kernel_id),
'dead',
)
else:
self._check_kernel_id(kernel_id)
self.log.info("Using existing kernel: %s" % kernel_id)
saved_kernel_id = kernel_id
return kernel_id
and add
saved_kernel_id = None
above
class MappingKernelManager(MultiKernelManager):
True IPython gurus, please supply the correct fix. A lot of people using notebooks want the ability to share the kernel, it's natural, because one notebook quickly grows too big to work with a single complex application, so it is easier to be able to break down the application into multiple notebooks.
Also, gurus, while you're listening, it would be nice to have a collapse-expand feature as in Mathematica so you can only view the part of the notebook you care about and you can zoom out the rest.
The IPython Notebook does not have the equivalent of --existing. Notebooks do not share kernels. It is not a limitation of the notebook itself, it is just a design decision made in the notebook server code. The server code can be modified, for instance, to have all notebooks share the same kernel. You can do this with a little monkeypatching in your IPython configuration. Start by creating a profile:
$ ipython profile create singlekernel
[ProfileCreate] Generating default config file: u'~/.ipython/profile_singlekernel/ipython_config.py'
[ProfileCreate] Generating default config file: u'~/.ipython/profile_singlekernel/ipython_qtconsole_config.py'
[ProfileCreate] Generating default config file: u'~/.ipython/profile_singlekernel/ipython_notebook_config.py'
[ProfileCreate] Generating default config file: u'~/.ipython/profile_singlekernel/ipython_nbconvert_config.py'
and edit $(ipython locate profile singlekernel)/ipython_notebook_config.py to contain:
# Configuration file for ipython-notebook.
c = get_config()
import os
import uuid
from IPython.kernel.multikernelmanager import MultiKernelManager
def start_kernel(self, **kwargs):
"""Minimal override of MKM.start_kernel that always returns the same kernel"""
kernel_id = kwargs.pop('kernel_id', str(uuid.uuid4()))
if self.km is None:
self.km = self.kernel_manager_factory(connection_file=os.path.join(
self.connection_dir, "kernel-%s.json" % kernel_id),
parent=self, autorestart=True, log=self.log
)
if not self.km.is_alive():
self.log.info("starting single kernel")
self.km.start_kernel(**kwargs)
else:
self.log.info("reusing existing kernel")
self._kernels[kernel_id] = self.km
return kernel_id
MultiKernelManager.km = None
MultiKernelManager.start_kernel = start_kernel
This just overrides the kernel starting mechanism to start only one kernel and return it at every subsequent request,
rather than starting a new one for each kernel ID.
Now whenever you start the notebook server with
ipython notebook --profile singlekernel
all of the notebooks in that session will share the same kernel.

Does virtualenv just modify the pythonpath?

I'm deploying Django on a production server together with virtualenv, and am having trouble activating virtualenv on the server with
source .../bin/activate
I did a little research, and found that the pythonpath is changed depending on if we are or aren't in a virtualenv.
sys.path (with virtualenv activated)
['',
'/.../virtualenv/test_path/bin',
'/.../virtualenv/test_path/local/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg',
'/.../virtualenv/test_path/local/lib/python2.7/site-packages/pip-1.2.1-py2.7.egg',
'/.../virtualenv/test_path/lib/python2.7',
'/.../virtualenv/test_path/lib/python2.7/plat-linux2',
'/.../virtualenv/test_path/lib/python2.7/lib-tk',
'/.../virtualenv/test_path/lib/python2.7/lib-old',
'/.../virtualenv/test_path/lib/python2.7/lib-dynload',
'/usr/lib/python2.7',
'/usr/lib/python2.7/plat-linux2',
'/usr/lib/python2.7/lib-tk',
'/.../virtualenv/test_path/local/lib/python2.7/site-packages',
'/.../virtualenv/test_path/local/lib/python2.7/site-packages/IPython/extensions']
sys.path (without activating virtualenv):
['',
'/usr/local/bin',
'/usr/lib/python2.7',
'/usr/lib/python2.7/plat-linux2',
'/usr/lib/python2.7/lib-tk',
'/usr/lib/python2.7/lib-old',
'/usr/lib/python2.7/lib-dynload',
'/usr/local/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages/PIL',
'/usr/lib/python2.7/dist-packages/gst-0.10',
'/usr/lib/python2.7/dist-packages/gtk-2.0',
'/usr/lib/pymodules/python2.7',
'/usr/local/lib/python2.7/dist-packages/IPython/extensions']
Is is sufficient to just change the pythonpath to point to the virtualenv
.../python2.7/site-packages
folder to get the same results as running
source .../bin/activate
?
No, it is not. virtualenv is not only about site-packages, it is about a whole isolated python environment.
Doing source /path/to/venv/bin/activate just changes your $PATH environment variable to include your virtualenv bin directory as first lookup.
If you call python directly, it is just a shortcut for:
$ /path/to/venv/bin/python myscript.py
And if you call pip in an activated virtualenv, it is the same as:
$ /path/to/venv/bin/pip install XYZ

How to start IPython Notebook with a specified namespace

I have an GUI-based (TraitsUI/PyQt/Envisage) application written in Python. I would like to spawn an IPython Notebook in which I expose a small API and a number of objects. Those objects include a SQLAlchemy session and a bunch of SQLAlchemy models.
I've looked a lot, but I can't find any examples of this. I can start a notebook:
from IPython.frontend.html.notebook import notebookapp
app = notebookapp.NotebookApp.instance()
app.initialize()
app.start()
and that works well enough (although I'd prefer if 'start' was nonblocking... I assume I can do it in another thread if needed), but I can't alter the namespace.
I've also found examples like this:
from IPython.zmq.ipkernel import IPKernelApp
namespace = dict(z=1010)
kapp = IPKernelApp.instance()
kapp.initialize()
# Update the ns we want with special variables auto-created by the kernel
namespace.update(kapp.shell.user_ns)
# Now set the kernel's ns to be ours
kapp.shell.user_ns = namespace
kapp.start()
But I'm not sure how to actually open the Notebook from here.
Does anybody have any suggestions?
Thanks!
>>> import IPython
>>> z=1010
>>> IPython.embed()
Python 3.5.2 (default, Oct 8 2019, 13:06:37)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.9.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: z
Out[1]: 1010