Using profiles with ipython/jupyter - ipython

Here is help output from ipython:
Examples
ipython notebook # start the notebook
ipython notebook --profile=sympy # use the sympy profile
ipython notebook --certfile=mycert.pem # use SSL/TLS certificate
Seems straightforward .. but then when invoking
$ipython notebook --profile=pyspark
The following warning occurs:
[W 20:54:38.623 NotebookApp] Unrecognized alias: '--profile=pyspark',
it will probably have no effect.
So then the online help is inconsistent with the warning message.
What is the correct way to activate the profile?
Update I tried reversing the order as follows:
$ipython --profile=pyspark notebook
But then a different warning occurs:
[TerminalIPythonApp] WARNING | File not found: u'notebook

The option is for the ipython binary, but you are trying to pass the option to the notebook application, as evident from the warning, which is from NotebookApp:
[W 20:54:38.623 NotebookApp] Unrecognized alias: '--profile=pyspark',
it will probably have no effect.
That's basically saying you are passing an option to notebook which it doesn't recognize, so it won't have any effect.
You need to pass the option to ipython:
ipython --profile=foo -- notebook

Online docs are inaccurate. The order needs to be reversed - with the -- option before notebook:
$ipython --profile=pyspark notebook
But .. the issues go beyond that even ..
It seems that jupyter (newer version of ipython) may not respect ipython profiles at all.
There are multiple references to same. Here is one from the Spark mailing list
Does anyone have a pointer to Jupyter configuration with pyspark? The
current material on python inotebook is out of date, and jupyter
ignores ipython profiles.

Related

kedro context and catalog missing from ipython session

I launched ipython session and trying to load a dataset.
I am running
df = catalog.load("test_dataset")
Facing the below error
NameError: name 'catalog' is not defined
I also tried %reload_kedro but got the below error
UsageError: Line magic function `%reload_kedro` not found.
Even not able to load context either.
I am running the kedro environment from a Docker container.
I am not sure where I am going wrong.
new in 0.17.5 there is a fallback option, please run the following commands in your Jupyter/IPython session:
%load_ext kedro.extras.extensions.ipython
%reload_kedro <path_to_project_root>
This should help you get up and running.

Why cannot I import 'pandas_udf' in Jupiter notebook?

I run the following code in Jupyter notebook, but get ImportError. Note that 'udf' can be imported in Jupyter.
from pyspark.sql.functions import pandas_udf
ImportError Traceback (most recent call
last) in ()
----> 1 from pyspark.sql.functions import pandas_udf
ImportError: cannot import name 'pandas_udf'
Anyone knows how to fix it? Thank you very much!
It looks like you start jupyter notebook by itself, rather than start pyspark with jupyter notebook, which is following command:
PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark
If your jupyter notebook server process are running from another machine, maybe you want to use this command to make it available to all IP addresses of your sever.
(NOTE: This could be a potential security issue if your server is on a public or untrusted network)
PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook --ip=0.0.0.0 " pyspark
I will revised my answer if the problem still persist after you start jupyter notebook like that.

What is a spark kernel for apache toree?

I have a spark cluster which master is on 192.168.0.60:7077
I used to use jupyter notebook to make some pyspark scripts.
I am now willing to move on to scala.
I don't know scala's world.
I am trying to use Apache Toree.
I installed it, downloaded the scala kernels, and runned it to the point to open a scala notebook . Till there everything seems ok :-/
But I can't find the spark context, and there are errors in the jupyter's server logs :
[I 16:20:35.953 NotebookApp] Kernel started: afb8cb27-c0a2-425c-b8b1-3874329eb6a6
Starting Spark Kernel with SPARK_HOME=/Users/romain/spark
Error: Master must start with yarn, spark, mesos, or local
Run with --help for usage help or --verbose for debug output
[I 16:20:38.956 NotebookApp] KernelRestarter: restarting kernel (1/5)
As I don't know scala, I am not sure of the issue here ?
It could be :
I need a spark kernel (according to https://github.com/ibm-et/spark-kernel/wiki/Getting-Started-with-the-Spark-Kernel )
I need to add an option on the server (the error message says 'Master must start with yarn, spark, mesos, or local' )
or something else :-/
I was just willing to migrate from python to scala, and I spend a few hours lost just on starting up the jupyter IDE :-/
It looks like you are using Spark in a standalone deploy mode. As Tzach suggested in his comment, following should work:
SPARK_OPTS='--master=spark://192.168.0.60:7077' jupyter notebook
SPARK_OPTS expects usual spark-submit parameter list.
If that does not help, you would need to check the SPARK_MASTER_PORT value in conf/spark-env.sh (7077 is the default).

synchronizing history when ipython notebook and console are connected to the same kernel

I have ipython notebook running on a remote server, i.e.
ipython notebook --profile=nbserver
which I access from my local machine. Further, I ssh to the remote server from my machine, and start ipython console (terminal) on that server. I have found following command to work well:
ipython console --existing \
~/.config/ipython/profile_nbserver/security/kernel-*.json
Now I am connected to the same remote kernel from two different clients (lets call them browser and terminal). Everything works well, except one annoying detail:
1) in browser, I type a=1
2) in terminal, I type b=2
3) in both clients I can see both commands using %history. But when I want to cycle through the history (in terminal) using Up, it only shows the commands which have been typed in the terminal, (i.e b=2). Similarly, I am unable to use a + PageDown in the terminal, to go back in history and find the command starting with a.
From what I understand, my two clients are using two separate history files history.sqlite. But why does %history show all commands ?
Question:
Is there any way to configure using one history.sqlite for both clients ?
I find, having easy access to history is absolutely crucial. Moreover, I see using both terminal and browser as complementary, they both have tradeoffs and are best used combined.
You can set where the history gets loaded either by setting it at the terminal:
ipython --HistoryManager.hist_file=$HOME/ipython_hist.sqlite
or within the ipython config files:
import os
c.HistoryManager.hist_file=os.path.expanduser("~/ipython_hist.sqlite")

iPython notebook won't open some files

I have a git folder with several ipython notebook files in it. I've just got a new comp and installed ipython. When I open some files, it works fine, others, however, display this error:
Error loading notebook, bad request.
The log looks like:
2014-07-16 00:20:11.523 [NotebookApp] WARNING | Unreadable Notebook: /nas-6000/wclab/Ahmed/Notebook/01 - Boundary Layer.ipynb [Errno 5] Input/output error
WARNING:tornado.access:400 GET /api/notebooks/01%20-%20Boundary%20Layer.ipynb?_=1405434011080 (127.0.0.1) 3.00ms referer=linktofile
The read/write and owner permissions are the same for each of the files. The files open fine on my other computers, it's just this new one. Any ideas?
Cheers,
James
Errno 5 is a low level error usually reported when your disk has bad sectors.
I don't think the error is related to the file or ipython, check your disk with an appropriate tool (fsck if you are using Linux).