Can't launch PySpark in browser (windows 10) - pyspark

I'm trying to launch PySpark notebook in my browser by typing in pyspark from the console, but I get the following error:
c:\Spark\bin>pyspark
python: can't open file 'notebook': [Errno 2] No such file or directory
What am I doing wrong here?
Please help?

Sounds like the jupyter notebook is either not installed or not in your path.
I prefer to use Anaconda for my python distribution and Jupyter comes standard and will install all necessary path information as well.
After that as long as you have set PYSPARK_PYTHON_DRIVER=jupyter and PYSPARK_PYTHON_DRIVER_OPTS='notebook' correctly you are good to go.

You want to launch the jupyter notebook when you invoke the command pyspark. Therefore you need to add the following to the bash_profile or zshrc_profile.
export PYSPARK_SUBMIT_ARGS="pyspark-shell"
export PYSPARK_DRIVER_PYTHON=ipython
export PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark

Related

Can I execute a file (or some lines of python) within a running jupyterlab kernel for a notebook?

I've got a notebook that has got a bit unwieldy and I'm doing some refactoring which isn't fun.
I was wondering if it would be possible to execute code in this notebook from the command line for debugging.
Ideally, I would run something like:
run-in-jupyter $notebook file.py
and see the output from the command line. There is an interpreter in jupyterlab that can do this, so this make me think that it is possible.
I have a brief search but couldn't find much
How to run an .ipynb Jupyter Notebook from terminal? I explicitly don't want to do this (I want to run commands in an existing instace)
There is this library but this seems quite involved and some of the results I found on the internet where people not being able to use the library
jupyter console (pip install jupyter-console) connects to a running jupyter kernel from the kernel. Details on running kernels can be found amongst jupyter's run time files, on my box these live in ~/.local/share/jupyter/runtime. You can find the path to the kernel data file corresponding to an open workbook with %config IPKernelApp.connection_file which will look something like ~/.local/share/jupyter/runtime/kernel-55da8a07-b67d-4584-9ec6-f24e4a26cbbd.json.
You can then connect from the command line with
jupyter console --existing ~/.local/share/jupyter/runtime/kernel-55da8a07-b67d-4584-9ec6-f24e4a26cbbd.json
You can pipe commands into it as shown
echo h=87 | jupyter console --existing 55da8a07-b67d-4584-9ec6-f24e4a26cbbd 'h=57' --simple-prompt -y

Where to run the put command

I am a windows users with a scala kernel set up on Jupyter Notebook. I have a ML model saved as .pmml file saved in a jar file and that jar file I need to put in the snowflake stage. However in the snowflake documentation the following command is used to do that
The command for windows is :
put file://c:\data\data.csv #~/staged;
My question is where should I execute the command with my user directory details,should it be in the scala kernel in Jupyter notebook, in cmd or in the snowflake itself?
My analogous command:
put file:C:\Users\psengar\myJar.jar #~/staged
You can install and run the put command on SnowSQL

jupyter setup i18n on exiting notebook

I have been trying to translate jupyter notebook interface with my native language, using existing i18n implementation. I have already created translation files just like readme advised and now i want to add it to jupyter.
https://github.com/jupyter/notebook/tree/master/notebook/i18n
but i can't find /notebook/i18n/ folder on my computer ( Ubuntu 16.04 ).Do i have to install jupyter one more time or can i just add translate files to already existing jupyter installation on my machine and run it?
I just reinstalled jupyter and this time i18n folder is on its place in:
/usr/local/lib/python3.5/dist-packages/notebook/i18n/i18n
First, find lib_path by python:
import sys
from distutils.sysconfig import get_python_lib
print (get_python_lib())
And you will find it in
${lib_path}/notebook/i18n/

how to access pyspark from jupyter notebook

I have been using pyspark [ with python 2.7] in an ipython notebook on Ubuntu 14.04 quite successfully by creating a special profile for spark and starting the notebook by calling $ipython notebook --profile spark. The mechanism for creating the spark profile is given on many websites but i have used the one given in here.
and the $HOME/.ipython/profile_spark/startup/00-pyspark-setup.py contains the following code
import os
import sys
# Configure the environment
if 'SPARK_HOME' not in os.environ:
os.environ['SPARK_HOME'] = '/home/osboxes/spark16'
# Create a variable for our root path
SPARK_HOME = os.environ['SPARK_HOME']
# Add the PySpark/py4j to the Python Path
sys.path.insert(0, os.path.join(SPARK_HOME, "python", "build"))
sys.path.insert(0, os.path.join(SPARK_HOME, "python"))
I have just created a new VM of Ubuntu 16.04 for my students where I want them to run pyspark programs in ipython notebook. Python, Pyspark is working quite well. We are using Spark 1.6.
However I have discovered that the current versions of ipython notebook [ or jupyter notebook ] whether downloaded through Anaconda or installed with sudo pip install ipython .. DO NOT SUPPORT the --profile option and all configuration parameters have to be specified in the ~/.jupyter/jupyter_notebook_config.py file.
Can someone please help me with the config parameters that I need to put into this file? Or is there an alternative solution? I have tried the findshark() explained here but could not make it work. Findspark got installed but findspark.init() failed, possibly because it was written for python 3.
My challenge is that everything is working just fine on my old installation of ipython on my machine but my students who are installing everything from scratch cannot get pyspark going on their VMs.
i work with spark just for test purpose locally from ~/apps/spark-1.6.2-bin-hadoop2.6/bin/pyspark
PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" ~/apps/spark-1.6.2-bin-hadoop2.6/bin/pyspark
I have found a ridiculously simple answer to my own question by looking at the advice given in this page.
forget about all configuration files etc. Simply start notebook with this command -- $IPYTHON_OPTS="notebook" pyspark
thats all.
Obviously the paths to SPARK have to set as given here.
and if you get an error with Py4j then look at this page.
With this you are good to go. The spark context is available at sc so don't import it again
With Python 2.7.13 from Anaconda 4.3.0 and Spark 2.1.0 on Ubuntu 16.04:
$ cd
$ gedit .bashrc
Add following lines (where "*****" is the proper path):
export SPARK_HOME=*****/spark-2.1.0-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH
export PATH=$SPARK_HOME/sbin:$PATH
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH
Save, then do:
$ *****/anaconda2/bin/.pip install py4j
$ cd
$ source .bashrc
Check if it works with:
$ ipython
In [1]: import pyspark
For more details go here

Can not create a "notebook" using ipython Notebook

I have recently installed ipython using Enthought's EPD python install - and when starting the iPython HTML notebook from the command prompt by typing:
ipython notebook --pylab=inline
I manage to get the localhost browser notebook screen pop up correctly.
However when I try to create a new notebook by clicking "New Notebook" I get the following error message:
"Creating Notebook Failed The error was: Unexpected error while autosaving notebook: C:\Windows\System32\Untitled0.ipynb [Errno 17] No usable temporary file name found"
I am assuming this i sbecause I may not have write privilege for that particular drive. So I have tried to go into the "ipython_notebook_config.py" file and change the following:
# The directory to use for notebooks and kernels.
c.NotebookApp.notebook_dir = u'C:\Users\Stuart\Documents\iPython'
and
c.FileNotebookManager.notebook_dir = u'C:\Users\Stuart\Documents\iPython'
I have then closed down all the cmd windows and started the ipython notebook agaion. But when I click on "New Notebook" I get the same error message as before:
"Creating Notebook Failed The error was: Unexpected error while autosaving notebook: C:\Windows\System32\Untitled0.ipynb [Errno 17] No usable temporary file name found"
Could someone please help me as to how I can get this working? Any help very much appreciated.
The answer kindly provided by #Jakob in the comments above did the trick:
"Can you try switching to C:\Users\Stuart\Documents\iPython in the terminal before starting the notebook?"
Just change the directory where are run your iPython notebook. For make it, you right-click on the shortcut and edit properties. In this properties, a field named "run directory" or something like that. Put your link in this field.
I just experienced the same problem. I even erased all the untitled.ipynb files in the directory. Then I realized that I had other copies of Anaconda terminal open. When I closed them and tried again, things went back to normal.
If you run the IPython as administrator you won't run into error for starting a new notebook. To do that right click on the Ipython shortcut and click on run as administrator.
I also had the same problem, I was not able to create the new notebook or access existing notebook present in that directory.
Error Message - Unexpected error while saving file:/path/ database is locked
Turns out my old anaconda jupyter notebook terminals were open and running in the background. Every time I started jupyter notebook I used the new instance that led me to this problem. When I closed all terminals and restarted new Jupyter notebook terminal it started working again.
Many of the problems with Anaconda/Jupyter/Notebooks can be solved by examining and cleaning up what you have in your environmental variables such as Path or, if you trying to set up files to store Notebooks that you develop.
There is a very good discussion of environmental variables here:
http://johnatten.com/2014/12/07/adding-and-editing-path-environment-variables-in-windows/
It is obvious that if Anaconda/Jupyter/Notebook can't find the files they can't run them.
At a minimum your path in environmental variables should contain:
c:\users\*******\Anaconda3 where ******** is your user name
c:\users\*******\Anaconda3\Scripts
then you could create environmental variables that point to your personal Notebook code directories: (note: there can't be any spaces in the addresses) in Windows Environmental Variables (System Properties --> Environmental Variables --> add to User and System variables
variable value
NOTEBOOK address of your personal Notebook location
TESTING address of your Notebook Testing location
With this setup you can on the Anaconda Command
jupyter notebook %TESTING%
or
jupyter notebook %NOTEBOOK%
Another way you can go to your own Notebook directory is to change
jupyter_notebook_config.py
Go to:
## The directory to use for notebooks and kernels.
c.NotebookApp.notebook_dir = 'your Notebook directory address goes here'
remove the ## and enter your directory using \'s instead of \ in the address
Then anytime you enter 'jupyter notebook' you will start at your Notebook Directory.