jupyter: using a dataframe created in another notebook - pyspark

I'm using jupyter notebooks to manipulate some date;
I want to know in jupyter, if it is possible to use a dataframe df1 created in the notebook n1.ipynb in another notebook n2.ipynb without recreating it?

Related

Save Jupyter notebook before nbconvert

I want to export a jupyter notebook within the notebook itself and use this command:
!jupyter nbconvert --to html MyNotebook.ipynb --output MyNotebook.html
This works correct, except that I manually need to save the notebook first for the latest outputs to be in the final html copy. Is there a similar command to save the notebook first (like hitting the save button) before doing the nbconvert?
import IPython
%%javascript
IPython.notebook.save_notebook()

Start notebook from other notebook

Using jupyter-lab
%run otherNotebook.ipynb
gives the following error message
Error: file not found otherNotebook.ipynb.py
How can I use the magic method and prevent it from adding .py to the file
As described here %run is for running a named file inside IPython as a program. Jupyter notebooks are not Python programs.
Notebooks can be converted to Python programs/scripts using Jupytext. Following that conversion you could then use %run.
Alternatively, you can use nbconvert to execute a notebook or use Papermill to execute a notebook. Papermill allows you to easily pass in parameters at the time of run. I have an example of both commented out in code under 'Step #5' here and 'Step#2' here.
If you are actually trying to bring the code into your present notebook, then you may want to explore importing Jupyter notebooks as modules. importnb is recommended here for making importing notebooks more convenient. Or, I just came across the subnotebook project that let's you run a notebook as you would call a Python function, pass parameters and get results back, including output contents.

Can't launch PySpark in browser (windows 10)

I'm trying to launch PySpark notebook in my browser by typing in pyspark from the console, but I get the following error:
c:\Spark\bin>pyspark
python: can't open file 'notebook': [Errno 2] No such file or directory
What am I doing wrong here?
Please help?
Sounds like the jupyter notebook is either not installed or not in your path.
I prefer to use Anaconda for my python distribution and Jupyter comes standard and will install all necessary path information as well.
After that as long as you have set PYSPARK_PYTHON_DRIVER=jupyter and PYSPARK_PYTHON_DRIVER_OPTS='notebook' correctly you are good to go.
You want to launch the jupyter notebook when you invoke the command pyspark. Therefore you need to add the following to the bash_profile or zshrc_profile.
export PYSPARK_SUBMIT_ARGS="pyspark-shell"
export PYSPARK_DRIVER_PYTHON=ipython
export PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark

Execute a Jupyter notebook from the console

I have some data analysis steps combined in a Jupyter notebook.
As the data change, I want to be able to
Re-run all the cells (to take the new data into account)
Convert to html for viewing
I know I can do #2 through jupyter nbconvert, but how do I do #1 without manually interacting with the notebook web interface?
nbconvert can do that as well, with the --execute argument.
https://nbconvert.readthedocs.io/en/latest/execute_api.html#executing-notebooks-from-the-command-line

jupyter notebook kernel dies when loading mat files using scipy

I am using jupyter notebook for my python work. I have a large dataset .mat file which I am trying to load using scipy package using loadMat function. When I am trying to load my data file, my jupyter kernel dies and restart without any error in the console.
python console works fine and it loads data properly.
I am using ubuntu machine.
Can someone please let me know what is the issue with the jupyter notebook