I want to use be able to run through a list of config files and use %run to import variables from config files into a databricks notebook.
But I cant find a method to dynamically change the file following %run.
I have tried specifying a parameter like this:
config = './config.py'
%run $config
But it doesn't work. I cannot use dbutils.notebook.run(config) as I won't get access to the variables in my main notebook.
Can anything think of a way to do this?
Since, you have already mentioned config files, I will consider that you have the config files already available in some path and those are not Databricks notebook.
You can use python - configparser in one notebook to read the config files and specify the notebook path using %run in main notebook (or you can ignore the notebook itself by using configparser in main notebook)
Reference: How to read a config file using python
Related
How to pass the dynamic path to %run command in databricks because the function used in another notebook needs to be executed in the current notebook?
dbutils.notebook.run(path = dbutils.widgets.get(), args={}, timeout='120')
My code is written in jupyter and saved as .ipynb format.
We want to use airflow to schedule the execution and define the dependencies.
How can the notebooks be executed in airflow?
I know I can convert them to python files first but the graphs generated on the fly will be difficult to handle.
Is there are any easier solution? Thanks
You can also use combination of airflow + papermill.
Papermill
Papermill is a tool for running jupyter notebooks with parameters: https://github.com/nteract/papermill
Running a jupyter notebook is very easy, you can do it from python script:
import papermill as pm
pm.execute_notebook(
'path/to/input.ipynb',
'path/to/output.ipynb',
parameters = dict(alpha=0.6, ratio=0.1)
)
or from CLI:
$ papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1
and it will run a notebook from the input path, create a copy in the output path and update this copy after each cell run.
Airflow Integration
To integrate it with Airflow, there is a dedicated papermill operator for running parametrized notebooks: https://airflow.readthedocs.io/en/latest/howto/operator/papermill.html
You can setup the same input/output/paramters arguments directly in the DAG definition and use the templating for the aifrlow variables:
run_this = PapermillOperator(
task_id="run_example_notebook",
dag=dag,
input_nb="/tmp/hello_world.ipynb",
output_nb="/tmp/out-{{ execution_date }}.ipynb",
parameters={"msgs": "Ran from Airflow at {{ execution_date }}!"}
)
We encountered this problem before and spent quite a couple of days to solve it.
We packaged it as a docker file and published on github https://github.com/michaelchanwahyan/datalab.
It is done by modifing an open source package nbparameterize and integrating the passing arguments such as execution_date. Graph generated on the fly can also be updated and saved within inside the notebook.
When it is executed
the notebook will be read and inject the parameters
the notebook is executed and the output will overwrite the original path
Besides, it also installed and configured common tools such as spark, keras, tensorflow, etc.
Another alternative is to use Ploomner (disclaimer: I'm the author). It uses papermill under the hood to build multi-stage pipelines. Tasks can be notebooks, scripts, functions, or any combination of them. You can run locally, Airflow, or Kubernetes (using Argo workflows).
This is how a pipeline declaration looks like:
tasks:
- source: notebook.ipynb
product:
nb: output.html
data: output.csv
- source: another.ipynb
product:
nb: another.html
data: another.csv
Repository
Exporting to Airflow
Exporting to Kubernetes
Sample pipelines
Is there a way to open a new Jupyter notebook from a template? (which would not itself be modified)
I would expect something like this perhaps:
jupyter --template <template-filename>
(re-using the existing jupyter-notebook server session if there is one already)
But I don't seem to see how to do it (as of Jupyter 4.0.6)
I admit it's a rather hacky workaround, but just in case there is no implemented solution: You could make an alias in your .bashrc like this:
alias newnb='cp -i ~/templates/jupyter.ipynb new_notebook.ipynb && juypter notebook'
Then, the command newnb in your terminal copies the template file to your current directory and invokes the jupyter notebook session on this directory, where you could open new_notebook.ipynb with the given content.
ipython notebook has setting for default working directory
c.FileNotebookManager.notebook_dir = '/path/to/my/desired/dir'
is there analogous setting for ipython console (terminal) ? I have tried adjusting following configuration parameter:
c.TerminalInteractiveShell.ipython_dir = '/path/to/my/desired/dir'
but this seems to have no effect. There is also no comment as to what this parameter is supposed to effect.
How can I configure ipython so that my working directory upon start will be /path/to/my/desired/dir, irrespective from where I started ipython ?
From your home directory, go to .ipython, then your profile directory (probably profile_default), then startup. In there, create a new file with the extension .ipy, containing the lines:
import os
os.chdir('/path/to/my/desired/dir')
As pointed out by crowie in the comments, the .ipy extension also enables you to use IPython "magic" commands, so you could instead say:
%cd /path/to/my/desired/dir
Does anyone know if it is possible to run an IPython/Jupyter notebook non-interactively from the command line and have the resulting .ipynb file saved with the results of the run. If it isn't already possible, how hard would it be to implement with phantomJS, something to turn the kernel on and off, and something to turn the web server on and off?
To be more specific, let's assume I already have a notebook original.ipynb and I want to rerun all cells in that notebook and save the results in a new notebook new.ipynb, but do this with one single command on the command line without requiring interaction either in the browser or to close the kernel or web server, and assuming no kernel or web server is already running.
example command:
$ ipython notebook run original.ipynb --output=new.ipynb
Yes it is possible, and easy, it will (mostly) be in IPython core for 2.0, I would suggest looking at those examples for now.
[edit]
$ jupyter nbconvert --to notebook --execute original.ipynb --output=new.ipynb
It is now in Jupyter NbConvert. NbConvert comes with a bunch of Preprocessors that are disabled by default, two of them (ClearOutputPreprocessor and ExecutePreprocessor) are of interest. You can either enabled them in your (local|global) config file(s) via c.<PreprocessorName>.enabled=True (Uppercase that's python), or on the command line with --ExecutePreprocessor.enabled=True keep the rest of the command as usual.
The --ExecutePreprocessor.enabled=True has convenient --execute alias that can be used on recent version of NbConvert. It can be combine with --inplace if desired
For example, convert to html after running the notebook headless :
$ jupyter nbconvert --to=html --execute RunMe.ipynb
converting to PDF after stripping outputs
$ ipython nbconvert --to=pdf --ClearOutputPreprocessor.enabled=True RunMe.ipynb
This (of course) does work with non-python kernels by spawning a <insert-your-language-here> kernel, if you set --profile=<your fav profile>. The conversion can be really long as it needs to rerun the notebook. You can do notebook to notebook conversion with the --to=notebook option.
There are various other options (timeout, allow errors, ...) that might need to be set/unset depending on use case. See documentation and of course jupyter nbconvert --help, --help-all, or nbconvert online documentation for more information.
Until this functionality becomes part of the core, I put together a little command-line app that does just what you want. It's called runipy and you can install it with pip install runipy. The source and readme are on github.
Run and replace original .ipynb file:
jupyter nbconvert --ExecutePreprocessor.timeout=-1 --to notebook --inplace --execute original.ipynb
To cover some features such as parallel workers, input parameters, e-mail sending or S3 input/output... you can install jupyter-runner
pip install jupyter-runner
Readme on github: https://github.com/omar-masmoudi/jupyter-runner
One more way is to use papermill, it has Command Line Interface
Usage example: (you need to specify output path for execution results to be stored)
papermill your_notebook.ipynb logs/yourlog.out.ipynb
You also can specify required params if you wish with -p flag for each param:
papermill your_notebook.ipynb logs/yourlog.out.ipynb -p env "prod" -p tests "e2e"
one more related to papermill reply - https://stackoverflow.com/a/55458141/2957102
You can just run the iPython-Notebook-server via command line:
ipython notebook --pylab inline
This will start the server in non-interactive mode and all output is printed below the code. You can then save the .ipynb-File which includes Code & Output.