I am running a databricks notebook from Azure devops pipeline using Execute Databricks notebook task. I am passing vertical name and branch name to my notebooks. Using dbutils am able to get the required values. I have Scala code and shell script code in the same notebook in different cells. I have to use the notebook parameters in the shell script which is in same notebook different cell.
Can someone please suggest me how i can use notebook parameters in the shell script
%scala
val verticalName = dbutils.widgets.get("vertical")
val branchName = dbutils.widgets.get("branch")
println(verticalName)
println(branchName)
%sh
echo $verticalName
echo $branchName
Related
How to pass the dynamic path to %run command in databricks because the function used in another notebook needs to be executed in the current notebook?
dbutils.notebook.run(path = dbutils.widgets.get(), args={}, timeout='120')
I want to use be able to run through a list of config files and use %run to import variables from config files into a databricks notebook.
But I cant find a method to dynamically change the file following %run.
I have tried specifying a parameter like this:
config = './config.py'
%run $config
But it doesn't work. I cannot use dbutils.notebook.run(config) as I won't get access to the variables in my main notebook.
Can anything think of a way to do this?
Since, you have already mentioned config files, I will consider that you have the config files already available in some path and those are not Databricks notebook.
You can use python - configparser in one notebook to read the config files and specify the notebook path using %run in main notebook (or you can ignore the notebook itself by using configparser in main notebook)
Reference: How to read a config file using python
My code is written in jupyter and saved as .ipynb format.
We want to use airflow to schedule the execution and define the dependencies.
How can the notebooks be executed in airflow?
I know I can convert them to python files first but the graphs generated on the fly will be difficult to handle.
Is there are any easier solution? Thanks
You can also use combination of airflow + papermill.
Papermill
Papermill is a tool for running jupyter notebooks with parameters: https://github.com/nteract/papermill
Running a jupyter notebook is very easy, you can do it from python script:
import papermill as pm
pm.execute_notebook(
'path/to/input.ipynb',
'path/to/output.ipynb',
parameters = dict(alpha=0.6, ratio=0.1)
)
or from CLI:
$ papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1
and it will run a notebook from the input path, create a copy in the output path and update this copy after each cell run.
Airflow Integration
To integrate it with Airflow, there is a dedicated papermill operator for running parametrized notebooks: https://airflow.readthedocs.io/en/latest/howto/operator/papermill.html
You can setup the same input/output/paramters arguments directly in the DAG definition and use the templating for the aifrlow variables:
run_this = PapermillOperator(
task_id="run_example_notebook",
dag=dag,
input_nb="/tmp/hello_world.ipynb",
output_nb="/tmp/out-{{ execution_date }}.ipynb",
parameters={"msgs": "Ran from Airflow at {{ execution_date }}!"}
)
We encountered this problem before and spent quite a couple of days to solve it.
We packaged it as a docker file and published on github https://github.com/michaelchanwahyan/datalab.
It is done by modifing an open source package nbparameterize and integrating the passing arguments such as execution_date. Graph generated on the fly can also be updated and saved within inside the notebook.
When it is executed
the notebook will be read and inject the parameters
the notebook is executed and the output will overwrite the original path
Besides, it also installed and configured common tools such as spark, keras, tensorflow, etc.
Another alternative is to use Ploomner (disclaimer: I'm the author). It uses papermill under the hood to build multi-stage pipelines. Tasks can be notebooks, scripts, functions, or any combination of them. You can run locally, Airflow, or Kubernetes (using Argo workflows).
This is how a pipeline declaration looks like:
tasks:
- source: notebook.ipynb
product:
nb: output.html
data: output.csv
- source: another.ipynb
product:
nb: another.html
data: another.csv
Repository
Exporting to Airflow
Exporting to Kubernetes
Sample pipelines
Does anyone know if it is possible to run an IPython/Jupyter notebook non-interactively from the command line and have the resulting .ipynb file saved with the results of the run. If it isn't already possible, how hard would it be to implement with phantomJS, something to turn the kernel on and off, and something to turn the web server on and off?
To be more specific, let's assume I already have a notebook original.ipynb and I want to rerun all cells in that notebook and save the results in a new notebook new.ipynb, but do this with one single command on the command line without requiring interaction either in the browser or to close the kernel or web server, and assuming no kernel or web server is already running.
example command:
$ ipython notebook run original.ipynb --output=new.ipynb
Yes it is possible, and easy, it will (mostly) be in IPython core for 2.0, I would suggest looking at those examples for now.
[edit]
$ jupyter nbconvert --to notebook --execute original.ipynb --output=new.ipynb
It is now in Jupyter NbConvert. NbConvert comes with a bunch of Preprocessors that are disabled by default, two of them (ClearOutputPreprocessor and ExecutePreprocessor) are of interest. You can either enabled them in your (local|global) config file(s) via c.<PreprocessorName>.enabled=True (Uppercase that's python), or on the command line with --ExecutePreprocessor.enabled=True keep the rest of the command as usual.
The --ExecutePreprocessor.enabled=True has convenient --execute alias that can be used on recent version of NbConvert. It can be combine with --inplace if desired
For example, convert to html after running the notebook headless :
$ jupyter nbconvert --to=html --execute RunMe.ipynb
converting to PDF after stripping outputs
$ ipython nbconvert --to=pdf --ClearOutputPreprocessor.enabled=True RunMe.ipynb
This (of course) does work with non-python kernels by spawning a <insert-your-language-here> kernel, if you set --profile=<your fav profile>. The conversion can be really long as it needs to rerun the notebook. You can do notebook to notebook conversion with the --to=notebook option.
There are various other options (timeout, allow errors, ...) that might need to be set/unset depending on use case. See documentation and of course jupyter nbconvert --help, --help-all, or nbconvert online documentation for more information.
Until this functionality becomes part of the core, I put together a little command-line app that does just what you want. It's called runipy and you can install it with pip install runipy. The source and readme are on github.
Run and replace original .ipynb file:
jupyter nbconvert --ExecutePreprocessor.timeout=-1 --to notebook --inplace --execute original.ipynb
To cover some features such as parallel workers, input parameters, e-mail sending or S3 input/output... you can install jupyter-runner
pip install jupyter-runner
Readme on github: https://github.com/omar-masmoudi/jupyter-runner
One more way is to use papermill, it has Command Line Interface
Usage example: (you need to specify output path for execution results to be stored)
papermill your_notebook.ipynb logs/yourlog.out.ipynb
You also can specify required params if you wish with -p flag for each param:
papermill your_notebook.ipynb logs/yourlog.out.ipynb -p env "prod" -p tests "e2e"
one more related to papermill reply - https://stackoverflow.com/a/55458141/2957102
You can just run the iPython-Notebook-server via command line:
ipython notebook --pylab inline
This will start the server in non-interactive mode and all output is printed below the code. You can then save the .ipynb-File which includes Code & Output.
I want to accomplish a loop over R code within an IPython notebook. What is the best way to do this?
l = []
for i in range(10):
# execute R script
%%R -i i -o result #some arbitrary R code
# and use the output
l.append(result)
Can this be done inside a notebook (Looping over next cell)?
Have you looked into rmagic and rpy2 module?
If you have R scripts, then you can call them and assign their output to a variable using the shell command notation:
var=!R_script arguments....
The above does not need you need you to install rpy2 since ! shell command execution is basic in ipython. You can pass values of variables from ipython notebook by using $var in the arg list.