How to run notebook inside another notebook in databricks? - pyspark

How to pass the dynamic path to %run command in databricks because the function used in another notebook needs to be executed in the current notebook?

dbutils.notebook.run(path = dbutils.widgets.get(), args={}, timeout='120')

Related

Is there a way to parameterize magic commands in Databricks notebooks?

I want to use be able to run through a list of config files and use %run to import variables from config files into a databricks notebook.
But I cant find a method to dynamically change the file following %run.
I have tried specifying a parameter like this:
config = './config.py'
%run $config
But it doesn't work. I cannot use dbutils.notebook.run(config) as I won't get access to the variables in my main notebook.
Can anything think of a way to do this?
Since, you have already mentioned config files, I will consider that you have the config files already available in some path and those are not Databricks notebook.
You can use python - configparser in one notebook to read the config files and specify the notebook path using %run in main notebook (or you can ignore the notebook itself by using configparser in main notebook)
Reference: How to read a config file using python

using scala variable in shell script

I am running a databricks notebook from Azure devops pipeline using Execute Databricks notebook task. I am passing vertical name and branch name to my notebooks. Using dbutils am able to get the required values. I have Scala code and shell script code in the same notebook in different cells. I have to use the notebook parameters in the shell script which is in same notebook different cell.
Can someone please suggest me how i can use notebook parameters in the shell script
%scala
val verticalName = dbutils.widgets.get("vertical")
val branchName = dbutils.widgets.get("branch")
println(verticalName)
println(branchName)
%sh
echo $verticalName
echo $branchName

How to run jupyter notebook in airflow

My code is written in jupyter and saved as .ipynb format.
We want to use airflow to schedule the execution and define the dependencies.
How can the notebooks be executed in airflow?
I know I can convert them to python files first but the graphs generated on the fly will be difficult to handle.
Is there are any easier solution? Thanks
You can also use combination of airflow + papermill.
Papermill
Papermill is a tool for running jupyter notebooks with parameters: https://github.com/nteract/papermill
Running a jupyter notebook is very easy, you can do it from python script:
import papermill as pm
pm.execute_notebook(
'path/to/input.ipynb',
'path/to/output.ipynb',
parameters = dict(alpha=0.6, ratio=0.1)
)
or from CLI:
$ papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1
and it will run a notebook from the input path, create a copy in the output path and update this copy after each cell run.
Airflow Integration
To integrate it with Airflow, there is a dedicated papermill operator for running parametrized notebooks: https://airflow.readthedocs.io/en/latest/howto/operator/papermill.html
You can setup the same input/output/paramters arguments directly in the DAG definition and use the templating for the aifrlow variables:
run_this = PapermillOperator(
task_id="run_example_notebook",
dag=dag,
input_nb="/tmp/hello_world.ipynb",
output_nb="/tmp/out-{{ execution_date }}.ipynb",
parameters={"msgs": "Ran from Airflow at {{ execution_date }}!"}
)
We encountered this problem before and spent quite a couple of days to solve it.
We packaged it as a docker file and published on github https://github.com/michaelchanwahyan/datalab.
It is done by modifing an open source package nbparameterize and integrating the passing arguments such as execution_date. Graph generated on the fly can also be updated and saved within inside the notebook.
When it is executed
the notebook will be read and inject the parameters
the notebook is executed and the output will overwrite the original path
Besides, it also installed and configured common tools such as spark, keras, tensorflow, etc.
Another alternative is to use Ploomner (disclaimer: I'm the author). It uses papermill under the hood to build multi-stage pipelines. Tasks can be notebooks, scripts, functions, or any combination of them. You can run locally, Airflow, or Kubernetes (using Argo workflows).
This is how a pipeline declaration looks like:
tasks:
- source: notebook.ipynb
product:
nb: output.html
data: output.csv
- source: another.ipynb
product:
nb: another.html
data: another.csv
Repository
Exporting to Airflow
Exporting to Kubernetes
Sample pipelines

How to write MATLAB functions in Jupyter Notebook?

Overview
I am using the MATLAB kernel in Jupyter Notebook. I would like to write a function in the notebook, rather than referring to a function that is saved in another .m file. The problem is that when I try to do so, I get the error:
Error: Function definitions are not permitted in this context.
Visual example:
In a new notebook, it looks like the following picture:
Now, it does work if I make a new .m file:
and then call then function via the notebook:
but this is inconvenient. Is there a way to define functions from within a Jupyter Notebook directly?
My software
MATLAB 2017b
Windows 10
Jupyter running in chrome
Jupyter installed via anaconda
The documentation indicates that you can use the magic:
%%file name_of_your_function.m
To take your example, your cell should be written as follows:
%%file fun.m
function out = fun(in)
out = in + 1;
end
This creates a new file called fun.m. This allows MATLAB to do what it needs (a function in a separate file), and also allows you to write your function directly in the Jupyter Notebook.

Can you `source` the `ipython profile` (Jupyter) while in a notebook?

The iPython profile or Jupyter profile path: ~/.ipython/profile_default/startup/startup.ipy
I update this quite often.
Is there a way to source this within a notebook like when you're in the terminal and source ~/.bash_profile after you make an update? My current method is to close the kernel and Jupyter session then restart.
You can use %run to do this:
%run ~/.ipython/profile_default/startup/startup.ipy
If you do %run -i [script], then your current interactive namespace will be available to the script.