I have two different Jupyter notebooks, running on the same server. What I would like to do is to access some (only a few of them) of the variables of one notebook through the other notebook (I have to compare if the two different versions of the algorithm give the same results, basically). Is there a way to do this?
Thanks
Between 2 jupyter notebook, you can use the %store command.
In the first jupyter notebook:
data = 'string or data-table to pass'
%store data
del data # This will DELETE the data from the memory of the first notebook
In the second jupyter notebook:
%store -r data
data
You can find more information at here.
If you only need something quick'n dirty, you can use the pickle module to make the data persistent (save it to a file) and then have it picked up by your other notebook. For example:
import pickle
a = ['test value','test value 2','test value 3']
# Choose a file name
file_name = "sharedfile"
# Open the file for writing
with open(file_name,'wb') as my_file_obj:
pickle.dump(a,my_file_obj)
# The file you have just saved can be opened in a different session
# (or iPython notebook) and the contents will be preserved.
# Now select the (same) file to open (e.g. in another notebook)
file_name = "sharedfile"
# Open the file for reading
file_object = open(file_Name,'r')
# load the object from the file into var b
b = pickle.load(file_object)
print(b)
>>> ['test value','test value 2','test value 3']
You can use same magic commands to do this.The Cell magic: %%cache in the IPython notebook can be used to cache results and outputs of long-lasting computations in a persistent pickle file. Useful when some computations in a notebook are long and you want to easily save the results in a file.
To use it in your notebook, you need to install the module ipycache first as this Cell magic command is not a built-in magic command.
then load the module in your notebook:
%load_ext ipycache
Then, create a cell with:
%%cache mycache.pkl var1 var2
var1 = 1 # you can put any code you want at there,
var2 = 2 # just make sure this cell is not empty.
When you execute this cell the first time, the code is executed, and the variables var1 and var2 are saved in mycache.pkl in the current directory along with the outputs. Rich display outputs are only saved if you use the development version of IPython. When you execute this cell again, the code is skipped, the variables are loaded from the file and injected into the namespace, and the outputs are restored in the notebook.
Alternatively use $file_name instead of mycache.pkl, where file_name is a variable holding the path to the file used for caching.
Use the --force or -f option to force the cell's execution and overwrite the file.
Use the --read or -r option to prevent the cell's execution and always load the variables from the cache. An exception is raised if the file does not exist.
ref:
The github repository of ipycache and the example notebook
Related
I created and tested multiple times a python wheel on my local machine and on Azure Databricks as a Job and it worked fine.
Now, I'm trying to create an Azure Data Factory Pipeline that triggers the wheel stored in Azure Databricks (dbfs:/..) everytime a new file is stored in a blob storage container.
The wheel takes a parameter (-f) and the values is new file name. I passed it to the wheel using argparse inside the script.py and parameters section of databricks job in the previous tests.
I created the pipeline and setted two parameters param and value that I want to pass to the wheel whose values are -f and new-file.txt. See image here
Then I created a Databricks Python file in ADF workspace and paste wheel path into Python file section. Now I'm wondering if this is the right way to do this.
I passed the parametes in the way you can see in the image below and I didn't add any library as I've already attacched the wheel in the upper section (I've tried to add the wheel also as library but notthing changed). See image here
I've created the trigger for blob storage and I've checked that in the trigger json file the parameters exists. Trying to trigger the pipeline I received this error: See image here
I checked if there are errors in code and I changed to UTF-8 the encoding as suggested in other questions of the community but notthing changes.
At this point, I think that I didn't trigger correctly the blob storage or the wheel can't be attached in the way I've done. I didn't add other resources in the workspace, hence I've only Databricks Python file.
Any advice is really appreciate,
thanks for the help!
If I understand your goal is to launch a wheel package from a databricks python notebook using Azure data factory and calling the notebook via the activity python databricks.
I think the problem that you are facing would be when calling the python wheel from the notebook.
Here is an example that I tried to use which is close to your needs and it worked fine.
I created a hello.py script and put it on the path /dbfs/FileStore/jars/
Here is the content of hello.py (just prints the provided arguments)
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('-f', help='file', type=str)
args = parser.parse_args()
print('You provided the value : ', args.f)
I created a python notebook on databricks that takes arguments and passes them to the hello.py script.
This code defines the parameter that the notebook can take (which refers to the parameters you pass via Azure Data Factory while calling the activity databricks)
dbutils.widgets.text("value", "new_file.txt")
dbutils.widgets.text("param", "-f")
This code retrieves the parameters passed to the databricks notebook
param = dbutils.widgets.get("param")
value = dbutils.widgets.get("value")
And finally we call the python hello.py script to execute our custom code as follows :
!python /dbfs/FileStore/jars/hello.py $param $value
Pay attention to the ! at the begining.
Hope this helps your needs and don't forget to mark the answer :) .
I tried the below code with pycharm IDE and found that file is being created for append mode.
As we learn in course lecture that file will be created only with write mode ='w'
with open('xyz.txt',mode= 'a') as xyz_file:
xyz_file.write('This file is created in append mode')
with open('xyz.txt',mode= 'r') as xyz_file:
print(xyz_file.read())
I was aware that there is no file that exists with the name xyz.txt in my python file path. yet with the above code, it is created and the text appended.
The python open() command will create the file if the mode is 'w', 'a', or 'x'.
I believe that if you want to be able to write in a file but still error if the file is not created, the 'r+' mode will do that.
I am trying to export few charts to Excel (.Xlsx format) through Qlikview Macro and to save it on postrelaod at a particular location. The file works perfectly fine when it is run manually or from the batch (.bat) on double click.
But when scheduled to run from the Qlikview Management Console through the external File(.bat file) its generating the Excel Extract but the file is blank. The error is:
Error: Paste method of Worksheet class failed
I have checked the permission/location of the file and its not an issue.
Postreload trigger saving charts via macro will not work via QMC (both postreload and frontend/chart manipulations doesn't work via QMC).
To solve that I do as following.
1) Set reload in QMC to refresh data in your document.
2) After successful reload another document which triggers... macro from first document to save that charts but with that it also gave me trouble as it generated conflict (you can not open Qlikview from Qlikview..... (I know that it is nosense) so in second document I run macro from first one like that (via PsExec):
EXECUTE "C:\Qlikview\PROD APPLICATION\modules\scripts\edx\PsExec64.exe" *\\SERVER_NAME* -u *SERVER_NAME\User* -p *password* -i 1 -d -high cmd /c ""C:\Program Files\QlikView\qv.exe" "C:\Qlikview\PROD APPLICATION\modules\$(vDocument).qvw" /vvRun=yes
I use variable vRun to specify that macro on open runs only when it is set to yes:
and in macro it is set to close app after saving charts:
ActiveDocument.UnlockAll
ActiveDocument.ClearAll true
ActiveDocument.Save
ActiveDocument.GetApplication.quit
end sub
I have one notebook name "paths" and I want to use some of val declared in it, in another notebook, but I am getting error. I am using spark-scala
%run "/paths"
error
Notebook not found: paths. Notebooks can be specified via a relative path (./Notebook or ../folder/Notebook) or via an absolute path (/Abs/Path/to/Notebook). Make sure you are specifying the path correctly.
Please use %run /Users/testuser/TESTParam
You can also pass Additional paramters like
%run /Users/testuser/TESTParam $runtype = "Sequential" $configuration="xxx" $runnumber=1-10
You can run below code to get current notebook absolute path
dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()
then you can use %run <path>
My Jupyter workflow for exploratory analysis looks like:
Fiddle with some parameters.
Run the notebook; generate output.
Eyeball outputs.
Repeat.
Can anyone suggest a command to make the notebook to save a copy of itself (e.g as an html in the output folder), so that if I want to recreate a particular experiment (results from a particular parameter set) I can do so?
Yes you can. Just add a safe cell by using cell magic. After using nbconvert you can rename the file and append a date
%%bash
jupyter nbconvert --to html MyNotebookName.ipynb
mv MyNotebookName.html $(date +"%m_%d_%Y-%H%M%S")_MyNotebookName.html