I am using one notebook to another notebook in databrick -spark-scala

I am using one notebook to another notebook in databrick -spark-scala - scala

I have one notebook name "paths" and I want to use some of val declared in it, in another notebook, but I am getting error. I am using spark-scala
%run "/paths"
error
Notebook not found: paths. Notebooks can be specified via a relative path (./Notebook or ../folder/Notebook) or via an absolute path (/Abs/Path/to/Notebook). Make sure you are specifying the path correctly.

Please use %run /Users/testuser/TESTParam
You can also pass Additional paramters like
%run /Users/testuser/TESTParam $runtype = "Sequential" $configuration="xxx" $runnumber=1-10

You can run below code to get current notebook absolute path
dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()
then you can use %run <path>

Related

Can't trigger a wheel in azure data factory

I created and tested multiple times a python wheel on my local machine and on Azure Databricks as a Job and it worked fine.
Now, I'm trying to create an Azure Data Factory Pipeline that triggers the wheel stored in Azure Databricks (dbfs:/..) everytime a new file is stored in a blob storage container.
The wheel takes a parameter (-f) and the values is new file name. I passed it to the wheel using argparse inside the script.py and parameters section of databricks job in the previous tests.
I created the pipeline and setted two parameters param and value that I want to pass to the wheel whose values are -f and new-file.txt. See image here
Then I created a Databricks Python file in ADF workspace and paste wheel path into Python file section. Now I'm wondering if this is the right way to do this.
I passed the parametes in the way you can see in the image below and I didn't add any library as I've already attacched the wheel in the upper section (I've tried to add the wheel also as library but notthing changed). See image here
I've created the trigger for blob storage and I've checked that in the trigger json file the parameters exists. Trying to trigger the pipeline I received this error: See image here
I checked if there are errors in code and I changed to UTF-8 the encoding as suggested in other questions of the community but notthing changes.
At this point, I think that I didn't trigger correctly the blob storage or the wheel can't be attached in the way I've done. I didn't add other resources in the workspace, hence I've only Databricks Python file.
Any advice is really appreciate,
thanks for the help!

If I understand your goal is to launch a wheel package from a databricks python notebook using Azure data factory and calling the notebook via the activity python databricks.
I think the problem that you are facing would be when calling the python wheel from the notebook.
Here is an example that I tried to use which is close to your needs and it worked fine.
I created a hello.py script and put it on the path /dbfs/FileStore/jars/
Here is the content of hello.py (just prints the provided arguments)
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('-f', help='file', type=str)
args = parser.parse_args()
print('You provided the value : ', args.f)
I created a python notebook on databricks that takes arguments and passes them to the hello.py script.
This code defines the parameter that the notebook can take (which refers to the parameters you pass via Azure Data Factory while calling the activity databricks)
dbutils.widgets.text("value", "new_file.txt")
dbutils.widgets.text("param", "-f")
This code retrieves the parameters passed to the databricks notebook
param = dbutils.widgets.get("param")
value = dbutils.widgets.get("value")
And finally we call the python hello.py script to execute our custom code as follows :
!python /dbfs/FileStore/jars/hello.py $param $value
Pay attention to the ! at the begining.
Hope this helps your needs and don't forget to mark the answer :) .

Apache Spark Multiple sources found for csv Error

I'm trying to run my spark program using the spark-submit command (i'm working with scala), i specified the master adress, the class name, the jar file with all dependencies, the input file and then the output file but i'm having and error:
Exception in thread "main" org.apache.spark.sql.AnalysisException:
Multiple sources found for csv
(org.apache.spark.sql.execution.datasources.v2.csv.CSVDataSourceV2,
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat), please
specify the fully qualified class name.;
Here is a screenshot for this error, What is it about? How can i fix it?
Thank you

Here you got some warnings also,
If you correctly run your fat-jar file with correct permissions you can get a output like this for ./spark-submit
Check whether if correctly set environmental variables for spark (~/.bashrc). Also check the source CSV file permissions. May be it will be the problem.
If you are running on linux environment set the folder permissions for the source CSV folder as
sudo chmod -R 777 /source_folder
After that again try to run ./spark-submit with your fat-jar file.

Share variables between different jupyter notebooks

I have two different Jupyter notebooks, running on the same server. What I would like to do is to access some (only a few of them) of the variables of one notebook through the other notebook (I have to compare if the two different versions of the algorithm give the same results, basically). Is there a way to do this?
Thanks

Between 2 jupyter notebook, you can use the %store command.
In the first jupyter notebook:
data = 'string or data-table to pass'
%store data
del data # This will DELETE the data from the memory of the first notebook
In the second jupyter notebook:
%store -r data
data
You can find more information at here.

If you only need something quick'n dirty, you can use the pickle module to make the data persistent (save it to a file) and then have it picked up by your other notebook. For example:
import pickle
a = ['test value','test value 2','test value 3']
# Choose a file name
file_name = "sharedfile"
# Open the file for writing
with open(file_name,'wb') as my_file_obj:
pickle.dump(a,my_file_obj)
# The file you have just saved can be opened in a different session
# (or iPython notebook) and the contents will be preserved.
# Now select the (same) file to open (e.g. in another notebook)
file_name = "sharedfile"
# Open the file for reading
file_object = open(file_Name,'r')
# load the object from the file into var b
b = pickle.load(file_object)
print(b)
>>> ['test value','test value 2','test value 3']

You can use same magic commands to do this.The Cell magic: %%cache in the IPython notebook can be used to cache results and outputs of long-lasting computations in a persistent pickle file. Useful when some computations in a notebook are long and you want to easily save the results in a file.
To use it in your notebook, you need to install the module ipycache first as this Cell magic command is not a built-in magic command.
then load the module in your notebook:
%load_ext ipycache
Then, create a cell with:
%%cache mycache.pkl var1 var2
var1 = 1 # you can put any code you want at there,
var2 = 2 # just make sure this cell is not empty.
When you execute this cell the first time, the code is executed, and the variables var1 and var2 are saved in mycache.pkl in the current directory along with the outputs. Rich display outputs are only saved if you use the development version of IPython. When you execute this cell again, the code is skipped, the variables are loaded from the file and injected into the namespace, and the outputs are restored in the notebook.
Alternatively use $file_name instead of mycache.pkl, where file_name is a variable holding the path to the file used for caching.
Use the --force or -f option to force the cell's execution and overwrite the file.
Use the --read or -r option to prevent the cell's execution and always load the variables from the cache. An exception is raised if the file does not exist.
ref:
The github repository of ipycache and the example notebook

I am encountering an error while using imwrite

I am new to image processing and what I am trying to do is resize an image and store it in tif format, but command window reports an error saying "you don't have the permission to write"
my code is imwrite(B,'myNewFile.tif');
and after running it shows
Error using imwrite (line 10)
Unable to open file "myNewFile.tif" for writing. You may not have write permission.
Do I have to create a file by the name 'myNewFile' before writing the above code?

As the error message states, you are trying to write the file myNewFile.tif to the current working directory. However, you do not have writing permission in the current working direcoty. This is an OS issue, not a Matlab one.
What you can do is change the current working directory (using cd command) and write the image to a different folder where you do have writing permissions.
Alternatively, you can supply a full path to the image file name, directing it to a folder where you have writing permissions.
imwrite( B, fullfile( '/path/to/where/you/can/write', 'myNewFile.tif' ) );
Here are links to the description of some Matlab commands that might help you:
pwd can be used to check what is your current working directory.
You can use cd to change the current working directory.
fullfile helps you construct file names and paths in a generic way without worrying about OS pecularities.

Error while try to rename a file name in matlab

This is my code:
filename_date = strcat('Maayanei_yeshua-IC_',file_date,'.pdf')
filenamepdf = strcat(filename,'.pdf')
rename(['C:\Users\user\Desktop\' filenamepdf],['C:\Users\user\Desktop\' filename_date]);
And i get the error:
<??? Error using ==> movefile The system cannot find the path specified.>
or
<??? Undefined function or method 'rename' for input arguments of type 'char'.>
I checked hundreds of times and the file is there... i don't know why it can't find it, any help ?

Use the command
doc rename
to discover that rename is for working with ftp servers, which you are not doing here. What you want is the command movefile
Use the help window brought up by helpwin to look up all the commands you are using.
Also, from the command prompt try
dir(['C:\Users\user\Desktop\' filenamepdf])
to verify the file you want to move exists.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse