Can't trigger a wheel in azure data factory - azure-data-factory

I created and tested multiple times a python wheel on my local machine and on Azure Databricks as a Job and it worked fine.
Now, I'm trying to create an Azure Data Factory Pipeline that triggers the wheel stored in Azure Databricks (dbfs:/..) everytime a new file is stored in a blob storage container.
The wheel takes a parameter (-f) and the values is new file name. I passed it to the wheel using argparse inside the script.py and parameters section of databricks job in the previous tests.
I created the pipeline and setted two parameters param and value that I want to pass to the wheel whose values are -f and new-file.txt. See image here
Then I created a Databricks Python file in ADF workspace and paste wheel path into Python file section. Now I'm wondering if this is the right way to do this.
I passed the parametes in the way you can see in the image below and I didn't add any library as I've already attacched the wheel in the upper section (I've tried to add the wheel also as library but notthing changed). See image here
I've created the trigger for blob storage and I've checked that in the trigger json file the parameters exists. Trying to trigger the pipeline I received this error: See image here
I checked if there are errors in code and I changed to UTF-8 the encoding as suggested in other questions of the community but notthing changes.
At this point, I think that I didn't trigger correctly the blob storage or the wheel can't be attached in the way I've done. I didn't add other resources in the workspace, hence I've only Databricks Python file.
Any advice is really appreciate,
thanks for the help!

If I understand your goal is to launch a wheel package from a databricks python notebook using Azure data factory and calling the notebook via the activity python databricks.
I think the problem that you are facing would be when calling the python wheel from the notebook.
Here is an example that I tried to use which is close to your needs and it worked fine.
I created a hello.py script and put it on the path /dbfs/FileStore/jars/
Here is the content of hello.py (just prints the provided arguments)
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('-f', help='file', type=str)
args = parser.parse_args()
print('You provided the value : ', args.f)
I created a python notebook on databricks that takes arguments and passes them to the hello.py script.
This code defines the parameter that the notebook can take (which refers to the parameters you pass via Azure Data Factory while calling the activity databricks)
dbutils.widgets.text("value", "new_file.txt")
dbutils.widgets.text("param", "-f")
This code retrieves the parameters passed to the databricks notebook
param = dbutils.widgets.get("param")
value = dbutils.widgets.get("value")
And finally we call the python hello.py script to execute our custom code as follows :
!python /dbfs/FileStore/jars/hello.py $param $value
Pay attention to the ! at the begining.
Hope this helps your needs and don't forget to mark the answer :) .

Related

List content of a directory in Spark code in Azure Synapse

In Databricks' Scala language, the command dbutils.fs.ls lists the content of a directory. However, I'm working on a notebook in Azure Synapse and it doesn't have dbutils package. What is a Spark command corresponding to dbutils.fs.ls?
%%scala
dbutils.fs.ls("abfss://container#datalake.dfs.core.windows.net/outputs/wrangleddata")
%%spark
// list the content of a directory. ????
Just use mssparkutils, it's a rough equivalent and the main documentation page is here. A simple example:
mssparkutils.fs.ls("/")
mssparkutils.fs.ls("abfss://container#datalake.dfs.core.windows.net/outputs/wrangleddata")

AzCopy ignore if source file is older

Is there an option to handle the next situation:
I have a pipeline and Copy Files task implemented in it, it is used to upload some static html file from git to blob. Everything works perfect. But sometimes I need this file to be changed in the blob storage (using hosted application tools). So, the question is: can I "detect" if my git file is older than target blob file and ignore this file for the copy task to leave it untouched. My initial idea was to use Azure file copy and use an "Optional Arguments" textbox. However, I couldn't find required option in the documentation. Does it allow such things? Or should this case be handled some other way?
I think you're looking for the isSourceNewer value for the --overwrite option.
--overwrite string Overwrite the conflicting files and blobs at the destination if this flag is set to true. (default true) Possible values include true, false, prompt, and ifSourceNewer.
More info: azcopy copy - Options
Agree with ickvdbosch. The isSourceNewer value for the --overwrite option could meet your requirements.
error: couldn't parse "ifSourceNewer" into a "OverwriteOption"
Based on my test, I could reproduce this issue in Azure file copy task.
It seems that the isSourceNewer value couldn't be set to Overwrite option in Azure File copy task.
Workaround: you could use PowerShell task to run the azcopy script to upload the files with --overwrite=ifSourceNewer
For example:
azcopy copy "filepath" "BlobURLwithSASToken" --overwrite=ifSourceNewer --recursive
For more detailed info, you could refer to this doc.
For the issue about the Azure File copy task, I suggest that you could submit a feedback ticket in the following link: Report task issues.

Macro to generate xlsx works fine manually but not from the batch through QlikviewManagement Consol

I am trying to export few charts to Excel (.Xlsx format) through Qlikview Macro and to save it on postrelaod at a particular location. The file works perfectly fine when it is run manually or from the batch (.bat) on double click.
But when scheduled to run from the Qlikview Management Console through the external File(.bat file) its generating the Excel Extract but the file is blank. The error is:
Error: Paste method of Worksheet class failed
I have checked the permission/location of the file and its not an issue.
Postreload trigger saving charts via macro will not work via QMC (both postreload and frontend/chart manipulations doesn't work via QMC).
To solve that I do as following.
1) Set reload in QMC to refresh data in your document.
2) After successful reload another document which triggers... macro from first document to save that charts but with that it also gave me trouble as it generated conflict (you can not open Qlikview from Qlikview..... (I know that it is nosense) so in second document I run macro from first one like that (via PsExec):
EXECUTE "C:\Qlikview\PROD APPLICATION\modules\scripts\edx\PsExec64.exe" *\\SERVER_NAME* -u *SERVER_NAME\User* -p *password* -i 1 -d -high cmd /c ""C:\Program Files\QlikView\qv.exe" "C:\Qlikview\PROD APPLICATION\modules\$(vDocument).qvw" /vvRun=yes
I use variable vRun to specify that macro on open runs only when it is set to yes:
and in macro it is set to close app after saving charts:
ActiveDocument.UnlockAll
ActiveDocument.ClearAll true
ActiveDocument.Save
ActiveDocument.GetApplication.quit
end sub

Jenkins Pipeline - Create file in workspace (Windows Slave)

For a number of reasons, it would be really useful if I could create a file from a Jenkins pipeline and put it in my workspace. If I can do this, I could avoid pulling in some repositories where I'm currently pulling them in for just one or two files, keep those files in a maintainable place, and I could also use this to create temporary powershell scripts, working around a limitation of the solution described in https://stackoverflow.com/a/42576572
This might be possible through a Pipeline utility, although https://jenkins.io/doc/pipeline/steps/pipeline-utility-steps/ doesn't list any such utility; or it might be possible using a batch script - as long as that can be passed in as a string
You can do something like that:
node (''){
stage('test'){
bat """
echo "something" > file.txt
"""
String out = readFile(file.txt).trim()
print out // prints variable out groovy style
out.useFunction() // allows running functions loaded from the file
bat "type %out%" // batch closure can access the variable
}
}

Jenkins Powershell Output

I would like to capture the output of some variables to be used elsewhere in the job using Jenkins Powershell plugin.
Is this possible?
My goal is to build the latest tag somehow and the powershell script was meant to achieve that, outputing to a text file would not help and environment variables can't be used because the process is seemingly forked unfortunately
Besides EnvInject the another common approach for sharing data between build steps is to store results in files located at job workspace.
The idea is to skip using environment variables altogether and just write/read files.
It seems that the only solution is to combine with EnvInject plugin. You can create a text file with key value pairs from powershell then export them into the build using the EnvInject plugin.
You should make the workspace persistant for this job , then you can save the data you need to file. Other jobs can then access this persistant workspace or use it as their own as long as they are on the same node.
Another option would be to use jenkins built in artifact retention, at the end of the jobs configure page there will be an option to retain files specified by a match (e.g *.xml or last_build_number). These are then given a specific address that can be used by other jobs regardless of which node they are on , the address can be on the master or the node IIRC.
For the simple case of wanting to read a single object from Powershell you can convert it to a JSON string in Powershell and then convert it back in Groovy. Here's an example:
def pathsJSON = powershell(returnStdout: true, script: "ConvertTo-Json ((Get-ChildItem -Path *.txt) | select -Property Name)");
def paths = [];
if(pathsJSON != '') {
paths = readJSON text: pathsJSON
}