How to use a output of a Databricks activity in future activity inside ADF?

How to use a output of a Databricks activity in future activity inside ADF? - azure-data-factory

I have a Databricks activity in ADF and I pass the output with the below code:
dbutils.notebook.exit(message_json)
Now, I want to use this output for the next Databrick activity.
As my search, I think add the last output into base parameters in the second activity. Am I right?
and other questions, How can I use this output inside the Databrick notebook?
Edited: The output is a JSON as the below screenshot.

As per doc, you can consume the output of Databrick Notebook activity in data factory by using expression such as #{activity('databricks notebook activity name').output.runOutput}.
If you are passing JSON object you can retrieve values by appending property names.
Example: #{activity('databricks notebook activity name').output.runOutput.PropertyName}.
I reproduced the issue and it's working fine.
Below is the sample notebook.
import json
dates = ['2017-12-11', '2017-12-10', '2017-12-09', '2017-12-08', '2017-12-07']
return_json = json.dumps(dates)
dbutils.notebook.exit(return_json)
This is how the Notebook2 Activity Seeting looks like:
Pipeline ran successfully.

Related

ADF Copy only when a new CSV file is placed in the source and copy to the Container

I want to copy the file from Source to target container but only when the Source file is new
(latest file is placed in source). I am not sure how to proceed this and not sure about the syntax to check the source file greater than target. Should i have to use two get metadata activity to check source and target last modified date and use if condition. i tried few ways but it didn't work.
Any help will be handy
syntax i used for the condition is giving me the error
#if(greaterOrEquals(ticks(activity('Get Metadata_File').output.lastModified),activity('Get Metadata_File2')),True,False)
error message
The function 'greaterOrEquals' expects all of its parameters to be either integer or decimal numbers. Found invalid parameter types: 'Object'

You can try one of the Pipeline Templates that ADF offers.
Use this template to copy new and changed files only by using
LastModifiedDate. This template first selects the new and changed
files only by their attributes "LastModifiedDate", and then copies
them from the data source store to the data destination store. You can
also go to "Copy Data Tool" to get the pipeline for the same scenario
with more connectors.
View
documentation
OR...
You can use Storage Event Triggers to trigger the pipeline with copy activity to copy when each new file is written to storage.
Follow detailed example here: Create a trigger that runs a pipeline in response to a storage event

Flutter | Retrieve ffprobe data

I'm using flutter_ffmpeg package, specifically I'm trying to retrieve information regarding the chapter marks from a .m4b file which is an audiobook. By using this method:
_flutterFFmpeg.executeWithArguments(['-i', widget.book.path, '-print_format', 'json', '-show_chapters', '-loglevel', 'error']);
I was able to output this data as a JSON map in the console. The thing is, I need to use this data inside my application, is there a way to get access to those chapters as a variable using another approach, or maybe to access this data directly from the console log printed by the method shown earlier.

Azure Data Factory Passing a pipeline-processed file as a mail attachment with logic app

I have an ADF pipeline moving a file to a blob storage. I am trying to pass the processed file as a parameter of my web activity so that I can use it as an email attachment. I am successfully passing the following parameters:
{
"Title":"Error File Received From MOE",
"Message": "This is a test message.",
"DataFactoryName":"#{pipeline().DataFactory}",
"PipelineName":"#{pipeline().Pipeline}",
"PipelineRunId":"#{pipeline().RunId}",
"Time":"#{utcnow()}",
"File":????????????????????????????
}
But, how should I specify the path to the file I just processed within the same pipeline?
Any help would be greatly appreciated,
Thanks
Eric

I'm copying data to output container. My current assumption is to upload a file a day, and then use two GetMetadata activities to get the lastModified attribute of the file to filter out the name of the most recently uploaded file.
Get Child items in Get Metadata1 activity.
Then in Foreach activity, get child items via dynamic content #activity('Get Metadata1').output.childItems
Inside Foreach activity, in Get Metadata2 activity, specify the json4 dataset to the output container.
Enter dynamic content #item().name to foreach the filename list.
In If condition activity, using #equals(dayOfMonth(activity('Get Metadata2').output.lastModified),dayOfMonth(utcnow())) to determine whether the date the file was uploaded is today.
In true activity, add dynamic content #concat('https://{account}.blob.core.windows.net/{Path}/',item().name) to assign the value to the variable.
The output is as follows:

Debugging ADF and Databrickswith display dataframes

I have ADF pipelines that calls Azure Databricks notebook. I want to call an ADF pipeline in normal mode(high performance) and then in debug mode.
When in debug mode, I want to display some DFs(Data frames) in databricks. But when run normally DFs should not displayed.
To achieve this I am thinking of sending parameters from ADF (debug=true) and let the display happen in an 'if' condition in databricks notebook. Is this the recommended approach or are there builtin functionlities in databricks or ADF?

If I understand the ask ,you are trying to to capture if the pipeline was initiated in debug mode or scheduled trigger mode . I think you can use the expression
#pipeline().TriggerName .
Atleast when I tested for debug mode it shows the value as "Sanbox" or for scheduled mode you its shows the trigger name which triggered the pipeline .
You can pass this as a parameter to the notebook and put a IF statement to add your logic .
HTH

SPSS stream executing with python

I need help with script in python to run stream in SPSS. Now I am using with code to export data into Excel file and it works. But this code is required one manual step before exporting data to Excel file.
stream = modeler.script.stream() **- geting stream in SPSS;**
output1 = stream.findByType("excelexport", "1") **- then searching Excel file with name "1";**
results = [] **- then run all stream;**
output1.run(results) **- but here I need to press button to finish execution(Have a look screenshots);**
output1 = stream.findByType("excelexport", "2") **- this the next step!**
results = []
output1.run(results)
I would like to fully automate stream. Please, help me! Thanks a lot!

I can help you only using the legacy script. I have several export excel nodes in my streams and they are saved according to the month and year of reference.
set Excel_1.full_filename = "\\PATH"><"TO"><".xlsx" execute Excel_1image example
Take a look in the image because the stackoverflow is messing with the written code.
And to fully automatize, you have also to set all the passwords in the initial nodes, example:
set Database.username = "TEST"
set Database.password = "PASSWORD"*

On the stream properties window -> Exection tab have you selected the 'Run this script' on stream execution??
If you make this selection you can run the stream and produce your output without even opening SPSS Modeler User Interface(via Modelerbatch).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to use a output of a Databricks activity in future activity inside ADF? - azure-data-factory

Related

ADF Copy only when a new CSV file is placed in the source and copy to the Container

Flutter | Retrieve ffprobe data

Azure Data Factory Passing a pipeline-processed file as a mail attachment with logic app

Debugging ADF and Databrickswith display dataframes

SPSS stream executing with python

Categories

Resources