Stop ADF pipeline execution if there is no data - azure-data-factory

I must stop the execution of an adf pipeline if there is no data in a table, but this should not generate an error, it should only stop the execution, is this possible?

You can use if activity wherein 1st validate whether there is any data in table, if yes then use other activities within True case else do nothing.
It would exit without any issues

Related

Trigger Date for reruns

My pipelines activities need the date of the run as a parameter. Now I get the current date in the pipeline from the utcnow() function. Ideally this would be something I could enter dynamically in the trigger so I could rerun a failed day and the parameter would be set right, now a rerun would lead to my pipeline being rerun but with the date of today not the failed run date.
I am used to airflow where such things are pretty easy to do, including scheduling reruns. Probably I think too much in terms of airflow but I can't wrap my head around a better solution.
In ADF,it is not supported directly to pass trigger date at which pipeline got failed to trigger.
You can get the trigger time using #pipeline().TriggerTime .
This system variable will give the time at which the trigger triggers the pipeline to run.
You can store this trigger value for every pipeline and use this as a parameter for the trigger which got failed and rerun the pipeline.
Reference: Microsoft document on System Variables on ADF
To resolve my problem I had to create a nested structure of pipelines, the top pipeline setting a variable for the date and then calling other pipelines passing that variable.
With this I still can't rerun the top pipeline but rerunning Execute Pipeline1/2/3 reruns them with the right variable set. It is still not perfect since the top pipeline run stays an error and it is difficult to keep track of what needs to be rerun, however it is a partial solution.

Azure Data Factory: Get the result of a query on the databricks notebook to create a condition

I wanted the result of a query on the databricks notebook to be the success or failure condition of the pipeline to reprocess for example the "copy data" in the azure data factory.
For example:
If x = 1, terminate the pipeline, if not, reprocess (with a limit of 3 attempts).
What's the best way to do this?
You can do this with the help of if and until activities in ADF.
Please go through the sample demonstration below:
This is the sample Notebook code from databricks.
#your code
x=1
dbutils.notebook.exit(x)
In ADF, first create an array variable which will be used in the until activity.
This array length is used for n number of times re-process.
Next give your databricks notebook.
Now use an if activity and give the below expression in that.
#equals(activity('Notebook1').output.runOutput,1)
If this is true, our pipeline has to be terminated. So, add a fail activity in the True activities of if.
Here you can give any message that you want.
Leave the Fail activities of if as it is.
Now, use an until activity and give the success of if to it.
Inside Until activities we can give any activity. if you want to reprocess another pipeline then you can give execute pipeline also. Here I have given a copy activity.
After copy activity use an append variable activity and give the array variable that we defined in the first and append with any single value that you want.
Now in the until expression give the below.
#equals(length(variables('iter')),4)
So, the activities inside until will reprocess 3 times if x!=1.
If x=1 in notebook, pipeline failed and terminated at if.
if x!=1 in Notebook, until reprocessed copy activity 3 times.

ADF fail error. Property 'output' cannot be selected

I have the following Fail task in my Until:
On my Until I have an expression that fails the pipeline if an error is encountered in my Until:
#or(greater(variables('RunDate') ,utcnow()), activity('Fail').output)
I've been getting this error for the last few days. I have 4 other pipelines that work the same way but only this one is giving me this error.
I'm not sure what this error means. It seems it can't select the Output text from the Fail activity but not sure why.
You want to stop the Until activity if any of your child activities fails.
I reproduced the similar scenario and gave the Fail activity output in until condition and got same error.
This is my scenario inside until which causing the error.
You will get this error when all of your inner activities inside Until succeeds. As every activity succeeds, the pipeline flow won't recognize the existence of Fail activity in that particular iteration. And when pipeline flow meets condition in the until, it will fail because it don't know about Fail activity.
Even though, in first iteration if any of your activity fails and Fail activity executed, you will get the error in the until condition because the activity('Fail').output gives the JSON object which will not fit in the or logic as it expects only a boolean value.
So, you can't give the Fail activity output like that in the Until condition.
The workaround for this can be achieved with two set variable activities inside Until.
The Until activity stops the execution of the inner activities when the condition of the Until becomes true. It will execute as long as the condition is false.
First create a pipeline variable of boolean type.
Set the variable to false if you want to continue the execution after success.
In your case, join the success of your last activity to this Set variable activity.
Set the variable to true, if you want to stop the execution after failure and along with your date condition.
Now, give the Until condition as per your requirement using or(your date condition, boolean variable) or you can use and as well.
But make sure that, it gives true if you want to stop and false if you want to continue.
NOTE: As per my repro, the Until is giving the error as Activity failed because an inner activity failed if any of the child activity fails only in the final iteration and not in every iteration.

Run an Azure Data Factory Pipeline Continuously

I have a requirement to incrementally copy data from one SQL table to another SQL table. The watermark or key column is an Identity column. My boss wants me to restart the load as soon as it's done...and as you know, the completion time may vary. In Azure Data Factory, the trigger options are Scheduled, Tumbling Window and Custom Event. Does anyone know which option would allow me to achieve this continuous running of the pipeline and how to configure it?
Create a new pipeline. Call it "run forever". Add an "until" activity with a never-true condition e.g. #equals(1, 2).
Inside the until execute the pipeline which copies between tables. Ensure "wait for completion" is ticked.
If the table copy fails then "run forever" will fail and will have to be manually re-started. You likely do not want "run forever" to be scheduled as the scheduled invocations will queue should it, in fact, not fail.
Pipeline in ADF are batch-based. You can set "micro batches" of 1 min with schedule trigger or 5 mins with tumbling window.
ADF will run in a batch. If you want to continuously load the data then you can go for Stream Analytics / Event Hub which would load real-time data

Do pipeline variables persist between runs?

I'm doing a simple data flow pipeline between 2 cosmos dbs. The pipeline starts with the dataflow, which grabs the pipeline variable "LastPipelineStartTime" and passes that parameter to the dataflow for the query to use in order to get all new data where c._ts >= "LastPipelineStartTime". Then, on data flow success, updates the variable via Set Variable to the pipeline.TriggerTime(). Essentially so I'm always grabbing new data between pipeline runs.
My question is: it looks like the variable during each debug run reverts back to its Default Value of 0, and instead grabs everything each time. Am I misunderstanding or using pipeline variables wrong? Thanks!
As i know,the variable which is set in the Set Variable Activity has it's own life cycle: during current execution of pipeline.Any change of variable can't persist until next execution stage.
To implement your needs,pls refer to my workarounds as below:
1.If you execute ADF pipeline in the schedule,you could just pass the schedule time as parameter into it to make sure you grab new data.
2.If the frequency is random,persist the trigger time into other residence(e.g. simple file in the blob storage),before data flow activity,use LookUp Activity to grab that time from blob storage file.